--- JSON-XS/XS.pm 2009/01/21 05:34:08 1.114 +++ JSON-XS/XS.pm 2009/02/17 23:41:20 1.116 @@ -104,7 +104,7 @@ no warnings; use strict; -our $VERSION = '2.231'; +our $VERSION = '2.232'; our @ISA = qw(Exporter); our @EXPORT = qw(encode_json decode_json to_json from_json); @@ -1187,6 +1187,82 @@ =back +=head2 JSON and ECMAscript + +JSON syntax is based on how literals are represented in javascript (the +not-standardised predecessor of ECMAscript) which is presumably why it is +called "JavaScript Object Notation". + +However, JSON is not a subset (and also not a superset of course) of +ECMAscript (the standard) or javascript (whatever browsers actually +implement). + +If you want to use javascript's C function to "parse" JSON, you +might run into parse errors for valid JSON texts, or the resulting data +structure might not be queryable: + +One of the problems is that U+2028 and U+2029 are valid characters inside +JSON strings, but are not allowed in ECMAscript string literals, so the +following Perl fragment will not output something that can be guaranteed +to be parsable by javascript's C: + + use JSON::XS; + + print encode_json [chr 0x2028]; + +The right fix for this is to use a proper JSON parser in your javascript +programs, and not rely on C. + +If this is not an option, you can, as a stop-gap measure, simply encode to +ASCII-only JSON: + + use JSON::XS; + + print JSON::XS->new->ascii->encode ([chr 0x2028]); + +And if you are concerned about the size of the resulting JSON text, you +can run some regexes to only escape U+2028 and U+2029: + + use JSON::XS; + + my $json = JSON::XS->new->utf8->encode ([chr 0x2028]); + $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028 + $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029 + print $json; + +This works because U+2028/U+2029 are not allowed outside of strings and +are not used for syntax, so replacing them unconditionally just works. + +Note, however, that fixing the broken JSON parser is better than working +around it in every other generator. The above regexes should work well in +other languages, as long as they operate on UTF-8. It is equally valid to +replace all occurences of U+2028/2029 directly by their \\u-escaped forms +in unicode texts, so they can simply be used to fix any parsers relying on +C by first applying the regexes on the encoded texts. + +Note also that the above only works for U+2028 and U+2029 and thus +only for fully ECMAscript-compliant parsers. Many existing javascript +implementations misparse other characters as well. Best rely on a good +JSON parser, such as Douglas Crockfords F, which escapes the +above and many more problematic characters properly before passing them +into C. + +Another problem is that some javascript implementations reserve +some property names for their own purposes (which probably makes +them non-ECMAscript-compliant). For example, Iceweasel reserves the +C<__proto__> property name for it's own purposes. + +If that is a problem, you could parse try to filter the resulting JSON +output for these property strings, e.g.: + + $json =~ s/"__proto__"\s*:/"__proto__renamed":/g; + +This works because C<__proto__> is not valid outside of strings, so every +occurence of C<"__proto__"\s*:> must be a string used as property name. + +If you know of other incompatibilities, please let me know. + + =head2 JSON and YAML You often hear that JSON is a subset of YAML. This is, however, a mass