--- JSON-XS/README 2008/09/29 03:09:27 1.28 +++ JSON-XS/README 2009/02/19 01:13:46 1.29 @@ -686,16 +686,19 @@ $json->incr_skip This will reset the state of the incremental parser and will remove - the parsed text from the input buffer. This is useful after + the parsed text from the input buffer so far. This is useful after "incr_parse" died, in which case the input buffer and incremental parser state is left unchanged, to skip the text parsed so far and to reset the parse state. + The difference to "incr_reset" is that only text until the parse + error occured is removed. + $json->incr_reset This completely resets the incremental parser, that is, after this call, it will be as if the parser had never parsed anything. - This is useful if you want ot repeatedly parse JSON objects and want + This is useful if you want to repeatedly parse JSON objects and want to ignore any trailing data, which means you have to reset the parser after each successful decode. @@ -1071,6 +1074,69 @@ in mail), and works because ASCII is a proper subset of most 8-bit and multibyte encodings in use in the world. + JSON and ECMAscript + JSON syntax is based on how literals are represented in javascript (the + not-standardised predecessor of ECMAscript) which is presumably why it + is called "JavaScript Object Notation". + + However, JSON is not a subset (and also not a superset of course) of + ECMAscript (the standard) or javascript (whatever browsers actually + implement). + + If you want to use javascript's "eval" function to "parse" JSON, you + might run into parse errors for valid JSON texts, or the resulting data + structure might not be queryable: + + One of the problems is that U+2028 and U+2029 are valid characters + inside JSON strings, but are not allowed in ECMAscript string literals, + so the following Perl fragment will not output something that can be + guaranteed to be parsable by javascript's "eval": + + use JSON::XS; + + print encode_json [chr 0x2028]; + + The right fix for this is to use a proper JSON parser in your javascript + programs, and not rely on "eval" (see for example Douglas Crockford's + json2.js parser). + + If this is not an option, you can, as a stop-gap measure, simply encode + to ASCII-only JSON: + + use JSON::XS; + + print JSON::XS->new->ascii->encode ([chr 0x2028]); + + Note that this will enlarge the resulting JSON text quite a bit if you + have many non-ASCII characters. You might be tempted to run some regexes + to only escape U+2028 and U+2029, e.g.: + + # DO NOT USE THIS! + my $json = JSON::XS->new->utf8->encode ([chr 0x2028]); + $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028 + $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029 + print $json; + + Note that *this is a bad idea*: the above only works for U+2028 and + U+2029 and thus only for fully ECMAscript-compliant parsers. Many + existing javascript implementations, however, have issues with other + characters as well - using "eval" naively simply *will* cause problems. + + Another problem is that some javascript implementations reserve some + property names for their own purposes (which probably makes them + non-ECMAscript-compliant). For example, Iceweasel reserves the + "__proto__" property name for it's own purposes. + + If that is a problem, you could parse try to filter the resulting JSON + output for these property strings, e.g.: + + $json =~ s/"__proto__"\s*:/"__proto__renamed":/g; + + This works because "__proto__" is not valid outside of strings, so every + occurence of ""__proto__"\s*:" must be a string used as property name. + + If you know of other incompatibilities, please let me know. + JSON and YAML You often hear that JSON is a subset of YAML. This is, however, a mass hysteria(*) and very far from the truth (as of the time of this