--- JSON-XS/XS.pm 2009/02/17 23:29:38 1.115 +++ JSON-XS/XS.pm 2009/02/18 00:08:28 1.117 @@ -1211,7 +1211,8 @@ print encode_json [chr 0x2028]; The right fix for this is to use a proper JSON parser in your javascript -programs, and not rely on C. +programs, and not rely on C (see for example Douglas Crockford's +F parser). If this is not an option, you can, as a stop-gap measure, simply encode to ASCII-only JSON: @@ -1220,25 +1221,20 @@ print JSON::XS->new->ascii->encode ([chr 0x2028]); -And if you are concerned about the size of the resulting JSON text, you -can run some regexes to only escape U+2028 and U+2029: - - use JSON::XS; +Note that this will enlarge the resulting JSON text quite a bit if you +have many non-ASCII characters. You might be tempted to run some regexes +to only escape U+2028 and U+2029, e.g.: + # DO NOT USE THIS! my $json = JSON::XS->new->utf8->encode ([chr 0x2028]); $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028 $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029 print $json; -This works because U+2028/U+2029 are not allowed outside of strings and -are not used for syntax, so replacing them unconditionally just works. - -Note, however, that fixing the broken JSON parser is better than working -around it in every other generator. The above regexes should work well in -other languages, as long as they operate on UTF-8. It is equally valid to -replace all occurences of U+2028/2029 directly by their \\u-escaped forms -in unicode texts, so they can simply be used to fix any parsers relying on -C by first applying the regexes on the encoded texts. +Note that I: the above only works for U+2028 and +U+2029 and thus only for fully ECMAscript-compliant parsers. Many existing +javascript implementations, however, have issues with other characters as +well - using C naively simply I cause problems. Another problem is that some javascript implementations reserve some property names for their own purposes (which probably makes