… | |
… | |
102 | package JSON::XS; |
102 | package JSON::XS; |
103 | |
103 | |
104 | no warnings; |
104 | no warnings; |
105 | use strict; |
105 | use strict; |
106 | |
106 | |
107 | our $VERSION = '2.232'; |
107 | our $VERSION = '2.24'; |
108 | our @ISA = qw(Exporter); |
108 | our @ISA = qw(Exporter); |
109 | |
109 | |
110 | our @EXPORT = qw(encode_json decode_json to_json from_json); |
110 | our @EXPORT = qw(encode_json decode_json to_json from_json); |
111 | |
111 | |
112 | sub to_json($) { |
112 | sub to_json($) { |
… | |
… | |
1209 | use JSON::XS; |
1209 | use JSON::XS; |
1210 | |
1210 | |
1211 | print encode_json [chr 0x2028]; |
1211 | print encode_json [chr 0x2028]; |
1212 | |
1212 | |
1213 | The right fix for this is to use a proper JSON parser in your javascript |
1213 | The right fix for this is to use a proper JSON parser in your javascript |
1214 | programs, and not rely on C<eval>. |
1214 | programs, and not rely on C<eval> (see for example Douglas Crockford's |
|
|
1215 | F<json2.js> parser). |
1215 | |
1216 | |
1216 | If this is not an option, you can, as a stop-gap measure, simply encode to |
1217 | If this is not an option, you can, as a stop-gap measure, simply encode to |
1217 | ASCII-only JSON: |
1218 | ASCII-only JSON: |
1218 | |
1219 | |
1219 | use JSON::XS; |
1220 | use JSON::XS; |
1220 | |
1221 | |
1221 | print JSON::XS->new->ascii->encode ([chr 0x2028]); |
1222 | print JSON::XS->new->ascii->encode ([chr 0x2028]); |
1222 | |
1223 | |
1223 | And if you are concerned about the size of the resulting JSON text, you |
1224 | Note that this will enlarge the resulting JSON text quite a bit if you |
1224 | can run some regexes to only escape U+2028 and U+2029: |
1225 | have many non-ASCII characters. You might be tempted to run some regexes |
|
|
1226 | to only escape U+2028 and U+2029, e.g.: |
1225 | |
1227 | |
1226 | use JSON::XS; |
1228 | # DO NOT USE THIS! |
1227 | |
|
|
1228 | my $json = JSON::XS->new->utf8->encode ([chr 0x2028]); |
1229 | my $json = JSON::XS->new->utf8->encode ([chr 0x2028]); |
1229 | $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028 |
1230 | $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028 |
1230 | $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029 |
1231 | $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029 |
1231 | print $json; |
1232 | print $json; |
1232 | |
1233 | |
1233 | This works because U+2028/U+2029 are not allowed outside of strings and |
1234 | Note that I<this is a bad idea>: the above only works for U+2028 and |
1234 | are not used for syntax, so replacing them unconditionally just works. |
|
|
1235 | |
|
|
1236 | Note, however, that fixing the broken JSON parser is better than working |
|
|
1237 | around it in every other generator. The above regexes should work well in |
|
|
1238 | other languages, as long as they operate on UTF-8. It is equally valid to |
|
|
1239 | replace all occurences of U+2028/2029 directly by their \\u-escaped forms |
|
|
1240 | in unicode texts, so they can simply be used to fix any parsers relying on |
|
|
1241 | C<eval> by first applying the regexes on the encoded texts. |
|
|
1242 | |
|
|
1243 | Note also that the above only works for U+2028 and U+2029 and thus |
|
|
1244 | only for fully ECMAscript-compliant parsers. Many existing javascript |
1235 | U+2029 and thus only for fully ECMAscript-compliant parsers. Many existing |
1245 | implementations misparse other characters as well. Best rely on a good |
1236 | javascript implementations, however, have issues with other characters as |
1246 | JSON parser, such as Douglas Crockfords F<json2.js>, which escapes the |
1237 | well - using C<eval> naively simply I<will> cause problems. |
1247 | above and many more problematic characters properly before passing them |
|
|
1248 | into C<eval>. |
|
|
1249 | |
1238 | |
1250 | Another problem is that some javascript implementations reserve |
1239 | Another problem is that some javascript implementations reserve |
1251 | some property names for their own purposes (which probably makes |
1240 | some property names for their own purposes (which probably makes |
1252 | them non-ECMAscript-compliant). For example, Iceweasel reserves the |
1241 | them non-ECMAscript-compliant). For example, Iceweasel reserves the |
1253 | C<__proto__> property name for it's own purposes. |
1242 | C<__proto__> property name for it's own purposes. |