| 1 |
NAME |
| 2 |
JSON::XS - JSON serialising/deserialising, done correctly and fast |
| 3 |
|
| 4 |
JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ |
| 5 |
(http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html) |
| 6 |
|
| 7 |
SYNOPSIS |
| 8 |
use JSON::XS; |
| 9 |
|
| 10 |
# exported functions, they croak on error |
| 11 |
# and expect/generate UTF-8 |
| 12 |
|
| 13 |
$utf8_encoded_json_text = encode_json $perl_hash_or_arrayref; |
| 14 |
$perl_hash_or_arrayref = decode_json $utf8_encoded_json_text; |
| 15 |
|
| 16 |
# OO-interface |
| 17 |
|
| 18 |
$coder = JSON::XS->new->ascii->pretty->allow_nonref; |
| 19 |
$pretty_printed_unencoded = $coder->encode ($perl_scalar); |
| 20 |
$perl_scalar = $coder->decode ($unicode_json_text); |
| 21 |
|
| 22 |
# Note that JSON version 2.0 and above will automatically use JSON::XS |
| 23 |
# if available, at virtually no speed overhead either, so you should |
| 24 |
# be able to just: |
| 25 |
|
| 26 |
use JSON; |
| 27 |
|
| 28 |
# and do the same things, except that you have a pure-perl fallback now. |
| 29 |
|
| 30 |
DESCRIPTION |
| 31 |
This module converts Perl data structures to JSON and vice versa. Its |
| 32 |
primary goal is to be *correct* and its secondary goal is to be *fast*. |
| 33 |
To reach the latter goal it was written in C. |
| 34 |
|
| 35 |
See MAPPING, below, on how JSON::XS maps perl values to JSON values and |
| 36 |
vice versa. |
| 37 |
|
| 38 |
FEATURES |
| 39 |
* correct Unicode handling |
| 40 |
|
| 41 |
This module knows how to handle Unicode, documents how and when it |
| 42 |
does so, and even documents what "correct" means. |
| 43 |
|
| 44 |
* round-trip integrity |
| 45 |
|
| 46 |
When you serialise a perl data structure using only data types |
| 47 |
supported by JSON and Perl, the deserialised data structure is |
| 48 |
identical on the Perl level. (e.g. the string "2.0" doesn't suddenly |
| 49 |
become "2" just because it looks like a number). There *are* minor |
| 50 |
exceptions to this, read the MAPPING section below to learn about |
| 51 |
those. |
| 52 |
|
| 53 |
* strict checking of JSON correctness |
| 54 |
|
| 55 |
There is no guessing, no generating of illegal JSON texts by |
| 56 |
default, and only JSON is accepted as input by default (the latter |
| 57 |
is a security feature). |
| 58 |
|
| 59 |
* fast |
| 60 |
|
| 61 |
Compared to other JSON modules and other serialisers such as |
| 62 |
Storable, this module usually compares favourably in terms of speed, |
| 63 |
too. |
| 64 |
|
| 65 |
* simple to use |
| 66 |
|
| 67 |
This module has both a simple functional interface as well as an |
| 68 |
object oriented interface. |
| 69 |
|
| 70 |
* reasonably versatile output formats |
| 71 |
|
| 72 |
You can choose between the most compact guaranteed-single-line |
| 73 |
format possible (nice for simple line-based protocols), a pure-ASCII |
| 74 |
format (for when your transport is not 8-bit clean, still supports |
| 75 |
the whole Unicode range), or a pretty-printed format (for when you |
| 76 |
want to read that stuff). Or you can combine those features in |
| 77 |
whatever way you like. |
| 78 |
|
| 79 |
FUNCTIONAL INTERFACE |
| 80 |
The following convenience methods are provided by this module. They are |
| 81 |
exported by default: |
| 82 |
|
| 83 |
$json_text = encode_json $perl_scalar |
| 84 |
Converts the given Perl data structure to a UTF-8 encoded, binary |
| 85 |
string (that is, the string contains octets only). Croaks on error. |
| 86 |
|
| 87 |
This function call is functionally identical to: |
| 88 |
|
| 89 |
$json_text = JSON::XS->new->utf8->encode ($perl_scalar) |
| 90 |
|
| 91 |
Except being faster. |
| 92 |
|
| 93 |
$perl_scalar = decode_json $json_text |
| 94 |
The opposite of "encode_json": expects a UTF-8 (binary) string and |
| 95 |
tries to parse that as a UTF-8 encoded JSON text, returning the |
| 96 |
resulting reference. Croaks on error. |
| 97 |
|
| 98 |
This function call is functionally identical to: |
| 99 |
|
| 100 |
$perl_scalar = JSON::XS->new->utf8->decode ($json_text) |
| 101 |
|
| 102 |
Except being faster. |
| 103 |
|
| 104 |
A FEW NOTES ON UNICODE AND PERL |
| 105 |
Since this often leads to confusion, here are a few very clear words on |
| 106 |
how Unicode works in Perl, modulo bugs. |
| 107 |
|
| 108 |
1. Perl strings can store characters with ordinal values > 255. |
| 109 |
This enables you to store Unicode characters as single characters in |
| 110 |
a Perl string - very natural. |
| 111 |
|
| 112 |
2. Perl does *not* associate an encoding with your strings. |
| 113 |
... until you force it to, e.g. when matching it against a regex, or |
| 114 |
printing the scalar to a file, in which case Perl either interprets |
| 115 |
your string as locale-encoded text, octets/binary, or as Unicode, |
| 116 |
depending on various settings. In no case is an encoding stored |
| 117 |
together with your data, it is *use* that decides encoding, not any |
| 118 |
magical meta data. |
| 119 |
|
| 120 |
3. The internal utf-8 flag has no meaning with regards to the encoding |
| 121 |
of your string. |
| 122 |
Just ignore that flag unless you debug a Perl bug, a module written |
| 123 |
in XS or want to dive into the internals of perl. Otherwise it will |
| 124 |
only confuse you, as, despite the name, it says nothing about how |
| 125 |
your string is encoded. You can have Unicode strings with that flag |
| 126 |
set, with that flag clear, and you can have binary data with that |
| 127 |
flag set and that flag clear. Other possibilities exist, too. |
| 128 |
|
| 129 |
If you didn't know about that flag, just the better, pretend it |
| 130 |
doesn't exist. |
| 131 |
|
| 132 |
4. A "Unicode String" is simply a string where each character can be |
| 133 |
validly interpreted as a Unicode code point. |
| 134 |
If you have UTF-8 encoded data, it is no longer a Unicode string, |
| 135 |
but a Unicode string encoded in UTF-8, giving you a binary string. |
| 136 |
|
| 137 |
5. A string containing "high" (> 255) character values is *not* a UTF-8 |
| 138 |
string. |
| 139 |
It's a fact. Learn to live with it. |
| 140 |
|
| 141 |
I hope this helps :) |
| 142 |
|
| 143 |
OBJECT-ORIENTED INTERFACE |
| 144 |
The object oriented interface lets you configure your own encoding or |
| 145 |
decoding style, within the limits of supported formats. |
| 146 |
|
| 147 |
$json = new JSON::XS |
| 148 |
Creates a new JSON::XS object that can be used to de/encode JSON |
| 149 |
strings. All boolean flags described below are by default *disabled* |
| 150 |
(with the exception of "allow_nonref", which defaults to *enabled* |
| 151 |
since version 4.0). |
| 152 |
|
| 153 |
The mutators for flags all return the JSON object again and thus |
| 154 |
calls can be chained: |
| 155 |
|
| 156 |
my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]}) |
| 157 |
=> {"a": [1, 2]} |
| 158 |
|
| 159 |
$json = $json->ascii ([$enable]) |
| 160 |
$enabled = $json->get_ascii |
| 161 |
If $enable is true (or missing), then the "encode" method will not |
| 162 |
generate characters outside the code range 0..127 (which is ASCII). |
| 163 |
Any Unicode characters outside that range will be escaped using |
| 164 |
either a single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL |
| 165 |
escape sequence, as per RFC4627. The resulting encoded JSON text can |
| 166 |
be treated as a native Unicode string, an ascii-encoded, |
| 167 |
latin1-encoded or UTF-8 encoded string, or any other superset of |
| 168 |
ASCII. |
| 169 |
|
| 170 |
If $enable is false, then the "encode" method will not escape |
| 171 |
Unicode characters unless required by the JSON syntax or other |
| 172 |
flags. This results in a faster and more compact format. |
| 173 |
|
| 174 |
See also the section *ENCODING/CODESET FLAG NOTES* later in this |
| 175 |
document. |
| 176 |
|
| 177 |
The main use for this flag is to produce JSON texts that can be |
| 178 |
transmitted over a 7-bit channel, as the encoded JSON texts will not |
| 179 |
contain any 8 bit characters. |
| 180 |
|
| 181 |
JSON::XS->new->ascii (1)->encode ([chr 0x10401]) |
| 182 |
=> ["\ud801\udc01"] |
| 183 |
|
| 184 |
$json = $json->latin1 ([$enable]) |
| 185 |
$enabled = $json->get_latin1 |
| 186 |
If $enable is true (or missing), then the "encode" method will |
| 187 |
encode the resulting JSON text as latin1 (or iso-8859-1), escaping |
| 188 |
any characters outside the code range 0..255. The resulting string |
| 189 |
can be treated as a latin1-encoded JSON text or a native Unicode |
| 190 |
string. The "decode" method will not be affected in any way by this |
| 191 |
flag, as "decode" by default expects Unicode, which is a strict |
| 192 |
superset of latin1. |
| 193 |
|
| 194 |
If $enable is false, then the "encode" method will not escape |
| 195 |
Unicode characters unless required by the JSON syntax or other |
| 196 |
flags. |
| 197 |
|
| 198 |
See also the section *ENCODING/CODESET FLAG NOTES* later in this |
| 199 |
document. |
| 200 |
|
| 201 |
The main use for this flag is efficiently encoding binary data as |
| 202 |
JSON text, as most octets will not be escaped, resulting in a |
| 203 |
smaller encoded size. The disadvantage is that the resulting JSON |
| 204 |
text is encoded in latin1 (and must correctly be treated as such |
| 205 |
when storing and transferring), a rare encoding for JSON. It is |
| 206 |
therefore most useful when you want to store data structures known |
| 207 |
to contain binary data efficiently in files or databases, not when |
| 208 |
talking to other JSON encoders/decoders. |
| 209 |
|
| 210 |
JSON::XS->new->latin1->encode (["\x{89}\x{abc}"] |
| 211 |
=> ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not) |
| 212 |
|
| 213 |
$json = $json->utf8 ([$enable]) |
| 214 |
$enabled = $json->get_utf8 |
| 215 |
If $enable is true (or missing), then the "encode" method will |
| 216 |
encode the JSON result into UTF-8, as required by many protocols, |
| 217 |
while the "decode" method expects to be handed a UTF-8-encoded |
| 218 |
string. Please note that UTF-8-encoded strings do not contain any |
| 219 |
characters outside the range 0..255, they are thus useful for |
| 220 |
bytewise/binary I/O. In future versions, enabling this option might |
| 221 |
enable autodetection of the UTF-16 and UTF-32 encoding families, as |
| 222 |
described in RFC4627. |
| 223 |
|
| 224 |
If $enable is false, then the "encode" method will return the JSON |
| 225 |
string as a (non-encoded) Unicode string, while "decode" expects |
| 226 |
thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or |
| 227 |
UTF-16) needs to be done yourself, e.g. using the Encode module. |
| 228 |
|
| 229 |
See also the section *ENCODING/CODESET FLAG NOTES* later in this |
| 230 |
document. |
| 231 |
|
| 232 |
Example, output UTF-16BE-encoded JSON: |
| 233 |
|
| 234 |
use Encode; |
| 235 |
$jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object); |
| 236 |
|
| 237 |
Example, decode UTF-32LE-encoded JSON: |
| 238 |
|
| 239 |
use Encode; |
| 240 |
$object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext); |
| 241 |
|
| 242 |
$json = $json->pretty ([$enable]) |
| 243 |
This enables (or disables) all of the "indent", "space_before" and |
| 244 |
"space_after" (and in the future possibly more) flags in one call to |
| 245 |
generate the most readable (or most compact) form possible. |
| 246 |
|
| 247 |
Example, pretty-print some simple structure: |
| 248 |
|
| 249 |
my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]}) |
| 250 |
=> |
| 251 |
{ |
| 252 |
"a" : [ |
| 253 |
1, |
| 254 |
2 |
| 255 |
] |
| 256 |
} |
| 257 |
|
| 258 |
$json = $json->indent ([$enable]) |
| 259 |
$enabled = $json->get_indent |
| 260 |
If $enable is true (or missing), then the "encode" method will use a |
| 261 |
multiline format as output, putting every array member or |
| 262 |
object/hash key-value pair into its own line, indenting them |
| 263 |
properly. |
| 264 |
|
| 265 |
If $enable is false, no newlines or indenting will be produced, and |
| 266 |
the resulting JSON text is guaranteed not to contain any "newlines". |
| 267 |
|
| 268 |
This setting has no effect when decoding JSON texts. |
| 269 |
|
| 270 |
$json = $json->space_before ([$enable]) |
| 271 |
$enabled = $json->get_space_before |
| 272 |
If $enable is true (or missing), then the "encode" method will add |
| 273 |
an extra optional space before the ":" separating keys from values |
| 274 |
in JSON objects. |
| 275 |
|
| 276 |
If $enable is false, then the "encode" method will not add any extra |
| 277 |
space at those places. |
| 278 |
|
| 279 |
This setting has no effect when decoding JSON texts. You will also |
| 280 |
most likely combine this setting with "space_after". |
| 281 |
|
| 282 |
Example, space_before enabled, space_after and indent disabled: |
| 283 |
|
| 284 |
{"key" :"value"} |
| 285 |
|
| 286 |
$json = $json->space_after ([$enable]) |
| 287 |
$enabled = $json->get_space_after |
| 288 |
If $enable is true (or missing), then the "encode" method will add |
| 289 |
an extra optional space after the ":" separating keys from values in |
| 290 |
JSON objects and extra whitespace after the "," separating key-value |
| 291 |
pairs and array members. |
| 292 |
|
| 293 |
If $enable is false, then the "encode" method will not add any extra |
| 294 |
space at those places. |
| 295 |
|
| 296 |
This setting has no effect when decoding JSON texts. |
| 297 |
|
| 298 |
Example, space_before and indent disabled, space_after enabled: |
| 299 |
|
| 300 |
{"key": "value"} |
| 301 |
|
| 302 |
$json = $json->relaxed ([$enable]) |
| 303 |
$enabled = $json->get_relaxed |
| 304 |
If $enable is true (or missing), then "decode" will accept some |
| 305 |
extensions to normal JSON syntax (see below). "encode" will not be |
| 306 |
affected in any way. *Be aware that this option makes you accept |
| 307 |
invalid JSON texts as if they were valid!*. I suggest only to use |
| 308 |
this option to parse application-specific files written by humans |
| 309 |
(configuration files, resource files etc.) |
| 310 |
|
| 311 |
If $enable is false (the default), then "decode" will only accept |
| 312 |
valid JSON texts. |
| 313 |
|
| 314 |
Currently accepted extensions are: |
| 315 |
|
| 316 |
* list items can have an end-comma |
| 317 |
|
| 318 |
JSON *separates* array elements and key-value pairs with commas. |
| 319 |
This can be annoying if you write JSON texts manually and want |
| 320 |
to be able to quickly append elements, so this extension accepts |
| 321 |
comma at the end of such items not just between them: |
| 322 |
|
| 323 |
[ |
| 324 |
1, |
| 325 |
2, <- this comma not normally allowed |
| 326 |
] |
| 327 |
{ |
| 328 |
"k1": "v1", |
| 329 |
"k2": "v2", <- this comma not normally allowed |
| 330 |
} |
| 331 |
|
| 332 |
* shell-style '#'-comments |
| 333 |
|
| 334 |
Whenever JSON allows whitespace, shell-style comments are |
| 335 |
additionally allowed. They are terminated by the first |
| 336 |
carriage-return or line-feed character, after which more |
| 337 |
white-space and comments are allowed. |
| 338 |
|
| 339 |
[ |
| 340 |
1, # this comment not allowed in JSON |
| 341 |
# neither this one... |
| 342 |
] |
| 343 |
|
| 344 |
* literal ASCII TAB characters in strings |
| 345 |
|
| 346 |
Literal ASCII TAB characters are now allowed in strings (and |
| 347 |
treated as "\t"). |
| 348 |
|
| 349 |
[ |
| 350 |
"Hello\tWorld", |
| 351 |
"Hello<TAB>World", # literal <TAB> would not normally be allowed |
| 352 |
] |
| 353 |
|
| 354 |
$json = $json->canonical ([$enable]) |
| 355 |
$enabled = $json->get_canonical |
| 356 |
If $enable is true (or missing), then the "encode" method will |
| 357 |
output JSON objects by sorting their keys. This is adding a |
| 358 |
comparatively high overhead. |
| 359 |
|
| 360 |
If $enable is false, then the "encode" method will output key-value |
| 361 |
pairs in the order Perl stores them (which will likely change |
| 362 |
between runs of the same script, and can change even within the same |
| 363 |
run from 5.18 onwards). |
| 364 |
|
| 365 |
This option is useful if you want the same data structure to be |
| 366 |
encoded as the same JSON text (given the same overall settings). If |
| 367 |
it is disabled, the same hash might be encoded differently even if |
| 368 |
contains the same data, as key-value pairs have no inherent ordering |
| 369 |
in Perl. |
| 370 |
|
| 371 |
This setting has no effect when decoding JSON texts. |
| 372 |
|
| 373 |
This setting has currently no effect on tied hashes. |
| 374 |
|
| 375 |
$json = $json->allow_nonref ([$enable]) |
| 376 |
$enabled = $json->get_allow_nonref |
| 377 |
Unlike other boolean options, this opotion is enabled by default |
| 378 |
beginning with version 4.0. See "SECURITY CONSIDERATIONS" for the |
| 379 |
gory details. |
| 380 |
|
| 381 |
If $enable is true (or missing), then the "encode" method can |
| 382 |
convert a non-reference into its corresponding string, number or |
| 383 |
null JSON value, which is an extension to RFC4627. Likewise, |
| 384 |
"decode" will accept those JSON values instead of croaking. |
| 385 |
|
| 386 |
If $enable is false, then the "encode" method will croak if it isn't |
| 387 |
passed an arrayref or hashref, as JSON texts must either be an |
| 388 |
object or array. Likewise, "decode" will croak if given something |
| 389 |
that is not a JSON object or array. |
| 390 |
|
| 391 |
Example, encode a Perl scalar as JSON value without enabled |
| 392 |
"allow_nonref", resulting in an error: |
| 393 |
|
| 394 |
JSON::XS->new->allow_nonref (0)->encode ("Hello, World!") |
| 395 |
=> hash- or arrayref expected... |
| 396 |
|
| 397 |
$json = $json->allow_unknown ([$enable]) |
| 398 |
$enabled = $json->get_allow_unknown |
| 399 |
If $enable is true (or missing), then "encode" will *not* throw an |
| 400 |
exception when it encounters values it cannot represent in JSON (for |
| 401 |
example, filehandles) but instead will encode a JSON "null" value. |
| 402 |
Note that blessed objects are not included here and are handled |
| 403 |
separately by c<allow_nonref>. |
| 404 |
|
| 405 |
If $enable is false (the default), then "encode" will throw an |
| 406 |
exception when it encounters anything it cannot encode as JSON. |
| 407 |
|
| 408 |
This option does not affect "decode" in any way, and it is |
| 409 |
recommended to leave it off unless you know your communications |
| 410 |
partner. |
| 411 |
|
| 412 |
$json = $json->allow_blessed ([$enable]) |
| 413 |
$enabled = $json->get_allow_blessed |
| 414 |
See "OBJECT SERIALISATION" for details. |
| 415 |
|
| 416 |
If $enable is true (or missing), then the "encode" method will not |
| 417 |
barf when it encounters a blessed reference that it cannot convert |
| 418 |
otherwise. Instead, a JSON "null" value is encoded instead of the |
| 419 |
object. |
| 420 |
|
| 421 |
If $enable is false (the default), then "encode" will throw an |
| 422 |
exception when it encounters a blessed object that it cannot convert |
| 423 |
otherwise. |
| 424 |
|
| 425 |
This setting has no effect on "decode". |
| 426 |
|
| 427 |
$json = $json->convert_blessed ([$enable]) |
| 428 |
$enabled = $json->get_convert_blessed |
| 429 |
See "OBJECT SERIALISATION" for details. |
| 430 |
|
| 431 |
If $enable is true (or missing), then "encode", upon encountering a |
| 432 |
blessed object, will check for the availability of the "TO_JSON" |
| 433 |
method on the object's class. If found, it will be called in scalar |
| 434 |
context and the resulting scalar will be encoded instead of the |
| 435 |
object. |
| 436 |
|
| 437 |
The "TO_JSON" method may safely call die if it wants. If "TO_JSON" |
| 438 |
returns other blessed objects, those will be handled in the same |
| 439 |
way. "TO_JSON" must take care of not causing an endless recursion |
| 440 |
cycle (== crash) in this case. The name of "TO_JSON" was chosen |
| 441 |
because other methods called by the Perl core (== not by the user of |
| 442 |
the object) are usually in upper case letters and to avoid |
| 443 |
collisions with any "to_json" function or method. |
| 444 |
|
| 445 |
If $enable is false (the default), then "encode" will not consider |
| 446 |
this type of conversion. |
| 447 |
|
| 448 |
This setting has no effect on "decode". |
| 449 |
|
| 450 |
$json = $json->allow_tags ([$enable]) |
| 451 |
$enabled = $json->get_allow_tags |
| 452 |
See "OBJECT SERIALISATION" for details. |
| 453 |
|
| 454 |
If $enable is true (or missing), then "encode", upon encountering a |
| 455 |
blessed object, will check for the availability of the "FREEZE" |
| 456 |
method on the object's class. If found, it will be used to serialise |
| 457 |
the object into a nonstandard tagged JSON value (that JSON decoders |
| 458 |
cannot decode). |
| 459 |
|
| 460 |
It also causes "decode" to parse such tagged JSON values and |
| 461 |
deserialise them via a call to the "THAW" method. |
| 462 |
|
| 463 |
If $enable is false (the default), then "encode" will not consider |
| 464 |
this type of conversion, and tagged JSON values will cause a parse |
| 465 |
error in "decode", as if tags were not part of the grammar. |
| 466 |
|
| 467 |
$json->boolean_values ([$false, $true]) |
| 468 |
($false, $true) = $json->get_boolean_values |
| 469 |
By default, JSON booleans will be decoded as overloaded |
| 470 |
$Types::Serialiser::false and $Types::Serialiser::true objects. |
| 471 |
|
| 472 |
With this method you can specify your own boolean values for |
| 473 |
decoding - on decode, JSON "false" will be decoded as a copy of |
| 474 |
$false, and JSON "true" will be decoded as $true ("copy" here is the |
| 475 |
same thing as assigning a value to another variable, i.e. "$copy = |
| 476 |
$false"). |
| 477 |
|
| 478 |
Calling this method without any arguments will reset the booleans to |
| 479 |
their default values. |
| 480 |
|
| 481 |
"get_boolean_values" will return both $false and $true values, or |
| 482 |
the empty list when they are set to the default. |
| 483 |
|
| 484 |
$json = $json->filter_json_object ([$coderef->($hashref)]) |
| 485 |
When $coderef is specified, it will be called from "decode" each |
| 486 |
time it decodes a JSON object. The only argument is a reference to |
| 487 |
the newly-created hash. If the code reference returns a single |
| 488 |
scalar (which need not be a reference), this value (or rather a copy |
| 489 |
of it) is inserted into the deserialised data structure. If it |
| 490 |
returns an empty list (NOTE: *not* "undef", which is a valid |
| 491 |
scalar), the original deserialised hash will be inserted. This |
| 492 |
setting can slow down decoding considerably. |
| 493 |
|
| 494 |
When $coderef is omitted or undefined, any existing callback will be |
| 495 |
removed and "decode" will not change the deserialised hash in any |
| 496 |
way. |
| 497 |
|
| 498 |
Example, convert all JSON objects into the integer 5: |
| 499 |
|
| 500 |
my $js = JSON::XS->new->filter_json_object (sub { 5 }); |
| 501 |
# returns [5] |
| 502 |
$js->decode ('[{}]') |
| 503 |
# throw an exception because allow_nonref is not enabled |
| 504 |
# so a lone 5 is not allowed. |
| 505 |
$js->decode ('{"a":1, "b":2}'); |
| 506 |
|
| 507 |
$json = $json->filter_json_single_key_object ($key [=> |
| 508 |
$coderef->($value)]) |
| 509 |
Works remotely similar to "filter_json_object", but is only called |
| 510 |
for JSON objects having a single key named $key. |
| 511 |
|
| 512 |
This $coderef is called before the one specified via |
| 513 |
"filter_json_object", if any. It gets passed the single value in the |
| 514 |
JSON object. If it returns a single value, it will be inserted into |
| 515 |
the data structure. If it returns nothing (not even "undef" but the |
| 516 |
empty list), the callback from "filter_json_object" will be called |
| 517 |
next, as if no single-key callback were specified. |
| 518 |
|
| 519 |
If $coderef is omitted or undefined, the corresponding callback will |
| 520 |
be disabled. There can only ever be one callback for a given key. |
| 521 |
|
| 522 |
As this callback gets called less often then the |
| 523 |
"filter_json_object" one, decoding speed will not usually suffer as |
| 524 |
much. Therefore, single-key objects make excellent targets to |
| 525 |
serialise Perl objects into, especially as single-key JSON objects |
| 526 |
are as close to the type-tagged value concept as JSON gets (it's |
| 527 |
basically an ID/VALUE tuple). Of course, JSON does not support this |
| 528 |
in any way, so you need to make sure your data never looks like a |
| 529 |
serialised Perl hash. |
| 530 |
|
| 531 |
Typical names for the single object key are "__class_whatever__", or |
| 532 |
"$__dollars_are_rarely_used__$" or "}ugly_brace_placement", or even |
| 533 |
things like "__class_md5sum(classname)__", to reduce the risk of |
| 534 |
clashing with real hashes. |
| 535 |
|
| 536 |
Example, decode JSON objects of the form "{ "__widget__" => <id> }" |
| 537 |
into the corresponding $WIDGET{<id>} object: |
| 538 |
|
| 539 |
# return whatever is in $WIDGET{5}: |
| 540 |
JSON::XS |
| 541 |
->new |
| 542 |
->filter_json_single_key_object (__widget__ => sub { |
| 543 |
$WIDGET{ $_[0] } |
| 544 |
}) |
| 545 |
->decode ('{"__widget__": 5') |
| 546 |
|
| 547 |
# this can be used with a TO_JSON method in some "widget" class |
| 548 |
# for serialisation to json: |
| 549 |
sub WidgetBase::TO_JSON { |
| 550 |
my ($self) = @_; |
| 551 |
|
| 552 |
unless ($self->{id}) { |
| 553 |
$self->{id} = ..get..some..id..; |
| 554 |
$WIDGET{$self->{id}} = $self; |
| 555 |
} |
| 556 |
|
| 557 |
{ __widget__ => $self->{id} } |
| 558 |
} |
| 559 |
|
| 560 |
$json = $json->shrink ([$enable]) |
| 561 |
$enabled = $json->get_shrink |
| 562 |
Perl usually over-allocates memory a bit when allocating space for |
| 563 |
strings. This flag optionally resizes strings generated by either |
| 564 |
"encode" or "decode" to their minimum size possible. This can save |
| 565 |
memory when your JSON texts are either very very long or you have |
| 566 |
many short strings. It will also try to downgrade any strings to |
| 567 |
octet-form if possible: perl stores strings internally either in an |
| 568 |
encoding called UTF-X or in octet-form. The latter cannot store |
| 569 |
everything but uses less space in general (and some buggy Perl or C |
| 570 |
code might even rely on that internal representation being used). |
| 571 |
|
| 572 |
The actual definition of what shrink does might change in future |
| 573 |
versions, but it will always try to save space at the expense of |
| 574 |
time. |
| 575 |
|
| 576 |
If $enable is true (or missing), the string returned by "encode" |
| 577 |
will be shrunk-to-fit, while all strings generated by "decode" will |
| 578 |
also be shrunk-to-fit. |
| 579 |
|
| 580 |
If $enable is false, then the normal perl allocation algorithms are |
| 581 |
used. If you work with your data, then this is likely to be faster. |
| 582 |
|
| 583 |
In the future, this setting might control other things, such as |
| 584 |
converting strings that look like integers or floats into integers |
| 585 |
or floats internally (there is no difference on the Perl level), |
| 586 |
saving space. |
| 587 |
|
| 588 |
$json = $json->max_depth ([$maximum_nesting_depth]) |
| 589 |
$max_depth = $json->get_max_depth |
| 590 |
Sets the maximum nesting level (default 512) accepted while encoding |
| 591 |
or decoding. If a higher nesting level is detected in JSON text or a |
| 592 |
Perl data structure, then the encoder and decoder will stop and |
| 593 |
croak at that point. |
| 594 |
|
| 595 |
Nesting level is defined by number of hash- or arrayrefs that the |
| 596 |
encoder needs to traverse to reach a given point or the number of |
| 597 |
"{" or "[" characters without their matching closing parenthesis |
| 598 |
crossed to reach a given character in a string. |
| 599 |
|
| 600 |
Setting the maximum depth to one disallows any nesting, so that |
| 601 |
ensures that the object is only a single hash/object or array. |
| 602 |
|
| 603 |
If no argument is given, the highest possible setting will be used, |
| 604 |
which is rarely useful. |
| 605 |
|
| 606 |
Note that nesting is implemented by recursion in C. The default |
| 607 |
value has been chosen to be as large as typical operating systems |
| 608 |
allow without crashing. |
| 609 |
|
| 610 |
See SECURITY CONSIDERATIONS, below, for more info on why this is |
| 611 |
useful. |
| 612 |
|
| 613 |
$json = $json->max_size ([$maximum_string_size]) |
| 614 |
$max_size = $json->get_max_size |
| 615 |
Set the maximum length a JSON text may have (in bytes) where |
| 616 |
decoding is being attempted. The default is 0, meaning no limit. |
| 617 |
When "decode" is called on a string that is longer then this many |
| 618 |
bytes, it will not attempt to decode the string but throw an |
| 619 |
exception. This setting has no effect on "encode" (yet). |
| 620 |
|
| 621 |
If no argument is given, the limit check will be deactivated (same |
| 622 |
as when 0 is specified). |
| 623 |
|
| 624 |
See SECURITY CONSIDERATIONS, below, for more info on why this is |
| 625 |
useful. |
| 626 |
|
| 627 |
$json_text = $json->encode ($perl_scalar) |
| 628 |
Converts the given Perl value or data structure to its JSON |
| 629 |
representation. Croaks on error. |
| 630 |
|
| 631 |
$perl_scalar = $json->decode ($json_text) |
| 632 |
The opposite of "encode": expects a JSON text and tries to parse it, |
| 633 |
returning the resulting simple scalar or reference. Croaks on error. |
| 634 |
|
| 635 |
($perl_scalar, $characters) = $json->decode_prefix ($json_text) |
| 636 |
This works like the "decode" method, but instead of raising an |
| 637 |
exception when there is trailing garbage after the first JSON |
| 638 |
object, it will silently stop parsing there and return the number of |
| 639 |
characters consumed so far. |
| 640 |
|
| 641 |
This is useful if your JSON texts are not delimited by an outer |
| 642 |
protocol and you need to know where the JSON text ends. |
| 643 |
|
| 644 |
JSON::XS->new->decode_prefix ("[1] the tail") |
| 645 |
=> ([1], 3) |
| 646 |
|
| 647 |
INCREMENTAL PARSING |
| 648 |
In some cases, there is the need for incremental parsing of JSON texts. |
| 649 |
While this module always has to keep both JSON text and resulting Perl |
| 650 |
data structure in memory at one time, it does allow you to parse a JSON |
| 651 |
stream incrementally. It does so by accumulating text until it has a |
| 652 |
full JSON object, which it then can decode. This process is similar to |
| 653 |
using "decode_prefix" to see if a full JSON object is available, but is |
| 654 |
much more efficient (and can be implemented with a minimum of method |
| 655 |
calls). |
| 656 |
|
| 657 |
JSON::XS will only attempt to parse the JSON text once it is sure it has |
| 658 |
enough text to get a decisive result, using a very simple but truly |
| 659 |
incremental parser. This means that it sometimes won't stop as early as |
| 660 |
the full parser, for example, it doesn't detect mismatched parentheses. |
| 661 |
The only thing it guarantees is that it starts decoding as soon as a |
| 662 |
syntactically valid JSON text has been seen. This means you need to set |
| 663 |
resource limits (e.g. "max_size") to ensure the parser will stop parsing |
| 664 |
in the presence if syntax errors. |
| 665 |
|
| 666 |
The following methods implement this incremental parser. |
| 667 |
|
| 668 |
[void, scalar or list context] = $json->incr_parse ([$string]) |
| 669 |
This is the central parsing function. It can both append new text |
| 670 |
and extract objects from the stream accumulated so far (both of |
| 671 |
these functions are optional). |
| 672 |
|
| 673 |
If $string is given, then this string is appended to the already |
| 674 |
existing JSON fragment stored in the $json object. |
| 675 |
|
| 676 |
After that, if the function is called in void context, it will |
| 677 |
simply return without doing anything further. This can be used to |
| 678 |
add more text in as many chunks as you want. |
| 679 |
|
| 680 |
If the method is called in scalar context, then it will try to |
| 681 |
extract exactly *one* JSON object. If that is successful, it will |
| 682 |
return this object, otherwise it will return "undef". If there is a |
| 683 |
parse error, this method will croak just as "decode" would do (one |
| 684 |
can then use "incr_skip" to skip the erroneous part). This is the |
| 685 |
most common way of using the method. |
| 686 |
|
| 687 |
And finally, in list context, it will try to extract as many objects |
| 688 |
from the stream as it can find and return them, or the empty list |
| 689 |
otherwise. For this to work, there must be no separators (other than |
| 690 |
whitespace) between the JSON objects or arrays, instead they must be |
| 691 |
concatenated back-to-back. If an error occurs, an exception will be |
| 692 |
raised as in the scalar context case. Note that in this case, any |
| 693 |
previously-parsed JSON texts will be lost. |
| 694 |
|
| 695 |
Example: Parse some JSON arrays/objects in a given string and return |
| 696 |
them. |
| 697 |
|
| 698 |
my @objs = JSON::XS->new->incr_parse ("[5][7][1,2]"); |
| 699 |
|
| 700 |
$lvalue_string = $json->incr_text |
| 701 |
This method returns the currently stored JSON fragment as an lvalue, |
| 702 |
that is, you can manipulate it. This *only* works when a preceding |
| 703 |
call to "incr_parse" in *scalar context* successfully returned an |
| 704 |
object. Under all other circumstances you must not call this |
| 705 |
function (I mean it. although in simple tests it might actually |
| 706 |
work, it *will* fail under real world conditions). As a special |
| 707 |
exception, you can also call this method before having parsed |
| 708 |
anything. |
| 709 |
|
| 710 |
That means you can only use this function to look at or manipulate |
| 711 |
text before or after complete JSON objects, not while the parser is |
| 712 |
in the middle of parsing a JSON object. |
| 713 |
|
| 714 |
This function is useful in two cases: a) finding the trailing text |
| 715 |
after a JSON object or b) parsing multiple JSON objects separated by |
| 716 |
non-JSON text (such as commas). |
| 717 |
|
| 718 |
$json->incr_skip |
| 719 |
This will reset the state of the incremental parser and will remove |
| 720 |
the parsed text from the input buffer so far. This is useful after |
| 721 |
"incr_parse" died, in which case the input buffer and incremental |
| 722 |
parser state is left unchanged, to skip the text parsed so far and |
| 723 |
to reset the parse state. |
| 724 |
|
| 725 |
The difference to "incr_reset" is that only text until the parse |
| 726 |
error occurred is removed. |
| 727 |
|
| 728 |
$json->incr_reset |
| 729 |
This completely resets the incremental parser, that is, after this |
| 730 |
call, it will be as if the parser had never parsed anything. |
| 731 |
|
| 732 |
This is useful if you want to repeatedly parse JSON objects and want |
| 733 |
to ignore any trailing data, which means you have to reset the |
| 734 |
parser after each successful decode. |
| 735 |
|
| 736 |
LIMITATIONS |
| 737 |
The incremental parser is a non-exact parser: it works by gathering as |
| 738 |
much text as possible that *could* be a valid JSON text, followed by |
| 739 |
trying to decode it. |
| 740 |
|
| 741 |
That means it sometimes needs to read more data than strictly necessary |
| 742 |
to diagnose an invalid JSON text. For example, after parsing the |
| 743 |
following fragment, the parser *could* stop with an error, as this |
| 744 |
fragment *cannot* be the beginning of a valid JSON text: |
| 745 |
|
| 746 |
[, |
| 747 |
|
| 748 |
In reality, hopwever, the parser might continue to read data until a |
| 749 |
length limit is exceeded or it finds a closing bracket. |
| 750 |
|
| 751 |
EXAMPLES |
| 752 |
Some examples will make all this clearer. First, a simple example that |
| 753 |
works similarly to "decode_prefix": We want to decode the JSON object at |
| 754 |
the start of a string and identify the portion after the JSON object: |
| 755 |
|
| 756 |
my $text = "[1,2,3] hello"; |
| 757 |
|
| 758 |
my $json = new JSON::XS; |
| 759 |
|
| 760 |
my $obj = $json->incr_parse ($text) |
| 761 |
or die "expected JSON object or array at beginning of string"; |
| 762 |
|
| 763 |
my $tail = $json->incr_text; |
| 764 |
# $tail now contains " hello" |
| 765 |
|
| 766 |
Easy, isn't it? |
| 767 |
|
| 768 |
Now for a more complicated example: Imagine a hypothetical protocol |
| 769 |
where you read some requests from a TCP stream, and each request is a |
| 770 |
JSON array, without any separation between them (in fact, it is often |
| 771 |
useful to use newlines as "separators", as these get interpreted as |
| 772 |
whitespace at the start of the JSON text, which makes it possible to |
| 773 |
test said protocol with "telnet"...). |
| 774 |
|
| 775 |
Here is how you'd do it (it is trivial to write this in an event-based |
| 776 |
manner): |
| 777 |
|
| 778 |
my $json = new JSON::XS; |
| 779 |
|
| 780 |
# read some data from the socket |
| 781 |
while (sysread $socket, my $buf, 4096) { |
| 782 |
|
| 783 |
# split and decode as many requests as possible |
| 784 |
for my $request ($json->incr_parse ($buf)) { |
| 785 |
# act on the $request |
| 786 |
} |
| 787 |
} |
| 788 |
|
| 789 |
Another complicated example: Assume you have a string with JSON objects |
| 790 |
or arrays, all separated by (optional) comma characters (e.g. "[1],[2], |
| 791 |
[3]"). To parse them, we have to skip the commas between the JSON texts, |
| 792 |
and here is where the lvalue-ness of "incr_text" comes in useful: |
| 793 |
|
| 794 |
my $text = "[1],[2], [3]"; |
| 795 |
my $json = new JSON::XS; |
| 796 |
|
| 797 |
# void context, so no parsing done |
| 798 |
$json->incr_parse ($text); |
| 799 |
|
| 800 |
# now extract as many objects as possible. note the |
| 801 |
# use of scalar context so incr_text can be called. |
| 802 |
while (my $obj = $json->incr_parse) { |
| 803 |
# do something with $obj |
| 804 |
|
| 805 |
# now skip the optional comma |
| 806 |
$json->incr_text =~ s/^ \s* , //x; |
| 807 |
} |
| 808 |
|
| 809 |
Now lets go for a very complex example: Assume that you have a gigantic |
| 810 |
JSON array-of-objects, many gigabytes in size, and you want to parse it, |
| 811 |
but you cannot load it into memory fully (this has actually happened in |
| 812 |
the real world :). |
| 813 |
|
| 814 |
Well, you lost, you have to implement your own JSON parser. But JSON::XS |
| 815 |
can still help you: You implement a (very simple) array parser and let |
| 816 |
JSON decode the array elements, which are all full JSON objects on their |
| 817 |
own (this wouldn't work if the array elements could be JSON numbers, for |
| 818 |
example): |
| 819 |
|
| 820 |
my $json = new JSON::XS; |
| 821 |
|
| 822 |
# open the monster |
| 823 |
open my $fh, "<bigfile.json" |
| 824 |
or die "bigfile: $!"; |
| 825 |
|
| 826 |
# first parse the initial "[" |
| 827 |
for (;;) { |
| 828 |
sysread $fh, my $buf, 65536 |
| 829 |
or die "read error: $!"; |
| 830 |
$json->incr_parse ($buf); # void context, so no parsing |
| 831 |
|
| 832 |
# Exit the loop once we found and removed(!) the initial "[". |
| 833 |
# In essence, we are (ab-)using the $json object as a simple scalar |
| 834 |
# we append data to. |
| 835 |
last if $json->incr_text =~ s/^ \s* \[ //x; |
| 836 |
} |
| 837 |
|
| 838 |
# now we have the skipped the initial "[", so continue |
| 839 |
# parsing all the elements. |
| 840 |
for (;;) { |
| 841 |
# in this loop we read data until we got a single JSON object |
| 842 |
for (;;) { |
| 843 |
if (my $obj = $json->incr_parse) { |
| 844 |
# do something with $obj |
| 845 |
last; |
| 846 |
} |
| 847 |
|
| 848 |
# add more data |
| 849 |
sysread $fh, my $buf, 65536 |
| 850 |
or die "read error: $!"; |
| 851 |
$json->incr_parse ($buf); # void context, so no parsing |
| 852 |
} |
| 853 |
|
| 854 |
# in this loop we read data until we either found and parsed the |
| 855 |
# separating "," between elements, or the final "]" |
| 856 |
for (;;) { |
| 857 |
# first skip whitespace |
| 858 |
$json->incr_text =~ s/^\s*//; |
| 859 |
|
| 860 |
# if we find "]", we are done |
| 861 |
if ($json->incr_text =~ s/^\]//) { |
| 862 |
print "finished.\n"; |
| 863 |
exit; |
| 864 |
} |
| 865 |
|
| 866 |
# if we find ",", we can continue with the next element |
| 867 |
if ($json->incr_text =~ s/^,//) { |
| 868 |
last; |
| 869 |
} |
| 870 |
|
| 871 |
# if we find anything else, we have a parse error! |
| 872 |
if (length $json->incr_text) { |
| 873 |
die "parse error near ", $json->incr_text; |
| 874 |
} |
| 875 |
|
| 876 |
# else add more data |
| 877 |
sysread $fh, my $buf, 65536 |
| 878 |
or die "read error: $!"; |
| 879 |
$json->incr_parse ($buf); # void context, so no parsing |
| 880 |
} |
| 881 |
|
| 882 |
This is a complex example, but most of the complexity comes from the |
| 883 |
fact that we are trying to be correct (bear with me if I am wrong, I |
| 884 |
never ran the above example :). |
| 885 |
|
| 886 |
MAPPING |
| 887 |
This section describes how JSON::XS maps Perl values to JSON values and |
| 888 |
vice versa. These mappings are designed to "do the right thing" in most |
| 889 |
circumstances automatically, preserving round-tripping characteristics |
| 890 |
(what you put in comes out as something equivalent). |
| 891 |
|
| 892 |
For the more enlightened: note that in the following descriptions, |
| 893 |
lowercase *perl* refers to the Perl interpreter, while uppercase *Perl* |
| 894 |
refers to the abstract Perl language itself. |
| 895 |
|
| 896 |
JSON -> PERL |
| 897 |
object |
| 898 |
A JSON object becomes a reference to a hash in Perl. No ordering of |
| 899 |
object keys is preserved (JSON does not preserve object key ordering |
| 900 |
itself). |
| 901 |
|
| 902 |
array |
| 903 |
A JSON array becomes a reference to an array in Perl. |
| 904 |
|
| 905 |
string |
| 906 |
A JSON string becomes a string scalar in Perl - Unicode codepoints |
| 907 |
in JSON are represented by the same codepoints in the Perl string, |
| 908 |
so no manual decoding is necessary. |
| 909 |
|
| 910 |
number |
| 911 |
A JSON number becomes either an integer, numeric (floating point) or |
| 912 |
string scalar in perl, depending on its range and any fractional |
| 913 |
parts. On the Perl level, there is no difference between those as |
| 914 |
Perl handles all the conversion details, but an integer may take |
| 915 |
slightly less memory and might represent more values exactly than |
| 916 |
floating point numbers. |
| 917 |
|
| 918 |
If the number consists of digits only, JSON::XS will try to |
| 919 |
represent it as an integer value. If that fails, it will try to |
| 920 |
represent it as a numeric (floating point) value if that is possible |
| 921 |
without loss of precision. Otherwise it will preserve the number as |
| 922 |
a string value (in which case you lose roundtripping ability, as the |
| 923 |
JSON number will be re-encoded to a JSON string). |
| 924 |
|
| 925 |
Numbers containing a fractional or exponential part will always be |
| 926 |
represented as numeric (floating point) values, possibly at a loss |
| 927 |
of precision (in which case you might lose perfect roundtripping |
| 928 |
ability, but the JSON number will still be re-encoded as a JSON |
| 929 |
number). |
| 930 |
|
| 931 |
Note that precision is not accuracy - binary floating point values |
| 932 |
cannot represent most decimal fractions exactly, and when converting |
| 933 |
from and to floating point, JSON::XS only guarantees precision up to |
| 934 |
but not including the least significant bit. |
| 935 |
|
| 936 |
true, false |
| 937 |
These JSON atoms become "Types::Serialiser::true" and |
| 938 |
"Types::Serialiser::false", respectively. They are overloaded to act |
| 939 |
almost exactly like the numbers 1 and 0. You can check whether a |
| 940 |
scalar is a JSON boolean by using the "Types::Serialiser::is_bool" |
| 941 |
function (after "use Types::Serialier", of course). |
| 942 |
|
| 943 |
null |
| 944 |
A JSON null atom becomes "undef" in Perl. |
| 945 |
|
| 946 |
shell-style comments ("# *text*") |
| 947 |
As a nonstandard extension to the JSON syntax that is enabled by the |
| 948 |
"relaxed" setting, shell-style comments are allowed. They can start |
| 949 |
anywhere outside strings and go till the end of the line. |
| 950 |
|
| 951 |
tagged values ("(*tag*)*value*"). |
| 952 |
Another nonstandard extension to the JSON syntax, enabled with the |
| 953 |
"allow_tags" setting, are tagged values. In this implementation, the |
| 954 |
*tag* must be a perl package/class name encoded as a JSON string, |
| 955 |
and the *value* must be a JSON array encoding optional constructor |
| 956 |
arguments. |
| 957 |
|
| 958 |
See "OBJECT SERIALISATION", below, for details. |
| 959 |
|
| 960 |
PERL -> JSON |
| 961 |
The mapping from Perl to JSON is slightly more difficult, as Perl is a |
| 962 |
truly typeless language, so we can only guess which JSON type is meant |
| 963 |
by a Perl value. |
| 964 |
|
| 965 |
hash references |
| 966 |
Perl hash references become JSON objects. As there is no inherent |
| 967 |
ordering in hash keys (or JSON objects), they will usually be |
| 968 |
encoded in a pseudo-random order. JSON::XS can optionally sort the |
| 969 |
hash keys (determined by the *canonical* flag), so the same |
| 970 |
datastructure will serialise to the same JSON text (given same |
| 971 |
settings and version of JSON::XS), but this incurs a runtime |
| 972 |
overhead and is only rarely useful, e.g. when you want to compare |
| 973 |
some JSON text against another for equality. |
| 974 |
|
| 975 |
array references |
| 976 |
Perl array references become JSON arrays. |
| 977 |
|
| 978 |
other references |
| 979 |
Other unblessed references are generally not allowed and will cause |
| 980 |
an exception to be thrown, except for references to the integers 0 |
| 981 |
and 1, which get turned into "false" and "true" atoms in JSON. |
| 982 |
|
| 983 |
Since "JSON::XS" uses the boolean model from Types::Serialiser, you |
| 984 |
can also "use Types::Serialiser" and then use |
| 985 |
"Types::Serialiser::false" and "Types::Serialiser::true" to improve |
| 986 |
readability. |
| 987 |
|
| 988 |
use Types::Serialiser; |
| 989 |
encode_json [\0, Types::Serialiser::true] # yields [false,true] |
| 990 |
|
| 991 |
Types::Serialiser::true, Types::Serialiser::false |
| 992 |
These special values from the Types::Serialiser module become JSON |
| 993 |
true and JSON false values, respectively. You can also use "\1" and |
| 994 |
"\0" directly if you want. |
| 995 |
|
| 996 |
blessed objects |
| 997 |
Blessed objects are not directly representable in JSON, but |
| 998 |
"JSON::XS" allows various ways of handling objects. See "OBJECT |
| 999 |
SERIALISATION", below, for details. |
| 1000 |
|
| 1001 |
simple scalars |
| 1002 |
Simple Perl scalars (any scalar that is not a reference) are the |
| 1003 |
most difficult objects to encode: JSON::XS will encode undefined |
| 1004 |
scalars as JSON "null" values, scalars that have last been used in a |
| 1005 |
string context before encoding as JSON strings, and anything else as |
| 1006 |
number value: |
| 1007 |
|
| 1008 |
# dump as number |
| 1009 |
encode_json [2] # yields [2] |
| 1010 |
encode_json [-3.0e17] # yields [-3e+17] |
| 1011 |
my $value = 5; encode_json [$value] # yields [5] |
| 1012 |
|
| 1013 |
# used as string, so dump as string |
| 1014 |
print $value; |
| 1015 |
encode_json [$value] # yields ["5"] |
| 1016 |
|
| 1017 |
# undef becomes null |
| 1018 |
encode_json [undef] # yields [null] |
| 1019 |
|
| 1020 |
You can force the type to be a JSON string by stringifying it: |
| 1021 |
|
| 1022 |
my $x = 3.1; # some variable containing a number |
| 1023 |
"$x"; # stringified |
| 1024 |
$x .= ""; # another, more awkward way to stringify |
| 1025 |
print $x; # perl does it for you, too, quite often |
| 1026 |
|
| 1027 |
You can force the type to be a JSON number by numifying it: |
| 1028 |
|
| 1029 |
my $x = "3"; # some variable containing a string |
| 1030 |
$x += 0; # numify it, ensuring it will be dumped as a number |
| 1031 |
$x *= 1; # same thing, the choice is yours. |
| 1032 |
|
| 1033 |
You can not currently force the type in other, less obscure, ways. |
| 1034 |
Tell me if you need this capability (but don't forget to explain why |
| 1035 |
it's needed :). |
| 1036 |
|
| 1037 |
Note that numerical precision has the same meaning as under Perl (so |
| 1038 |
binary to decimal conversion follows the same rules as in Perl, |
| 1039 |
which can differ to other languages). Also, your perl interpreter |
| 1040 |
might expose extensions to the floating point numbers of your |
| 1041 |
platform, such as infinities or NaN's - these cannot be represented |
| 1042 |
in JSON, and it is an error to pass those in. |
| 1043 |
|
| 1044 |
OBJECT SERIALISATION |
| 1045 |
As JSON cannot directly represent Perl objects, you have to choose |
| 1046 |
between a pure JSON representation (without the ability to deserialise |
| 1047 |
the object automatically again), and a nonstandard extension to the JSON |
| 1048 |
syntax, tagged values. |
| 1049 |
|
| 1050 |
SERIALISATION |
| 1051 |
What happens when "JSON::XS" encounters a Perl object depends on the |
| 1052 |
"allow_blessed", "convert_blessed" and "allow_tags" settings, which are |
| 1053 |
used in this order: |
| 1054 |
|
| 1055 |
1. "allow_tags" is enabled and the object has a "FREEZE" method. |
| 1056 |
In this case, "JSON::XS" uses the Types::Serialiser object |
| 1057 |
serialisation protocol to create a tagged JSON value, using a |
| 1058 |
nonstandard extension to the JSON syntax. |
| 1059 |
|
| 1060 |
This works by invoking the "FREEZE" method on the object, with the |
| 1061 |
first argument being the object to serialise, and the second |
| 1062 |
argument being the constant string "JSON" to distinguish it from |
| 1063 |
other serialisers. |
| 1064 |
|
| 1065 |
The "FREEZE" method can return any number of values (i.e. zero or |
| 1066 |
more). These values and the paclkage/classname of the object will |
| 1067 |
then be encoded as a tagged JSON value in the following format: |
| 1068 |
|
| 1069 |
("classname")[FREEZE return values...] |
| 1070 |
|
| 1071 |
e.g.: |
| 1072 |
|
| 1073 |
("URI")["http://www.google.com/"] |
| 1074 |
("MyDate")[2013,10,29] |
| 1075 |
("ImageData::JPEG")["Z3...VlCg=="] |
| 1076 |
|
| 1077 |
For example, the hypothetical "My::Object" "FREEZE" method might use |
| 1078 |
the objects "type" and "id" members to encode the object: |
| 1079 |
|
| 1080 |
sub My::Object::FREEZE { |
| 1081 |
my ($self, $serialiser) = @_; |
| 1082 |
|
| 1083 |
($self->{type}, $self->{id}) |
| 1084 |
} |
| 1085 |
|
| 1086 |
2. "convert_blessed" is enabled and the object has a "TO_JSON" method. |
| 1087 |
In this case, the "TO_JSON" method of the object is invoked in |
| 1088 |
scalar context. It must return a single scalar that can be directly |
| 1089 |
encoded into JSON. This scalar replaces the object in the JSON text. |
| 1090 |
|
| 1091 |
For example, the following "TO_JSON" method will convert all URI |
| 1092 |
objects to JSON strings when serialised. The fatc that these values |
| 1093 |
originally were URI objects is lost. |
| 1094 |
|
| 1095 |
sub URI::TO_JSON { |
| 1096 |
my ($uri) = @_; |
| 1097 |
$uri->as_string |
| 1098 |
} |
| 1099 |
|
| 1100 |
3. "allow_blessed" is enabled. |
| 1101 |
The object will be serialised as a JSON null value. |
| 1102 |
|
| 1103 |
4. none of the above |
| 1104 |
If none of the settings are enabled or the respective methods are |
| 1105 |
missing, "JSON::XS" throws an exception. |
| 1106 |
|
| 1107 |
DESERIALISATION |
| 1108 |
For deserialisation there are only two cases to consider: either |
| 1109 |
nonstandard tagging was used, in which case "allow_tags" decides, or |
| 1110 |
objects cannot be automatically be deserialised, in which case you can |
| 1111 |
use postprocessing or the "filter_json_object" or |
| 1112 |
"filter_json_single_key_object" callbacks to get some real objects our |
| 1113 |
of your JSON. |
| 1114 |
|
| 1115 |
This section only considers the tagged value case: I a tagged JSON |
| 1116 |
object is encountered during decoding and "allow_tags" is disabled, a |
| 1117 |
parse error will result (as if tagged values were not part of the |
| 1118 |
grammar). |
| 1119 |
|
| 1120 |
If "allow_tags" is enabled, "JSON::XS" will look up the "THAW" method of |
| 1121 |
the package/classname used during serialisation (it will not attempt to |
| 1122 |
load the package as a Perl module). If there is no such method, the |
| 1123 |
decoding will fail with an error. |
| 1124 |
|
| 1125 |
Otherwise, the "THAW" method is invoked with the classname as first |
| 1126 |
argument, the constant string "JSON" as second argument, and all the |
| 1127 |
values from the JSON array (the values originally returned by the |
| 1128 |
"FREEZE" method) as remaining arguments. |
| 1129 |
|
| 1130 |
The method must then return the object. While technically you can return |
| 1131 |
any Perl scalar, you might have to enable the "enable_nonref" setting to |
| 1132 |
make that work in all cases, so better return an actual blessed |
| 1133 |
reference. |
| 1134 |
|
| 1135 |
As an example, let's implement a "THAW" function that regenerates the |
| 1136 |
"My::Object" from the "FREEZE" example earlier: |
| 1137 |
|
| 1138 |
sub My::Object::THAW { |
| 1139 |
my ($class, $serialiser, $type, $id) = @_; |
| 1140 |
|
| 1141 |
$class->new (type => $type, id => $id) |
| 1142 |
} |
| 1143 |
|
| 1144 |
ENCODING/CODESET FLAG NOTES |
| 1145 |
The interested reader might have seen a number of flags that signify |
| 1146 |
encodings or codesets - "utf8", "latin1" and "ascii". There seems to be |
| 1147 |
some confusion on what these do, so here is a short comparison: |
| 1148 |
|
| 1149 |
"utf8" controls whether the JSON text created by "encode" (and expected |
| 1150 |
by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only |
| 1151 |
control whether "encode" escapes character values outside their |
| 1152 |
respective codeset range. Neither of these flags conflict with each |
| 1153 |
other, although some combinations make less sense than others. |
| 1154 |
|
| 1155 |
Care has been taken to make all flags symmetrical with respect to |
| 1156 |
"encode" and "decode", that is, texts encoded with any combination of |
| 1157 |
these flag values will be correctly decoded when the same flags are used |
| 1158 |
- in general, if you use different flag settings while encoding vs. when |
| 1159 |
decoding you likely have a bug somewhere. |
| 1160 |
|
| 1161 |
Below comes a verbose discussion of these flags. Note that a "codeset" |
| 1162 |
is simply an abstract set of character-codepoint pairs, while an |
| 1163 |
encoding takes those codepoint numbers and *encodes* them, in our case |
| 1164 |
into octets. Unicode is (among other things) a codeset, UTF-8 is an |
| 1165 |
encoding, and ISO-8859-1 (= latin 1) and ASCII are both codesets *and* |
| 1166 |
encodings at the same time, which can be confusing. |
| 1167 |
|
| 1168 |
"utf8" flag disabled |
| 1169 |
When "utf8" is disabled (the default), then "encode"/"decode" |
| 1170 |
generate and expect Unicode strings, that is, characters with high |
| 1171 |
ordinal Unicode values (> 255) will be encoded as such characters, |
| 1172 |
and likewise such characters are decoded as-is, no changes to them |
| 1173 |
will be done, except "(re-)interpreting" them as Unicode codepoints |
| 1174 |
or Unicode characters, respectively (to Perl, these are the same |
| 1175 |
thing in strings unless you do funny/weird/dumb stuff). |
| 1176 |
|
| 1177 |
This is useful when you want to do the encoding yourself (e.g. when |
| 1178 |
you want to have UTF-16 encoded JSON texts) or when some other layer |
| 1179 |
does the encoding for you (for example, when printing to a terminal |
| 1180 |
using a filehandle that transparently encodes to UTF-8 you certainly |
| 1181 |
do NOT want to UTF-8 encode your data first and have Perl encode it |
| 1182 |
another time). |
| 1183 |
|
| 1184 |
"utf8" flag enabled |
| 1185 |
If the "utf8"-flag is enabled, "encode"/"decode" will encode all |
| 1186 |
characters using the corresponding UTF-8 multi-byte sequence, and |
| 1187 |
will expect your input strings to be encoded as UTF-8, that is, no |
| 1188 |
"character" of the input string must have any value > 255, as UTF-8 |
| 1189 |
does not allow that. |
| 1190 |
|
| 1191 |
The "utf8" flag therefore switches between two modes: disabled means |
| 1192 |
you will get a Unicode string in Perl, enabled means you get a UTF-8 |
| 1193 |
encoded octet/binary string in Perl. |
| 1194 |
|
| 1195 |
"latin1" or "ascii" flags enabled |
| 1196 |
With "latin1" (or "ascii") enabled, "encode" will escape characters |
| 1197 |
with ordinal values > 255 (> 127 with "ascii") and encode the |
| 1198 |
remaining characters as specified by the "utf8" flag. |
| 1199 |
|
| 1200 |
If "utf8" is disabled, then the result is also correctly encoded in |
| 1201 |
those character sets (as both are proper subsets of Unicode, meaning |
| 1202 |
that a Unicode string with all character values < 256 is the same |
| 1203 |
thing as a ISO-8859-1 string, and a Unicode string with all |
| 1204 |
character values < 128 is the same thing as an ASCII string in |
| 1205 |
Perl). |
| 1206 |
|
| 1207 |
If "utf8" is enabled, you still get a correct UTF-8-encoded string, |
| 1208 |
regardless of these flags, just some more characters will be escaped |
| 1209 |
using "\uXXXX" then before. |
| 1210 |
|
| 1211 |
Note that ISO-8859-1-*encoded* strings are not compatible with UTF-8 |
| 1212 |
encoding, while ASCII-encoded strings are. That is because the |
| 1213 |
ISO-8859-1 encoding is NOT a subset of UTF-8 (despite the ISO-8859-1 |
| 1214 |
*codeset* being a subset of Unicode), while ASCII is. |
| 1215 |
|
| 1216 |
Surprisingly, "decode" will ignore these flags and so treat all |
| 1217 |
input values as governed by the "utf8" flag. If it is disabled, this |
| 1218 |
allows you to decode ISO-8859-1- and ASCII-encoded strings, as both |
| 1219 |
strict subsets of Unicode. If it is enabled, you can correctly |
| 1220 |
decode UTF-8 encoded strings. |
| 1221 |
|
| 1222 |
So neither "latin1" nor "ascii" are incompatible with the "utf8" |
| 1223 |
flag - they only govern when the JSON output engine escapes a |
| 1224 |
character or not. |
| 1225 |
|
| 1226 |
The main use for "latin1" is to relatively efficiently store binary |
| 1227 |
data as JSON, at the expense of breaking compatibility with most |
| 1228 |
JSON decoders. |
| 1229 |
|
| 1230 |
The main use for "ascii" is to force the output to not contain |
| 1231 |
characters with values > 127, which means you can interpret the |
| 1232 |
resulting string as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about |
| 1233 |
any character set and 8-bit-encoding, and still get the same data |
| 1234 |
structure back. This is useful when your channel for JSON transfer |
| 1235 |
is not 8-bit clean or the encoding might be mangled in between (e.g. |
| 1236 |
in mail), and works because ASCII is a proper subset of most 8-bit |
| 1237 |
and multibyte encodings in use in the world. |
| 1238 |
|
| 1239 |
JSON and ECMAscript |
| 1240 |
JSON syntax is based on how literals are represented in javascript (the |
| 1241 |
not-standardised predecessor of ECMAscript) which is presumably why it |
| 1242 |
is called "JavaScript Object Notation". |
| 1243 |
|
| 1244 |
However, JSON is not a subset (and also not a superset of course) of |
| 1245 |
ECMAscript (the standard) or javascript (whatever browsers actually |
| 1246 |
implement). |
| 1247 |
|
| 1248 |
If you want to use javascript's "eval" function to "parse" JSON, you |
| 1249 |
might run into parse errors for valid JSON texts, or the resulting data |
| 1250 |
structure might not be queryable: |
| 1251 |
|
| 1252 |
One of the problems is that U+2028 and U+2029 are valid characters |
| 1253 |
inside JSON strings, but are not allowed in ECMAscript string literals, |
| 1254 |
so the following Perl fragment will not output something that can be |
| 1255 |
guaranteed to be parsable by javascript's "eval": |
| 1256 |
|
| 1257 |
use JSON::XS; |
| 1258 |
|
| 1259 |
print encode_json [chr 0x2028]; |
| 1260 |
|
| 1261 |
The right fix for this is to use a proper JSON parser in your javascript |
| 1262 |
programs, and not rely on "eval" (see for example Douglas Crockford's |
| 1263 |
json2.js parser). |
| 1264 |
|
| 1265 |
If this is not an option, you can, as a stop-gap measure, simply encode |
| 1266 |
to ASCII-only JSON: |
| 1267 |
|
| 1268 |
use JSON::XS; |
| 1269 |
|
| 1270 |
print JSON::XS->new->ascii->encode ([chr 0x2028]); |
| 1271 |
|
| 1272 |
Note that this will enlarge the resulting JSON text quite a bit if you |
| 1273 |
have many non-ASCII characters. You might be tempted to run some regexes |
| 1274 |
to only escape U+2028 and U+2029, e.g.: |
| 1275 |
|
| 1276 |
# DO NOT USE THIS! |
| 1277 |
my $json = JSON::XS->new->utf8->encode ([chr 0x2028]); |
| 1278 |
$json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028 |
| 1279 |
$json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029 |
| 1280 |
print $json; |
| 1281 |
|
| 1282 |
Note that *this is a bad idea*: the above only works for U+2028 and |
| 1283 |
U+2029 and thus only for fully ECMAscript-compliant parsers. Many |
| 1284 |
existing javascript implementations, however, have issues with other |
| 1285 |
characters as well - using "eval" naively simply *will* cause problems. |
| 1286 |
|
| 1287 |
Another problem is that some javascript implementations reserve some |
| 1288 |
property names for their own purposes (which probably makes them |
| 1289 |
non-ECMAscript-compliant). For example, Iceweasel reserves the |
| 1290 |
"__proto__" property name for its own purposes. |
| 1291 |
|
| 1292 |
If that is a problem, you could parse try to filter the resulting JSON |
| 1293 |
output for these property strings, e.g.: |
| 1294 |
|
| 1295 |
$json =~ s/"__proto__"\s*:/"__proto__renamed":/g; |
| 1296 |
|
| 1297 |
This works because "__proto__" is not valid outside of strings, so every |
| 1298 |
occurrence of ""__proto__"\s*:" must be a string used as property name. |
| 1299 |
|
| 1300 |
If you know of other incompatibilities, please let me know. |
| 1301 |
|
| 1302 |
JSON and YAML |
| 1303 |
You often hear that JSON is a subset of YAML. This is, however, a mass |
| 1304 |
hysteria(*) and very far from the truth (as of the time of this |
| 1305 |
writing), so let me state it clearly: *in general, there is no way to |
| 1306 |
configure JSON::XS to output a data structure as valid YAML* that works |
| 1307 |
in all cases. |
| 1308 |
|
| 1309 |
If you really must use JSON::XS to generate YAML, you should use this |
| 1310 |
algorithm (subject to change in future versions): |
| 1311 |
|
| 1312 |
my $to_yaml = JSON::XS->new->utf8->space_after (1); |
| 1313 |
my $yaml = $to_yaml->encode ($ref) . "\n"; |
| 1314 |
|
| 1315 |
This will *usually* generate JSON texts that also parse as valid YAML. |
| 1316 |
Please note that YAML has hardcoded limits on (simple) object key |
| 1317 |
lengths that JSON doesn't have and also has different and incompatible |
| 1318 |
unicode character escape syntax, so you should make sure that your hash |
| 1319 |
keys are noticeably shorter than the 1024 "stream characters" YAML |
| 1320 |
allows and that you do not have characters with codepoint values outside |
| 1321 |
the Unicode BMP (basic multilingual page). YAML also does not allow "\/" |
| 1322 |
sequences in strings (which JSON::XS does not *currently* generate, but |
| 1323 |
other JSON generators might). |
| 1324 |
|
| 1325 |
There might be other incompatibilities that I am not aware of (or the |
| 1326 |
YAML specification has been changed yet again - it does so quite often). |
| 1327 |
In general you should not try to generate YAML with a JSON generator or |
| 1328 |
vice versa, or try to parse JSON with a YAML parser or vice versa: |
| 1329 |
chances are high that you will run into severe interoperability problems |
| 1330 |
when you least expect it. |
| 1331 |
|
| 1332 |
(*) I have been pressured multiple times by Brian Ingerson (one of the |
| 1333 |
authors of the YAML specification) to remove this paragraph, despite |
| 1334 |
him acknowledging that the actual incompatibilities exist. As I was |
| 1335 |
personally bitten by this "JSON is YAML" lie, I refused and said I |
| 1336 |
will continue to educate people about these issues, so others do not |
| 1337 |
run into the same problem again and again. After this, Brian called |
| 1338 |
me a (quote)*complete and worthless idiot*(unquote). |
| 1339 |
|
| 1340 |
In my opinion, instead of pressuring and insulting people who |
| 1341 |
actually clarify issues with YAML and the wrong statements of some |
| 1342 |
of its proponents, I would kindly suggest reading the JSON spec |
| 1343 |
(which is not that difficult or long) and finally make YAML |
| 1344 |
compatible to it, and educating users about the changes, instead of |
| 1345 |
spreading lies about the real compatibility for many *years* and |
| 1346 |
trying to silence people who point out that it isn't true. |
| 1347 |
|
| 1348 |
Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, |
| 1349 |
even though the incompatibilities have been documented (and are |
| 1350 |
known to Brian) for many years and the spec makes explicit claims |
| 1351 |
that YAML is a superset of JSON. It would be so easy to fix, but |
| 1352 |
apparently, bullying people and corrupting userdata is so much |
| 1353 |
easier. |
| 1354 |
|
| 1355 |
SPEED |
| 1356 |
It seems that JSON::XS is surprisingly fast, as shown in the following |
| 1357 |
tables. They have been generated with the help of the "eg/bench" program |
| 1358 |
in the JSON::XS distribution, to make it easy to compare on your own |
| 1359 |
system. |
| 1360 |
|
| 1361 |
First comes a comparison between various modules using a very short |
| 1362 |
single-line JSON string (also available at |
| 1363 |
<http://dist.schmorp.de/misc/json/short.json>). |
| 1364 |
|
| 1365 |
{"method": "handleMessage", "params": ["user1", |
| 1366 |
"we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7, |
| 1367 |
1, 0]} |
| 1368 |
|
| 1369 |
It shows the number of encodes/decodes per second (JSON::XS uses the |
| 1370 |
functional interface, while JSON::XS/2 uses the OO interface with |
| 1371 |
pretty-printing and hashkey sorting enabled, JSON::XS/3 enables shrink. |
| 1372 |
JSON::DWIW/DS uses the deserialise function, while JSON::DWIW::FJ uses |
| 1373 |
the from_json method). Higher is better: |
| 1374 |
|
| 1375 |
module | encode | decode | |
| 1376 |
--------------|------------|------------| |
| 1377 |
JSON::DWIW/DS | 86302.551 | 102300.098 | |
| 1378 |
JSON::DWIW/FJ | 86302.551 | 75983.768 | |
| 1379 |
JSON::PP | 15827.562 | 6638.658 | |
| 1380 |
JSON::Syck | 63358.066 | 47662.545 | |
| 1381 |
JSON::XS | 511500.488 | 511500.488 | |
| 1382 |
JSON::XS/2 | 291271.111 | 388361.481 | |
| 1383 |
JSON::XS/3 | 361577.931 | 361577.931 | |
| 1384 |
Storable | 66788.280 | 265462.278 | |
| 1385 |
--------------+------------+------------+ |
| 1386 |
|
| 1387 |
That is, JSON::XS is almost six times faster than JSON::DWIW on |
| 1388 |
encoding, about five times faster on decoding, and over thirty to |
| 1389 |
seventy times faster than JSON's pure perl implementation. It also |
| 1390 |
compares favourably to Storable for small amounts of data. |
| 1391 |
|
| 1392 |
Using a longer test string (roughly 18KB, generated from Yahoo! Locals |
| 1393 |
search API (<http://dist.schmorp.de/misc/json/long.json>). |
| 1394 |
|
| 1395 |
module | encode | decode | |
| 1396 |
--------------|------------|------------| |
| 1397 |
JSON::DWIW/DS | 1647.927 | 2673.916 | |
| 1398 |
JSON::DWIW/FJ | 1630.249 | 2596.128 | |
| 1399 |
JSON::PP | 400.640 | 62.311 | |
| 1400 |
JSON::Syck | 1481.040 | 1524.869 | |
| 1401 |
JSON::XS | 20661.596 | 9541.183 | |
| 1402 |
JSON::XS/2 | 10683.403 | 9416.938 | |
| 1403 |
JSON::XS/3 | 20661.596 | 9400.054 | |
| 1404 |
Storable | 19765.806 | 10000.725 | |
| 1405 |
--------------+------------+------------+ |
| 1406 |
|
| 1407 |
Again, JSON::XS leads by far (except for Storable which non-surprisingly |
| 1408 |
decodes a bit faster). |
| 1409 |
|
| 1410 |
On large strings containing lots of high Unicode characters, some |
| 1411 |
modules (such as JSON::PC) seem to decode faster than JSON::XS, but the |
| 1412 |
result will be broken due to missing (or wrong) Unicode handling. Others |
| 1413 |
refuse to decode or encode properly, so it was impossible to prepare a |
| 1414 |
fair comparison table for that case. |
| 1415 |
|
| 1416 |
SECURITY CONSIDERATIONS |
| 1417 |
When you are using JSON in a protocol, talking to untrusted potentially |
| 1418 |
hostile creatures requires relatively few measures. |
| 1419 |
|
| 1420 |
First of all, your JSON decoder should be secure, that is, should not |
| 1421 |
have any buffer overflows. Obviously, this module should ensure that and |
| 1422 |
I am trying hard on making that true, but you never know. |
| 1423 |
|
| 1424 |
Second, you need to avoid resource-starving attacks. That means you |
| 1425 |
should limit the size of JSON texts you accept, or make sure then when |
| 1426 |
your resources run out, that's just fine (e.g. by using a separate |
| 1427 |
process that can crash safely). The size of a JSON text in octets or |
| 1428 |
characters is usually a good indication of the size of the resources |
| 1429 |
required to decode it into a Perl structure. While JSON::XS can check |
| 1430 |
the size of the JSON text, it might be too late when you already have it |
| 1431 |
in memory, so you might want to check the size before you accept the |
| 1432 |
string. |
| 1433 |
|
| 1434 |
Third, JSON::XS recurses using the C stack when decoding objects and |
| 1435 |
arrays. The C stack is a limited resource: for instance, on my amd64 |
| 1436 |
machine with 8MB of stack size I can decode around 180k nested arrays |
| 1437 |
but only 14k nested JSON objects (due to perl itself recursing deeply on |
| 1438 |
croak to free the temporary). If that is exceeded, the program crashes. |
| 1439 |
To be conservative, the default nesting limit is set to 512. If your |
| 1440 |
process has a smaller stack, you should adjust this setting accordingly |
| 1441 |
with the "max_depth" method. |
| 1442 |
|
| 1443 |
Something else could bomb you, too, that I forgot to think of. In that |
| 1444 |
case, you get to keep the pieces. I am always open for hints, though... |
| 1445 |
|
| 1446 |
Also keep in mind that JSON::XS might leak contents of your Perl data |
| 1447 |
structures in its error messages, so when you serialise sensitive |
| 1448 |
information you might want to make sure that exceptions thrown by |
| 1449 |
JSON::XS will not end up in front of untrusted eyes. |
| 1450 |
|
| 1451 |
If you are using JSON::XS to return packets to consumption by JavaScript |
| 1452 |
scripts in a browser you should have a look at |
| 1453 |
<http://blog.archive.jpsykes.com/47/practical-csrf-and-json-security/> |
| 1454 |
to see whether you are vulnerable to some common attack vectors (which |
| 1455 |
really are browser design bugs, but it is still you who will have to |
| 1456 |
deal with it, as major browser developers care only for features, not |
| 1457 |
about getting security right). |
| 1458 |
|
| 1459 |
"OLD" VS. "NEW" JSON (RFC4627 VS. RFC7159) |
| 1460 |
JSON originally required JSON texts to represent an array or object - |
| 1461 |
scalar values were explicitly not allowed. This has changed, and |
| 1462 |
versions of JSON::XS beginning with 4.0 reflect this by allowing scalar |
| 1463 |
values by default. |
| 1464 |
|
| 1465 |
One reason why one might not want this is that this removes a |
| 1466 |
fundamental property of JSON texts, namely that they are self-delimited |
| 1467 |
and self-contained, or in other words, you could take any number of |
| 1468 |
"old" JSON texts and paste them together, and the result would be |
| 1469 |
unambiguously parseable: |
| 1470 |
|
| 1471 |
[1,3]{"k":5}[][null] # four JSON texts, without doubt |
| 1472 |
|
| 1473 |
By allowing scalars, this property is lost: in the following example, is |
| 1474 |
this one JSON text (the number 12) or two JSON texts (the numbers 1 and |
| 1475 |
2): |
| 1476 |
|
| 1477 |
12 # could be 12, or 1 and 2 |
| 1478 |
|
| 1479 |
Another lost property of "old" JSON is that no lookahead is required to |
| 1480 |
know the end of a JSON text, i.e. the JSON text definitely ended at the |
| 1481 |
last "]" or "}" character, there was no need to read extra characters. |
| 1482 |
|
| 1483 |
For example, a viable network protocol with "old" JSON was to simply |
| 1484 |
exchange JSON texts without delimiter. For "new" JSON, you have to use a |
| 1485 |
suitable delimiter (such as a newline) after every JSON text or ensure |
| 1486 |
you never encode/decode scalar values. |
| 1487 |
|
| 1488 |
Most protocols do work by only transferring arrays or objects, and the |
| 1489 |
easiest way to avoid problems with the "new" JSON definition is to |
| 1490 |
explicitly disallow scalar values in your encoder and decoder: |
| 1491 |
|
| 1492 |
$json_coder = JSON::XS->new->allow_nonref (0) |
| 1493 |
|
| 1494 |
This is a somewhat unhappy situation, and the blame can fully be put on |
| 1495 |
JSON's inmventor, Douglas Crockford, who unilaterally changed the format |
| 1496 |
in 2006 without consulting the IETF, forcing the IETF to either fork the |
| 1497 |
format or go with it (as I was told, the IETF wasn't amused). |
| 1498 |
|
| 1499 |
RELATIONSHIP WITH I-JSON |
| 1500 |
JSON is a somewhat sloppily-defined format - it carries around obvious |
| 1501 |
Javascript baggage, such as not really defining number range, probably |
| 1502 |
because Javascript only has one type of numbers: IEEE 64 bit floats |
| 1503 |
("binary64"). |
| 1504 |
|
| 1505 |
For this reaosn, RFC7493 defines "Internet JSON", which is a restricted |
| 1506 |
subset of JSON that is supposedly more interoperable on the internet. |
| 1507 |
|
| 1508 |
While "JSON::XS" does not offer specific support for I-JSON, it of |
| 1509 |
course accepts valid I-JSON and by default implements some of the |
| 1510 |
limitations of I-JSON, such as parsing numbers as perl numbers, which |
| 1511 |
are usually a superset of binary64 numbers. |
| 1512 |
|
| 1513 |
To generate I-JSON, follow these rules: |
| 1514 |
|
| 1515 |
* always generate UTF-8 |
| 1516 |
|
| 1517 |
I-JSON must be encoded in UTF-8, the default for "encode_json". |
| 1518 |
|
| 1519 |
* numbers should be within IEEE 754 binary64 range |
| 1520 |
|
| 1521 |
Basically all existing perl installations use binary64 to represent |
| 1522 |
floating point numbers, so all you need to do is to avoid large |
| 1523 |
integers. |
| 1524 |
|
| 1525 |
* objects must not have duplicate keys |
| 1526 |
|
| 1527 |
This is trivially done, as "JSON::XS" does not allow duplicate keys. |
| 1528 |
|
| 1529 |
* do not generate scalar JSON texts, use "->allow_nonref (0)" |
| 1530 |
|
| 1531 |
I-JSON strongly requests you to only encode arrays and objects into |
| 1532 |
JSON. |
| 1533 |
|
| 1534 |
* times should be strings in ISO 8601 format |
| 1535 |
|
| 1536 |
There are a myriad of modules on CPAN dealing with ISO 8601 - search |
| 1537 |
for "ISO8601" on CPAN and use one. |
| 1538 |
|
| 1539 |
* encode binary data as base64 |
| 1540 |
|
| 1541 |
While it's tempting to just dump binary data as a string (and let |
| 1542 |
"JSON::XS" do the escaping), for I-JSON, it's *recommended* to |
| 1543 |
encode binary data as base64. |
| 1544 |
|
| 1545 |
There are some other considerations - read RFC7493 for the details if |
| 1546 |
interested. |
| 1547 |
|
| 1548 |
INTEROPERABILITY WITH OTHER MODULES |
| 1549 |
"JSON::XS" uses the Types::Serialiser module to provide boolean |
| 1550 |
constants. That means that the JSON true and false values will be |
| 1551 |
comaptible to true and false values of other modules that do the same, |
| 1552 |
such as JSON::PP and CBOR::XS. |
| 1553 |
|
| 1554 |
INTEROPERABILITY WITH OTHER JSON DECODERS |
| 1555 |
As long as you only serialise data that can be directly expressed in |
| 1556 |
JSON, "JSON::XS" is incapable of generating invalid JSON output (modulo |
| 1557 |
bugs, but "JSON::XS" has found more bugs in the official JSON testsuite |
| 1558 |
(1) than the official JSON testsuite has found in "JSON::XS" (0)). |
| 1559 |
|
| 1560 |
When you have trouble decoding JSON generated by this module using other |
| 1561 |
decoders, then it is very likely that you have an encoding mismatch or |
| 1562 |
the other decoder is broken. |
| 1563 |
|
| 1564 |
When decoding, "JSON::XS" is strict by default and will likely catch all |
| 1565 |
errors. There are currently two settings that change this: "relaxed" |
| 1566 |
makes "JSON::XS" accept (but not generate) some non-standard extensions, |
| 1567 |
and "allow_tags" will allow you to encode and decode Perl objects, at |
| 1568 |
the cost of not outputting valid JSON anymore. |
| 1569 |
|
| 1570 |
TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS |
| 1571 |
When you use "allow_tags" to use the extended (and also nonstandard and |
| 1572 |
invalid) JSON syntax for serialised objects, and you still want to |
| 1573 |
decode the generated When you want to serialise objects, you can run a |
| 1574 |
regex to replace the tagged syntax by standard JSON arrays (it only |
| 1575 |
works for "normal" package names without comma, newlines or single |
| 1576 |
colons). First, the readable Perl version: |
| 1577 |
|
| 1578 |
# if your FREEZE methods return no values, you need this replace first: |
| 1579 |
$json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx; |
| 1580 |
|
| 1581 |
# this works for non-empty constructor arg lists: |
| 1582 |
$json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx; |
| 1583 |
|
| 1584 |
And here is a less readable version that is easy to adapt to other |
| 1585 |
languages: |
| 1586 |
|
| 1587 |
$json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g; |
| 1588 |
|
| 1589 |
Here is an ECMAScript version (same regex): |
| 1590 |
|
| 1591 |
json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,"); |
| 1592 |
|
| 1593 |
Since this syntax converts to standard JSON arrays, it might be hard to |
| 1594 |
distinguish serialised objects from normal arrays. You can prepend a |
| 1595 |
"magic number" as first array element to reduce chances of a collision: |
| 1596 |
|
| 1597 |
$json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g; |
| 1598 |
|
| 1599 |
And after decoding the JSON text, you could walk the data structure |
| 1600 |
looking for arrays with a first element of |
| 1601 |
"XU1peReLzT4ggEllLanBYq4G9VzliwKF". |
| 1602 |
|
| 1603 |
The same approach can be used to create the tagged format with another |
| 1604 |
encoder. First, you create an array with the magic string as first |
| 1605 |
member, the classname as second, and constructor arguments last, encode |
| 1606 |
it as part of your JSON structure, and then: |
| 1607 |
|
| 1608 |
$json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; |
| 1609 |
|
| 1610 |
Again, this has some limitations - the magic string must not be encoded |
| 1611 |
with character escapes, and the constructor arguments must be non-empty. |
| 1612 |
|
| 1613 |
(I-)THREADS |
| 1614 |
This module is *not* guaranteed to be ithread (or MULTIPLICITY-) safe |
| 1615 |
and there are no plans to change this. Note that perl's builtin |
| 1616 |
so-called threads/ithreads are officially deprecated and should not be |
| 1617 |
used. |
| 1618 |
|
| 1619 |
THE PERILS OF SETLOCALE |
| 1620 |
Sometimes people avoid the Perl locale support and directly call the |
| 1621 |
system's setlocale function with "LC_ALL". |
| 1622 |
|
| 1623 |
This breaks both perl and modules such as JSON::XS, as stringification |
| 1624 |
of numbers no longer works correctly (e.g. "$x = 0.1; print "$x"+1" |
| 1625 |
might print 1, and JSON::XS might output illegal JSON as JSON::XS relies |
| 1626 |
on perl to stringify numbers). |
| 1627 |
|
| 1628 |
The solution is simple: don't call "setlocale", or use it for only those |
| 1629 |
categories you need, such as "LC_MESSAGES" or "LC_CTYPE". |
| 1630 |
|
| 1631 |
If you need "LC_NUMERIC", you should enable it only around the code that |
| 1632 |
actually needs it (avoiding stringification of numbers), and restore it |
| 1633 |
afterwards. |
| 1634 |
|
| 1635 |
SOME HISTORY |
| 1636 |
At the time this module was created there already were a number of JSON |
| 1637 |
modules available on CPAN, so what was the reason to write yet another |
| 1638 |
JSON module? While it seems there are many JSON modules, none of them |
| 1639 |
correctly handled all corner cases, and in most cases their maintainers |
| 1640 |
are unresponsive, gone missing, or not listening to bug reports for |
| 1641 |
other reasons. |
| 1642 |
|
| 1643 |
Beginning with version 2.0 of the JSON module, when both JSON and |
| 1644 |
JSON::XS are installed, then JSON will fall back on JSON::XS (this can |
| 1645 |
be overridden) with no overhead due to emulation (by inheriting |
| 1646 |
constructor and methods). If JSON::XS is not available, it will fall |
| 1647 |
back to the compatible JSON::PP module as backend, so using JSON instead |
| 1648 |
of JSON::XS gives you a portable JSON API that can be fast when you need |
| 1649 |
it and doesn't require a C compiler when that is a problem. |
| 1650 |
|
| 1651 |
Somewhere around version 3, this module was forked into |
| 1652 |
"Cpanel::JSON::XS", because its maintainer had serious trouble |
| 1653 |
understanding JSON and insisted on a fork with many bugs "fixed" that |
| 1654 |
weren't actually bugs, while spreading FUD about this module without |
| 1655 |
actually giving any details on his accusations. You be the judge, but in |
| 1656 |
my personal opinion, if you want quality, you will stay away from |
| 1657 |
dangerous forks like that. |
| 1658 |
|
| 1659 |
BUGS |
| 1660 |
While the goal of this module is to be correct, that unfortunately does |
| 1661 |
not mean it's bug-free, only that I think its design is bug-free. If you |
| 1662 |
keep reporting bugs they will be fixed swiftly, though. |
| 1663 |
|
| 1664 |
Please refrain from using rt.cpan.org or any other bug reporting |
| 1665 |
service. I put the contact address into my modules for a reason. |
| 1666 |
|
| 1667 |
SEE ALSO |
| 1668 |
The json_xs command line utility for quick experiments. |
| 1669 |
|
| 1670 |
AUTHOR |
| 1671 |
Marc Lehmann <schmorp@schmorp.de> |
| 1672 |
http://home.schmorp.de/ |
| 1673 |
|