ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/README
(Generate patch)

Comparing JSON-XS/README (file contents):
Revision 1.42 by root, Thu Aug 17 03:47:54 2017 UTC vs.
Revision 1.43 by root, Thu Nov 15 23:07:55 2018 UTC

29 29
30DESCRIPTION 30DESCRIPTION
31 This module converts Perl data structures to JSON and vice versa. Its 31 This module converts Perl data structures to JSON and vice versa. Its
32 primary goal is to be *correct* and its secondary goal is to be *fast*. 32 primary goal is to be *correct* and its secondary goal is to be *fast*.
33 To reach the latter goal it was written in C. 33 To reach the latter goal it was written in C.
34
35 Beginning with version 2.0 of the JSON module, when both JSON and
36 JSON::XS are installed, then JSON will fall back on JSON::XS (this can
37 be overridden) with no overhead due to emulation (by inheriting
38 constructor and methods). If JSON::XS is not available, it will fall
39 back to the compatible JSON::PP module as backend, so using JSON instead
40 of JSON::XS gives you a portable JSON API that can be fast when you need
41 it and doesn't require a C compiler when that is a problem.
42
43 As this is the n-th-something JSON module on CPAN, what was the reason
44 to write yet another JSON module? While it seems there are many JSON
45 modules, none of them correctly handle all corner cases, and in most
46 cases their maintainers are unresponsive, gone missing, or not listening
47 to bug reports for other reasons.
48 34
49 See MAPPING, below, on how JSON::XS maps perl values to JSON values and 35 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
50 vice versa. 36 vice versa.
51 37
52 FEATURES 38 FEATURES
103 $json_text = JSON::XS->new->utf8->encode ($perl_scalar) 89 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
104 90
105 Except being faster. 91 Except being faster.
106 92
107 $perl_scalar = decode_json $json_text 93 $perl_scalar = decode_json $json_text
108 The opposite of "encode_json": expects an UTF-8 (binary) string and 94 The opposite of "encode_json": expects a UTF-8 (binary) string and
109 tries to parse that as an UTF-8 encoded JSON text, returning the 95 tries to parse that as a UTF-8 encoded JSON text, returning the
110 resulting reference. Croaks on error. 96 resulting reference. Croaks on error.
111 97
112 This function call is functionally identical to: 98 This function call is functionally identical to:
113 99
114 $perl_scalar = JSON::XS->new->utf8->decode ($json_text) 100 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
158 The object oriented interface lets you configure your own encoding or 144 The object oriented interface lets you configure your own encoding or
159 decoding style, within the limits of supported formats. 145 decoding style, within the limits of supported formats.
160 146
161 $json = new JSON::XS 147 $json = new JSON::XS
162 Creates a new JSON::XS object that can be used to de/encode JSON 148 Creates a new JSON::XS object that can be used to de/encode JSON
163 strings. All boolean flags described below are by default 149 strings. All boolean flags described below are by default *disabled*
164 *disabled*. 150 (with the exception of "allow_nonref", which defaults to *enabled*
151 since version 4.0).
165 152
166 The mutators for flags all return the JSON object again and thus 153 The mutators for flags all return the JSON object again and thus
167 calls can be chained: 154 calls can be chained:
168 155
169 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]}) 156 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
225 212
226 $json = $json->utf8 ([$enable]) 213 $json = $json->utf8 ([$enable])
227 $enabled = $json->get_utf8 214 $enabled = $json->get_utf8
228 If $enable is true (or missing), then the "encode" method will 215 If $enable is true (or missing), then the "encode" method will
229 encode the JSON result into UTF-8, as required by many protocols, 216 encode the JSON result into UTF-8, as required by many protocols,
230 while the "decode" method expects to be handled an UTF-8-encoded 217 while the "decode" method expects to be handed a UTF-8-encoded
231 string. Please note that UTF-8-encoded strings do not contain any 218 string. Please note that UTF-8-encoded strings do not contain any
232 characters outside the range 0..255, they are thus useful for 219 characters outside the range 0..255, they are thus useful for
233 bytewise/binary I/O. In future versions, enabling this option might 220 bytewise/binary I/O. In future versions, enabling this option might
234 enable autodetection of the UTF-16 and UTF-32 encoding families, as 221 enable autodetection of the UTF-16 and UTF-32 encoding families, as
235 described in RFC4627. 222 described in RFC4627.
314 301
315 $json = $json->relaxed ([$enable]) 302 $json = $json->relaxed ([$enable])
316 $enabled = $json->get_relaxed 303 $enabled = $json->get_relaxed
317 If $enable is true (or missing), then "decode" will accept some 304 If $enable is true (or missing), then "decode" will accept some
318 extensions to normal JSON syntax (see below). "encode" will not be 305 extensions to normal JSON syntax (see below). "encode" will not be
319 affected in anyway. *Be aware that this option makes you accept 306 affected in any way. *Be aware that this option makes you accept
320 invalid JSON texts as if they were valid!*. I suggest only to use 307 invalid JSON texts as if they were valid!*. I suggest only to use
321 this option to parse application-specific files written by humans 308 this option to parse application-specific files written by humans
322 (configuration files, resource files etc.) 309 (configuration files, resource files etc.)
323 310
324 If $enable is false (the default), then "decode" will only accept 311 If $enable is false (the default), then "decode" will only accept
385 372
386 This setting has currently no effect on tied hashes. 373 This setting has currently no effect on tied hashes.
387 374
388 $json = $json->allow_nonref ([$enable]) 375 $json = $json->allow_nonref ([$enable])
389 $enabled = $json->get_allow_nonref 376 $enabled = $json->get_allow_nonref
377 Unlike other boolean options, this opotion is enabled by default
378 beginning with version 4.0. See "SECURITY CONSIDERATIONS" for the
379 gory details.
380
390 If $enable is true (or missing), then the "encode" method can 381 If $enable is true (or missing), then the "encode" method can
391 convert a non-reference into its corresponding string, number or 382 convert a non-reference into its corresponding string, number or
392 null JSON value, which is an extension to RFC4627. Likewise, 383 null JSON value, which is an extension to RFC4627. Likewise,
393 "decode" will accept those JSON values instead of croaking. 384 "decode" will accept those JSON values instead of croaking.
394 385
395 If $enable is false, then the "encode" method will croak if it isn't 386 If $enable is false, then the "encode" method will croak if it isn't
396 passed an arrayref or hashref, as JSON texts must either be an 387 passed an arrayref or hashref, as JSON texts must either be an
397 object or array. Likewise, "decode" will croak if given something 388 object or array. Likewise, "decode" will croak if given something
398 that is not a JSON object or array. 389 that is not a JSON object or array.
399 390
400 Example, encode a Perl scalar as JSON value with enabled 391 Example, encode a Perl scalar as JSON value without enabled
401 "allow_nonref", resulting in an invalid JSON text: 392 "allow_nonref", resulting in an error:
402 393
403 JSON::XS->new->allow_nonref->encode ("Hello, World!") 394 JSON::XS->new->allow_nonref (0)->encode ("Hello, World!")
404 => "Hello, World!" 395 => hash- or arrayref expected...
405 396
406 $json = $json->allow_unknown ([$enable]) 397 $json = $json->allow_unknown ([$enable])
407 $enabled = $json->get_allow_unknown 398 $enabled = $json->get_allow_unknown
408 If $enable is true (or missing), then "encode" will *not* throw an 399 If $enable is true (or missing), then "encode" will *not* throw an
409 exception when it encounters values it cannot represent in JSON (for 400 exception when it encounters values it cannot represent in JSON (for
455 this type of conversion. 446 this type of conversion.
456 447
457 This setting has no effect on "decode". 448 This setting has no effect on "decode".
458 449
459 $json = $json->allow_tags ([$enable]) 450 $json = $json->allow_tags ([$enable])
460 $enabled = $json->allow_tags 451 $enabled = $json->get_allow_tags
461 See "OBJECT SERIALISATION" for details. 452 See "OBJECT SERIALISATION" for details.
462 453
463 If $enable is true (or missing), then "encode", upon encountering a 454 If $enable is true (or missing), then "encode", upon encountering a
464 blessed object, will check for the availability of the "FREEZE" 455 blessed object, will check for the availability of the "FREEZE"
465 method on the object's class. If found, it will be used to serialise 456 method on the object's class. If found, it will be used to serialise
471 462
472 If $enable is false (the default), then "encode" will not consider 463 If $enable is false (the default), then "encode" will not consider
473 this type of conversion, and tagged JSON values will cause a parse 464 this type of conversion, and tagged JSON values will cause a parse
474 error in "decode", as if tags were not part of the grammar. 465 error in "decode", as if tags were not part of the grammar.
475 466
467 $json->boolean_values ([$false, $true])
468 ($false, $true) = $json->get_boolean_values
469 By default, JSON booleans will be decoded as overloaded
470 $Types::Serialiser::false and $Types::Serialiser::true objects.
471
472 With this method you can specify your own boolean values for
473 decoding - on decode, JSON "false" will be decoded as a copy of
474 $false, and JSON "true" will be decoded as $true ("copy" here is the
475 same thing as assigning a value to another variable, i.e. "$copy =
476 $false").
477
478 Calling this method without any arguments will reset the booleans to
479 their default values.
480
481 "get_boolean_values" will return both $false and $true values, or
482 the empty list when they are set to the default.
483
476 $json = $json->filter_json_object ([$coderef->($hashref)]) 484 $json = $json->filter_json_object ([$coderef->($hashref)])
477 When $coderef is specified, it will be called from "decode" each 485 When $coderef is specified, it will be called from "decode" each
478 time it decodes a JSON object. The only argument is a reference to 486 time it decodes a JSON object. The only argument is a reference to
479 the newly-created hash. If the code references returns a single 487 the newly-created hash. If the code reference returns a single
480 scalar (which need not be a reference), this value (i.e. a copy of 488 scalar (which need not be a reference), this value (or rather a copy
481 that scalar to avoid aliasing) is inserted into the deserialised 489 of it) is inserted into the deserialised data structure. If it
482 data structure. If it returns an empty list (NOTE: *not* "undef", 490 returns an empty list (NOTE: *not* "undef", which is a valid
483 which is a valid scalar), the original deserialised hash will be 491 scalar), the original deserialised hash will be inserted. This
484 inserted. This setting can slow down decoding considerably. 492 setting can slow down decoding considerably.
485 493
486 When $coderef is omitted or undefined, any existing callback will be 494 When $coderef is omitted or undefined, any existing callback will be
487 removed and "decode" will not change the deserialised hash in any 495 removed and "decode" will not change the deserialised hash in any
488 way. 496 way.
489 497
724 This is useful if you want to repeatedly parse JSON objects and want 732 This is useful if you want to repeatedly parse JSON objects and want
725 to ignore any trailing data, which means you have to reset the 733 to ignore any trailing data, which means you have to reset the
726 parser after each successful decode. 734 parser after each successful decode.
727 735
728 LIMITATIONS 736 LIMITATIONS
729 All options that affect decoding are supported, except "allow_nonref". 737 The incremental parser is a non-exact parser: it works by gathering as
730 The reason for this is that it cannot be made to work sensibly: JSON 738 much text as possible that *could* be a valid JSON text, followed by
731 objects and arrays are self-delimited, i.e. you can concatenate them 739 trying to decode it.
732 back to back and still decode them perfectly. This does not hold true
733 for JSON numbers, however.
734 740
735 For example, is the string 1 a single JSON number, or is it simply the 741 That means it sometimes needs to read more data than strictly necessary
736 start of 12? Or is 12 a single JSON number, or the concatenation of 1 742 to diagnose an invalid JSON text. For example, after parsing the
737 and 2? In neither case you can tell, and this is why JSON::XS takes the 743 following fragment, the parser *could* stop with an error, as this
738 conservative route and disallows this case. 744 fragment *cannot* be the beginning of a valid JSON text:
745
746 [,
747
748 In reality, hopwever, the parser might continue to read data until a
749 length limit is exceeded or it finds a closing bracket.
739 750
740 EXAMPLES 751 EXAMPLES
741 Some examples will make all this clearer. First, a simple example that 752 Some examples will make all this clearer. First, a simple example that
742 works similarly to "decode_prefix": We want to decode the JSON object at 753 works similarly to "decode_prefix": We want to decode the JSON object at
743 the start of a string and identify the portion after the JSON object: 754 the start of a string and identify the portion after the JSON object:
1176 will expect your input strings to be encoded as UTF-8, that is, no 1187 will expect your input strings to be encoded as UTF-8, that is, no
1177 "character" of the input string must have any value > 255, as UTF-8 1188 "character" of the input string must have any value > 255, as UTF-8
1178 does not allow that. 1189 does not allow that.
1179 1190
1180 The "utf8" flag therefore switches between two modes: disabled means 1191 The "utf8" flag therefore switches between two modes: disabled means
1181 you will get a Unicode string in Perl, enabled means you get an 1192 you will get a Unicode string in Perl, enabled means you get a UTF-8
1182 UTF-8 encoded octet/binary string in Perl. 1193 encoded octet/binary string in Perl.
1183 1194
1184 "latin1" or "ascii" flags enabled 1195 "latin1" or "ascii" flags enabled
1185 With "latin1" (or "ascii") enabled, "encode" will escape characters 1196 With "latin1" (or "ascii") enabled, "encode" will escape characters
1186 with ordinal values > 255 (> 127 with "ascii") and encode the 1197 with ordinal values > 255 (> 127 with "ascii") and encode the
1187 remaining characters as specified by the "utf8" flag. 1198 remaining characters as specified by the "utf8" flag.
1443 to see whether you are vulnerable to some common attack vectors (which 1454 to see whether you are vulnerable to some common attack vectors (which
1444 really are browser design bugs, but it is still you who will have to 1455 really are browser design bugs, but it is still you who will have to
1445 deal with it, as major browser developers care only for features, not 1456 deal with it, as major browser developers care only for features, not
1446 about getting security right). 1457 about getting security right).
1447 1458
1448"OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159) 1459 "OLD" VS. "NEW" JSON (RFC4627 VS. RFC7159)
1449 TL;DR: Due to security concerns, JSON::XS will not allow scalar data in 1460 JSON originally required JSON texts to represent an array or object -
1450 JSON texts by default - you need to create your own JSON::XS object and 1461 scalar values were explicitly not allowed. This has changed, and
1451 enable "allow_nonref": 1462 versions of JSON::XS beginning with 4.0 reflect this by allowing scalar
1463 values by default.
1452 1464
1465 One reason why one might not want this is that this removes a
1466 fundamental property of JSON texts, namely that they are self-delimited
1467 and self-contained, or in other words, you could take any number of
1468 "old" JSON texts and paste them together, and the result would be
1469 unambiguously parseable:
1470
1471 [1,3]{"k":5}[][null] # four JSON texts, without doubt
1472
1473 By allowing scalars, this property is lost: in the following example, is
1474 this one JSON text (the number 12) or two JSON texts (the numbers 1 and
1475 2):
1476
1477 12 # could be 12, or 1 and 2
1478
1479 Another lost property of "old" JSON is that no lookahead is required to
1480 know the end of a JSON text, i.e. the JSON text definitely ended at the
1481 last "]" or "}" character, there was no need to read extra characters.
1482
1483 For example, a viable network protocol with "old" JSON was to simply
1484 exchange JSON texts without delimiter. For "new" JSON, you have to use a
1485 suitable delimiter (such as a newline) after every JSON text or ensure
1486 you never encode/decode scalar values.
1487
1488 Most protocols do work by only transferring arrays or objects, and the
1489 easiest way to avoid problems with the "new" JSON definition is to
1490 explicitly disallow scalar values in your encoder and decoder:
1491
1453 my $json = JSON::XS->new->allow_nonref; 1492 $json_coder = JSON::XS->new->allow_nonref (0)
1454 1493
1455 $text = $json->encode ($data); 1494 This is a somewhat unhappy situation, and the blame can fully be put on
1456 $data = $json->decode ($text);
1457
1458 The long version: JSON being an important and supposedly stable format,
1459 the IETF standardised it as RFC 4627 in 2006. Unfortunately, the
1460 inventor of JSON, Dougles Crockford, unilaterally changed the definition 1495 JSON's inmventor, Douglas Crockford, who unilaterally changed the format
1461 of JSON in javascript. Rather than create a fork, the IETF decided to 1496 in 2006 without consulting the IETF, forcing the IETF to either fork the
1462 standardise the new syntax (apparently, so Iw as told, without finding 1497 format or go with it (as I was told, the IETF wasn't amused).
1463 it very amusing).
1464 1498
1465 The biggest difference between thed original JSON and the new JSON is 1499RELATIONSHIP WITH I-JSON
1466 that the new JSON supports scalars (anything other than arrays and 1500 JSON is a somewhat sloppily-defined format - it carries around obvious
1467 objects) at the toplevel of a JSON text. While this is strictly 1501 Javascript baggage, such as not really defining number range, probably
1468 backwards compatible to older versions, it breaks a number of protocols 1502 because Javascript only has one type of numbers: IEEE 64 bit floats
1469 that relied on sending JSON back-to-back, and is a minor security 1503 ("binary64").
1470 concern.
1471 1504
1472 For example, imagine you have two banks communicating, and on one side, 1505 For this reaosn, RFC7493 defines "Internet JSON", which is a restricted
1473 trhe JSON coder gets upgraded. Two messages, such as 10 and 1000 might 1506 subset of JSON that is supposedly more interoperable on the internet.
1474 then be confused to mean 101000, something that couldn't happen in the
1475 original JSON, because niether of these messages would be valid JSON.
1476 1507
1477 If one side accepts these messages, then an upgrade in the coder on 1508 While "JSON::XS" does not offer specific support for I-JSON, it of
1478 either side could result in this becoming exploitable. 1509 course accepts valid I-JSON and by default implements some of the
1510 limitations of I-JSON, such as parsing numbers as perl numbers, which
1511 are usually a superset of binary64 numbers.
1479 1512
1480 This module has always allowed these messages as an optional extension, 1513 To generate I-JSON, follow these rules:
1481 by default disabled. The security concerns are the reason why the 1514
1482 default is still disabled, but future versions might/will likely upgrade 1515 * always generate UTF-8
1483 to the newer RFC as default format, so you are advised to check your 1516
1484 implementation and/or override the default with "->allow_nonref (0)" to 1517 I-JSON must be encoded in UTF-8, the default for "encode_json".
1485 ensure that future versions are safe. 1518
1519 * numbers should be within IEEE 754 binary64 range
1520
1521 Basically all existing perl installations use binary64 to represent
1522 floating point numbers, so all you need to do is to avoid large
1523 integers.
1524
1525 * objects must not have duplicate keys
1526
1527 This is trivially done, as "JSON::XS" does not allow duplicate keys.
1528
1529 * do not generate scalar JSON texts, use "->allow_nonref (0)"
1530
1531 I-JSON strongly requests you to only encode arrays and objects into
1532 JSON.
1533
1534 * times should be strings in ISO 8601 format
1535
1536 There are a myriad of modules on CPAN dealing with ISO 8601 - search
1537 for "ISO8601" on CPAN and use one.
1538
1539 * encode binary data as base64
1540
1541 While it's tempting to just dump binary data as a string (and let
1542 "JSON::XS" do the escaping), for I-JSON, it's *recommended* to
1543 encode binary data as base64.
1544
1545 There are some other considerations - read RFC7493 for the details if
1546 interested.
1486 1547
1487INTEROPERABILITY WITH OTHER MODULES 1548INTEROPERABILITY WITH OTHER MODULES
1488 "JSON::XS" uses the Types::Serialiser module to provide boolean 1549 "JSON::XS" uses the Types::Serialiser module to provide boolean
1489 constants. That means that the JSON true and false values will be 1550 constants. That means that the JSON true and false values will be
1490 comaptible to true and false values of other modules that do the same, 1551 comaptible to true and false values of other modules that do the same,
1547 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; 1608 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
1548 1609
1549 Again, this has some limitations - the magic string must not be encoded 1610 Again, this has some limitations - the magic string must not be encoded
1550 with character escapes, and the constructor arguments must be non-empty. 1611 with character escapes, and the constructor arguments must be non-empty.
1551 1612
1552RFC7159
1553 Since this module was written, Google has written a new JSON RFC, RFC
1554 7159 (and RFC7158). Unfortunately, this RFC breaks compatibility with
1555 both the original JSON specification on www.json.org and RFC4627.
1556
1557 As far as I can see, you can get partial compatibility when parsing by
1558 using "->allow_nonref". However, consider the security implications of
1559 doing so.
1560
1561 I haven't decided yet when to break compatibility with RFC4627 by
1562 default (and potentially leave applications insecure) and change the
1563 default to follow RFC7159, but application authors are well advised to
1564 call "->allow_nonref(0)" even if this is the current default, if they
1565 cannot handle non-reference values, in preparation for the day when the
1566 default will change.
1567
1568(I-)THREADS 1613(I-)THREADS
1569 This module is *not* guaranteed to be ithread (or MULTIPLICITY-) safe 1614 This module is *not* guaranteed to be ithread (or MULTIPLICITY-) safe
1570 and there are no plans to change this. Note that perl's builtin 1615 and there are no plans to change this. Note that perl's builtin
1571 so-called theeads/ithreads are officially deprecated and should not be 1616 so-called threads/ithreads are officially deprecated and should not be
1572 used. 1617 used.
1573 1618
1574THE PERILS OF SETLOCALE 1619THE PERILS OF SETLOCALE
1575 Sometimes people avoid the Perl locale support and directly call the 1620 Sometimes people avoid the Perl locale support and directly call the
1576 system's setlocale function with "LC_ALL". 1621 system's setlocale function with "LC_ALL".
1585 1630
1586 If you need "LC_NUMERIC", you should enable it only around the code that 1631 If you need "LC_NUMERIC", you should enable it only around the code that
1587 actually needs it (avoiding stringification of numbers), and restore it 1632 actually needs it (avoiding stringification of numbers), and restore it
1588 afterwards. 1633 afterwards.
1589 1634
1635SOME HISTORY
1636 At the time this module was created there already were a number of JSON
1637 modules available on CPAN, so what was the reason to write yet another
1638 JSON module? While it seems there are many JSON modules, none of them
1639 correctly handled all corner cases, and in most cases their maintainers
1640 are unresponsive, gone missing, or not listening to bug reports for
1641 other reasons.
1642
1643 Beginning with version 2.0 of the JSON module, when both JSON and
1644 JSON::XS are installed, then JSON will fall back on JSON::XS (this can
1645 be overridden) with no overhead due to emulation (by inheriting
1646 constructor and methods). If JSON::XS is not available, it will fall
1647 back to the compatible JSON::PP module as backend, so using JSON instead
1648 of JSON::XS gives you a portable JSON API that can be fast when you need
1649 it and doesn't require a C compiler when that is a problem.
1650
1651 Somewhere around version 3, this module was forked into
1652 "Cpanel::JSON::XS", because its maintainer had serious trouble
1653 understanding JSON and insisted on a fork with many bugs "fixed" that
1654 weren't actually bugs, while spreading FUD about this module without
1655 actually giving any details on his accusations. You be the judge, but in
1656 my personal opinion, if you want quality, you will stay away from
1657 dangerous forks like that.
1658
1590BUGS 1659BUGS
1591 While the goal of this module is to be correct, that unfortunately does 1660 While the goal of this module is to be correct, that unfortunately does
1592 not mean it's bug-free, only that I think its design is bug-free. If you 1661 not mean it's bug-free, only that I think its design is bug-free. If you
1593 keep reporting bugs they will be fixed swiftly, though. 1662 keep reporting bugs they will be fixed swiftly, though.
1594 1663

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines