… | |
… | |
350 | white-space and comments are allowed. |
350 | white-space and comments are allowed. |
351 | |
351 | |
352 | [ |
352 | [ |
353 | 1, # this comment not allowed in JSON |
353 | 1, # this comment not allowed in JSON |
354 | # neither this one... |
354 | # neither this one... |
|
|
355 | ] |
|
|
356 | |
|
|
357 | * literal ASCII TAB characters in strings |
|
|
358 | |
|
|
359 | Literal ASCII TAB characters are now allowed in strings (and |
|
|
360 | treated as "\t"). |
|
|
361 | |
|
|
362 | [ |
|
|
363 | "Hello\tWorld", |
|
|
364 | "Hello<TAB>World", # literal <TAB> would not normally be allowed |
355 | ] |
365 | ] |
356 | |
366 | |
357 | $json = $json->canonical ([$enable]) |
367 | $json = $json->canonical ([$enable]) |
358 | $enabled = $json->get_canonical |
368 | $enabled = $json->get_canonical |
359 | If $enable is true (or missing), then the "encode" method will |
369 | If $enable is true (or missing), then the "encode" method will |
… | |
… | |
622 | |
632 | |
623 | This is useful if your JSON texts are not delimited by an outer |
633 | This is useful if your JSON texts are not delimited by an outer |
624 | protocol and you need to know where the JSON text ends. |
634 | protocol and you need to know where the JSON text ends. |
625 | |
635 | |
626 | JSON::XS->new->decode_prefix ("[1] the tail") |
636 | JSON::XS->new->decode_prefix ("[1] the tail") |
627 | => ([], 3) |
637 | => ([1], 3) |
628 | |
638 | |
629 | INCREMENTAL PARSING |
639 | INCREMENTAL PARSING |
630 | In some cases, there is the need for incremental parsing of JSON texts. |
640 | In some cases, there is the need for incremental parsing of JSON texts. |
631 | While this module always has to keep both JSON text and resulting Perl |
641 | While this module always has to keep both JSON text and resulting Perl |
632 | data structure in memory at one time, it does allow you to parse a JSON |
642 | data structure in memory at one time, it does allow you to parse a JSON |
… | |
… | |
666 | can then use "incr_skip" to skip the erroneous part). This is the |
676 | can then use "incr_skip" to skip the erroneous part). This is the |
667 | most common way of using the method. |
677 | most common way of using the method. |
668 | |
678 | |
669 | And finally, in list context, it will try to extract as many objects |
679 | And finally, in list context, it will try to extract as many objects |
670 | from the stream as it can find and return them, or the empty list |
680 | from the stream as it can find and return them, or the empty list |
671 | otherwise. For this to work, there must be no separators between the |
681 | otherwise. For this to work, there must be no separators (other than |
672 | JSON objects or arrays, instead they must be concatenated |
682 | whitespace) between the JSON objects or arrays, instead they must be |
673 | back-to-back. If an error occurs, an exception will be raised as in |
683 | concatenated back-to-back. If an error occurs, an exception will be |
674 | the scalar context case. Note that in this case, any |
684 | raised as in the scalar context case. Note that in this case, any |
675 | previously-parsed JSON texts will be lost. |
685 | previously-parsed JSON texts will be lost. |
676 | |
686 | |
677 | Example: Parse some JSON arrays/objects in a given string and return |
687 | Example: Parse some JSON arrays/objects in a given string and return |
678 | them. |
688 | them. |
679 | |
689 | |
… | |
… | |
687 | function (I mean it. although in simple tests it might actually |
697 | function (I mean it. although in simple tests it might actually |
688 | work, it *will* fail under real world conditions). As a special |
698 | work, it *will* fail under real world conditions). As a special |
689 | exception, you can also call this method before having parsed |
699 | exception, you can also call this method before having parsed |
690 | anything. |
700 | anything. |
691 | |
701 | |
|
|
702 | That means you can only use this function to look at or manipulate |
|
|
703 | text before or after complete JSON objects, not while the parser is |
|
|
704 | in the middle of parsing a JSON object. |
|
|
705 | |
692 | This function is useful in two cases: a) finding the trailing text |
706 | This function is useful in two cases: a) finding the trailing text |
693 | after a JSON object or b) parsing multiple JSON objects separated by |
707 | after a JSON object or b) parsing multiple JSON objects separated by |
694 | non-JSON text (such as commas). |
708 | non-JSON text (such as commas). |
695 | |
709 | |
696 | $json->incr_skip |
710 | $json->incr_skip |
… | |
… | |
1040 | The "FREEZE" method can return any number of values (i.e. zero or |
1054 | The "FREEZE" method can return any number of values (i.e. zero or |
1041 | more). These values and the paclkage/classname of the object will |
1055 | more). These values and the paclkage/classname of the object will |
1042 | then be encoded as a tagged JSON value in the following format: |
1056 | then be encoded as a tagged JSON value in the following format: |
1043 | |
1057 | |
1044 | ("classname")[FREEZE return values...] |
1058 | ("classname")[FREEZE return values...] |
|
|
1059 | |
|
|
1060 | e.g.: |
|
|
1061 | |
|
|
1062 | ("URI")["http://www.google.com/"] |
|
|
1063 | ("MyDate")[2013,10,29] |
|
|
1064 | ("ImageData::JPEG")["Z3...VlCg=="] |
1045 | |
1065 | |
1046 | For example, the hypothetical "My::Object" "FREEZE" method might use |
1066 | For example, the hypothetical "My::Object" "FREEZE" method might use |
1047 | the objects "type" and "id" members to encode the object: |
1067 | the objects "type" and "id" members to encode the object: |
1048 | |
1068 | |
1049 | sub My::Object::FREEZE { |
1069 | sub My::Object::FREEZE { |
… | |
… | |
1423 | to see whether you are vulnerable to some common attack vectors (which |
1443 | to see whether you are vulnerable to some common attack vectors (which |
1424 | really are browser design bugs, but it is still you who will have to |
1444 | really are browser design bugs, but it is still you who will have to |
1425 | deal with it, as major browser developers care only for features, not |
1445 | deal with it, as major browser developers care only for features, not |
1426 | about getting security right). |
1446 | about getting security right). |
1427 | |
1447 | |
|
|
1448 | "OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159) |
|
|
1449 | TL;DR: Due to security concerns, JSON::XS will not allow scalar data in |
|
|
1450 | JSON texts by default - you need to create your own JSON::XS object and |
|
|
1451 | enable "allow_nonref": |
|
|
1452 | |
|
|
1453 | my $json = JSON::XS->new->allow_nonref; |
|
|
1454 | |
|
|
1455 | $text = $json->encode ($data); |
|
|
1456 | $data = $json->decode ($text); |
|
|
1457 | |
|
|
1458 | The long version: JSON being an important and supposedly stable format, |
|
|
1459 | the IETF standardised it as RFC 4627 in 2006. Unfortunately, the |
|
|
1460 | inventor of JSON, Dougles Crockford, unilaterally changed the definition |
|
|
1461 | of JSON in javascript. Rather than create a fork, the IETF decided to |
|
|
1462 | standardise the new syntax (apparently, so Iw as told, without finding |
|
|
1463 | it very amusing). |
|
|
1464 | |
|
|
1465 | The biggest difference between thed original JSON and the new JSON is |
|
|
1466 | that the new JSON supports scalars (anything other than arrays and |
|
|
1467 | objects) at the toplevel of a JSON text. While this is strictly |
|
|
1468 | backwards compatible to older versions, it breaks a number of protocols |
|
|
1469 | that relied on sending JSON back-to-back, and is a minor security |
|
|
1470 | concern. |
|
|
1471 | |
|
|
1472 | For example, imagine you have two banks communicating, and on one side, |
|
|
1473 | trhe JSON coder gets upgraded. Two messages, such as 10 and 1000 might |
|
|
1474 | then be confused to mean 101000, something that couldn't happen in the |
|
|
1475 | original JSON, because niether of these messages would be valid JSON. |
|
|
1476 | |
|
|
1477 | If one side accepts these messages, then an upgrade in the coder on |
|
|
1478 | either side could result in this becoming exploitable. |
|
|
1479 | |
|
|
1480 | This module has always allowed these messages as an optional extension, |
|
|
1481 | by default disabled. The security concerns are the reason why the |
|
|
1482 | default is still disabled, but future versions might/will likely upgrade |
|
|
1483 | to the newer RFC as default format, so you are advised to check your |
|
|
1484 | implementation and/or override the default with "->allow_nonref (0)" to |
|
|
1485 | ensure that future versions are safe. |
|
|
1486 | |
1428 | INTEROPERABILITY WITH OTHER MODULES |
1487 | INTEROPERABILITY WITH OTHER MODULES |
1429 | "JSON::XS" uses the Types::Serialiser module to provide boolean |
1488 | "JSON::XS" uses the Types::Serialiser module to provide boolean |
1430 | constants. That means that the JSON true and false values will be |
1489 | constants. That means that the JSON true and false values will be |
1431 | comaptible to true and false values of iother modules that do the same, |
1490 | comaptible to true and false values of other modules that do the same, |
1432 | such as JSON::PP and CBOR::XS. |
1491 | such as JSON::PP and CBOR::XS. |
|
|
1492 | |
|
|
1493 | INTEROPERABILITY WITH OTHER JSON DECODERS |
|
|
1494 | As long as you only serialise data that can be directly expressed in |
|
|
1495 | JSON, "JSON::XS" is incapable of generating invalid JSON output (modulo |
|
|
1496 | bugs, but "JSON::XS" has found more bugs in the official JSON testsuite |
|
|
1497 | (1) than the official JSON testsuite has found in "JSON::XS" (0)). |
|
|
1498 | |
|
|
1499 | When you have trouble decoding JSON generated by this module using other |
|
|
1500 | decoders, then it is very likely that you have an encoding mismatch or |
|
|
1501 | the other decoder is broken. |
|
|
1502 | |
|
|
1503 | When decoding, "JSON::XS" is strict by default and will likely catch all |
|
|
1504 | errors. There are currently two settings that change this: "relaxed" |
|
|
1505 | makes "JSON::XS" accept (but not generate) some non-standard extensions, |
|
|
1506 | and "allow_tags" will allow you to encode and decode Perl objects, at |
|
|
1507 | the cost of not outputting valid JSON anymore. |
|
|
1508 | |
|
|
1509 | TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS |
|
|
1510 | When you use "allow_tags" to use the extended (and also nonstandard and |
|
|
1511 | invalid) JSON syntax for serialised objects, and you still want to |
|
|
1512 | decode the generated When you want to serialise objects, you can run a |
|
|
1513 | regex to replace the tagged syntax by standard JSON arrays (it only |
|
|
1514 | works for "normal" package names without comma, newlines or single |
|
|
1515 | colons). First, the readable Perl version: |
|
|
1516 | |
|
|
1517 | # if your FREEZE methods return no values, you need this replace first: |
|
|
1518 | $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx; |
|
|
1519 | |
|
|
1520 | # this works for non-empty constructor arg lists: |
|
|
1521 | $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx; |
|
|
1522 | |
|
|
1523 | And here is a less readable version that is easy to adapt to other |
|
|
1524 | languages: |
|
|
1525 | |
|
|
1526 | $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g; |
|
|
1527 | |
|
|
1528 | Here is an ECMAScript version (same regex): |
|
|
1529 | |
|
|
1530 | json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,"); |
|
|
1531 | |
|
|
1532 | Since this syntax converts to standard JSON arrays, it might be hard to |
|
|
1533 | distinguish serialised objects from normal arrays. You can prepend a |
|
|
1534 | "magic number" as first array element to reduce chances of a collision: |
|
|
1535 | |
|
|
1536 | $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g; |
|
|
1537 | |
|
|
1538 | And after decoding the JSON text, you could walk the data structure |
|
|
1539 | looking for arrays with a first element of |
|
|
1540 | "XU1peReLzT4ggEllLanBYq4G9VzliwKF". |
|
|
1541 | |
|
|
1542 | The same approach can be used to create the tagged format with another |
|
|
1543 | encoder. First, you create an array with the magic string as first |
|
|
1544 | member, the classname as second, and constructor arguments last, encode |
|
|
1545 | it as part of your JSON structure, and then: |
|
|
1546 | |
|
|
1547 | $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; |
|
|
1548 | |
|
|
1549 | Again, this has some limitations - the magic string must not be encoded |
|
|
1550 | with character escapes, and the constructor arguments must be non-empty. |
|
|
1551 | |
|
|
1552 | RFC7159 |
|
|
1553 | Since this module was written, Google has written a new JSON RFC, RFC |
|
|
1554 | 7159 (and RFC7158). Unfortunately, this RFC breaks compatibility with |
|
|
1555 | both the original JSON specification on www.json.org and RFC4627. |
|
|
1556 | |
|
|
1557 | As far as I can see, you can get partial compatibility when parsing by |
|
|
1558 | using "->allow_nonref". However, consider the security implications of |
|
|
1559 | doing so. |
|
|
1560 | |
|
|
1561 | I haven't decided yet when to break compatibility with RFC4627 by |
|
|
1562 | default (and potentially leave applications insecure) and change the |
|
|
1563 | default to follow RFC7159, but application authors are well advised to |
|
|
1564 | call "->allow_nonref(0)" even if this is the current default, if they |
|
|
1565 | cannot handle non-reference values, in preparation for the day when the |
|
|
1566 | default will change. |
1433 | |
1567 | |
1434 | THREADS |
1568 | THREADS |
1435 | This module is *not* guaranteed to be thread safe and there are no plans |
1569 | This module is *not* guaranteed to be thread safe and there are no plans |
1436 | to change this until Perl gets thread support (as opposed to the |
1570 | to change this until Perl gets thread support (as opposed to the |
1437 | horribly slow so-called "threads" which are simply slow and bloated |
1571 | horribly slow so-called "threads" which are simply slow and bloated |