ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
(Generate patch)

Comparing JSON-XS/XS.pm (file contents):
Revision 1.146 by root, Tue Oct 29 00:18:55 2013 UTC vs.
Revision 1.167 by root, Tue Aug 28 16:16:17 2018 UTC

40Beginning with version 2.0 of the JSON module, when both JSON and 40Beginning with version 2.0 of the JSON module, when both JSON and
41JSON::XS are installed, then JSON will fall back on JSON::XS (this can be 41JSON::XS are installed, then JSON will fall back on JSON::XS (this can be
42overridden) with no overhead due to emulation (by inheriting constructor 42overridden) with no overhead due to emulation (by inheriting constructor
43and methods). If JSON::XS is not available, it will fall back to the 43and methods). If JSON::XS is not available, it will fall back to the
44compatible JSON::PP module as backend, so using JSON instead of JSON::XS 44compatible JSON::PP module as backend, so using JSON instead of JSON::XS
45gives you a portable JSON API that can be fast when you need and doesn't 45gives you a portable JSON API that can be fast when you need it and
46require a C compiler when that is a problem. 46doesn't require a C compiler when that is a problem.
47 47
48As this is the n-th-something JSON module on CPAN, what was the reason 48As this is the n-th-something JSON module on CPAN, what was the reason
49to write yet another JSON module? While it seems there are many JSON 49to write yet another JSON module? While it seems there are many JSON
50modules, none of them correctly handle all corner cases, and in most cases 50modules, none of them correctly handle all corner cases, and in most cases
51their maintainers are unresponsive, gone missing, or not listening to bug 51their maintainers are unresponsive, gone missing, or not listening to bug
101 101
102package JSON::XS; 102package JSON::XS;
103 103
104use common::sense; 104use common::sense;
105 105
106our $VERSION = '3.0'; 106our $VERSION = 3.04;
107our @ISA = qw(Exporter); 107our @ISA = qw(Exporter);
108 108
109our @EXPORT = qw(encode_json decode_json); 109our @EXPORT = qw(encode_json decode_json);
110 110
111use Exporter; 111use Exporter;
131 131
132Except being faster. 132Except being faster.
133 133
134=item $perl_scalar = decode_json $json_text 134=item $perl_scalar = decode_json $json_text
135 135
136The opposite of C<encode_json>: expects an UTF-8 (binary) string and tries 136The opposite of C<encode_json>: expects a UTF-8 (binary) string and tries
137to parse that as an UTF-8 encoded JSON text, returning the resulting 137to parse that as a UTF-8 encoded JSON text, returning the resulting
138reference. Croaks on error. 138reference. Croaks on error.
139 139
140This function call is functionally identical to: 140This function call is functionally identical to:
141 141
142 $perl_scalar = JSON::XS->new->utf8->decode ($json_text) 142 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
270 270
271=item $enabled = $json->get_utf8 271=item $enabled = $json->get_utf8
272 272
273If C<$enable> is true (or missing), then the C<encode> method will encode 273If C<$enable> is true (or missing), then the C<encode> method will encode
274the JSON result into UTF-8, as required by many protocols, while the 274the JSON result into UTF-8, as required by many protocols, while the
275C<decode> method expects to be handled an UTF-8-encoded string. Please 275C<decode> method expects to be handed a UTF-8-encoded string. Please
276note that UTF-8-encoded strings do not contain any characters outside the 276note that UTF-8-encoded strings do not contain any characters outside the
277range C<0..255>, they are thus useful for bytewise/binary I/O. In future 277range C<0..255>, they are thus useful for bytewise/binary I/O. In future
278versions, enabling this option might enable autodetection of the UTF-16 278versions, enabling this option might enable autodetection of the UTF-16
279and UTF-32 encoding families, as described in RFC4627. 279and UTF-32 encoding families, as described in RFC4627.
280 280
365 365
366=item $enabled = $json->get_relaxed 366=item $enabled = $json->get_relaxed
367 367
368If C<$enable> is true (or missing), then C<decode> will accept some 368If C<$enable> is true (or missing), then C<decode> will accept some
369extensions to normal JSON syntax (see below). C<encode> will not be 369extensions to normal JSON syntax (see below). C<encode> will not be
370affected in anyway. I<Be aware that this option makes you accept invalid 370affected in any way. I<Be aware that this option makes you accept invalid
371JSON texts as if they were valid!>. I suggest only to use this option to 371JSON texts as if they were valid!>. I suggest only to use this option to
372parse application-specific files written by humans (configuration files, 372parse application-specific files written by humans (configuration files,
373resource files etc.) 373resource files etc.)
374 374
375If C<$enable> is false (the default), then C<decode> will only accept 375If C<$enable> is false (the default), then C<decode> will only accept
404 [ 404 [
405 1, # this comment not allowed in JSON 405 1, # this comment not allowed in JSON
406 # neither this one... 406 # neither this one...
407 ] 407 ]
408 408
409=item * literal ASCII TAB characters in strings
410
411Literal ASCII TAB characters are now allowed in strings (and treated as
412C<\t>).
413
414 [
415 "Hello\tWorld",
416 "Hello<TAB>World", # literal <TAB> would not normally be allowed
417 ]
418
409=back 419=back
410 420
411=item $json = $json->canonical ([$enable]) 421=item $json = $json->canonical ([$enable])
412 422
413=item $enabled = $json->get_canonical 423=item $enabled = $json->get_canonical
467 477
468=item $json = $json->allow_blessed ([$enable]) 478=item $json = $json->allow_blessed ([$enable])
469 479
470=item $enabled = $json->get_allow_blessed 480=item $enabled = $json->get_allow_blessed
471 481
472See "OBJECT SERIALISATION" for details. 482See L<OBJECT SERIALISATION> for details.
473 483
474If C<$enable> is true (or missing), then the C<encode> method will not 484If C<$enable> is true (or missing), then the C<encode> method will not
475barf when it encounters a blessed reference that it cannot convert 485barf when it encounters a blessed reference that it cannot convert
476otherwise. Instead, a JSON C<null> value is encoded instead of the object. 486otherwise. Instead, a JSON C<null> value is encoded instead of the object.
477 487
483 493
484=item $json = $json->convert_blessed ([$enable]) 494=item $json = $json->convert_blessed ([$enable])
485 495
486=item $enabled = $json->get_convert_blessed 496=item $enabled = $json->get_convert_blessed
487 497
488See "OBJECT SERIALISATION" for details. 498See L<OBJECT SERIALISATION> for details.
489 499
490If C<$enable> is true (or missing), then C<encode>, upon encountering a 500If C<$enable> is true (or missing), then C<encode>, upon encountering a
491blessed object, will check for the availability of the C<TO_JSON> method 501blessed object, will check for the availability of the C<TO_JSON> method
492on the object's class. If found, it will be called in scalar context and 502on the object's class. If found, it will be called in scalar context and
493the resulting scalar will be encoded instead of the object. 503the resulting scalar will be encoded instead of the object.
507 517
508=item $json = $json->allow_tags ([$enable]) 518=item $json = $json->allow_tags ([$enable])
509 519
510=item $enabled = $json->allow_tags 520=item $enabled = $json->allow_tags
511 521
512See "OBJECT SERIALISATION" for details. 522See L<OBJECT SERIALISATION> for details.
513 523
514If C<$enable> is true (or missing), then C<encode>, upon encountering a 524If C<$enable> is true (or missing), then C<encode>, upon encountering a
515blessed object, will check for the availability of the C<FREEZE> method on 525blessed object, will check for the availability of the C<FREEZE> method on
516the object's class. If found, it will be used to serialise the object into 526the object's class. If found, it will be used to serialise the object into
517a nonstandard tagged JSON value (that JSON decoders cannot decode). 527a nonstandard tagged JSON value (that JSON decoders cannot decode).
687 697
688This is useful if your JSON texts are not delimited by an outer protocol 698This is useful if your JSON texts are not delimited by an outer protocol
689and you need to know where the JSON text ends. 699and you need to know where the JSON text ends.
690 700
691 JSON::XS->new->decode_prefix ("[1] the tail") 701 JSON::XS->new->decode_prefix ("[1] the tail")
692 => ([], 3) 702 => ([1], 3)
693 703
694=back 704=back
695 705
696 706
697=head1 INCREMENTAL PARSING 707=head1 INCREMENTAL PARSING
738C<incr_skip> to skip the erroneous part). This is the most common way of 748C<incr_skip> to skip the erroneous part). This is the most common way of
739using the method. 749using the method.
740 750
741And finally, in list context, it will try to extract as many objects 751And finally, in list context, it will try to extract as many objects
742from the stream as it can find and return them, or the empty list 752from the stream as it can find and return them, or the empty list
743otherwise. For this to work, there must be no separators between the JSON 753otherwise. For this to work, there must be no separators (other than
744objects or arrays, instead they must be concatenated back-to-back. If 754whitespace) between the JSON objects or arrays, instead they must be
745an error occurs, an exception will be raised as in the scalar context 755concatenated back-to-back. If an error occurs, an exception will be
746case. Note that in this case, any previously-parsed JSON texts will be 756raised as in the scalar context case. Note that in this case, any
747lost. 757previously-parsed JSON texts will be lost.
748 758
749Example: Parse some JSON arrays/objects in a given string and return 759Example: Parse some JSON arrays/objects in a given string and return
750them. 760them.
751 761
752 my @objs = JSON::XS->new->incr_parse ("[5][7][1,2]"); 762 my @objs = JSON::XS->new->incr_parse ("[5][7][1,2]");
758C<incr_parse> in I<scalar context> successfully returned an object. Under 768C<incr_parse> in I<scalar context> successfully returned an object. Under
759all other circumstances you must not call this function (I mean it. 769all other circumstances you must not call this function (I mean it.
760although in simple tests it might actually work, it I<will> fail under 770although in simple tests it might actually work, it I<will> fail under
761real world conditions). As a special exception, you can also call this 771real world conditions). As a special exception, you can also call this
762method before having parsed anything. 772method before having parsed anything.
773
774That means you can only use this function to look at or manipulate text
775before or after complete JSON objects, not while the parser is in the
776middle of parsing a JSON object.
763 777
764This function is useful in two cases: a) finding the trailing text after a 778This function is useful in two cases: a) finding the trailing text after a
765JSON object or b) parsing multiple JSON objects separated by non-JSON text 779JSON object or b) parsing multiple JSON objects separated by non-JSON text
766(such as commas). 780(such as commas).
767 781
1017Another nonstandard extension to the JSON syntax, enabled with the 1031Another nonstandard extension to the JSON syntax, enabled with the
1018C<allow_tags> setting, are tagged values. In this implementation, the 1032C<allow_tags> setting, are tagged values. In this implementation, the
1019I<tag> must be a perl package/class name encoded as a JSON string, and the 1033I<tag> must be a perl package/class name encoded as a JSON string, and the
1020I<value> must be a JSON array encoding optional constructor arguments. 1034I<value> must be a JSON array encoding optional constructor arguments.
1021 1035
1022See "OBJECT SERIALISATION", below, for details. 1036See L<OBJECT SERIALISATION>, below, for details.
1023 1037
1024=back 1038=back
1025 1039
1026 1040
1027=head2 PERL -> JSON 1041=head2 PERL -> JSON
1066directly if you want. 1080directly if you want.
1067 1081
1068=item blessed objects 1082=item blessed objects
1069 1083
1070Blessed objects are not directly representable in JSON, but C<JSON::XS> 1084Blessed objects are not directly representable in JSON, but C<JSON::XS>
1071allows various ways of handling objects. See "OBJECT SERIALISATION", 1085allows various ways of handling objects. See L<OBJECT SERIALISATION>,
1072below, for details. 1086below, for details.
1073 1087
1074=item simple scalars 1088=item simple scalars
1075 1089
1076Simple Perl scalars (any scalar that is not a reference) are the most 1090Simple Perl scalars (any scalar that is not a reference) are the most
1129C<allow_blessed>, C<convert_blessed> and C<allow_tags> settings, which are 1143C<allow_blessed>, C<convert_blessed> and C<allow_tags> settings, which are
1130used in this order: 1144used in this order:
1131 1145
1132=over 4 1146=over 4
1133 1147
1134=item 1. C<allow_tags> is enabled and object has a C<FREEZE> method. 1148=item 1. C<allow_tags> is enabled and the object has a C<FREEZE> method.
1135 1149
1136In this case, C<JSON::XS> uses the L<Types::Serialiser> object 1150In this case, C<JSON::XS> uses the L<Types::Serialiser> object
1137serialisation protocol to create a tagged JSON value, using a nonstandard 1151serialisation protocol to create a tagged JSON value, using a nonstandard
1138extension to the JSON syntax. 1152extension to the JSON syntax.
1139 1153
1145more). These values and the paclkage/classname of the object will then be 1159more). These values and the paclkage/classname of the object will then be
1146encoded as a tagged JSON value in the following format: 1160encoded as a tagged JSON value in the following format:
1147 1161
1148 ("classname")[FREEZE return values...] 1162 ("classname")[FREEZE return values...]
1149 1163
1164e.g.:
1165
1166 ("URI")["http://www.google.com/"]
1167 ("MyDate")[2013,10,29]
1168 ("ImageData::JPEG")["Z3...VlCg=="]
1169
1150For example, the hypothetical C<My::Object> C<FREEZE> method might use the 1170For example, the hypothetical C<My::Object> C<FREEZE> method might use the
1151objects C<type> and C<id> members to encode the object: 1171objects C<type> and C<id> members to encode the object:
1152 1172
1153 sub My::Object::FREEZE { 1173 sub My::Object::FREEZE {
1154 my ($self, $serialiser) = @_; 1174 my ($self, $serialiser) = @_;
1155 1175
1156 ($self->{type}, $self->{id}) 1176 ($self->{type}, $self->{id})
1157 } 1177 }
1158 1178
1159=item 2. C<convert_blessed> is enabled and object has a C<TO_JSON> method. 1179=item 2. C<convert_blessed> is enabled and the object has a C<TO_JSON> method.
1160 1180
1161In this case, the C<TO_JSON> method of the object is invoked in scalar 1181In this case, the C<TO_JSON> method of the object is invoked in scalar
1162context. It must return a single scalar that can be directly encoded into 1182context. It must return a single scalar that can be directly encoded into
1163JSON. This scalar replaces the object in the JSON text. 1183JSON. This scalar replaces the object in the JSON text.
1164 1184
1269expect your input strings to be encoded as UTF-8, that is, no "character" 1289expect your input strings to be encoded as UTF-8, that is, no "character"
1270of the input string must have any value > 255, as UTF-8 does not allow 1290of the input string must have any value > 255, as UTF-8 does not allow
1271that. 1291that.
1272 1292
1273The C<utf8> flag therefore switches between two modes: disabled means you 1293The C<utf8> flag therefore switches between two modes: disabled means you
1274will get a Unicode string in Perl, enabled means you get an UTF-8 encoded 1294will get a Unicode string in Perl, enabled means you get a UTF-8 encoded
1275octet/binary string in Perl. 1295octet/binary string in Perl.
1276 1296
1277=item C<latin1> or C<ascii> flags enabled 1297=item C<latin1> or C<ascii> flags enabled
1278 1298
1279With C<latin1> (or C<ascii>) enabled, C<encode> will escape characters 1299With C<latin1> (or C<ascii>) enabled, C<encode> will escape characters
1547are browser design bugs, but it is still you who will have to deal with 1567are browser design bugs, but it is still you who will have to deal with
1548it, as major browser developers care only for features, not about getting 1568it, as major browser developers care only for features, not about getting
1549security right). 1569security right).
1550 1570
1551 1571
1572=head1 "OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159)
1573
1574TL;DR: Due to security concerns, JSON::XS will not allow scalar data in
1575JSON texts by default - you need to create your own JSON::XS object and
1576enable C<allow_nonref>:
1577
1578
1579 my $json = JSON::XS->new->allow_nonref;
1580
1581 $text = $json->encode ($data);
1582 $data = $json->decode ($text);
1583
1584The long version: JSON being an important and supposedly stable format,
1585the IETF standardised it as RFC 4627 in 2006. Unfortunately, the inventor
1586of JSON, Dougles Crockford, unilaterally changed the definition of JSON in
1587javascript. Rather than create a fork, the IETF decided to standardise the
1588new syntax (apparently, so Iw as told, without finding it very amusing).
1589
1590The biggest difference between thed original JSON and the new JSON is that
1591the new JSON supports scalars (anything other than arrays and objects) at
1592the toplevel of a JSON text. While this is strictly backwards compatible
1593to older versions, it breaks a number of protocols that relied on sending
1594JSON back-to-back, and is a minor security concern.
1595
1596For example, imagine you have two banks communicating, and on one side,
1597trhe JSON coder gets upgraded. Two messages, such as C<10> and C<1000>
1598might then be confused to mean C<101000>, something that couldn't happen
1599in the original JSON, because niether of these messages would be valid
1600JSON.
1601
1602If one side accepts these messages, then an upgrade in the coder on either
1603side could result in this becoming exploitable.
1604
1605This module has always allowed these messages as an optional extension, by
1606default disabled. The security concerns are the reason why the default is
1607still disabled, but future versions might/will likely upgrade to the newer
1608RFC as default format, so you are advised to check your implementation
1609and/or override the default with C<< ->allow_nonref (0) >> to ensure that
1610future versions are safe.
1611
1612
1552=head1 INTEROPERABILITY WITH OTHER MODULES 1613=head1 INTEROPERABILITY WITH OTHER MODULES
1553 1614
1554C<JSON::XS> uses the L<Types::Serialiser> module to provide boolean 1615C<JSON::XS> uses the L<Types::Serialiser> module to provide boolean
1555constants. That means that the JSON true and false values will be 1616constants. That means that the JSON true and false values will be
1556comaptible to true and false values of iother modules that do the same, 1617comaptible to true and false values of other modules that do the same,
1557such as L<JSON::PP> and L<CBOR::XS>. 1618such as L<JSON::PP> and L<CBOR::XS>.
1558 1619
1559 1620
1621=head1 INTEROPERABILITY WITH OTHER JSON DECODERS
1622
1623As long as you only serialise data that can be directly expressed in JSON,
1624C<JSON::XS> is incapable of generating invalid JSON output (modulo bugs,
1625but C<JSON::XS> has found more bugs in the official JSON testsuite (1)
1626than the official JSON testsuite has found in C<JSON::XS> (0)).
1627
1628When you have trouble decoding JSON generated by this module using other
1629decoders, then it is very likely that you have an encoding mismatch or the
1630other decoder is broken.
1631
1632When decoding, C<JSON::XS> is strict by default and will likely catch all
1633errors. There are currently two settings that change this: C<relaxed>
1634makes C<JSON::XS> accept (but not generate) some non-standard extensions,
1635and C<allow_tags> will allow you to encode and decode Perl objects, at the
1636cost of not outputting valid JSON anymore.
1637
1638=head2 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS
1639
1640When you use C<allow_tags> to use the extended (and also nonstandard and
1641invalid) JSON syntax for serialised objects, and you still want to decode
1642the generated When you want to serialise objects, you can run a regex
1643to replace the tagged syntax by standard JSON arrays (it only works for
1644"normal" package names without comma, newlines or single colons). First,
1645the readable Perl version:
1646
1647 # if your FREEZE methods return no values, you need this replace first:
1648 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx;
1649
1650 # this works for non-empty constructor arg lists:
1651 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx;
1652
1653And here is a less readable version that is easy to adapt to other
1654languages:
1655
1656 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g;
1657
1658Here is an ECMAScript version (same regex):
1659
1660 json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,");
1661
1662Since this syntax converts to standard JSON arrays, it might be hard to
1663distinguish serialised objects from normal arrays. You can prepend a
1664"magic number" as first array element to reduce chances of a collision:
1665
1666 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g;
1667
1668And after decoding the JSON text, you could walk the data
1669structure looking for arrays with a first element of
1670C<XU1peReLzT4ggEllLanBYq4G9VzliwKF>.
1671
1672The same approach can be used to create the tagged format with another
1673encoder. First, you create an array with the magic string as first member,
1674the classname as second, and constructor arguments last, encode it as part
1675of your JSON structure, and then:
1676
1677 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
1678
1679Again, this has some limitations - the magic string must not be encoded
1680with character escapes, and the constructor arguments must be non-empty.
1681
1682
1683=head1 RFC7159
1684
1685Since this module was written, Google has written a new JSON RFC, RFC 7159
1686(and RFC7158). Unfortunately, this RFC breaks compatibility with both the
1687original JSON specification on www.json.org and RFC4627.
1688
1689As far as I can see, you can get partial compatibility when parsing by
1690using C<< ->allow_nonref >>. However, consider the security implications
1691of doing so.
1692
1693I haven't decided yet when to break compatibility with RFC4627 by default
1694(and potentially leave applications insecure) and change the default to
1695follow RFC7159, but application authors are well advised to call C<<
1696->allow_nonref(0) >> even if this is the current default, if they cannot
1697handle non-reference values, in preparation for the day when the default
1698will change.
1699
1700
1560=head1 THREADS 1701=head1 (I-)THREADS
1561 1702
1562This module is I<not> guaranteed to be thread safe and there are no 1703This module is I<not> guaranteed to be ithread (or MULTIPLICITY-) safe
1563plans to change this until Perl gets thread support (as opposed to the 1704and there are no plans to change this. Note that perl's builtin so-called
1564horribly slow so-called "threads" which are simply slow and bloated 1705threads/ithreads are officially deprecated and should not be used.
1565process simulations - use fork, it's I<much> faster, cheaper, better).
1566
1567(It might actually work, but you have been warned).
1568 1706
1569 1707
1570=head1 THE PERILS OF SETLOCALE 1708=head1 THE PERILS OF SETLOCALE
1571 1709
1572Sometimes people avoid the Perl locale support and directly call the 1710Sometimes people avoid the Perl locale support and directly call the

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines