--- JSON-XS/XS.pm 2013/10/29 00:19:08 1.147 +++ JSON-XS/XS.pm 2017/09/05 13:07:09 1.165 @@ -42,8 +42,8 @@ overridden) with no overhead due to emulation (by inheriting constructor and methods). If JSON::XS is not available, it will fall back to the compatible JSON::PP module as backend, so using JSON instead of JSON::XS -gives you a portable JSON API that can be fast when you need and doesn't -require a C compiler when that is a problem. +gives you a portable JSON API that can be fast when you need it and +doesn't require a C compiler when that is a problem. As this is the n-th-something JSON module on CPAN, what was the reason to write yet another JSON module? While it seems there are many JSON @@ -103,7 +103,7 @@ use common::sense; -our $VERSION = '3.0'; +our $VERSION = 3.04; our @ISA = qw(Exporter); our @EXPORT = qw(encode_json decode_json); @@ -133,8 +133,8 @@ =item $perl_scalar = decode_json $json_text -The opposite of C: expects an UTF-8 (binary) string and tries -to parse that as an UTF-8 encoded JSON text, returning the resulting +The opposite of C: expects a UTF-8 (binary) string and tries +to parse that as a UTF-8 encoded JSON text, returning the resulting reference. Croaks on error. This function call is functionally identical to: @@ -272,7 +272,7 @@ If C<$enable> is true (or missing), then the C method will encode the JSON result into UTF-8, as required by many protocols, while the -C method expects to be handled an UTF-8-encoded string. Please +C method expects to be handed a UTF-8-encoded string. Please note that UTF-8-encoded strings do not contain any characters outside the range C<0..255>, they are thus useful for bytewise/binary I/O. In future versions, enabling this option might enable autodetection of the UTF-16 @@ -406,6 +406,16 @@ # neither this one... ] +=item * literal ASCII TAB characters in strings + +Literal ASCII TAB characters are now allowed in strings (and treated as +C<\t>). + + [ + "Hello\tWorld", + "HelloWorld", # literal would not normally be allowed + ] + =back =item $json = $json->canonical ([$enable]) @@ -485,7 +495,7 @@ =item $enabled = $json->get_convert_blessed -See "OBJECT SERIALISATION" for details. +See L for details. If C<$enable> is true (or missing), then C, upon encountering a blessed object, will check for the availability of the C method @@ -509,7 +519,7 @@ =item $enabled = $json->allow_tags -See "OBJECT SERIALISATION" for details. +See L for details. If C<$enable> is true (or missing), then C, upon encountering a blessed object, will check for the availability of the C method on @@ -689,7 +699,7 @@ and you need to know where the JSON text ends. JSON::XS->new->decode_prefix ("[1] the tail") - => ([], 3) + => ([1], 3) =back @@ -740,11 +750,11 @@ And finally, in list context, it will try to extract as many objects from the stream as it can find and return them, or the empty list -otherwise. For this to work, there must be no separators between the JSON -objects or arrays, instead they must be concatenated back-to-back. If -an error occurs, an exception will be raised as in the scalar context -case. Note that in this case, any previously-parsed JSON texts will be -lost. +otherwise. For this to work, there must be no separators (other than +whitespace) between the JSON objects or arrays, instead they must be +concatenated back-to-back. If an error occurs, an exception will be +raised as in the scalar context case. Note that in this case, any +previously-parsed JSON texts will be lost. Example: Parse some JSON arrays/objects in a given string and return them. @@ -761,6 +771,10 @@ real world conditions). As a special exception, you can also call this method before having parsed anything. +That means you can only use this function to look at or manipulate text +before or after complete JSON objects, not while the parser is in the +middle of parsing a JSON object. + This function is useful in two cases: a) finding the trailing text after a JSON object or b) parsing multiple JSON objects separated by non-JSON text (such as commas). @@ -1019,7 +1033,7 @@ I must be a perl package/class name encoded as a JSON string, and the I must be a JSON array encoding optional constructor arguments. -See "OBJECT SERIALISATION", below, for details. +See L, below, for details. =back @@ -1068,7 +1082,7 @@ =item blessed objects Blessed objects are not directly representable in JSON, but C -allows various ways of handling objects. See "OBJECT SERIALISATION", +allows various ways of handling objects. See L, below, for details. =item simple scalars @@ -1131,7 +1145,7 @@ =over 4 -=item 1. C is enabled and object has a C method. +=item 1. C is enabled and the object has a C method. In this case, C uses the L object serialisation protocol to create a tagged JSON value, using a nonstandard @@ -1147,6 +1161,12 @@ ("classname")[FREEZE return values...] +e.g.: + + ("URI")["http://www.google.com/"] + ("MyDate")[2013,10,29] + ("ImageData::JPEG")["Z3...VlCg=="] + For example, the hypothetical C C method might use the objects C and C members to encode the object: @@ -1156,7 +1176,7 @@ ($self->{type}, $self->{id}) } -=item 2. C is enabled and object has a C method. +=item 2. C is enabled and the object has a C method. In this case, the C method of the object is invoked in scalar context. It must return a single scalar that can be directly encoded into @@ -1271,7 +1291,7 @@ that. The C flag therefore switches between two modes: disabled means you -will get a Unicode string in Perl, enabled means you get an UTF-8 encoded +will get a Unicode string in Perl, enabled means you get a UTF-8 encoded octet/binary string in Perl. =item C or C flags enabled @@ -1549,22 +1569,140 @@ security right). +=head1 "OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159) + +TL;DR: Due to security concerns, JSON::XS will not allow scalar data in +JSON texts by default - you need to create your own JSON::XS object and +enable C: + + + my $json = JSON::XS->new->allow_nonref; + + $text = $json->encode ($data); + $data = $json->decode ($text); + +The long version: JSON being an important and supposedly stable format, +the IETF standardised it as RFC 4627 in 2006. Unfortunately, the inventor +of JSON, Dougles Crockford, unilaterally changed the definition of JSON in +javascript. Rather than create a fork, the IETF decided to standardise the +new syntax (apparently, so Iw as told, without finding it very amusing). + +The biggest difference between thed original JSON and the new JSON is that +the new JSON supports scalars (anything other than arrays and objects) at +the toplevel of a JSON text. While this is strictly backwards compatible +to older versions, it breaks a number of protocols that relied on sending +JSON back-to-back, and is a minor security concern. + +For example, imagine you have two banks communicating, and on one side, +trhe JSON coder gets upgraded. Two messages, such as C<10> and C<1000> +might then be confused to mean C<101000>, something that couldn't happen +in the original JSON, because niether of these messages would be valid +JSON. + +If one side accepts these messages, then an upgrade in the coder on either +side could result in this becoming exploitable. + +This module has always allowed these messages as an optional extension, by +default disabled. The security concerns are the reason why the default is +still disabled, but future versions might/will likely upgrade to the newer +RFC as default format, so you are advised to check your implementation +and/or override the default with C<< ->allow_nonref (0) >> to ensure that +future versions are safe. + + =head1 INTEROPERABILITY WITH OTHER MODULES C uses the L module to provide boolean constants. That means that the JSON true and false values will be -comaptible to true and false values of iother modules that do the same, +comaptible to true and false values of other modules that do the same, such as L and L. -=head1 THREADS +=head1 INTEROPERABILITY WITH OTHER JSON DECODERS + +As long as you only serialise data that can be directly expressed in JSON, +C is incapable of generating invalid JSON output (modulo bugs, +but C has found more bugs in the official JSON testsuite (1) +than the official JSON testsuite has found in C (0)). + +When you have trouble decoding JSON generated by this module using other +decoders, then it is very likely that you have an encoding mismatch or the +other decoder is broken. + +When decoding, C is strict by default and will likely catch all +errors. There are currently two settings that change this: C +makes C accept (but not generate) some non-standard extensions, +and C will allow you to encode and decode Perl objects, at the +cost of not outputting valid JSON anymore. + +=head2 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS + +When you use C to use the extended (and also nonstandard and +invalid) JSON syntax for serialised objects, and you still want to decode +the generated When you want to serialise objects, you can run a regex +to replace the tagged syntax by standard JSON arrays (it only works for +"normal" package names without comma, newlines or single colons). First, +the readable Perl version: + + # if your FREEZE methods return no values, you need this replace first: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx; + + # this works for non-empty constructor arg lists: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx; + +And here is a less readable version that is easy to adapt to other +languages: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g; + +Here is an ECMAScript version (same regex): + + json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,"); + +Since this syntax converts to standard JSON arrays, it might be hard to +distinguish serialised objects from normal arrays. You can prepend a +"magic number" as first array element to reduce chances of a collision: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g; + +And after decoding the JSON text, you could walk the data +structure looking for arrays with a first element of +C. + +The same approach can be used to create the tagged format with another +encoder. First, you create an array with the magic string as first member, +the classname as second, and constructor arguments last, encode it as part +of your JSON structure, and then: + + $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; + +Again, this has some limitations - the magic string must not be encoded +with character escapes, and the constructor arguments must be non-empty. + + +=head1 RFC7159 + +Since this module was written, Google has written a new JSON RFC, RFC 7159 +(and RFC7158). Unfortunately, this RFC breaks compatibility with both the +original JSON specification on www.json.org and RFC4627. + +As far as I can see, you can get partial compatibility when parsing by +using C<< ->allow_nonref >>. However, consider the security implications +of doing so. + +I haven't decided yet when to break compatibility with RFC4627 by default +(and potentially leave applications insecure) and change the default to +follow RFC7159, but application authors are well advised to call C<< +->allow_nonref(0) >> even if this is the current default, if they cannot +handle non-reference values, in preparation for the day when the default +will change. + -This module is I guaranteed to be thread safe and there are no -plans to change this until Perl gets thread support (as opposed to the -horribly slow so-called "threads" which are simply slow and bloated -process simulations - use fork, it's I faster, cheaper, better). +=head1 (I-)THREADS -(It might actually work, but you have been warned). +This module is I guaranteed to be ithread (or MULTIPLICITY-) safe +and there are no plans to change this. Note that perl's builtin so-called +theeads/ithreads are officially deprecated and should not be used. =head1 THE PERILS OF SETLOCALE