--- JSON-XS/XS.pm 2013/10/25 19:53:08 1.141 +++ JSON-XS/XS.pm 2017/08/17 01:42:19 1.163 @@ -42,8 +42,8 @@ overridden) with no overhead due to emulation (by inheriting constructor and methods). If JSON::XS is not available, it will fall back to the compatible JSON::PP module as backend, so using JSON instead of JSON::XS -gives you a portable JSON API that can be fast when you need and doesn't -require a C compiler when that is a problem. +gives you a portable JSON API that can be fast when you need it and +doesn't require a C compiler when that is a problem. As this is the n-th-something JSON module on CPAN, what was the reason to write yet another JSON module? While it seems there are many JSON @@ -103,7 +103,7 @@ use common::sense; -our $VERSION = 2.34; +our $VERSION = 3.03; our @ISA = qw(Exporter); our @EXPORT = qw(encode_json decode_json); @@ -111,6 +111,8 @@ use Exporter; use XSLoader; +use Types::Serialiser (); + =head1 FUNCTIONAL INTERFACE The following convenience methods are provided by this module. They are @@ -141,15 +143,6 @@ Except being faster. -=item $is_boolean = JSON::XS::is_bool $scalar - -Returns true if the passed scalar represents either JSON::XS::true or -JSON::XS::false, two constants that act like C<1> and C<0>, respectively -and are used to represent JSON C and C values in Perl. - -See MAPPING, below, for more information on how JSON values are mapped to -Perl. - =back @@ -413,6 +406,16 @@ # neither this one... ] +=item * literal ASCII TAB characters in strings + +Literal ASCII TAB characters are now allowed in strings (and treated as +C<\t>). + + [ + "Hello\tWorld", + "HelloWorld", # literal would not normally be allowed + ] + =back =item $json = $json->canonical ([$enable]) @@ -476,26 +479,28 @@ =item $enabled = $json->get_allow_blessed +See L for details. + If C<$enable> is true (or missing), then the C method will not -barf when it encounters a blessed reference. Instead, the value of the -B option will decide whether C (C -disabled or no C method found) or a representation of the -object (C enabled and C method found) is being -encoded. Has no effect on C. +barf when it encounters a blessed reference that it cannot convert +otherwise. Instead, a JSON C value is encoded instead of the object. If C<$enable> is false (the default), then C will throw an -exception when it encounters a blessed object. +exception when it encounters a blessed object that it cannot convert +otherwise. + +This setting has no effect on C. =item $json = $json->convert_blessed ([$enable]) =item $enabled = $json->get_convert_blessed +See L for details. + If C<$enable> is true (or missing), then C, upon encountering a blessed object, will check for the availability of the C method -on the object's class. If found, it will be called in scalar context -and the resulting scalar will be encoded instead of the object. If no -C method is found, the value of C will decide what -to do. +on the object's class. If found, it will be called in scalar context and +the resulting scalar will be encoded instead of the object. The C method may safely call die if it wants. If C returns other blessed objects, those will be handled in the same @@ -505,12 +510,28 @@ usually in upper case letters and to avoid collisions with any C function or method. -This setting does not yet influence C in any way, but in the -future, global hooks might get installed that influence C and are -enabled by this setting. +If C<$enable> is false (the default), then C will not consider +this type of conversion. -If C<$enable> is false, then the C setting will decide what -to do when a blessed object is found. +This setting has no effect on C. + +=item $json = $json->allow_tags ([$enable]) + +=item $enabled = $json->allow_tags + +See L for details. + +If C<$enable> is true (or missing), then C, upon encountering a +blessed object, will check for the availability of the C method on +the object's class. If found, it will be used to serialise the object into +a nonstandard tagged JSON value (that JSON decoders cannot decode). + +It also causes C to parse such tagged JSON values and deserialise +them via a call to the C method. + +If C<$enable> is false (the default), then C will not consider +this type of conversion, and tagged JSON values will cause a parse error +in C, as if tags were not part of the grammar. =item $json = $json->filter_json_object ([$coderef->($hashref)]) @@ -675,11 +696,10 @@ so far. This is useful if your JSON texts are not delimited by an outer protocol -(which is not the brightest thing to do in the first place) and you need -to know where the JSON text ends. +and you need to know where the JSON text ends. JSON::XS->new->decode_prefix ("[1] the tail") - => ([], 3) + => ([1], 3) =back @@ -730,11 +750,11 @@ And finally, in list context, it will try to extract as many objects from the stream as it can find and return them, or the empty list -otherwise. For this to work, there must be no separators between the JSON -objects or arrays, instead they must be concatenated back-to-back. If -an error occurs, an exception will be raised as in the scalar context -case. Note that in this case, any previously-parsed JSON texts will be -lost. +otherwise. For this to work, there must be no separators (other than +whitespace) between the JSON objects or arrays, instead they must be +concatenated back-to-back. If an error occurs, an exception will be +raised as in the scalar context case. Note that in this case, any +previously-parsed JSON texts will be lost. Example: Parse some JSON arrays/objects in a given string and return them. @@ -751,6 +771,10 @@ real world conditions). As a special exception, you can also call this method before having parsed anything. +That means you can only use this function to look at or manipulate text +before or after complete JSON objects, not while the parser is in the +middle of parsing a JSON object. + This function is useful in two cases: a) finding the trailing text after a JSON object or b) parsing multiple JSON objects separated by non-JSON text (such as commas). @@ -780,10 +804,10 @@ =head2 LIMITATIONS All options that affect decoding are supported, except -C. The reason for this is that it cannot be made to -work sensibly: JSON objects and arrays are self-delimited, i.e. you can concatenate -them back to back and still decode them perfectly. This does not hold true -for JSON numbers, however. +C. The reason for this is that it cannot be made to work +sensibly: JSON objects and arrays are self-delimited, i.e. you can +concatenate them back to back and still decode them perfectly. This does +not hold true for JSON numbers, however. For example, is the string C<1> a single JSON number, or is it simply the start of C<12>? Or is C<12> a single JSON number, or the concatenation @@ -986,15 +1010,31 @@ =item true, false -These JSON atoms become C and C, -respectively. They are overloaded to act almost exactly like the numbers -C<1> and C<0>. You can check whether a scalar is a JSON boolean by using -the C function. +These JSON atoms become C and +C, respectively. They are overloaded to act +almost exactly like the numbers C<1> and C<0>. You can check whether +a scalar is a JSON boolean by using the C +function (after C, of course). =item null A JSON null atom becomes C in Perl. +=item shell-style comments (C<< # I >>) + +As a nonstandard extension to the JSON syntax that is enabled by the +C setting, shell-style comments are allowed. They can start +anywhere outside strings and go till the end of the line. + +=item tagged values (C<< (I)I >>). + +Another nonstandard extension to the JSON syntax, enabled with the +C setting, are tagged values. In this implementation, the +I must be a perl package/class name encoded as a JSON string, and the +I must be a JSON array encoding optional constructor arguments. + +See L, below, for details. + =back @@ -1008,15 +1048,13 @@ =item hash references -Perl hash references become JSON objects. As there is no inherent ordering -in hash keys (or JSON objects), they will usually be encoded in a -pseudo-random order that can change between runs of the same program but -stays generally the same within a single run of a program. JSON::XS can -optionally sort the hash keys (determined by the I flag), so -the same datastructure will serialise to the same JSON text (given same -settings and version of JSON::XS), but this incurs a runtime overhead -and is only rarely useful, e.g. when you want to compare some JSON text -against another for equality. +Perl hash references become JSON objects. As there is no inherent +ordering in hash keys (or JSON objects), they will usually be encoded +in a pseudo-random order. JSON::XS can optionally sort the hash keys +(determined by the I flag), so the same datastructure will +serialise to the same JSON text (given same settings and version of +JSON::XS), but this incurs a runtime overhead and is only rarely useful, +e.g. when you want to compare some JSON text against another for equality. =item array references @@ -1026,23 +1064,26 @@ Other unblessed references are generally not allowed and will cause an exception to be thrown, except for references to the integers C<0> and -C<1>, which get turned into C and C atoms in JSON. You can -also use C and C to improve readability. +C<1>, which get turned into C and C atoms in JSON. + +Since C uses the boolean model from L, you +can also C and then use C +and C to improve readability. - encode_json [\0, JSON::XS::true] # yields [false,true] + use Types::Serialiser; + encode_json [\0, Types::Serialiser::true] # yields [false,true] -=item JSON::XS::true, JSON::XS::false +=item Types::Serialiser::true, Types::Serialiser::false -These special values become JSON true and JSON false values, -respectively. You can also use C<\1> and C<\0> directly if you want. +These special values from the L module become JSON true +and JSON false values, respectively. You can also use C<\1> and C<\0> +directly if you want. =item blessed objects -Blessed objects are not directly representable in JSON. See the -C and C methods on various options on -how to deal with this: basically, you can choose between throwing an -exception, encoding the reference as if it weren't blessed, or provide -your own serialiser method. +Blessed objects are not directly representable in JSON, but C +allows various ways of handling objects. See L, +below, for details. =item simple scalars @@ -1089,6 +1130,114 @@ =back +=head2 OBJECT SERIALISATION + +As JSON cannot directly represent Perl objects, you have to choose between +a pure JSON representation (without the ability to deserialise the object +automatically again), and a nonstandard extension to the JSON syntax, +tagged values. + +=head3 SERIALISATION + +What happens when C encounters a Perl object depends on the +C, C and C settings, which are +used in this order: + +=over 4 + +=item 1. C is enabled and the object has a C method. + +In this case, C uses the L object +serialisation protocol to create a tagged JSON value, using a nonstandard +extension to the JSON syntax. + +This works by invoking the C method on the object, with the first +argument being the object to serialise, and the second argument being the +constant string C to distinguish it from other serialisers. + +The C method can return any number of values (i.e. zero or +more). These values and the paclkage/classname of the object will then be +encoded as a tagged JSON value in the following format: + + ("classname")[FREEZE return values...] + +e.g.: + + ("URI")["http://www.google.com/"] + ("MyDate")[2013,10,29] + ("ImageData::JPEG")["Z3...VlCg=="] + +For example, the hypothetical C C method might use the +objects C and C members to encode the object: + + sub My::Object::FREEZE { + my ($self, $serialiser) = @_; + + ($self->{type}, $self->{id}) + } + +=item 2. C is enabled and the object has a C method. + +In this case, the C method of the object is invoked in scalar +context. It must return a single scalar that can be directly encoded into +JSON. This scalar replaces the object in the JSON text. + +For example, the following C method will convert all L +objects to JSON strings when serialised. The fatc that these values +originally were L objects is lost. + + sub URI::TO_JSON { + my ($uri) = @_; + $uri->as_string + } + +=item 3. C is enabled. + +The object will be serialised as a JSON null value. + +=item 4. none of the above + +If none of the settings are enabled or the respective methods are missing, +C throws an exception. + +=back + +=head3 DESERIALISATION + +For deserialisation there are only two cases to consider: either +nonstandard tagging was used, in which case C decides, +or objects cannot be automatically be deserialised, in which +case you can use postprocessing or the C or +C callbacks to get some real objects our of +your JSON. + +This section only considers the tagged value case: I a tagged JSON object +is encountered during decoding and C is disabled, a parse +error will result (as if tagged values were not part of the grammar). + +If C is enabled, C will look up the C method +of the package/classname used during serialisation (it will not attempt +to load the package as a Perl module). If there is no such method, the +decoding will fail with an error. + +Otherwise, the C method is invoked with the classname as first +argument, the constant string C as second argument, and all the +values from the JSON array (the values originally returned by the +C method) as remaining arguments. + +The method must then return the object. While technically you can return +any Perl scalar, you might have to enable the C setting to +make that work in all cases, so better return an actual blessed reference. + +As an example, let's implement a C function that regenerates the +C from the C example earlier: + + sub My::Object::THAW { + my ($class, $serialiser, $type, $id) = @_; + + $class->new (type => $type, id => $id) + } + =head1 ENCODING/CODESET FLAG NOTES @@ -1420,14 +1569,140 @@ security right). -=head1 THREADS +=head1 "OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159) + +TL;DR: Due to security concerns, JSON::XS will not allow scalar data in +JSON texts by default - you need to create your own JSON::XS object and +enable C: + + + my $json = JSON::XS->new->allow_nonref; + + $text = $json->encode ($data); + $data = $json->decode ($text); + +The long version: JSON being an important and supposedly stable format, +the IETF standardised it as RFC 4627 in 2006. Unfortunately, the inventor +of JSON, Dougles Crockford, unilaterally changed the definition of JSON in +javascript. Rather than create a fork, the IETF decided to standardise the +new syntax (apparently, so Iw as told, without finding it very amusing). + +The biggest difference between thed original JSON and the new JSON is that +the new JSON supports scalars (anything other than arrays and objects) at +the toplevel of a JSON text. While this is strictly backwards compatible +to older versions, it breaks a number of protocols that relied on sending +JSON back-to-back, and is a minor security concern. + +For example, imagine you have two banks communicating, and on one side, +trhe JSON coder gets upgraded. Two messages, such as C<10> and C<1000> +might then be confused to mean C<101000>, something that couldn't happen +in the original JSON, because niether of these messages would be valid +JSON. + +If one side accepts these messages, then an upgrade in the coder on either +side could result in this becoming exploitable. + +This module has always allowed these messages as an optional extension, by +default disabled. The security concerns are the reason why the default is +still disabled, but future versions might/will likely upgrade to the newer +RFC as default format, so you are advised to check your implementation +and/or override the default with C<< ->allow_nonref (0) >> to ensure that +future versions are safe. + + +=head1 INTEROPERABILITY WITH OTHER MODULES + +C uses the L module to provide boolean +constants. That means that the JSON true and false values will be +comaptible to true and false values of other modules that do the same, +such as L and L. + + +=head1 INTEROPERABILITY WITH OTHER JSON DECODERS + +As long as you only serialise data that can be directly expressed in JSON, +C is incapable of generating invalid JSON output (modulo bugs, +but C has found more bugs in the official JSON testsuite (1) +than the official JSON testsuite has found in C (0)). + +When you have trouble decoding JSON generated by this module using other +decoders, then it is very likely that you have an encoding mismatch or the +other decoder is broken. + +When decoding, C is strict by default and will likely catch all +errors. There are currently two settings that change this: C +makes C accept (but not generate) some non-standard extensions, +and C will allow you to encode and decode Perl objects, at the +cost of not outputting valid JSON anymore. + +=head2 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS -This module is I guaranteed to be thread safe and there are no -plans to change this until Perl gets thread support (as opposed to the -horribly slow so-called "threads" which are simply slow and bloated -process simulations - use fork, it's I faster, cheaper, better). +When you use C to use the extended (and also nonstandard and +invalid) JSON syntax for serialised objects, and you still want to decode +the generated When you want to serialise objects, you can run a regex +to replace the tagged syntax by standard JSON arrays (it only works for +"normal" package names without comma, newlines or single colons). First, +the readable Perl version: -(It might actually work, but you have been warned). + # if your FREEZE methods return no values, you need this replace first: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx; + + # this works for non-empty constructor arg lists: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx; + +And here is a less readable version that is easy to adapt to other +languages: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g; + +Here is an ECMAScript version (same regex): + + json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,"); + +Since this syntax converts to standard JSON arrays, it might be hard to +distinguish serialised objects from normal arrays. You can prepend a +"magic number" as first array element to reduce chances of a collision: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g; + +And after decoding the JSON text, you could walk the data +structure looking for arrays with a first element of +C. + +The same approach can be used to create the tagged format with another +encoder. First, you create an array with the magic string as first member, +the classname as second, and constructor arguments last, encode it as part +of your JSON structure, and then: + + $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; + +Again, this has some limitations - the magic string must not be encoded +with character escapes, and the constructor arguments must be non-empty. + + +=head1 RFC7159 + +Since this module was written, Google has written a new JSON RFC, RFC 7159 +(and RFC7158). Unfortunately, this RFC breaks compatibility with both the +original JSON specification on www.json.org and RFC4627. + +As far as I can see, you can get partial compatibility when parsing by +using C<< ->allow_nonref >>. However, consider the security implications +of doing so. + +I haven't decided yet when to break compatibility with RFC4627 by default +(and potentially leave applications insecure) and change the default to +follow RFC7159, but application authors are well advised to call C<< +->allow_nonref(0) >> even if this is the current default, if they cannot +handle non-reference values, in preparation for the day when the default +will change. + + +=head1 (I-)THREADS + +This module is I guaranteed to be ithread (or MULTIPLICITY-) safe +and there are no plans to change this. Note that perl's builtin so-called +theeads/ithreads are officially deprecated and should not be used. =head1 THE PERILS OF SETLOCALE @@ -1459,29 +1734,18 @@ =cut -our $true = do { bless \(my $dummy = 1), "JSON::XS::Boolean" }; -our $false = do { bless \(my $dummy = 0), "JSON::XS::Boolean" }; - -sub true() { $true } -sub false() { $false } +BEGIN { + *true = \$Types::Serialiser::true; + *true = \&Types::Serialiser::true; + *false = \$Types::Serialiser::false; + *false = \&Types::Serialiser::false; + *is_bool = \&Types::Serialiser::is_bool; -sub is_bool($) { - UNIVERSAL::isa $_[0], "JSON::XS::Boolean" -# or UNIVERSAL::isa $_[0], "JSON::Literal" + *JSON::XS::Boolean:: = *Types::Serialiser::Boolean::; } XSLoader::load "JSON::XS", $VERSION; -package JSON::XS::Boolean; - -use overload - "0+" => sub { ${$_[0]} }, - "++" => sub { $_[0] = ${$_[0]} + 1 }, - "--" => sub { $_[0] = ${$_[0]} - 1 }, - fallback => 1; - -1; - =head1 SEE ALSO The F command line utility for quick experiments. @@ -1493,3 +1757,5 @@ =cut +1 +