--- JSON-XS/README 2013/05/23 09:32:02 1.37 +++ JSON-XS/README 2016/02/26 21:46:45 1.40 @@ -79,7 +79,7 @@ * simple to use This module has both a simple functional interface as well as an - object oriented interface interface. + object oriented interface. * reasonably versatile output formats @@ -115,15 +115,6 @@ Except being faster. - $is_boolean = JSON::XS::is_bool $scalar - Returns true if the passed scalar represents either JSON::XS::true - or JSON::XS::false, two constants that act like 1 and 0, - respectively and are used to represent JSON "true" and "false" - values in Perl. - - See MAPPING, below, for more information on how JSON values are - mapped to Perl. - A FEW NOTES ON UNICODE AND PERL Since this often leads to confusion, here are a few very clear words on how Unicode works in Perl, modulo bugs. @@ -363,6 +354,16 @@ # neither this one... ] + * literal ASCII TAB characters in strings + + Literal ASCII TAB characters are now allowed in strings (and + treated as "\t"). + + [ + "Hello\tWorld", + "HelloWorld", # literal would not normally be allowed + ] + $json = $json->canonical ([$enable]) $enabled = $json->get_canonical If $enable is true (or missing), then the "encode" method will @@ -419,24 +420,28 @@ $json = $json->allow_blessed ([$enable]) $enabled = $json->get_allow_blessed + See "OBJECT SERIALISATION" for details. + If $enable is true (or missing), then the "encode" method will not - barf when it encounters a blessed reference. Instead, the value of - the convert_blessed option will decide whether "null" - ("convert_blessed" disabled or no "TO_JSON" method found) or a - representation of the object ("convert_blessed" enabled and - "TO_JSON" method found) is being encoded. Has no effect on "decode". + barf when it encounters a blessed reference that it cannot convert + otherwise. Instead, a JSON "null" value is encoded instead of the + object. If $enable is false (the default), then "encode" will throw an - exception when it encounters a blessed object. + exception when it encounters a blessed object that it cannot convert + otherwise. + + This setting has no effect on "decode". $json = $json->convert_blessed ([$enable]) $enabled = $json->get_convert_blessed + See "OBJECT SERIALISATION" for details. + If $enable is true (or missing), then "encode", upon encountering a blessed object, will check for the availability of the "TO_JSON" method on the object's class. If found, it will be called in scalar context and the resulting scalar will be encoded instead of the - object. If no "TO_JSON" method is found, the value of - "allow_blessed" will decide what to do. + object. The "TO_JSON" method may safely call die if it wants. If "TO_JSON" returns other blessed objects, those will be handled in the same @@ -446,12 +451,27 @@ the object) are usually in upper case letters and to avoid collisions with any "to_json" function or method. - This setting does not yet influence "decode" in any way, but in the - future, global hooks might get installed that influence "decode" and - are enabled by this setting. + If $enable is false (the default), then "encode" will not consider + this type of conversion. + + This setting has no effect on "decode". + + $json = $json->allow_tags ([$enable]) + $enabled = $json->allow_tags + See "OBJECT SERIALISATION" for details. - If $enable is false, then the "allow_blessed" setting will decide - what to do when a blessed object is found. + If $enable is true (or missing), then "encode", upon encountering a + blessed object, will check for the availability of the "FREEZE" + method on the object's class. If found, it will be used to serialise + the object into a nonstandard tagged JSON value (that JSON decoders + cannot decode). + + It also causes "decode" to parse such tagged JSON values and + deserialise them via a call to the "THAW" method. + + If $enable is false (the default), then "encode" will not consider + this type of conversion, and tagged JSON values will cause a parse + error in "decode", as if tags were not part of the grammar. $json = $json->filter_json_object ([$coderef->($hashref)]) When $coderef is specified, it will be called from "decode" each @@ -597,22 +617,13 @@ useful. $json_text = $json->encode ($perl_scalar) - Converts the given Perl data structure (a simple scalar or a - reference to a hash or array) to its JSON representation. Simple - scalars will be converted into JSON string or number sequences, - while references to arrays become JSON arrays and references to - hashes become JSON objects. Undefined Perl values (e.g. "undef") - become JSON "null" values. Neither "true" nor "false" values will be - generated. + Converts the given Perl value or data structure to its JSON + representation. Croaks on error. $perl_scalar = $json->decode ($json_text) The opposite of "encode": expects a JSON text and tries to parse it, returning the resulting simple scalar or reference. Croaks on error. - JSON numbers and strings become simple Perl scalars. JSON arrays - become Perl arrayrefs and JSON objects become Perl hashrefs. "true" - becomes 1, "false" becomes 0 and "null" becomes "undef". - ($perl_scalar, $characters) = $json->decode_prefix ($json_text) This works like the "decode" method, but instead of raising an exception when there is trailing garbage after the first JSON @@ -620,11 +631,10 @@ characters consumed so far. This is useful if your JSON texts are not delimited by an outer - protocol (which is not the brightest thing to do in the first place) - and you need to know where the JSON text ends. + protocol and you need to know where the JSON text ends. JSON::XS->new->decode_prefix ("[1] the tail") - => ([], 3) + => ([1], 3) INCREMENTAL PARSING In some cases, there is the need for incremental parsing of JSON texts. @@ -663,7 +673,7 @@ extract exactly *one* JSON object. If that is successful, it will return this object, otherwise it will return "undef". If there is a parse error, this method will croak just as "decode" would do (one - can then use "incr_skip" to skip the errornous part). This is the + can then use "incr_skip" to skip the erroneous part). This is the most common way of using the method. And finally, in list context, it will try to extract as many objects @@ -701,7 +711,7 @@ to reset the parse state. The difference to "incr_reset" is that only text until the parse - error occured is removed. + error occurred is removed. $json->incr_reset This completely resets the incremental parser, that is, after this @@ -895,7 +905,7 @@ represent it as a numeric (floating point) value if that is possible without loss of precision. Otherwise it will preserve the number as a string value (in which case you lose roundtripping ability, as the - JSON number will be re-encoded toa JSON string). + JSON number will be re-encoded to a JSON string). Numbers containing a fractional or exponential part will always be represented as numeric (floating point) values, possibly at a loss @@ -906,17 +916,32 @@ Note that precision is not accuracy - binary floating point values cannot represent most decimal fractions exactly, and when converting from and to floating point, JSON::XS only guarantees precision up to - but not including the leats significant bit. + but not including the least significant bit. true, false - These JSON atoms become "JSON::XS::true" and "JSON::XS::false", - respectively. They are overloaded to act almost exactly like the - numbers 1 and 0. You can check whether a scalar is a JSON boolean by - using the "JSON::XS::is_bool" function. + These JSON atoms become "Types::Serialiser::true" and + "Types::Serialiser::false", respectively. They are overloaded to act + almost exactly like the numbers 1 and 0. You can check whether a + scalar is a JSON boolean by using the "Types::Serialiser::is_bool" + function (after "use Types::Serialier", of course). null A JSON null atom becomes "undef" in Perl. + shell-style comments ("# *text*") + As a nonstandard extension to the JSON syntax that is enabled by the + "relaxed" setting, shell-style comments are allowed. They can start + anywhere outside strings and go till the end of the line. + + tagged values ("(*tag*)*value*"). + Another nonstandard extension to the JSON syntax, enabled with the + "allow_tags" setting, are tagged values. In this implementation, the + *tag* must be a perl package/class name encoded as a JSON string, + and the *value* must be a JSON array encoding optional constructor + arguments. + + See "OBJECT SERIALISATION", below, for details. + PERL -> JSON The mapping from Perl to JSON is slightly more difficult, as Perl is a truly typeless language, so we can only guess which JSON type is meant @@ -925,14 +950,12 @@ hash references Perl hash references become JSON objects. As there is no inherent ordering in hash keys (or JSON objects), they will usually be - encoded in a pseudo-random order that can change between runs of the - same program but stays generally the same within a single run of a - program. JSON::XS can optionally sort the hash keys (determined by - the *canonical* flag), so the same datastructure will serialise to - the same JSON text (given same settings and version of JSON::XS), - but this incurs a runtime overhead and is only rarely useful, e.g. - when you want to compare some JSON text against another for - equality. + encoded in a pseudo-random order. JSON::XS can optionally sort the + hash keys (determined by the *canonical* flag), so the same + datastructure will serialise to the same JSON text (given same + settings and version of JSON::XS), but this incurs a runtime + overhead and is only rarely useful, e.g. when you want to compare + some JSON text against another for equality. array references Perl array references become JSON arrays. @@ -940,22 +963,25 @@ other references Other unblessed references are generally not allowed and will cause an exception to be thrown, except for references to the integers 0 - and 1, which get turned into "false" and "true" atoms in JSON. You - can also use "JSON::XS::false" and "JSON::XS::true" to improve + and 1, which get turned into "false" and "true" atoms in JSON. + + Since "JSON::XS" uses the boolean model from Types::Serialiser, you + can also "use Types::Serialiser" and then use + "Types::Serialiser::false" and "Types::Serialiser::true" to improve readability. - encode_json [\0, JSON::XS::true] # yields [false,true] + use Types::Serialiser; + encode_json [\0, Types::Serialiser::true] # yields [false,true] - JSON::XS::true, JSON::XS::false - These special values become JSON true and JSON false values, - respectively. You can also use "\1" and "\0" directly if you want. + Types::Serialiser::true, Types::Serialiser::false + These special values from the Types::Serialiser module become JSON + true and JSON false values, respectively. You can also use "\1" and + "\0" directly if you want. blessed objects - Blessed objects are not directly representable in JSON. See the - "allow_blessed" and "convert_blessed" methods on various options on - how to deal with this: basically, you can choose between throwing an - exception, encoding the reference as if it weren't blessed, or - provide your own serialiser method. + Blessed objects are not directly representable in JSON, but + "JSON::XS" allows various ways of handling objects. See "OBJECT + SERIALISATION", below, for details. simple scalars Simple Perl scalars (any scalar that is not a reference) are the @@ -1000,6 +1026,106 @@ platform, such as infinities or NaN's - these cannot be represented in JSON, and it is an error to pass those in. + OBJECT SERIALISATION + As JSON cannot directly represent Perl objects, you have to choose + between a pure JSON representation (without the ability to deserialise + the object automatically again), and a nonstandard extension to the JSON + syntax, tagged values. + + SERIALISATION + What happens when "JSON::XS" encounters a Perl object depends on the + "allow_blessed", "convert_blessed" and "allow_tags" settings, which are + used in this order: + + 1. "allow_tags" is enabled and the object has a "FREEZE" method. + In this case, "JSON::XS" uses the Types::Serialiser object + serialisation protocol to create a tagged JSON value, using a + nonstandard extension to the JSON syntax. + + This works by invoking the "FREEZE" method on the object, with the + first argument being the object to serialise, and the second + argument being the constant string "JSON" to distinguish it from + other serialisers. + + The "FREEZE" method can return any number of values (i.e. zero or + more). These values and the paclkage/classname of the object will + then be encoded as a tagged JSON value in the following format: + + ("classname")[FREEZE return values...] + + e.g.: + + ("URI")["http://www.google.com/"] + ("MyDate")[2013,10,29] + ("ImageData::JPEG")["Z3...VlCg=="] + + For example, the hypothetical "My::Object" "FREEZE" method might use + the objects "type" and "id" members to encode the object: + + sub My::Object::FREEZE { + my ($self, $serialiser) = @_; + + ($self->{type}, $self->{id}) + } + + 2. "convert_blessed" is enabled and the object has a "TO_JSON" method. + In this case, the "TO_JSON" method of the object is invoked in + scalar context. It must return a single scalar that can be directly + encoded into JSON. This scalar replaces the object in the JSON text. + + For example, the following "TO_JSON" method will convert all URI + objects to JSON strings when serialised. The fatc that these values + originally were URI objects is lost. + + sub URI::TO_JSON { + my ($uri) = @_; + $uri->as_string + } + + 3. "allow_blessed" is enabled. + The object will be serialised as a JSON null value. + + 4. none of the above + If none of the settings are enabled or the respective methods are + missing, "JSON::XS" throws an exception. + + DESERIALISATION + For deserialisation there are only two cases to consider: either + nonstandard tagging was used, in which case "allow_tags" decides, or + objects cannot be automatically be deserialised, in which case you can + use postprocessing or the "filter_json_object" or + "filter_json_single_key_object" callbacks to get some real objects our + of your JSON. + + This section only considers the tagged value case: I a tagged JSON + object is encountered during decoding and "allow_tags" is disabled, a + parse error will result (as if tagged values were not part of the + grammar). + + If "allow_tags" is enabled, "JSON::XS" will look up the "THAW" method of + the package/classname used during serialisation (it will not attempt to + load the package as a Perl module). If there is no such method, the + decoding will fail with an error. + + Otherwise, the "THAW" method is invoked with the classname as first + argument, the constant string "JSON" as second argument, and all the + values from the JSON array (the values originally returned by the + "FREEZE" method) as remaining arguments. + + The method must then return the object. While technically you can return + any Perl scalar, you might have to enable the "enable_nonref" setting to + make that work in all cases, so better return an actual blessed + reference. + + As an example, let's implement a "THAW" function that regenerates the + "My::Object" from the "FREEZE" example earlier: + + sub My::Object::THAW { + my ($class, $serialiser, $type, $id) = @_; + + $class->new (type => $type, id => $id) + } + ENCODING/CODESET FLAG NOTES The interested reader might have seen a number of flags that signify encodings or codesets - "utf8", "latin1" and "ascii". There seems to be @@ -1028,7 +1154,7 @@ When "utf8" is disabled (the default), then "encode"/"decode" generate and expect Unicode strings, that is, characters with high ordinal Unicode values (> 255) will be encoded as such characters, - and likewise such characters are decoded as-is, no canges to them + and likewise such characters are decoded as-is, no changes to them will be done, except "(re-)interpreting" them as Unicode codepoints or Unicode characters, respectively (to Perl, these are the same thing in strings unless you do funny/weird/dumb stuff). @@ -1154,7 +1280,7 @@ $json =~ s/"__proto__"\s*:/"__proto__renamed":/g; This works because "__proto__" is not valid outside of strings, so every - occurence of ""__proto__"\s*:" must be a string used as property name. + occurrence of ""__proto__"\s*:" must be a string used as property name. If you know of other incompatibilities, please let me know. @@ -1315,6 +1441,126 @@ deal with it, as major browser developers care only for features, not about getting security right). +"OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159) + TL;DR: Due to security concerns, JSON::XS will not allow scalar data in + JSON texts by default - you need to create your own JSON::XS object and + enable "allow_nonref": + + my $json = JSON::XS->new->allow_nonref; + + $text = $json->encode ($data); + $data = $json->decode ($text); + + The long version: JSON being an important and supposedly stable format, + the IETF standardised it as RFC 4627 in 2006. Unfortunately, the + inventor of JSON, Dougles Crockford, unilaterally changed the definition + of JSON in javascript. Rather than create a fork, the IETF decided to + standardise the new syntax (apparently, so Iw as told, without finding + it very amusing). + + The biggest difference between thed original JSON and the new JSON is + that the new JSON supports scalars (anything other than arrays and + objects) at the toplevel of a JSON text. While this is strictly + backwards compatible to older versions, it breaks a number of protocols + that relied on sending JSON back-to-back, and is a minor security + concern. + + For example, imagine you have two banks communicating, and on one side, + trhe JSON coder gets upgraded. Two messages, such as 10 and 1000 might + then be confused to mean 101000, something that couldn't happen in the + original JSON, because niether of these messages would be valid JSON. + + If one side accepts these messages, then an upgrade in the coder on + either side could result in this becoming exploitable. + + This module has always allowed these messages as an optional extension, + by default disabled. The security concerns are the reason why the + default is still disabled, but future versions might/will likely upgrade + to the newer RFC as default format, so you are advised to check your + implementation and/or override the default with "->allow_nonref (0)" to + ensure that future versions are safe. + +INTEROPERABILITY WITH OTHER MODULES + "JSON::XS" uses the Types::Serialiser module to provide boolean + constants. That means that the JSON true and false values will be + comaptible to true and false values of iother modules that do the same, + such as JSON::PP and CBOR::XS. + +INTEROPERABILITY WITH OTHER JSON DECODERS + As long as you only serialise data that can be directly expressed in + JSON, "JSON::XS" is incapable of generating invalid JSON output (modulo + bugs, but "JSON::XS" has found more bugs in the official JSON testsuite + (1) than the official JSON testsuite has found in "JSON::XS" (0)). + + When you have trouble decoding JSON generated by this module using other + decoders, then it is very likely that you have an encoding mismatch or + the other decoder is broken. + + When decoding, "JSON::XS" is strict by default and will likely catch all + errors. There are currently two settings that change this: "relaxed" + makes "JSON::XS" accept (but not generate) some non-standard extensions, + and "allow_tags" will allow you to encode and decode Perl objects, at + the cost of not outputting valid JSON anymore. + + TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS + When you use "allow_tags" to use the extended (and also nonstandard and + invalid) JSON syntax for serialised objects, and you still want to + decode the generated When you want to serialise objects, you can run a + regex to replace the tagged syntax by standard JSON arrays (it only + works for "normal" packagesnames without comma, newlines or single + colons). First, the readable Perl version: + + # if your FREEZE methods return no values, you need this replace first: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx; + + # this works for non-empty constructor arg lists: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx; + + And here is a less readable version that is easy to adapt to other + languages: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g; + + Here is an ECMAScript version (same regex): + + json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,"); + + Since this syntax converts to standard JSON arrays, it might be hard to + distinguish serialised objects from normal arrays. You can prepend a + "magic number" as first array element to reduce chances of a collision: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g; + + And after decoding the JSON text, you could walk the data structure + looking for arrays with a first element of + "XU1peReLzT4ggEllLanBYq4G9VzliwKF". + + The same approach can be used to create the tagged format with another + encoder. First, you create an array with the magic string as first + member, the classname as second, and constructor arguments last, encode + it as part of your JSON structure, and then: + + $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; + + Again, this has some limitations - the magic string must not be encoded + with character escapes, and the constructor arguments must be non-empty. + +RFC7159 + Since this module was written, Google has written a new JSON RFC, RFC + 7159 (and RFC7158). Unfortunately, this RFC breaks compatibility with + both the original JSON specification on www.json.org and RFC4627. + + As far as I can see, you can get partial compatibility when parsing by + using "->allow_nonref". However, consider thew security implications of + doing so. + + I haven't decided yet when to break compatibility with RFC4627 by + default (and potentially leave applications insecure) and change the + default to follow RFC7159, but application authors are well advised to + call "->allow_nonref(0)" even if this is the current default, if they + cannot handle non-reference values, in preparation for the day when the4 + default will change. + THREADS This module is *not* guaranteed to be thread safe and there are no plans to change this until Perl gets thread support (as opposed to the @@ -1328,9 +1574,9 @@ system's setlocale function with "LC_ALL". This breaks both perl and modules such as JSON::XS, as stringification - of numbers no longer works correcly (e.g. "$x = 0.1; print "$x"+1" might - print 1, and JSON::XS might output illegal JSON as JSON::XS relies on - perl to stringify numbers). + of numbers no longer works correctly (e.g. "$x = 0.1; print "$x"+1" + might print 1, and JSON::XS might output illegal JSON as JSON::XS relies + on perl to stringify numbers). The solution is simple: don't call "setlocale", or use it for only those categories you need, such as "LC_MESSAGES" or "LC_CTYPE".