--- JSON-XS/XS.pm 2013/10/29 00:19:45 1.148 +++ JSON-XS/XS.pm 2018/11/15 22:35:35 1.170 @@ -37,26 +37,12 @@ primary goal is to be I and its secondary goal is to be I. To reach the latter goal it was written in C. -Beginning with version 2.0 of the JSON module, when both JSON and -JSON::XS are installed, then JSON will fall back on JSON::XS (this can be -overridden) with no overhead due to emulation (by inheriting constructor -and methods). If JSON::XS is not available, it will fall back to the -compatible JSON::PP module as backend, so using JSON instead of JSON::XS -gives you a portable JSON API that can be fast when you need and doesn't -require a C compiler when that is a problem. - -As this is the n-th-something JSON module on CPAN, what was the reason -to write yet another JSON module? While it seems there are many JSON -modules, none of them correctly handle all corner cases, and in most cases -their maintainers are unresponsive, gone missing, or not listening to bug -reports for other reasons. - See MAPPING, below, on how JSON::XS maps perl values to JSON values and vice versa. =head2 FEATURES -=over 4 +=over =item * correct Unicode handling @@ -103,7 +89,7 @@ use common::sense; -our $VERSION = '3.0'; +our $VERSION = '4.0'; our @ISA = qw(Exporter); our @EXPORT = qw(encode_json decode_json); @@ -118,7 +104,7 @@ The following convenience methods are provided by this module. They are exported by default: -=over 4 +=over =item $json_text = encode_json $perl_scalar @@ -133,8 +119,8 @@ =item $perl_scalar = decode_json $json_text -The opposite of C: expects an UTF-8 (binary) string and tries -to parse that as an UTF-8 encoded JSON text, returning the resulting +The opposite of C: expects a UTF-8 (binary) string and tries +to parse that as a UTF-8 encoded JSON text, returning the resulting reference. Croaks on error. This function call is functionally identical to: @@ -151,7 +137,7 @@ Since this often leads to confusion, here are a few very clear words on how Unicode works in Perl, modulo bugs. -=over 4 +=over =item 1. Perl strings can store characters with ordinal values > 255. @@ -199,12 +185,14 @@ The object oriented interface lets you configure your own encoding or decoding style, within the limits of supported formats. -=over 4 +=over =item $json = new JSON::XS Creates a new JSON::XS object that can be used to de/encode JSON -strings. All boolean flags described below are by default I. +strings. All boolean flags described below are by default I +(with the exception of C, which defaults to I since +version C<4.0>). The mutators for flags all return the JSON object again and thus calls can be chained: @@ -272,7 +260,7 @@ If C<$enable> is true (or missing), then the C method will encode the JSON result into UTF-8, as required by many protocols, while the -C method expects to be handled an UTF-8-encoded string. Please +C method expects to be handed a UTF-8-encoded string. Please note that UTF-8-encoded strings do not contain any characters outside the range C<0..255>, they are thus useful for bytewise/binary I/O. In future versions, enabling this option might enable autodetection of the UTF-16 @@ -367,7 +355,7 @@ If C<$enable> is true (or missing), then C will accept some extensions to normal JSON syntax (see below). C will not be -affected in anyway. I. I suggest only to use this option to parse application-specific files written by humans (configuration files, resource files etc.) @@ -377,7 +365,7 @@ Currently accepted extensions are: -=over 4 +=over =item * list items can have an end-comma @@ -406,6 +394,16 @@ # neither this one... ] +=item * literal ASCII TAB characters in strings + +Literal ASCII TAB characters are now allowed in strings (and treated as +C<\t>). + + [ + "Hello\tWorld", + "HelloWorld", # literal would not normally be allowed + ] + =back =item $json = $json->canonical ([$enable]) @@ -433,6 +431,9 @@ =item $enabled = $json->get_allow_nonref +Unlike other boolean options, this opotion is enabled by default beginning +with version C<4.0>. See L for the gory details. + If C<$enable> is true (or missing), then the C method can convert a non-reference into its corresponding string, number or null JSON value, which is an extension to RFC4627. Likewise, C will accept those JSON @@ -443,11 +444,11 @@ or array. Likewise, C will croak if given something that is not a JSON object or array. -Example, encode a Perl scalar as JSON value with enabled C, -resulting in an invalid JSON text: +Example, encode a Perl scalar as JSON value without enabled C, +resulting in an error: - JSON::XS->new->allow_nonref->encode ("Hello, World!") - => "Hello, World!" + JSON::XS->new->allow_nonref (0)->encode ("Hello, World!") + => hash- or arrayref expected... =item $json = $json->allow_unknown ([$enable]) @@ -507,7 +508,7 @@ =item $json = $json->allow_tags ([$enable]) -=item $enabled = $json->allow_tags +=item $enabled = $json->get_allow_tags See L for details. @@ -526,13 +527,13 @@ =item $json = $json->filter_json_object ([$coderef->($hashref)]) When C<$coderef> is specified, it will be called from C each -time it decodes a JSON object. The only argument is a reference to the -newly-created hash. If the code references returns a single scalar (which -need not be a reference), this value (i.e. a copy of that scalar to avoid -aliasing) is inserted into the deserialised data structure. If it returns -an empty list (NOTE: I C, which is a valid scalar), the -original deserialised hash will be inserted. This setting can slow down -decoding considerably. +time it decodes a JSON object. The only argument is a reference to +the newly-created hash. If the code reference returns a single scalar +(which need not be a reference), this value (or rather a copy of it) is +inserted into the deserialised data structure. If it returns an empty +list (NOTE: I C, which is a valid scalar), the original +deserialised hash will be inserted. This setting can slow down decoding +considerably. When C<$coderef> is omitted or undefined, any existing callback will be removed and C will not change the deserialised hash in any @@ -689,7 +690,7 @@ and you need to know where the JSON text ends. JSON::XS->new->decode_prefix ("[1] the tail") - => ([], 3) + => ([1], 3) =back @@ -716,7 +717,7 @@ The following methods implement this incremental parser. -=over 4 +=over =item [void, scalar or list context] = $json->incr_parse ([$string]) @@ -740,11 +741,11 @@ And finally, in list context, it will try to extract as many objects from the stream as it can find and return them, or the empty list -otherwise. For this to work, there must be no separators between the JSON -objects or arrays, instead they must be concatenated back-to-back. If -an error occurs, an exception will be raised as in the scalar context -case. Note that in this case, any previously-parsed JSON texts will be -lost. +otherwise. For this to work, there must be no separators (other than +whitespace) between the JSON objects or arrays, instead they must be +concatenated back-to-back. If an error occurs, an exception will be +raised as in the scalar context case. Note that in this case, any +previously-parsed JSON texts will be lost. Example: Parse some JSON arrays/objects in a given string and return them. @@ -761,6 +762,10 @@ real world conditions). As a special exception, you can also call this method before having parsed anything. +That means you can only use this function to look at or manipulate text +before or after complete JSON objects, not while the parser is in the +middle of parsing a JSON object. + This function is useful in two cases: a) finding the trailing text after a JSON object or b) parsing multiple JSON objects separated by non-JSON text (such as commas). @@ -789,16 +794,19 @@ =head2 LIMITATIONS -All options that affect decoding are supported, except -C. The reason for this is that it cannot be made to work -sensibly: JSON objects and arrays are self-delimited, i.e. you can -concatenate them back to back and still decode them perfectly. This does -not hold true for JSON numbers, however. - -For example, is the string C<1> a single JSON number, or is it simply the -start of C<12>? Or is C<12> a single JSON number, or the concatenation -of C<1> and C<2>? In neither case you can tell, and this is why JSON::XS -takes the conservative route and disallows this case. +The incremental parser is a non-exact parser: it works by gathering as +much text as possible that I be a valid JSON text, followed by +trying to decode it. + +That means it sometimes needs to read more data than strictly necessary to +diagnose an invalid JSON text. For example, after parsing the following +fragment, the parser I stop with an error, as this fragment +I be the beginning of a valid JSON text: + + [, + +In reality, hopwever, the parser might continue to read data until a +length limit is exceeded or it finds a closing bracket. =head2 EXAMPLES @@ -952,7 +960,7 @@ =head2 JSON -> PERL -=over 4 +=over =item object @@ -1030,7 +1038,7 @@ truly typeless language, so we can only guess which JSON type is meant by a Perl value. -=over 4 +=over =item hash references @@ -1129,9 +1137,9 @@ C, C and C settings, which are used in this order: -=over 4 +=over -=item 1. C is enabled and object has a C method. +=item 1. C is enabled and the object has a C method. In this case, C uses the L object serialisation protocol to create a tagged JSON value, using a nonstandard @@ -1147,6 +1155,12 @@ ("classname")[FREEZE return values...] +e.g.: + + ("URI")["http://www.google.com/"] + ("MyDate")[2013,10,29] + ("ImageData::JPEG")["Z3...VlCg=="] + For example, the hypothetical C C method might use the objects C and C members to encode the object: @@ -1156,7 +1170,7 @@ ($self->{type}, $self->{id}) } -=item 2. C is enabled and object has a C method. +=item 2. C is enabled and the object has a C method. In this case, the C method of the object is invoked in scalar context. It must return a single scalar that can be directly encoded into @@ -1244,7 +1258,7 @@ and ISO-8859-1 (= latin 1) and ASCII are both codesets I encodings at the same time, which can be confusing. -=over 4 +=over =item C flag disabled @@ -1271,7 +1285,7 @@ that. The C flag therefore switches between two modes: disabled means you -will get a Unicode string in Perl, enabled means you get an UTF-8 encoded +will get a Unicode string in Perl, enabled means you get a UTF-8 encoded octet/binary string in Perl. =item C or C flags enabled @@ -1413,7 +1427,7 @@ high that you will run into severe interoperability problems when you least expect it. -=over 4 +=over =item (*) @@ -1549,22 +1563,176 @@ security right). +=head2 "OLD" VS. "NEW" JSON (RFC4627 VS. RFC7159) + +JSON originally required JSON texts to represent an array or object - +scalar values were explicitly not allowed. This has changed, and versions +of JSON::XS beginning with C<4.0> reflect this by allowing scalar values +by default. + +One reason why one might not want this is that this removes a fundamental +property of JSON texts, namely that they are self-delimited and +self-contained, or in other words, you could take any number of "old" +JSON texts and paste them together, and the result would be unambiguously +parseable: + + [1,3]{"k":5}[][null] # four JSON texts, without doubt + +By allowing scalars, this property is lost: in the following example, is +this one JSON text (the number 12) or two JSON texts (the numbers 1 and +2): + + 12 # could be 12, or 1 and 2 + +Another lost property of "old" JSON is that no lookahead is required to +know the end of a JSON text, i.e. the JSON text definitely ended at the +last C<]> or C<}> character, there was no need to read extra characters. + +For example, a viable network protocol with "old" JSON was to simply +exchange JSON texts without delimiter. For "new" JSON, you have to use a +suitable delimiter (such as a newline) after every JSON text or ensure you +never encode/decode scalar values. + +Most protocols do work by only transferring arrays or objects, and the +easiest way to avoid problems with the "new" JSON definition is to +explicitly disallow scalar values in your encoder and decoder: + + $json_coder = JSON::XS->new->allow_nonref (0) + +This is a somewhat unhappy situation, and the blame can fully be put on +JSON's inmventor, Douglas Crockford, who unilaterally changed the format +in 2006 without consulting the IETF, forcing the IETF to either fork the +format or go with it (as I was told, the IETF wasn't amused). + + +=head1 RELATIONSHIP WITH I-JSON + +JSON is a somewhat sloppily-defined format - it carries around obvious +Javascript baggage, such as not really defining number range, probably +because Javascript only has one type of numbers: IEEE 64 bit floats +("binary64"). + +For this reaosn, RFC7493 defines "Internet JSON", which is a restricted +subset of JSON that is supposedly more interoperable on the internet. + +While C does not offer specific support for I-JSON, it of course +accepts valid I-JSON and by default implements some of the limitations +of I-JSON, such as parsing numbers as perl numbers, which are usually a +superset of binary64 numbers. + +To generate I-JSON, follow these rules: + +=over + +=item * always generate UTF-8 + +I-JSON must be encoded in UTF-8, the default for C. + +=item * numbers should be within IEEE 754 binary64 range + +Basically all existing perl installations use binary64 to represent +floating point numbers, so all you need to do is to avoid large integers. + +=item * objects must not have duplicate keys + +This is trivially done, as C does not allow duplicate keys. + +=item * do not generate scalar JSON texts, use C<< ->allow_nonref (0) >> + +I-JSON strongly requests you to only encode arrays and objects into JSON. + +=item * times should be strings in ISO 8601 format + +There are a myriad of modules on CPAN dealing with ISO 8601 - search for +C on CPAN and use one. + +=item * encode binary data as base64 + +While it's tempting to just dump binary data as a string (and let +C do the escaping), for I-JSON, it's I to encode +binary data as base64. + +=back + +There are some other considerations - read RFC7493 for the details if +interested. + + =head1 INTEROPERABILITY WITH OTHER MODULES C uses the L module to provide boolean constants. That means that the JSON true and false values will be -comaptible to true and false values of iother modules that do the same, +comaptible to true and false values of other modules that do the same, such as L and L. -=head1 THREADS +=head1 INTEROPERABILITY WITH OTHER JSON DECODERS + +As long as you only serialise data that can be directly expressed in JSON, +C is incapable of generating invalid JSON output (modulo bugs, +but C has found more bugs in the official JSON testsuite (1) +than the official JSON testsuite has found in C (0)). + +When you have trouble decoding JSON generated by this module using other +decoders, then it is very likely that you have an encoding mismatch or the +other decoder is broken. + +When decoding, C is strict by default and will likely catch all +errors. There are currently two settings that change this: C +makes C accept (but not generate) some non-standard extensions, +and C will allow you to encode and decode Perl objects, at the +cost of not outputting valid JSON anymore. + +=head2 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS + +When you use C to use the extended (and also nonstandard and +invalid) JSON syntax for serialised objects, and you still want to decode +the generated When you want to serialise objects, you can run a regex +to replace the tagged syntax by standard JSON arrays (it only works for +"normal" package names without comma, newlines or single colons). First, +the readable Perl version: + + # if your FREEZE methods return no values, you need this replace first: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx; + + # this works for non-empty constructor arg lists: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx; + +And here is a less readable version that is easy to adapt to other +languages: -This module is I guaranteed to be thread safe and there are no -plans to change this until Perl gets thread support (as opposed to the -horribly slow so-called "threads" which are simply slow and bloated -process simulations - use fork, it's I faster, cheaper, better). + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g; -(It might actually work, but you have been warned). +Here is an ECMAScript version (same regex): + + json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,"); + +Since this syntax converts to standard JSON arrays, it might be hard to +distinguish serialised objects from normal arrays. You can prepend a +"magic number" as first array element to reduce chances of a collision: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g; + +And after decoding the JSON text, you could walk the data +structure looking for arrays with a first element of +C. + +The same approach can be used to create the tagged format with another +encoder. First, you create an array with the magic string as first member, +the classname as second, and constructor arguments last, encode it as part +of your JSON structure, and then: + + $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; + +Again, this has some limitations - the magic string must not be encoded +with character escapes, and the constructor arguments must be non-empty. + + +=head1 (I-)THREADS + +This module is I guaranteed to be ithread (or MULTIPLICITY-) safe +and there are no plans to change this. Note that perl's builtin so-called +threads/ithreads are officially deprecated and should not be used. =head1 THE PERILS OF SETLOCALE @@ -1585,6 +1753,32 @@ afterwards. +=head1 SOME HISTORY + +At the time this module was created there already were a number of JSON +modules available on CPAN, so what was the reason to write yet another +JSON module? While it seems there are many JSON modules, none of them +correctly handled all corner cases, and in most cases their maintainers +are unresponsive, gone missing, or not listening to bug reports for other +reasons. + +Beginning with version 2.0 of the JSON module, when both JSON and +JSON::XS are installed, then JSON will fall back on JSON::XS (this can be +overridden) with no overhead due to emulation (by inheriting constructor +and methods). If JSON::XS is not available, it will fall back to the +compatible JSON::PP module as backend, so using JSON instead of JSON::XS +gives you a portable JSON API that can be fast when you need it and +doesn't require a C compiler when that is a problem. + +Somewhere around version 3, this module was forked into +C, because its maintainer had serious trouble +understanding JSON and insisted on a fork with many bugs "fixed" that +weren't actually bugs, while spreading FUD about this module without +actually giving any details on his accusations. You be the judge, but +in my personal opinion, if you want quality, you will stay away from +dangerous forks like that. + + =head1 BUGS While the goal of this module is to be correct, that unfortunately does