--- JSON-XS/XS.pm	2013/10/29 00:19:45	1.148
+++ JSON-XS/XS.pm	2018/11/15 22:35:35	1.170
@@ -37,26 +37,12 @@
 primary goal is to be I<correct> and its secondary goal is to be
 I<fast>. To reach the latter goal it was written in C.
 
-Beginning with version 2.0 of the JSON module, when both JSON and
-JSON::XS are installed, then JSON will fall back on JSON::XS (this can be
-overridden) with no overhead due to emulation (by inheriting constructor
-and methods). If JSON::XS is not available, it will fall back to the
-compatible JSON::PP module as backend, so using JSON instead of JSON::XS
-gives you a portable JSON API that can be fast when you need and doesn't
-require a C compiler when that is a problem.
-
-As this is the n-th-something JSON module on CPAN, what was the reason
-to write yet another JSON module? While it seems there are many JSON
-modules, none of them correctly handle all corner cases, and in most cases
-their maintainers are unresponsive, gone missing, or not listening to bug
-reports for other reasons.
-
 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
 vice versa.
 
 =head2 FEATURES
 
-=over 4
+=over
 
 =item * correct Unicode handling
 
@@ -103,7 +89,7 @@
 
 use common::sense;
 
-our $VERSION = '3.0';
+our $VERSION = '4.0';
 our @ISA = qw(Exporter);
 
 our @EXPORT = qw(encode_json decode_json);
@@ -118,7 +104,7 @@
 The following convenience methods are provided by this module. They are
 exported by default:
 
-=over 4
+=over
 
 =item $json_text = encode_json $perl_scalar
 
@@ -133,8 +119,8 @@
 
 =item $perl_scalar = decode_json $json_text
 
-The opposite of C<encode_json>: expects an UTF-8 (binary) string and tries
-to parse that as an UTF-8 encoded JSON text, returning the resulting
+The opposite of C<encode_json>: expects a UTF-8 (binary) string and tries
+to parse that as a UTF-8 encoded JSON text, returning the resulting
 reference. Croaks on error.
 
 This function call is functionally identical to:
@@ -151,7 +137,7 @@
 Since this often leads to confusion, here are a few very clear words on
 how Unicode works in Perl, modulo bugs.
 
-=over 4
+=over
 
 =item 1. Perl strings can store characters with ordinal values > 255.
 
@@ -199,12 +185,14 @@
 The object oriented interface lets you configure your own encoding or
 decoding style, within the limits of supported formats.
 
-=over 4
+=over
 
 =item $json = new JSON::XS
 
 Creates a new JSON::XS object that can be used to de/encode JSON
-strings. All boolean flags described below are by default I<disabled>.
+strings. All boolean flags described below are by default I<disabled>
+(with the exception of C<allow_nonref>, which defaults to I<enabled> since
+version C<4.0>).
 
 The mutators for flags all return the JSON object again and thus calls can
 be chained:
@@ -272,7 +260,7 @@
 
 If C<$enable> is true (or missing), then the C<encode> method will encode
 the JSON result into UTF-8, as required by many protocols, while the
-C<decode> method expects to be handled an UTF-8-encoded string.  Please
+C<decode> method expects to be handed a UTF-8-encoded string.  Please
 note that UTF-8-encoded strings do not contain any characters outside the
 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
 versions, enabling this option might enable autodetection of the UTF-16
@@ -367,7 +355,7 @@
 
 If C<$enable> is true (or missing), then C<decode> will accept some
 extensions to normal JSON syntax (see below). C<encode> will not be
-affected in anyway. I<Be aware that this option makes you accept invalid
+affected in any way. I<Be aware that this option makes you accept invalid
 JSON texts as if they were valid!>. I suggest only to use this option to
 parse application-specific files written by humans (configuration files,
 resource files etc.)
@@ -377,7 +365,7 @@
 
 Currently accepted extensions are:
 
-=over 4
+=over
 
 =item * list items can have an end-comma
 
@@ -406,6 +394,16 @@
         # neither this one...
   ]
 
+=item * literal ASCII TAB characters in strings
+
+Literal ASCII TAB characters are now allowed in strings (and treated as
+C<\t>).
+
+  [
+     "Hello\tWorld",
+     "Hello<TAB>World", # literal <TAB> would not normally be allowed
+  ]
+
 =back
 
 =item $json = $json->canonical ([$enable])
@@ -433,6 +431,9 @@
 
 =item $enabled = $json->get_allow_nonref
 
+Unlike other boolean options, this opotion is enabled by default beginning
+with version C<4.0>. See L<SECURITY CONSIDERATIONS> for the gory details.
+
 If C<$enable> is true (or missing), then the C<encode> method can convert a
 non-reference into its corresponding string, number or null JSON value,
 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
@@ -443,11 +444,11 @@
 or array. Likewise, C<decode> will croak if given something that is not a
 JSON object or array.
 
-Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
-resulting in an invalid JSON text:
+Example, encode a Perl scalar as JSON value without enabled C<allow_nonref>,
+resulting in an error:
 
-   JSON::XS->new->allow_nonref->encode ("Hello, World!")
-   => "Hello, World!"
+   JSON::XS->new->allow_nonref (0)->encode ("Hello, World!")
+   => hash- or arrayref expected...
 
 =item $json = $json->allow_unknown ([$enable])
 
@@ -507,7 +508,7 @@
 
 =item $json = $json->allow_tags ([$enable])
 
-=item $enabled = $json->allow_tags
+=item $enabled = $json->get_allow_tags
 
 See L<OBJECT SERIALISATION> for details.
 
@@ -526,13 +527,13 @@
 =item $json = $json->filter_json_object ([$coderef->($hashref)])
 
 When C<$coderef> is specified, it will be called from C<decode> each
-time it decodes a JSON object. The only argument is a reference to the
-newly-created hash. If the code references returns a single scalar (which
-need not be a reference), this value (i.e. a copy of that scalar to avoid
-aliasing) is inserted into the deserialised data structure. If it returns
-an empty list (NOTE: I<not> C<undef>, which is a valid scalar), the
-original deserialised hash will be inserted. This setting can slow down
-decoding considerably.
+time it decodes a JSON object. The only argument is a reference to
+the newly-created hash. If the code reference returns a single scalar
+(which need not be a reference), this value (or rather a copy of it) is
+inserted into the deserialised data structure. If it returns an empty
+list (NOTE: I<not> C<undef>, which is a valid scalar), the original
+deserialised hash will be inserted. This setting can slow down decoding
+considerably.
 
 When C<$coderef> is omitted or undefined, any existing callback will
 be removed and C<decode> will not change the deserialised hash in any
@@ -689,7 +690,7 @@
 and you need to know where the JSON text ends.
 
    JSON::XS->new->decode_prefix ("[1] the tail")
-   => ([], 3)
+   => ([1], 3)
 
 =back
 
@@ -716,7 +717,7 @@
 
 The following methods implement this incremental parser.
 
-=over 4
+=over
 
 =item [void, scalar or list context] = $json->incr_parse ([$string])
 
@@ -740,11 +741,11 @@
 
 And finally, in list context, it will try to extract as many objects
 from the stream as it can find and return them, or the empty list
-otherwise. For this to work, there must be no separators between the JSON
-objects or arrays, instead they must be concatenated back-to-back. If
-an error occurs, an exception will be raised as in the scalar context
-case. Note that in this case, any previously-parsed JSON texts will be
-lost.
+otherwise. For this to work, there must be no separators (other than
+whitespace) between the JSON objects or arrays, instead they must be
+concatenated back-to-back. If an error occurs, an exception will be
+raised as in the scalar context case. Note that in this case, any
+previously-parsed JSON texts will be lost.
 
 Example: Parse some JSON arrays/objects in a given string and return
 them.
@@ -761,6 +762,10 @@
 real world conditions). As a special exception, you can also call this
 method before having parsed anything.
 
+That means you can only use this function to look at or manipulate text
+before or after complete JSON objects, not while the parser is in the
+middle of parsing a JSON object.
+
 This function is useful in two cases: a) finding the trailing text after a
 JSON object or b) parsing multiple JSON objects separated by non-JSON text
 (such as commas).
@@ -789,16 +794,19 @@
 
 =head2 LIMITATIONS
 
-All options that affect decoding are supported, except
-C<allow_nonref>. The reason for this is that it cannot be made to work
-sensibly: JSON objects and arrays are self-delimited, i.e. you can
-concatenate them back to back and still decode them perfectly. This does
-not hold true for JSON numbers, however.
-
-For example, is the string C<1> a single JSON number, or is it simply the
-start of C<12>? Or is C<12> a single JSON number, or the concatenation
-of C<1> and C<2>? In neither case you can tell, and this is why JSON::XS
-takes the conservative route and disallows this case.
+The incremental parser is a non-exact parser: it works by gathering as
+much text as possible that I<could> be a valid JSON text, followed by
+trying to decode it.
+
+That means it sometimes needs to read more data than strictly necessary to
+diagnose an invalid JSON text. For example, after parsing the following
+fragment, the parser I<could> stop with an error, as this fragment
+I<cannot> be the beginning of a valid JSON text:
+
+   [,
+
+In reality, hopwever, the parser might continue to read data until a
+length limit is exceeded or it finds a closing bracket.
 
 =head2 EXAMPLES
 
@@ -952,7 +960,7 @@
 
 =head2 JSON -> PERL
 
-=over 4
+=over
 
 =item object
 
@@ -1030,7 +1038,7 @@
 truly typeless language, so we can only guess which JSON type is meant by
 a Perl value.
 
-=over 4
+=over
 
 =item hash references
 
@@ -1129,9 +1137,9 @@
 C<allow_blessed>, C<convert_blessed> and C<allow_tags> settings, which are
 used in this order:
 
-=over 4
+=over
 
-=item 1. C<allow_tags> is enabled and object has a C<FREEZE> method.
+=item 1. C<allow_tags> is enabled and the object has a C<FREEZE> method.
 
 In this case, C<JSON::XS> uses the L<Types::Serialiser> object
 serialisation protocol to create a tagged JSON value, using a nonstandard
@@ -1147,6 +1155,12 @@
 
    ("classname")[FREEZE return values...]
 
+e.g.:
+
+   ("URI")["http://www.google.com/"]
+   ("MyDate")[2013,10,29]
+   ("ImageData::JPEG")["Z3...VlCg=="]
+
 For example, the hypothetical C<My::Object> C<FREEZE> method might use the
 objects C<type> and C<id> members to encode the object:
 
@@ -1156,7 +1170,7 @@
       ($self->{type}, $self->{id})
    }
 
-=item 2. C<convert_blessed> is enabled and object has a C<TO_JSON> method.
+=item 2. C<convert_blessed> is enabled and the object has a C<TO_JSON> method.
 
 In this case, the C<TO_JSON> method of the object is invoked in scalar
 context. It must return a single scalar that can be directly encoded into
@@ -1244,7 +1258,7 @@
 and ISO-8859-1 (= latin 1) and ASCII are both codesets I<and> encodings at
 the same time, which can be confusing.
 
-=over 4
+=over
 
 =item C<utf8> flag disabled
 
@@ -1271,7 +1285,7 @@
 that.
 
 The C<utf8> flag therefore switches between two modes: disabled means you
-will get a Unicode string in Perl, enabled means you get an UTF-8 encoded
+will get a Unicode string in Perl, enabled means you get a UTF-8 encoded
 octet/binary string in Perl.
 
 =item C<latin1> or C<ascii> flags enabled
@@ -1413,7 +1427,7 @@
 high that you will run into severe interoperability problems when you
 least expect it.
 
-=over 4
+=over
 
 =item (*)
 
@@ -1549,22 +1563,176 @@
 security right).
 
 
+=head2 "OLD" VS. "NEW" JSON (RFC4627 VS. RFC7159)
+
+JSON originally required JSON texts to represent an array or object -
+scalar values were explicitly not allowed. This has changed, and versions
+of JSON::XS beginning with C<4.0> reflect this by allowing scalar values
+by default.
+
+One reason why one might not want this is that this removes a fundamental
+property of JSON texts, namely that they are self-delimited and
+self-contained, or in other words, you could take any number of "old"
+JSON texts and paste them together, and the result would be unambiguously
+parseable:
+
+   [1,3]{"k":5}[][null] # four JSON texts, without doubt
+
+By allowing scalars, this property is lost: in the following example, is
+this one JSON text (the number 12) or two JSON texts (the numbers 1 and
+2):
+
+   12    # could be 12, or 1 and 2
+
+Another lost property of "old" JSON is that no lookahead is required to
+know the end of a JSON text, i.e. the JSON text definitely ended at the
+last C<]> or C<}> character, there was no need to read extra characters.
+
+For example, a viable network protocol with "old" JSON was to simply
+exchange JSON texts without delimiter. For "new" JSON, you have to use a
+suitable delimiter (such as a newline) after every JSON text or ensure you
+never encode/decode scalar values.
+
+Most protocols do work by only transferring arrays or objects, and the
+easiest way to avoid problems with the "new" JSON definition is to
+explicitly disallow scalar values in your encoder and decoder:
+
+   $json_coder = JSON::XS->new->allow_nonref (0)
+
+This is a somewhat unhappy situation, and the blame can fully be put on
+JSON's inmventor, Douglas Crockford, who unilaterally changed the format
+in 2006 without consulting the IETF, forcing the IETF to either fork the
+format or go with it (as I was told, the IETF wasn't amused).
+
+
+=head1 RELATIONSHIP WITH I-JSON
+
+JSON is a somewhat sloppily-defined format - it carries around obvious
+Javascript baggage, such as not really defining number range, probably
+because Javascript only has one type of numbers: IEEE 64 bit floats
+("binary64").
+
+For this reaosn, RFC7493 defines "Internet JSON", which is a restricted
+subset of JSON that is supposedly more interoperable on the internet.
+
+While C<JSON::XS> does not offer specific support for I-JSON, it of course
+accepts valid I-JSON and by default implements some of the limitations
+of I-JSON, such as parsing numbers as perl numbers, which are usually a
+superset of binary64 numbers.
+
+To generate I-JSON, follow these rules:
+
+=over
+
+=item * always generate UTF-8
+
+I-JSON must be encoded in UTF-8, the default for C<encode_json>.
+
+=item * numbers should be within IEEE 754 binary64 range
+
+Basically all existing perl installations use binary64 to represent
+floating point numbers, so all you need to do is to avoid large integers.
+
+=item * objects must not have duplicate keys
+
+This is trivially done, as C<JSON::XS> does not allow duplicate keys.
+
+=item * do not generate scalar JSON texts, use C<< ->allow_nonref (0) >>
+
+I-JSON strongly requests you to only encode arrays and objects into JSON.
+
+=item * times should be strings in ISO 8601 format
+
+There are a myriad of modules on CPAN dealing with ISO 8601 - search for
+C<ISO8601> on CPAN and use one.
+
+=item * encode binary data as base64
+
+While it's tempting to just dump binary data as a string (and let
+C<JSON::XS> do the escaping), for I-JSON, it's I<recommended> to encode
+binary data as base64.
+
+=back
+
+There are some other considerations - read RFC7493 for the details if
+interested.
+
+
 =head1 INTEROPERABILITY WITH OTHER MODULES
 
 C<JSON::XS> uses the L<Types::Serialiser> module to provide boolean
 constants. That means that the JSON true and false values will be
-comaptible to true and false values of iother modules that do the same,
+comaptible to true and false values of other modules that do the same,
 such as L<JSON::PP> and L<CBOR::XS>.
 
 
-=head1 THREADS
+=head1 INTEROPERABILITY WITH OTHER JSON DECODERS
+
+As long as you only serialise data that can be directly expressed in JSON,
+C<JSON::XS> is incapable of generating invalid JSON output (modulo bugs,
+but C<JSON::XS> has found more bugs in the official JSON testsuite (1)
+than the official JSON testsuite has found in C<JSON::XS> (0)).
+
+When you have trouble decoding JSON generated by this module using other
+decoders, then it is very likely that you have an encoding mismatch or the
+other decoder is broken.
+
+When decoding, C<JSON::XS> is strict by default and will likely catch all
+errors. There are currently two settings that change this: C<relaxed>
+makes C<JSON::XS> accept (but not generate) some non-standard extensions,
+and C<allow_tags> will allow you to encode and decode Perl objects, at the
+cost of not outputting valid JSON anymore.
+
+=head2 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS
+
+When you use C<allow_tags> to use the extended (and also nonstandard and
+invalid) JSON syntax for serialised objects, and you still want to decode
+the generated When you want to serialise objects, you can run a regex
+to replace the tagged syntax by standard JSON arrays (it only works for
+"normal" package names without comma, newlines or single colons). First,
+the readable Perl version:
+
+   # if your FREEZE methods return no values, you need this replace first:
+   $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx;
+
+   # this works for non-empty constructor arg lists:
+   $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx;
+
+And here is a less readable version that is easy to adapt to other
+languages:
 
-This module is I<not> guaranteed to be thread safe and there are no
-plans to change this until Perl gets thread support (as opposed to the
-horribly slow so-called "threads" which are simply slow and bloated
-process simulations - use fork, it's I<much> faster, cheaper, better).
+   $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g;
 
-(It might actually work, but you have been warned).
+Here is an ECMAScript version (same regex):
+
+   json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,");
+
+Since this syntax converts to standard JSON arrays, it might be hard to
+distinguish serialised objects from normal arrays. You can prepend a
+"magic number" as first array element to reduce chances of a collision:
+
+   $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g;
+
+And after decoding the JSON text, you could walk the data
+structure looking for arrays with a first element of
+C<XU1peReLzT4ggEllLanBYq4G9VzliwKF>.
+
+The same approach can be used to create the tagged format with another
+encoder. First, you create an array with the magic string as first member,
+the classname as second, and constructor arguments last, encode it as part
+of your JSON structure, and then:
+
+   $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
+
+Again, this has some limitations - the magic string must not be encoded
+with character escapes, and the constructor arguments must be non-empty.
+
+
+=head1 (I-)THREADS
+
+This module is I<not> guaranteed to be ithread (or MULTIPLICITY-) safe
+and there are no plans to change this. Note that perl's builtin so-called
+threads/ithreads are officially deprecated and should not be used.
 
 
 =head1 THE PERILS OF SETLOCALE
@@ -1585,6 +1753,32 @@
 afterwards.
 
 
+=head1 SOME HISTORY
+
+At the time this module was created there already were a number of JSON
+modules available on CPAN, so what was the reason to write yet another
+JSON module? While it seems there are many JSON modules, none of them
+correctly handled all corner cases, and in most cases their maintainers
+are unresponsive, gone missing, or not listening to bug reports for other
+reasons.
+
+Beginning with version 2.0 of the JSON module, when both JSON and
+JSON::XS are installed, then JSON will fall back on JSON::XS (this can be
+overridden) with no overhead due to emulation (by inheriting constructor
+and methods). If JSON::XS is not available, it will fall back to the
+compatible JSON::PP module as backend, so using JSON instead of JSON::XS
+gives you a portable JSON API that can be fast when you need it and
+doesn't require a C compiler when that is a problem.
+
+Somewhere around version 3, this module was forked into
+C<Cpanel::JSON::XS>, because its maintainer had serious trouble
+understanding JSON and insisted on a fork with many bugs "fixed" that
+weren't actually bugs, while spreading FUD about this module without
+actually giving any details on his accusations. You be the judge, but
+in my personal opinion, if you want quality, you will stay away from
+dangerous forks like that.
+
+
 =head1 BUGS
 
 While the goal of this module is to be correct, that unfortunately does