--- JSON-XS/README 2007/03/22 21:13:58 1.2 +++ JSON-XS/README 2018/11/15 23:07:55 1.43 @@ -1,68 +1,144 @@ NAME JSON::XS - JSON serialising/deserialising, done correctly and fast + JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ + (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html) + SYNOPSIS use JSON::XS; + # exported functions, they croak on error + # and expect/generate UTF-8 + + $utf8_encoded_json_text = encode_json $perl_hash_or_arrayref; + $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text; + + # OO-interface + + $coder = JSON::XS->new->ascii->pretty->allow_nonref; + $pretty_printed_unencoded = $coder->encode ($perl_scalar); + $perl_scalar = $coder->decode ($unicode_json_text); + + # Note that JSON version 2.0 and above will automatically use JSON::XS + # if available, at virtually no speed overhead either, so you should + # be able to just: + + use JSON; + + # and do the same things, except that you have a pure-perl fallback now. + DESCRIPTION This module converts Perl data structures to JSON and vice versa. Its primary goal is to be *correct* and its secondary goal is to be *fast*. To reach the latter goal it was written in C. - As this is the n-th-something JSON module on CPAN, what was the reason - to write yet another JSON module? While it seems there are many JSON - modules, none of them correctly handle all corner cases, and in most - cases their maintainers are unresponsive, gone missing, or not listening - to bug reports for other reasons. - - See COMPARISON, below, for a comparison to some other JSON modules. + See MAPPING, below, on how JSON::XS maps perl values to JSON values and + vice versa. FEATURES - * correct handling of unicode issues - This module knows how to handle Unicode, and even documents how it - does so. - - * round-trip integrity - When you serialise a perl data structure using only datatypes - supported by JSON, the deserialised data structure is identical on - the Perl level. (e.g. the string "2.0" doesn't suddenly become "2"). - - * strict checking of JSON correctness - There is no guessing, no generating of illegal JSON strings by - default, and only JSON is accepted as input (the latter is a - security feature). - - * fast - compared to other JSON modules, this module compares favourably. - - * simple to use - This module has both a simple functional interface as well as an OO - interface. - - * reasonably versatile output formats - You can choose between the most compact format possible, a - pure-ascii format, or a pretty-printed format. Or you can combine - those features in whatever way you like. + * correct Unicode handling + + This module knows how to handle Unicode, documents how and when it + does so, and even documents what "correct" means. + + * round-trip integrity + + When you serialise a perl data structure using only data types + supported by JSON and Perl, the deserialised data structure is + identical on the Perl level. (e.g. the string "2.0" doesn't suddenly + become "2" just because it looks like a number). There *are* minor + exceptions to this, read the MAPPING section below to learn about + those. + + * strict checking of JSON correctness + + There is no guessing, no generating of illegal JSON texts by + default, and only JSON is accepted as input by default (the latter + is a security feature). + + * fast + + Compared to other JSON modules and other serialisers such as + Storable, this module usually compares favourably in terms of speed, + too. + + * simple to use + + This module has both a simple functional interface as well as an + object oriented interface. + + * reasonably versatile output formats + + You can choose between the most compact guaranteed-single-line + format possible (nice for simple line-based protocols), a pure-ASCII + format (for when your transport is not 8-bit clean, still supports + the whole Unicode range), or a pretty-printed format (for when you + want to read that stuff). Or you can combine those features in + whatever way you like. FUNCTIONAL INTERFACE - The following convinience methods are provided by this module. They are + The following convenience methods are provided by this module. They are exported by default: - $json_string = to_json $perl_scalar - Converts the given Perl data structure (a simple scalar or a - reference to a hash or array) to a UTF-8 encoded, binary string - (that is, the string contains octets only). Croaks on error. - - This function call is functionally identical to "JSON::XS->new->utf8 - (1)->encode ($perl_scalar)". - - $perl_scalar = from_json $json_string - The opposite of "to_json": expects an UTF-8 (binary) string and - tries to parse that as an UTF-8 encoded JSON string, returning the - resulting simple scalar or reference. Croaks on error. + $json_text = encode_json $perl_scalar + Converts the given Perl data structure to a UTF-8 encoded, binary + string (that is, the string contains octets only). Croaks on error. - This function call is functionally identical to "JSON::XS->new->utf8 - (1)->decode ($json_string)". + This function call is functionally identical to: + + $json_text = JSON::XS->new->utf8->encode ($perl_scalar) + + Except being faster. + + $perl_scalar = decode_json $json_text + The opposite of "encode_json": expects a UTF-8 (binary) string and + tries to parse that as a UTF-8 encoded JSON text, returning the + resulting reference. Croaks on error. + + This function call is functionally identical to: + + $perl_scalar = JSON::XS->new->utf8->decode ($json_text) + + Except being faster. + +A FEW NOTES ON UNICODE AND PERL + Since this often leads to confusion, here are a few very clear words on + how Unicode works in Perl, modulo bugs. + + 1. Perl strings can store characters with ordinal values > 255. + This enables you to store Unicode characters as single characters in + a Perl string - very natural. + + 2. Perl does *not* associate an encoding with your strings. + ... until you force it to, e.g. when matching it against a regex, or + printing the scalar to a file, in which case Perl either interprets + your string as locale-encoded text, octets/binary, or as Unicode, + depending on various settings. In no case is an encoding stored + together with your data, it is *use* that decides encoding, not any + magical meta data. + + 3. The internal utf-8 flag has no meaning with regards to the encoding + of your string. + Just ignore that flag unless you debug a Perl bug, a module written + in XS or want to dive into the internals of perl. Otherwise it will + only confuse you, as, despite the name, it says nothing about how + your string is encoded. You can have Unicode strings with that flag + set, with that flag clear, and you can have binary data with that + flag set and that flag clear. Other possibilities exist, too. + + If you didn't know about that flag, just the better, pretend it + doesn't exist. + + 4. A "Unicode String" is simply a string where each character can be + validly interpreted as a Unicode code point. + If you have UTF-8 encoded data, it is no longer a Unicode string, + but a Unicode string encoded in UTF-8, giving you a binary string. + + 5. A string containing "high" (> 255) character values is *not* a UTF-8 + string. + It's a fact. Learn to live with it. + + I hope this helps :) OBJECT-ORIENTED INTERFACE The object oriented interface lets you configure your own encoding or @@ -70,45 +146,106 @@ $json = new JSON::XS Creates a new JSON::XS object that can be used to de/encode JSON - strings. All boolean flags described below are by default - *disabled*. + strings. All boolean flags described below are by default *disabled* + (with the exception of "allow_nonref", which defaults to *enabled* + since version 4.0). The mutators for flags all return the JSON object again and thus calls can be chained: - my $json = JSON::XS->new->utf8(1)->space_after(1)->encode ({a => [1,2]}) + my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]}) => {"a": [1, 2]} - $json = $json->ascii ($enable) - If $enable is true, then the "encode" method will not generate - characters outside the code range 0..127. Any unicode characters - outside that range will be escaped using either a single \uXXXX (BMP - characters) or a double \uHHHH\uLLLLL escape sequence, as per - RFC4627. + $json = $json->ascii ([$enable]) + $enabled = $json->get_ascii + If $enable is true (or missing), then the "encode" method will not + generate characters outside the code range 0..127 (which is ASCII). + Any Unicode characters outside that range will be escaped using + either a single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL + escape sequence, as per RFC4627. The resulting encoded JSON text can + be treated as a native Unicode string, an ascii-encoded, + latin1-encoded or UTF-8 encoded string, or any other superset of + ASCII. If $enable is false, then the "encode" method will not escape - Unicode characters unless necessary. + Unicode characters unless required by the JSON syntax or other + flags. This results in a faster and more compact format. - JSON::XS->new->ascii (1)->encode (chr 0x10401) - => \ud801\udc01 + See also the section *ENCODING/CODESET FLAG NOTES* later in this + document. - $json = $json->utf8 ($enable) - If $enable is true, then the "encode" method will encode the JSON - string into UTF-8, as required by many protocols, while the "decode" - method expects to be handled an UTF-8-encoded string. Please note - that UTF-8-encoded strings do not contain any characters outside the - range 0..255, they are thus useful for bytewise/binary I/O. + The main use for this flag is to produce JSON texts that can be + transmitted over a 7-bit channel, as the encoded JSON texts will not + contain any 8 bit characters. + + JSON::XS->new->ascii (1)->encode ([chr 0x10401]) + => ["\ud801\udc01"] + + $json = $json->latin1 ([$enable]) + $enabled = $json->get_latin1 + If $enable is true (or missing), then the "encode" method will + encode the resulting JSON text as latin1 (or iso-8859-1), escaping + any characters outside the code range 0..255. The resulting string + can be treated as a latin1-encoded JSON text or a native Unicode + string. The "decode" method will not be affected in any way by this + flag, as "decode" by default expects Unicode, which is a strict + superset of latin1. + + If $enable is false, then the "encode" method will not escape + Unicode characters unless required by the JSON syntax or other + flags. + + See also the section *ENCODING/CODESET FLAG NOTES* later in this + document. + + The main use for this flag is efficiently encoding binary data as + JSON text, as most octets will not be escaped, resulting in a + smaller encoded size. The disadvantage is that the resulting JSON + text is encoded in latin1 (and must correctly be treated as such + when storing and transferring), a rare encoding for JSON. It is + therefore most useful when you want to store data structures known + to contain binary data efficiently in files or databases, not when + talking to other JSON encoders/decoders. + + JSON::XS->new->latin1->encode (["\x{89}\x{abc}"] + => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not) + + $json = $json->utf8 ([$enable]) + $enabled = $json->get_utf8 + If $enable is true (or missing), then the "encode" method will + encode the JSON result into UTF-8, as required by many protocols, + while the "decode" method expects to be handed a UTF-8-encoded + string. Please note that UTF-8-encoded strings do not contain any + characters outside the range 0..255, they are thus useful for + bytewise/binary I/O. In future versions, enabling this option might + enable autodetection of the UTF-16 and UTF-32 encoding families, as + described in RFC4627. If $enable is false, then the "encode" method will return the JSON - string as a (non-encoded) unicode string, while "decode" expects - thus a unicode string. Any decoding or encoding (e.g. to UTF-8 or + string as a (non-encoded) Unicode string, while "decode" expects + thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs to be done yourself, e.g. using the Encode module. - $json = $json->pretty ($enable) + See also the section *ENCODING/CODESET FLAG NOTES* later in this + document. + + Example, output UTF-16BE-encoded JSON: + + use Encode; + $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object); + + Example, decode UTF-32LE-encoded JSON: + + use Encode; + $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext); + + $json = $json->pretty ([$enable]) This enables (or disables) all of the "indent", "space_before" and "space_after" (and in the future possibly more) flags in one call to generate the most readable (or most compact) form possible. + Example, pretty-print some simple structure: + my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]}) => { @@ -118,163 +255,1102 @@ ] } - $json = $json->indent ($enable) - If $enable is true, then the "encode" method will use a multiline - format as output, putting every array member or object/hash - key-value pair into its own line, identing them properly. + $json = $json->indent ([$enable]) + $enabled = $json->get_indent + If $enable is true (or missing), then the "encode" method will use a + multiline format as output, putting every array member or + object/hash key-value pair into its own line, indenting them + properly. If $enable is false, no newlines or indenting will be produced, and - the resulting JSON strings is guarenteed not to contain any - "newlines". + the resulting JSON text is guaranteed not to contain any "newlines". - This setting has no effect when decoding JSON strings. + This setting has no effect when decoding JSON texts. - $json = $json->space_before ($enable) - If $enable is true, then the "encode" method will add an extra - optional space before the ":" separating keys from values in JSON - objects. + $json = $json->space_before ([$enable]) + $enabled = $json->get_space_before + If $enable is true (or missing), then the "encode" method will add + an extra optional space before the ":" separating keys from values + in JSON objects. If $enable is false, then the "encode" method will not add any extra space at those places. - This setting has no effect when decoding JSON strings. You will also + This setting has no effect when decoding JSON texts. You will also most likely combine this setting with "space_after". - $json = $json->space_after ($enable) - If $enable is true, then the "encode" method will add an extra - optional space after the ":" separating keys from values in JSON - objects and extra whitespace after the "," separating key-value + Example, space_before enabled, space_after and indent disabled: + + {"key" :"value"} + + $json = $json->space_after ([$enable]) + $enabled = $json->get_space_after + If $enable is true (or missing), then the "encode" method will add + an extra optional space after the ":" separating keys from values in + JSON objects and extra whitespace after the "," separating key-value pairs and array members. If $enable is false, then the "encode" method will not add any extra space at those places. - This setting has no effect when decoding JSON strings. + This setting has no effect when decoding JSON texts. + + Example, space_before and indent disabled, space_after enabled: - $json = $json->canonical ($enable) - If $enable is true, then the "encode" method will output JSON - objects by sorting their keys. This is adding a comparatively high - overhead. + {"key": "value"} + + $json = $json->relaxed ([$enable]) + $enabled = $json->get_relaxed + If $enable is true (or missing), then "decode" will accept some + extensions to normal JSON syntax (see below). "encode" will not be + affected in any way. *Be aware that this option makes you accept + invalid JSON texts as if they were valid!*. I suggest only to use + this option to parse application-specific files written by humans + (configuration files, resource files etc.) + + If $enable is false (the default), then "decode" will only accept + valid JSON texts. + + Currently accepted extensions are: + + * list items can have an end-comma + + JSON *separates* array elements and key-value pairs with commas. + This can be annoying if you write JSON texts manually and want + to be able to quickly append elements, so this extension accepts + comma at the end of such items not just between them: + + [ + 1, + 2, <- this comma not normally allowed + ] + { + "k1": "v1", + "k2": "v2", <- this comma not normally allowed + } + + * shell-style '#'-comments + + Whenever JSON allows whitespace, shell-style comments are + additionally allowed. They are terminated by the first + carriage-return or line-feed character, after which more + white-space and comments are allowed. + + [ + 1, # this comment not allowed in JSON + # neither this one... + ] + + * literal ASCII TAB characters in strings + + Literal ASCII TAB characters are now allowed in strings (and + treated as "\t"). + + [ + "Hello\tWorld", + "HelloWorld", # literal would not normally be allowed + ] + + $json = $json->canonical ([$enable]) + $enabled = $json->get_canonical + If $enable is true (or missing), then the "encode" method will + output JSON objects by sorting their keys. This is adding a + comparatively high overhead. If $enable is false, then the "encode" method will output key-value pairs in the order Perl stores them (which will likely change - between runs of the same script). + between runs of the same script, and can change even within the same + run from 5.18 onwards). This option is useful if you want the same data structure to be - encoded as the same JSON string (given the same overall settings). - If it is disabled, the same hash migh be encoded differently even if + encoded as the same JSON text (given the same overall settings). If + it is disabled, the same hash might be encoded differently even if contains the same data, as key-value pairs have no inherent ordering in Perl. - This setting has no effect when decoding JSON strings. + This setting has no effect when decoding JSON texts. - $json = $json->allow_nonref ($enable) - If $enable is true, then the "encode" method can convert a - non-reference into its corresponding string, number or null JSON - value, which is an extension to RFC4627. Likewise, "decode" will - accept those JSON values instead of croaking. + This setting has currently no effect on tied hashes. + + $json = $json->allow_nonref ([$enable]) + $enabled = $json->get_allow_nonref + Unlike other boolean options, this opotion is enabled by default + beginning with version 4.0. See "SECURITY CONSIDERATIONS" for the + gory details. + + If $enable is true (or missing), then the "encode" method can + convert a non-reference into its corresponding string, number or + null JSON value, which is an extension to RFC4627. Likewise, + "decode" will accept those JSON values instead of croaking. If $enable is false, then the "encode" method will croak if it isn't - passed an arrayref or hashref, as JSON strings must either be an + passed an arrayref or hashref, as JSON texts must either be an object or array. Likewise, "decode" will croak if given something that is not a JSON object or array. - $json_string = $json->encode ($perl_scalar) - Converts the given Perl data structure (a simple scalar or a - reference to a hash or array) to its JSON representation. Simple - scalars will be converted into JSON string or number sequences, - while references to arrays become JSON arrays and references to - hashes become JSON objects. Undefined Perl values (e.g. "undef") - become JSON "null" values. Neither "true" nor "false" values will be - generated. - - $perl_scalar = $json->decode ($json_string) - The opposite of "encode": expects a JSON string and tries to parse - it, returning the resulting simple scalar or reference. Croaks on - error. - - JSON numbers and strings become simple Perl scalars. JSON arrays - become Perl arrayrefs and JSON objects become Perl hashrefs. "true" - becomes 1, "false" becomes 0 and "null" becomes "undef". - -COMPARISON - As already mentioned, this module was created because none of the - existing JSON modules could be made to work correctly. First I will - describe the problems (or pleasures) I encountered with various existing - JSON modules, followed by some benchmark values. JSON::XS was designed - not to suffer from any of these problems or limitations. + Example, encode a Perl scalar as JSON value without enabled + "allow_nonref", resulting in an error: - JSON - Slow (but very portable, as it is written in pure Perl). + JSON::XS->new->allow_nonref (0)->encode ("Hello, World!") + => hash- or arrayref expected... - Undocumented/buggy Unicode handling (how JSON handles unicode values - is undocumented. One can get far by feeding it unicode strings and - doing en-/decoding oneself, but unicode escapes are not working - properly). + $json = $json->allow_unknown ([$enable]) + $enabled = $json->get_allow_unknown + If $enable is true (or missing), then "encode" will *not* throw an + exception when it encounters values it cannot represent in JSON (for + example, filehandles) but instead will encode a JSON "null" value. + Note that blessed objects are not included here and are handled + separately by c. + + If $enable is false (the default), then "encode" will throw an + exception when it encounters anything it cannot encode as JSON. + + This option does not affect "decode" in any way, and it is + recommended to leave it off unless you know your communications + partner. + + $json = $json->allow_blessed ([$enable]) + $enabled = $json->get_allow_blessed + See "OBJECT SERIALISATION" for details. + + If $enable is true (or missing), then the "encode" method will not + barf when it encounters a blessed reference that it cannot convert + otherwise. Instead, a JSON "null" value is encoded instead of the + object. + + If $enable is false (the default), then "encode" will throw an + exception when it encounters a blessed object that it cannot convert + otherwise. + + This setting has no effect on "decode". + + $json = $json->convert_blessed ([$enable]) + $enabled = $json->get_convert_blessed + See "OBJECT SERIALISATION" for details. + + If $enable is true (or missing), then "encode", upon encountering a + blessed object, will check for the availability of the "TO_JSON" + method on the object's class. If found, it will be called in scalar + context and the resulting scalar will be encoded instead of the + object. + + The "TO_JSON" method may safely call die if it wants. If "TO_JSON" + returns other blessed objects, those will be handled in the same + way. "TO_JSON" must take care of not causing an endless recursion + cycle (== crash) in this case. The name of "TO_JSON" was chosen + because other methods called by the Perl core (== not by the user of + the object) are usually in upper case letters and to avoid + collisions with any "to_json" function or method. + + If $enable is false (the default), then "encode" will not consider + this type of conversion. + + This setting has no effect on "decode". + + $json = $json->allow_tags ([$enable]) + $enabled = $json->get_allow_tags + See "OBJECT SERIALISATION" for details. + + If $enable is true (or missing), then "encode", upon encountering a + blessed object, will check for the availability of the "FREEZE" + method on the object's class. If found, it will be used to serialise + the object into a nonstandard tagged JSON value (that JSON decoders + cannot decode). + + It also causes "decode" to parse such tagged JSON values and + deserialise them via a call to the "THAW" method. + + If $enable is false (the default), then "encode" will not consider + this type of conversion, and tagged JSON values will cause a parse + error in "decode", as if tags were not part of the grammar. + + $json->boolean_values ([$false, $true]) + ($false, $true) = $json->get_boolean_values + By default, JSON booleans will be decoded as overloaded + $Types::Serialiser::false and $Types::Serialiser::true objects. + + With this method you can specify your own boolean values for + decoding - on decode, JSON "false" will be decoded as a copy of + $false, and JSON "true" will be decoded as $true ("copy" here is the + same thing as assigning a value to another variable, i.e. "$copy = + $false"). + + Calling this method without any arguments will reset the booleans to + their default values. + + "get_boolean_values" will return both $false and $true values, or + the empty list when they are set to the default. + + $json = $json->filter_json_object ([$coderef->($hashref)]) + When $coderef is specified, it will be called from "decode" each + time it decodes a JSON object. The only argument is a reference to + the newly-created hash. If the code reference returns a single + scalar (which need not be a reference), this value (or rather a copy + of it) is inserted into the deserialised data structure. If it + returns an empty list (NOTE: *not* "undef", which is a valid + scalar), the original deserialised hash will be inserted. This + setting can slow down decoding considerably. + + When $coderef is omitted or undefined, any existing callback will be + removed and "decode" will not change the deserialised hash in any + way. + + Example, convert all JSON objects into the integer 5: + + my $js = JSON::XS->new->filter_json_object (sub { 5 }); + # returns [5] + $js->decode ('[{}]') + # throw an exception because allow_nonref is not enabled + # so a lone 5 is not allowed. + $js->decode ('{"a":1, "b":2}'); + + $json = $json->filter_json_single_key_object ($key [=> + $coderef->($value)]) + Works remotely similar to "filter_json_object", but is only called + for JSON objects having a single key named $key. + + This $coderef is called before the one specified via + "filter_json_object", if any. It gets passed the single value in the + JSON object. If it returns a single value, it will be inserted into + the data structure. If it returns nothing (not even "undef" but the + empty list), the callback from "filter_json_object" will be called + next, as if no single-key callback were specified. + + If $coderef is omitted or undefined, the corresponding callback will + be disabled. There can only ever be one callback for a given key. + + As this callback gets called less often then the + "filter_json_object" one, decoding speed will not usually suffer as + much. Therefore, single-key objects make excellent targets to + serialise Perl objects into, especially as single-key JSON objects + are as close to the type-tagged value concept as JSON gets (it's + basically an ID/VALUE tuple). Of course, JSON does not support this + in any way, so you need to make sure your data never looks like a + serialised Perl hash. + + Typical names for the single object key are "__class_whatever__", or + "$__dollars_are_rarely_used__$" or "}ugly_brace_placement", or even + things like "__class_md5sum(classname)__", to reduce the risk of + clashing with real hashes. + + Example, decode JSON objects of the form "{ "__widget__" => }" + into the corresponding $WIDGET{} object: + + # return whatever is in $WIDGET{5}: + JSON::XS + ->new + ->filter_json_single_key_object (__widget__ => sub { + $WIDGET{ $_[0] } + }) + ->decode ('{"__widget__": 5') + + # this can be used with a TO_JSON method in some "widget" class + # for serialisation to json: + sub WidgetBase::TO_JSON { + my ($self) = @_; + + unless ($self->{id}) { + $self->{id} = ..get..some..id..; + $WIDGET{$self->{id}} = $self; + } - No roundtripping (strings get clobbered if they look like numbers, - e.g. the string 2.0 will encode to 2.0 instead of "2.0", and that - will decode into the number 2. - - JSON::PC - Very fast. - - Undocumented/buggy Unicode handling. - - No roundtripping. - - Has problems handling many Perl values (e.g. regex results and other - magic values will make it croak). - - Does not even generate valid JSON ("{1,2}" gets converted to "{1:2}" - which is not a valid JSON string. - - Unmaintained (maintainer unresponsive for many months, bugs are not - getting fixed). - - JSON::Syck - Very buggy (often crashes). - - Very inflexible (no human-readable format supported, format pretty - much undocumented. I need at least a format for easy reading by - humans and a single-line compact format for use in a protocol, and - preferably a way to generate ASCII-only JSON strings). - - Completely broken (and confusingly documented) Unicode handling - (unicode escapes are not working properly, you need to set - ImplicitUnicode to *different* values on en- and decoding to get - symmetric behaviour). - - No roundtripping (simple cases work, but this depends on wether the - scalar value was used in a numeric context or not). - - Dumping hashes may skip hash values depending on iterator state. - - Unmaintained (maintainer unresponsive for many months, bugs are not - getting fixed). + { __widget__ => $self->{id} } + } - Does not check input for validity (i.e. will accept non-JSON input - and return "something" instead of raising an exception. This is a - security issue: imagine two banks transfering money between each - other using JSON. One bank might parse a given non-JSON request and - deduct money, while the other might reject the transaction with a - syntax error. While a good protocol will at least recover, that is - extra unnecessary work and the transaction will still not succeed). + $json = $json->shrink ([$enable]) + $enabled = $json->get_shrink + Perl usually over-allocates memory a bit when allocating space for + strings. This flag optionally resizes strings generated by either + "encode" or "decode" to their minimum size possible. This can save + memory when your JSON texts are either very very long or you have + many short strings. It will also try to downgrade any strings to + octet-form if possible: perl stores strings internally either in an + encoding called UTF-X or in octet-form. The latter cannot store + everything but uses less space in general (and some buggy Perl or C + code might even rely on that internal representation being used). + + The actual definition of what shrink does might change in future + versions, but it will always try to save space at the expense of + time. + + If $enable is true (or missing), the string returned by "encode" + will be shrunk-to-fit, while all strings generated by "decode" will + also be shrunk-to-fit. + + If $enable is false, then the normal perl allocation algorithms are + used. If you work with your data, then this is likely to be faster. + + In the future, this setting might control other things, such as + converting strings that look like integers or floats into integers + or floats internally (there is no difference on the Perl level), + saving space. + + $json = $json->max_depth ([$maximum_nesting_depth]) + $max_depth = $json->get_max_depth + Sets the maximum nesting level (default 512) accepted while encoding + or decoding. If a higher nesting level is detected in JSON text or a + Perl data structure, then the encoder and decoder will stop and + croak at that point. + + Nesting level is defined by number of hash- or arrayrefs that the + encoder needs to traverse to reach a given point or the number of + "{" or "[" characters without their matching closing parenthesis + crossed to reach a given character in a string. + + Setting the maximum depth to one disallows any nesting, so that + ensures that the object is only a single hash/object or array. + + If no argument is given, the highest possible setting will be used, + which is rarely useful. + + Note that nesting is implemented by recursion in C. The default + value has been chosen to be as large as typical operating systems + allow without crashing. + + See SECURITY CONSIDERATIONS, below, for more info on why this is + useful. + + $json = $json->max_size ([$maximum_string_size]) + $max_size = $json->get_max_size + Set the maximum length a JSON text may have (in bytes) where + decoding is being attempted. The default is 0, meaning no limit. + When "decode" is called on a string that is longer then this many + bytes, it will not attempt to decode the string but throw an + exception. This setting has no effect on "encode" (yet). + + If no argument is given, the limit check will be deactivated (same + as when 0 is specified). + + See SECURITY CONSIDERATIONS, below, for more info on why this is + useful. + + $json_text = $json->encode ($perl_scalar) + Converts the given Perl value or data structure to its JSON + representation. Croaks on error. + + $perl_scalar = $json->decode ($json_text) + The opposite of "encode": expects a JSON text and tries to parse it, + returning the resulting simple scalar or reference. Croaks on error. + + ($perl_scalar, $characters) = $json->decode_prefix ($json_text) + This works like the "decode" method, but instead of raising an + exception when there is trailing garbage after the first JSON + object, it will silently stop parsing there and return the number of + characters consumed so far. + + This is useful if your JSON texts are not delimited by an outer + protocol and you need to know where the JSON text ends. + + JSON::XS->new->decode_prefix ("[1] the tail") + => ([1], 3) + +INCREMENTAL PARSING + In some cases, there is the need for incremental parsing of JSON texts. + While this module always has to keep both JSON text and resulting Perl + data structure in memory at one time, it does allow you to parse a JSON + stream incrementally. It does so by accumulating text until it has a + full JSON object, which it then can decode. This process is similar to + using "decode_prefix" to see if a full JSON object is available, but is + much more efficient (and can be implemented with a minimum of method + calls). + + JSON::XS will only attempt to parse the JSON text once it is sure it has + enough text to get a decisive result, using a very simple but truly + incremental parser. This means that it sometimes won't stop as early as + the full parser, for example, it doesn't detect mismatched parentheses. + The only thing it guarantees is that it starts decoding as soon as a + syntactically valid JSON text has been seen. This means you need to set + resource limits (e.g. "max_size") to ensure the parser will stop parsing + in the presence if syntax errors. + + The following methods implement this incremental parser. + + [void, scalar or list context] = $json->incr_parse ([$string]) + This is the central parsing function. It can both append new text + and extract objects from the stream accumulated so far (both of + these functions are optional). + + If $string is given, then this string is appended to the already + existing JSON fragment stored in the $json object. + + After that, if the function is called in void context, it will + simply return without doing anything further. This can be used to + add more text in as many chunks as you want. + + If the method is called in scalar context, then it will try to + extract exactly *one* JSON object. If that is successful, it will + return this object, otherwise it will return "undef". If there is a + parse error, this method will croak just as "decode" would do (one + can then use "incr_skip" to skip the erroneous part). This is the + most common way of using the method. + + And finally, in list context, it will try to extract as many objects + from the stream as it can find and return them, or the empty list + otherwise. For this to work, there must be no separators (other than + whitespace) between the JSON objects or arrays, instead they must be + concatenated back-to-back. If an error occurs, an exception will be + raised as in the scalar context case. Note that in this case, any + previously-parsed JSON texts will be lost. + + Example: Parse some JSON arrays/objects in a given string and return + them. + + my @objs = JSON::XS->new->incr_parse ("[5][7][1,2]"); + + $lvalue_string = $json->incr_text + This method returns the currently stored JSON fragment as an lvalue, + that is, you can manipulate it. This *only* works when a preceding + call to "incr_parse" in *scalar context* successfully returned an + object. Under all other circumstances you must not call this + function (I mean it. although in simple tests it might actually + work, it *will* fail under real world conditions). As a special + exception, you can also call this method before having parsed + anything. + + That means you can only use this function to look at or manipulate + text before or after complete JSON objects, not while the parser is + in the middle of parsing a JSON object. + + This function is useful in two cases: a) finding the trailing text + after a JSON object or b) parsing multiple JSON objects separated by + non-JSON text (such as commas). + + $json->incr_skip + This will reset the state of the incremental parser and will remove + the parsed text from the input buffer so far. This is useful after + "incr_parse" died, in which case the input buffer and incremental + parser state is left unchanged, to skip the text parsed so far and + to reset the parse state. + + The difference to "incr_reset" is that only text until the parse + error occurred is removed. + + $json->incr_reset + This completely resets the incremental parser, that is, after this + call, it will be as if the parser had never parsed anything. + + This is useful if you want to repeatedly parse JSON objects and want + to ignore any trailing data, which means you have to reset the + parser after each successful decode. + + LIMITATIONS + The incremental parser is a non-exact parser: it works by gathering as + much text as possible that *could* be a valid JSON text, followed by + trying to decode it. + + That means it sometimes needs to read more data than strictly necessary + to diagnose an invalid JSON text. For example, after parsing the + following fragment, the parser *could* stop with an error, as this + fragment *cannot* be the beginning of a valid JSON text: + + [, + + In reality, hopwever, the parser might continue to read data until a + length limit is exceeded or it finds a closing bracket. + + EXAMPLES + Some examples will make all this clearer. First, a simple example that + works similarly to "decode_prefix": We want to decode the JSON object at + the start of a string and identify the portion after the JSON object: + + my $text = "[1,2,3] hello"; + + my $json = new JSON::XS; + + my $obj = $json->incr_parse ($text) + or die "expected JSON object or array at beginning of string"; + + my $tail = $json->incr_text; + # $tail now contains " hello" + + Easy, isn't it? + + Now for a more complicated example: Imagine a hypothetical protocol + where you read some requests from a TCP stream, and each request is a + JSON array, without any separation between them (in fact, it is often + useful to use newlines as "separators", as these get interpreted as + whitespace at the start of the JSON text, which makes it possible to + test said protocol with "telnet"...). + + Here is how you'd do it (it is trivial to write this in an event-based + manner): + + my $json = new JSON::XS; + + # read some data from the socket + while (sysread $socket, my $buf, 4096) { + + # split and decode as many requests as possible + for my $request ($json->incr_parse ($buf)) { + # act on the $request + } + } + + Another complicated example: Assume you have a string with JSON objects + or arrays, all separated by (optional) comma characters (e.g. "[1],[2], + [3]"). To parse them, we have to skip the commas between the JSON texts, + and here is where the lvalue-ness of "incr_text" comes in useful: + + my $text = "[1],[2], [3]"; + my $json = new JSON::XS; + + # void context, so no parsing done + $json->incr_parse ($text); + + # now extract as many objects as possible. note the + # use of scalar context so incr_text can be called. + while (my $obj = $json->incr_parse) { + # do something with $obj + + # now skip the optional comma + $json->incr_text =~ s/^ \s* , //x; + } + + Now lets go for a very complex example: Assume that you have a gigantic + JSON array-of-objects, many gigabytes in size, and you want to parse it, + but you cannot load it into memory fully (this has actually happened in + the real world :). + + Well, you lost, you have to implement your own JSON parser. But JSON::XS + can still help you: You implement a (very simple) array parser and let + JSON decode the array elements, which are all full JSON objects on their + own (this wouldn't work if the array elements could be JSON numbers, for + example): + + my $json = new JSON::XS; + + # open the monster + open my $fh, "incr_parse ($buf); # void context, so no parsing + + # Exit the loop once we found and removed(!) the initial "[". + # In essence, we are (ab-)using the $json object as a simple scalar + # we append data to. + last if $json->incr_text =~ s/^ \s* \[ //x; + } + + # now we have the skipped the initial "[", so continue + # parsing all the elements. + for (;;) { + # in this loop we read data until we got a single JSON object + for (;;) { + if (my $obj = $json->incr_parse) { + # do something with $obj + last; + } + + # add more data + sysread $fh, my $buf, 65536 + or die "read error: $!"; + $json->incr_parse ($buf); # void context, so no parsing + } + + # in this loop we read data until we either found and parsed the + # separating "," between elements, or the final "]" + for (;;) { + # first skip whitespace + $json->incr_text =~ s/^\s*//; + + # if we find "]", we are done + if ($json->incr_text =~ s/^\]//) { + print "finished.\n"; + exit; + } + + # if we find ",", we can continue with the next element + if ($json->incr_text =~ s/^,//) { + last; + } + + # if we find anything else, we have a parse error! + if (length $json->incr_text) { + die "parse error near ", $json->incr_text; + } + + # else add more data + sysread $fh, my $buf, 65536 + or die "read error: $!"; + $json->incr_parse ($buf); # void context, so no parsing + } + + This is a complex example, but most of the complexity comes from the + fact that we are trying to be correct (bear with me if I am wrong, I + never ran the above example :). + +MAPPING + This section describes how JSON::XS maps Perl values to JSON values and + vice versa. These mappings are designed to "do the right thing" in most + circumstances automatically, preserving round-tripping characteristics + (what you put in comes out as something equivalent). + + For the more enlightened: note that in the following descriptions, + lowercase *perl* refers to the Perl interpreter, while uppercase *Perl* + refers to the abstract Perl language itself. + + JSON -> PERL + object + A JSON object becomes a reference to a hash in Perl. No ordering of + object keys is preserved (JSON does not preserve object key ordering + itself). + + array + A JSON array becomes a reference to an array in Perl. + + string + A JSON string becomes a string scalar in Perl - Unicode codepoints + in JSON are represented by the same codepoints in the Perl string, + so no manual decoding is necessary. + + number + A JSON number becomes either an integer, numeric (floating point) or + string scalar in perl, depending on its range and any fractional + parts. On the Perl level, there is no difference between those as + Perl handles all the conversion details, but an integer may take + slightly less memory and might represent more values exactly than + floating point numbers. + + If the number consists of digits only, JSON::XS will try to + represent it as an integer value. If that fails, it will try to + represent it as a numeric (floating point) value if that is possible + without loss of precision. Otherwise it will preserve the number as + a string value (in which case you lose roundtripping ability, as the + JSON number will be re-encoded to a JSON string). + + Numbers containing a fractional or exponential part will always be + represented as numeric (floating point) values, possibly at a loss + of precision (in which case you might lose perfect roundtripping + ability, but the JSON number will still be re-encoded as a JSON + number). + + Note that precision is not accuracy - binary floating point values + cannot represent most decimal fractions exactly, and when converting + from and to floating point, JSON::XS only guarantees precision up to + but not including the least significant bit. + + true, false + These JSON atoms become "Types::Serialiser::true" and + "Types::Serialiser::false", respectively. They are overloaded to act + almost exactly like the numbers 1 and 0. You can check whether a + scalar is a JSON boolean by using the "Types::Serialiser::is_bool" + function (after "use Types::Serialier", of course). + + null + A JSON null atom becomes "undef" in Perl. + + shell-style comments ("# *text*") + As a nonstandard extension to the JSON syntax that is enabled by the + "relaxed" setting, shell-style comments are allowed. They can start + anywhere outside strings and go till the end of the line. + + tagged values ("(*tag*)*value*"). + Another nonstandard extension to the JSON syntax, enabled with the + "allow_tags" setting, are tagged values. In this implementation, the + *tag* must be a perl package/class name encoded as a JSON string, + and the *value* must be a JSON array encoding optional constructor + arguments. + + See "OBJECT SERIALISATION", below, for details. + + PERL -> JSON + The mapping from Perl to JSON is slightly more difficult, as Perl is a + truly typeless language, so we can only guess which JSON type is meant + by a Perl value. + + hash references + Perl hash references become JSON objects. As there is no inherent + ordering in hash keys (or JSON objects), they will usually be + encoded in a pseudo-random order. JSON::XS can optionally sort the + hash keys (determined by the *canonical* flag), so the same + datastructure will serialise to the same JSON text (given same + settings and version of JSON::XS), but this incurs a runtime + overhead and is only rarely useful, e.g. when you want to compare + some JSON text against another for equality. + + array references + Perl array references become JSON arrays. + + other references + Other unblessed references are generally not allowed and will cause + an exception to be thrown, except for references to the integers 0 + and 1, which get turned into "false" and "true" atoms in JSON. + + Since "JSON::XS" uses the boolean model from Types::Serialiser, you + can also "use Types::Serialiser" and then use + "Types::Serialiser::false" and "Types::Serialiser::true" to improve + readability. + + use Types::Serialiser; + encode_json [\0, Types::Serialiser::true] # yields [false,true] + + Types::Serialiser::true, Types::Serialiser::false + These special values from the Types::Serialiser module become JSON + true and JSON false values, respectively. You can also use "\1" and + "\0" directly if you want. + + blessed objects + Blessed objects are not directly representable in JSON, but + "JSON::XS" allows various ways of handling objects. See "OBJECT + SERIALISATION", below, for details. + + simple scalars + Simple Perl scalars (any scalar that is not a reference) are the + most difficult objects to encode: JSON::XS will encode undefined + scalars as JSON "null" values, scalars that have last been used in a + string context before encoding as JSON strings, and anything else as + number value: + + # dump as number + encode_json [2] # yields [2] + encode_json [-3.0e17] # yields [-3e+17] + my $value = 5; encode_json [$value] # yields [5] + + # used as string, so dump as string + print $value; + encode_json [$value] # yields ["5"] + + # undef becomes null + encode_json [undef] # yields [null] + + You can force the type to be a JSON string by stringifying it: + + my $x = 3.1; # some variable containing a number + "$x"; # stringified + $x .= ""; # another, more awkward way to stringify + print $x; # perl does it for you, too, quite often + + You can force the type to be a JSON number by numifying it: + + my $x = "3"; # some variable containing a string + $x += 0; # numify it, ensuring it will be dumped as a number + $x *= 1; # same thing, the choice is yours. + + You can not currently force the type in other, less obscure, ways. + Tell me if you need this capability (but don't forget to explain why + it's needed :). + + Note that numerical precision has the same meaning as under Perl (so + binary to decimal conversion follows the same rules as in Perl, + which can differ to other languages). Also, your perl interpreter + might expose extensions to the floating point numbers of your + platform, such as infinities or NaN's - these cannot be represented + in JSON, and it is an error to pass those in. + + OBJECT SERIALISATION + As JSON cannot directly represent Perl objects, you have to choose + between a pure JSON representation (without the ability to deserialise + the object automatically again), and a nonstandard extension to the JSON + syntax, tagged values. + + SERIALISATION + What happens when "JSON::XS" encounters a Perl object depends on the + "allow_blessed", "convert_blessed" and "allow_tags" settings, which are + used in this order: + + 1. "allow_tags" is enabled and the object has a "FREEZE" method. + In this case, "JSON::XS" uses the Types::Serialiser object + serialisation protocol to create a tagged JSON value, using a + nonstandard extension to the JSON syntax. + + This works by invoking the "FREEZE" method on the object, with the + first argument being the object to serialise, and the second + argument being the constant string "JSON" to distinguish it from + other serialisers. + + The "FREEZE" method can return any number of values (i.e. zero or + more). These values and the paclkage/classname of the object will + then be encoded as a tagged JSON value in the following format: + + ("classname")[FREEZE return values...] + + e.g.: + + ("URI")["http://www.google.com/"] + ("MyDate")[2013,10,29] + ("ImageData::JPEG")["Z3...VlCg=="] - JSON::DWIW - Very fast. Very natural. Very nice. + For example, the hypothetical "My::Object" "FREEZE" method might use + the objects "type" and "id" members to encode the object: - Undocumented unicode handling (but the best of the pack. Unicode - escapes still don't get parsed properly). + sub My::Object::FREEZE { + my ($self, $serialiser) = @_; - Very inflexible. + ($self->{type}, $self->{id}) + } - No roundtripping. + 2. "convert_blessed" is enabled and the object has a "TO_JSON" method. + In this case, the "TO_JSON" method of the object is invoked in + scalar context. It must return a single scalar that can be directly + encoded into JSON. This scalar replaces the object in the JSON text. + + For example, the following "TO_JSON" method will convert all URI + objects to JSON strings when serialised. The fatc that these values + originally were URI objects is lost. + + sub URI::TO_JSON { + my ($uri) = @_; + $uri->as_string + } - Does not generate valid JSON (key strings are often unquoted, empty - keys result in nothing being output) + 3. "allow_blessed" is enabled. + The object will be serialised as a JSON null value. - Does not check input for validity. + 4. none of the above + If none of the settings are enabled or the respective methods are + missing, "JSON::XS" throws an exception. + + DESERIALISATION + For deserialisation there are only two cases to consider: either + nonstandard tagging was used, in which case "allow_tags" decides, or + objects cannot be automatically be deserialised, in which case you can + use postprocessing or the "filter_json_object" or + "filter_json_single_key_object" callbacks to get some real objects our + of your JSON. + + This section only considers the tagged value case: I a tagged JSON + object is encountered during decoding and "allow_tags" is disabled, a + parse error will result (as if tagged values were not part of the + grammar). + + If "allow_tags" is enabled, "JSON::XS" will look up the "THAW" method of + the package/classname used during serialisation (it will not attempt to + load the package as a Perl module). If there is no such method, the + decoding will fail with an error. + + Otherwise, the "THAW" method is invoked with the classname as first + argument, the constant string "JSON" as second argument, and all the + values from the JSON array (the values originally returned by the + "FREEZE" method) as remaining arguments. + + The method must then return the object. While technically you can return + any Perl scalar, you might have to enable the "enable_nonref" setting to + make that work in all cases, so better return an actual blessed + reference. + + As an example, let's implement a "THAW" function that regenerates the + "My::Object" from the "FREEZE" example earlier: + + sub My::Object::THAW { + my ($class, $serialiser, $type, $id) = @_; + + $class->new (type => $type, id => $id) + } + +ENCODING/CODESET FLAG NOTES + The interested reader might have seen a number of flags that signify + encodings or codesets - "utf8", "latin1" and "ascii". There seems to be + some confusion on what these do, so here is a short comparison: + + "utf8" controls whether the JSON text created by "encode" (and expected + by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only + control whether "encode" escapes character values outside their + respective codeset range. Neither of these flags conflict with each + other, although some combinations make less sense than others. + + Care has been taken to make all flags symmetrical with respect to + "encode" and "decode", that is, texts encoded with any combination of + these flag values will be correctly decoded when the same flags are used + - in general, if you use different flag settings while encoding vs. when + decoding you likely have a bug somewhere. + + Below comes a verbose discussion of these flags. Note that a "codeset" + is simply an abstract set of character-codepoint pairs, while an + encoding takes those codepoint numbers and *encodes* them, in our case + into octets. Unicode is (among other things) a codeset, UTF-8 is an + encoding, and ISO-8859-1 (= latin 1) and ASCII are both codesets *and* + encodings at the same time, which can be confusing. + + "utf8" flag disabled + When "utf8" is disabled (the default), then "encode"/"decode" + generate and expect Unicode strings, that is, characters with high + ordinal Unicode values (> 255) will be encoded as such characters, + and likewise such characters are decoded as-is, no changes to them + will be done, except "(re-)interpreting" them as Unicode codepoints + or Unicode characters, respectively (to Perl, these are the same + thing in strings unless you do funny/weird/dumb stuff). + + This is useful when you want to do the encoding yourself (e.g. when + you want to have UTF-16 encoded JSON texts) or when some other layer + does the encoding for you (for example, when printing to a terminal + using a filehandle that transparently encodes to UTF-8 you certainly + do NOT want to UTF-8 encode your data first and have Perl encode it + another time). + + "utf8" flag enabled + If the "utf8"-flag is enabled, "encode"/"decode" will encode all + characters using the corresponding UTF-8 multi-byte sequence, and + will expect your input strings to be encoded as UTF-8, that is, no + "character" of the input string must have any value > 255, as UTF-8 + does not allow that. + + The "utf8" flag therefore switches between two modes: disabled means + you will get a Unicode string in Perl, enabled means you get a UTF-8 + encoded octet/binary string in Perl. + + "latin1" or "ascii" flags enabled + With "latin1" (or "ascii") enabled, "encode" will escape characters + with ordinal values > 255 (> 127 with "ascii") and encode the + remaining characters as specified by the "utf8" flag. + + If "utf8" is disabled, then the result is also correctly encoded in + those character sets (as both are proper subsets of Unicode, meaning + that a Unicode string with all character values < 256 is the same + thing as a ISO-8859-1 string, and a Unicode string with all + character values < 128 is the same thing as an ASCII string in + Perl). + + If "utf8" is enabled, you still get a correct UTF-8-encoded string, + regardless of these flags, just some more characters will be escaped + using "\uXXXX" then before. + + Note that ISO-8859-1-*encoded* strings are not compatible with UTF-8 + encoding, while ASCII-encoded strings are. That is because the + ISO-8859-1 encoding is NOT a subset of UTF-8 (despite the ISO-8859-1 + *codeset* being a subset of Unicode), while ASCII is. + + Surprisingly, "decode" will ignore these flags and so treat all + input values as governed by the "utf8" flag. If it is disabled, this + allows you to decode ISO-8859-1- and ASCII-encoded strings, as both + strict subsets of Unicode. If it is enabled, you can correctly + decode UTF-8 encoded strings. + + So neither "latin1" nor "ascii" are incompatible with the "utf8" + flag - they only govern when the JSON output engine escapes a + character or not. + + The main use for "latin1" is to relatively efficiently store binary + data as JSON, at the expense of breaking compatibility with most + JSON decoders. + + The main use for "ascii" is to force the output to not contain + characters with values > 127, which means you can interpret the + resulting string as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about + any character set and 8-bit-encoding, and still get the same data + structure back. This is useful when your channel for JSON transfer + is not 8-bit clean or the encoding might be mangled in between (e.g. + in mail), and works because ASCII is a proper subset of most 8-bit + and multibyte encodings in use in the world. + + JSON and ECMAscript + JSON syntax is based on how literals are represented in javascript (the + not-standardised predecessor of ECMAscript) which is presumably why it + is called "JavaScript Object Notation". + + However, JSON is not a subset (and also not a superset of course) of + ECMAscript (the standard) or javascript (whatever browsers actually + implement). + + If you want to use javascript's "eval" function to "parse" JSON, you + might run into parse errors for valid JSON texts, or the resulting data + structure might not be queryable: + + One of the problems is that U+2028 and U+2029 are valid characters + inside JSON strings, but are not allowed in ECMAscript string literals, + so the following Perl fragment will not output something that can be + guaranteed to be parsable by javascript's "eval": + + use JSON::XS; + + print encode_json [chr 0x2028]; + + The right fix for this is to use a proper JSON parser in your javascript + programs, and not rely on "eval" (see for example Douglas Crockford's + json2.js parser). + + If this is not an option, you can, as a stop-gap measure, simply encode + to ASCII-only JSON: + + use JSON::XS; + + print JSON::XS->new->ascii->encode ([chr 0x2028]); + + Note that this will enlarge the resulting JSON text quite a bit if you + have many non-ASCII characters. You might be tempted to run some regexes + to only escape U+2028 and U+2029, e.g.: + + # DO NOT USE THIS! + my $json = JSON::XS->new->utf8->encode ([chr 0x2028]); + $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028 + $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029 + print $json; + + Note that *this is a bad idea*: the above only works for U+2028 and + U+2029 and thus only for fully ECMAscript-compliant parsers. Many + existing javascript implementations, however, have issues with other + characters as well - using "eval" naively simply *will* cause problems. + + Another problem is that some javascript implementations reserve some + property names for their own purposes (which probably makes them + non-ECMAscript-compliant). For example, Iceweasel reserves the + "__proto__" property name for its own purposes. + + If that is a problem, you could parse try to filter the resulting JSON + output for these property strings, e.g.: + + $json =~ s/"__proto__"\s*:/"__proto__renamed":/g; + + This works because "__proto__" is not valid outside of strings, so every + occurrence of ""__proto__"\s*:" must be a string used as property name. + + If you know of other incompatibilities, please let me know. + + JSON and YAML + You often hear that JSON is a subset of YAML. This is, however, a mass + hysteria(*) and very far from the truth (as of the time of this + writing), so let me state it clearly: *in general, there is no way to + configure JSON::XS to output a data structure as valid YAML* that works + in all cases. + + If you really must use JSON::XS to generate YAML, you should use this + algorithm (subject to change in future versions): + + my $to_yaml = JSON::XS->new->utf8->space_after (1); + my $yaml = $to_yaml->encode ($ref) . "\n"; + + This will *usually* generate JSON texts that also parse as valid YAML. + Please note that YAML has hardcoded limits on (simple) object key + lengths that JSON doesn't have and also has different and incompatible + unicode character escape syntax, so you should make sure that your hash + keys are noticeably shorter than the 1024 "stream characters" YAML + allows and that you do not have characters with codepoint values outside + the Unicode BMP (basic multilingual page). YAML also does not allow "\/" + sequences in strings (which JSON::XS does not *currently* generate, but + other JSON generators might). + + There might be other incompatibilities that I am not aware of (or the + YAML specification has been changed yet again - it does so quite often). + In general you should not try to generate YAML with a JSON generator or + vice versa, or try to parse JSON with a YAML parser or vice versa: + chances are high that you will run into severe interoperability problems + when you least expect it. + + (*) I have been pressured multiple times by Brian Ingerson (one of the + authors of the YAML specification) to remove this paragraph, despite + him acknowledging that the actual incompatibilities exist. As I was + personally bitten by this "JSON is YAML" lie, I refused and said I + will continue to educate people about these issues, so others do not + run into the same problem again and again. After this, Brian called + me a (quote)*complete and worthless idiot*(unquote). + + In my opinion, instead of pressuring and insulting people who + actually clarify issues with YAML and the wrong statements of some + of its proponents, I would kindly suggest reading the JSON spec + (which is not that difficult or long) and finally make YAML + compatible to it, and educating users about the changes, instead of + spreading lies about the real compatibility for many *years* and + trying to silence people who point out that it isn't true. + + Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, + even though the incompatibilities have been documented (and are + known to Brian) for many years and the spec makes explicit claims + that YAML is a superset of JSON. It would be so easy to fix, but + apparently, bullying people and corrupting userdata is so much + easier. SPEED It seems that JSON::XS is surprisingly fast, as shown in the following @@ -282,48 +1358,314 @@ in the JSON::XS distribution, to make it easy to compare on your own system. - First is a comparison between various modules using a very simple JSON - string, showing the number of encodes/decodes per second (JSON::XS is - the functional interface, while JSON::XS/2 is the OO interface with - pretty-printing and hashkey sorting enabled). - - module | encode | decode | - -----------|------------|------------| - JSON | 14006 | 6820 | - JSON::DWIW | 200937 | 120386 | - JSON::PC | 85065 | 129366 | - JSON::Syck | 59898 | 44232 | - JSON::XS | 1171478 | 342435 | - JSON::XS/2 | 730760 | 328714 | - -----------+------------+------------+ - - That is, JSON::XS is 6 times faster than than JSON::DWIW and about 80 - times faster than JSON, even with pretty-printing and key sorting. - - Using a longer test string (roughly 8KB, generated from Yahoo! Locals - search API (http://nanoref.com/yahooapis/mgPdGg): - - module | encode | decode | - -----------|------------|------------| - JSON | 673 | 38 | - JSON::DWIW | 5271 | 770 | - JSON::PC | 9901 | 2491 | - JSON::Syck | 2360 | 786 | - JSON::XS | 37398 | 3202 | - JSON::XS/2 | 13765 | 3153 | - -----------+------------+------------+ - - Again, JSON::XS leads by far in the encoding case, while still beating - every other module in the decoding case. - - Last example is an almost 8MB large hash with many large binary values - (PNG files), resulting in a lot of escaping: + First comes a comparison between various modules using a very short + single-line JSON string (also available at + ). + + {"method": "handleMessage", "params": ["user1", + "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7, + 1, 0]} + + It shows the number of encodes/decodes per second (JSON::XS uses the + functional interface, while JSON::XS/2 uses the OO interface with + pretty-printing and hashkey sorting enabled, JSON::XS/3 enables shrink. + JSON::DWIW/DS uses the deserialise function, while JSON::DWIW::FJ uses + the from_json method). Higher is better: + + module | encode | decode | + --------------|------------|------------| + JSON::DWIW/DS | 86302.551 | 102300.098 | + JSON::DWIW/FJ | 86302.551 | 75983.768 | + JSON::PP | 15827.562 | 6638.658 | + JSON::Syck | 63358.066 | 47662.545 | + JSON::XS | 511500.488 | 511500.488 | + JSON::XS/2 | 291271.111 | 388361.481 | + JSON::XS/3 | 361577.931 | 361577.931 | + Storable | 66788.280 | 265462.278 | + --------------+------------+------------+ + + That is, JSON::XS is almost six times faster than JSON::DWIW on + encoding, about five times faster on decoding, and over thirty to + seventy times faster than JSON's pure perl implementation. It also + compares favourably to Storable for small amounts of data. + + Using a longer test string (roughly 18KB, generated from Yahoo! Locals + search API (). + + module | encode | decode | + --------------|------------|------------| + JSON::DWIW/DS | 1647.927 | 2673.916 | + JSON::DWIW/FJ | 1630.249 | 2596.128 | + JSON::PP | 400.640 | 62.311 | + JSON::Syck | 1481.040 | 1524.869 | + JSON::XS | 20661.596 | 9541.183 | + JSON::XS/2 | 10683.403 | 9416.938 | + JSON::XS/3 | 20661.596 | 9400.054 | + Storable | 19765.806 | 10000.725 | + --------------+------------+------------+ + + Again, JSON::XS leads by far (except for Storable which non-surprisingly + decodes a bit faster). + + On large strings containing lots of high Unicode characters, some + modules (such as JSON::PC) seem to decode faster than JSON::XS, but the + result will be broken due to missing (or wrong) Unicode handling. Others + refuse to decode or encode properly, so it was impossible to prepare a + fair comparison table for that case. + +SECURITY CONSIDERATIONS + When you are using JSON in a protocol, talking to untrusted potentially + hostile creatures requires relatively few measures. + + First of all, your JSON decoder should be secure, that is, should not + have any buffer overflows. Obviously, this module should ensure that and + I am trying hard on making that true, but you never know. + + Second, you need to avoid resource-starving attacks. That means you + should limit the size of JSON texts you accept, or make sure then when + your resources run out, that's just fine (e.g. by using a separate + process that can crash safely). The size of a JSON text in octets or + characters is usually a good indication of the size of the resources + required to decode it into a Perl structure. While JSON::XS can check + the size of the JSON text, it might be too late when you already have it + in memory, so you might want to check the size before you accept the + string. + + Third, JSON::XS recurses using the C stack when decoding objects and + arrays. The C stack is a limited resource: for instance, on my amd64 + machine with 8MB of stack size I can decode around 180k nested arrays + but only 14k nested JSON objects (due to perl itself recursing deeply on + croak to free the temporary). If that is exceeded, the program crashes. + To be conservative, the default nesting limit is set to 512. If your + process has a smaller stack, you should adjust this setting accordingly + with the "max_depth" method. + + Something else could bomb you, too, that I forgot to think of. In that + case, you get to keep the pieces. I am always open for hints, though... + + Also keep in mind that JSON::XS might leak contents of your Perl data + structures in its error messages, so when you serialise sensitive + information you might want to make sure that exceptions thrown by + JSON::XS will not end up in front of untrusted eyes. + + If you are using JSON::XS to return packets to consumption by JavaScript + scripts in a browser you should have a look at + + to see whether you are vulnerable to some common attack vectors (which + really are browser design bugs, but it is still you who will have to + deal with it, as major browser developers care only for features, not + about getting security right). + + "OLD" VS. "NEW" JSON (RFC4627 VS. RFC7159) + JSON originally required JSON texts to represent an array or object - + scalar values were explicitly not allowed. This has changed, and + versions of JSON::XS beginning with 4.0 reflect this by allowing scalar + values by default. + + One reason why one might not want this is that this removes a + fundamental property of JSON texts, namely that they are self-delimited + and self-contained, or in other words, you could take any number of + "old" JSON texts and paste them together, and the result would be + unambiguously parseable: + + [1,3]{"k":5}[][null] # four JSON texts, without doubt + + By allowing scalars, this property is lost: in the following example, is + this one JSON text (the number 12) or two JSON texts (the numbers 1 and + 2): + + 12 # could be 12, or 1 and 2 + + Another lost property of "old" JSON is that no lookahead is required to + know the end of a JSON text, i.e. the JSON text definitely ended at the + last "]" or "}" character, there was no need to read extra characters. + + For example, a viable network protocol with "old" JSON was to simply + exchange JSON texts without delimiter. For "new" JSON, you have to use a + suitable delimiter (such as a newline) after every JSON text or ensure + you never encode/decode scalar values. + + Most protocols do work by only transferring arrays or objects, and the + easiest way to avoid problems with the "new" JSON definition is to + explicitly disallow scalar values in your encoder and decoder: + + $json_coder = JSON::XS->new->allow_nonref (0) + + This is a somewhat unhappy situation, and the blame can fully be put on + JSON's inmventor, Douglas Crockford, who unilaterally changed the format + in 2006 without consulting the IETF, forcing the IETF to either fork the + format or go with it (as I was told, the IETF wasn't amused). + +RELATIONSHIP WITH I-JSON + JSON is a somewhat sloppily-defined format - it carries around obvious + Javascript baggage, such as not really defining number range, probably + because Javascript only has one type of numbers: IEEE 64 bit floats + ("binary64"). + + For this reaosn, RFC7493 defines "Internet JSON", which is a restricted + subset of JSON that is supposedly more interoperable on the internet. + + While "JSON::XS" does not offer specific support for I-JSON, it of + course accepts valid I-JSON and by default implements some of the + limitations of I-JSON, such as parsing numbers as perl numbers, which + are usually a superset of binary64 numbers. + + To generate I-JSON, follow these rules: + + * always generate UTF-8 + + I-JSON must be encoded in UTF-8, the default for "encode_json". + + * numbers should be within IEEE 754 binary64 range + + Basically all existing perl installations use binary64 to represent + floating point numbers, so all you need to do is to avoid large + integers. + + * objects must not have duplicate keys + + This is trivially done, as "JSON::XS" does not allow duplicate keys. + + * do not generate scalar JSON texts, use "->allow_nonref (0)" + + I-JSON strongly requests you to only encode arrays and objects into + JSON. + + * times should be strings in ISO 8601 format + + There are a myriad of modules on CPAN dealing with ISO 8601 - search + for "ISO8601" on CPAN and use one. + + * encode binary data as base64 + + While it's tempting to just dump binary data as a string (and let + "JSON::XS" do the escaping), for I-JSON, it's *recommended* to + encode binary data as base64. + + There are some other considerations - read RFC7493 for the details if + interested. + +INTEROPERABILITY WITH OTHER MODULES + "JSON::XS" uses the Types::Serialiser module to provide boolean + constants. That means that the JSON true and false values will be + comaptible to true and false values of other modules that do the same, + such as JSON::PP and CBOR::XS. + +INTEROPERABILITY WITH OTHER JSON DECODERS + As long as you only serialise data that can be directly expressed in + JSON, "JSON::XS" is incapable of generating invalid JSON output (modulo + bugs, but "JSON::XS" has found more bugs in the official JSON testsuite + (1) than the official JSON testsuite has found in "JSON::XS" (0)). + + When you have trouble decoding JSON generated by this module using other + decoders, then it is very likely that you have an encoding mismatch or + the other decoder is broken. + + When decoding, "JSON::XS" is strict by default and will likely catch all + errors. There are currently two settings that change this: "relaxed" + makes "JSON::XS" accept (but not generate) some non-standard extensions, + and "allow_tags" will allow you to encode and decode Perl objects, at + the cost of not outputting valid JSON anymore. + + TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS + When you use "allow_tags" to use the extended (and also nonstandard and + invalid) JSON syntax for serialised objects, and you still want to + decode the generated When you want to serialise objects, you can run a + regex to replace the tagged syntax by standard JSON arrays (it only + works for "normal" package names without comma, newlines or single + colons). First, the readable Perl version: + + # if your FREEZE methods return no values, you need this replace first: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx; + + # this works for non-empty constructor arg lists: + $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx; + + And here is a less readable version that is easy to adapt to other + languages: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g; + + Here is an ECMAScript version (same regex): + + json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,"); + + Since this syntax converts to standard JSON arrays, it might be hard to + distinguish serialised objects from normal arrays. You can prepend a + "magic number" as first array element to reduce chances of a collision: + + $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g; + + And after decoding the JSON text, you could walk the data structure + looking for arrays with a first element of + "XU1peReLzT4ggEllLanBYq4G9VzliwKF". + + The same approach can be used to create the tagged format with another + encoder. First, you create an array with the magic string as first + member, the classname as second, and constructor arguments last, encode + it as part of your JSON structure, and then: + + $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g; + + Again, this has some limitations - the magic string must not be encoded + with character escapes, and the constructor arguments must be non-empty. + +(I-)THREADS + This module is *not* guaranteed to be ithread (or MULTIPLICITY-) safe + and there are no plans to change this. Note that perl's builtin + so-called threads/ithreads are officially deprecated and should not be + used. + +THE PERILS OF SETLOCALE + Sometimes people avoid the Perl locale support and directly call the + system's setlocale function with "LC_ALL". + + This breaks both perl and modules such as JSON::XS, as stringification + of numbers no longer works correctly (e.g. "$x = 0.1; print "$x"+1" + might print 1, and JSON::XS might output illegal JSON as JSON::XS relies + on perl to stringify numbers). + + The solution is simple: don't call "setlocale", or use it for only those + categories you need, such as "LC_MESSAGES" or "LC_CTYPE". + + If you need "LC_NUMERIC", you should enable it only around the code that + actually needs it (avoiding stringification of numbers), and restore it + afterwards. + +SOME HISTORY + At the time this module was created there already were a number of JSON + modules available on CPAN, so what was the reason to write yet another + JSON module? While it seems there are many JSON modules, none of them + correctly handled all corner cases, and in most cases their maintainers + are unresponsive, gone missing, or not listening to bug reports for + other reasons. + + Beginning with version 2.0 of the JSON module, when both JSON and + JSON::XS are installed, then JSON will fall back on JSON::XS (this can + be overridden) with no overhead due to emulation (by inheriting + constructor and methods). If JSON::XS is not available, it will fall + back to the compatible JSON::PP module as backend, so using JSON instead + of JSON::XS gives you a portable JSON API that can be fast when you need + it and doesn't require a C compiler when that is a problem. + + Somewhere around version 3, this module was forked into + "Cpanel::JSON::XS", because its maintainer had serious trouble + understanding JSON and insisted on a fork with many bugs "fixed" that + weren't actually bugs, while spreading FUD about this module without + actually giving any details on his accusations. You be the judge, but in + my personal opinion, if you want quality, you will stay away from + dangerous forks like that. BUGS While the goal of this module is to be correct, that unfortunately does - not mean its bug-free, only that I think its design is bug-free. It is - still very young and not well-tested. If you keep reporting bugs they - will be fixed swiftly, though. + not mean it's bug-free, only that I think its design is bug-free. If you + keep reporting bugs they will be fixed swiftly, though. + + Please refrain from using rt.cpan.org or any other bug reporting + service. I put the contact address into my modules for a reason. + +SEE ALSO + The json_xs command line utility for quick experiments. AUTHOR Marc Lehmann