--- JSON-XS/README 2007/03/24 01:15:22 1.5 +++ JSON-XS/README 2007/03/29 02:45:49 1.9 @@ -4,12 +4,17 @@ SYNOPSIS use JSON::XS; - # exported functions, croak on error + # exported functions, they croak on error + # and expect/generate UTF-8 $utf8_encoded_json_text = to_json $perl_hash_or_arrayref; $perl_hash_or_arrayref = from_json $utf8_encoded_json_text; - # oo-interface + # objToJson and jsonToObj aliases to to_json and from_json + # are exported for compatibility to the JSON module, + # but should not be used in new code. + + # OO-interface $coder = JSON::XS->new->ascii->pretty->allow_nonref; $pretty_printed_unencoded = $coder->encode ($perl_scalar); @@ -32,17 +37,18 @@ vice versa. FEATURES - * correct handling of unicode issues + * correct unicode handling This module knows how to handle Unicode, and even documents how and when it does so. * round-trip integrity When you serialise a perl data structure using only datatypes supported by JSON, the deserialised data structure is identical on - the Perl level. (e.g. the string "2.0" doesn't suddenly become "2"). + the Perl level. (e.g. the string "2.0" doesn't suddenly become "2" + just because it looks like a number). * strict checking of JSON correctness - There is no guessing, no generating of illegal JSON strings by + There is no guessing, no generating of illegal JSON texts by default, and only JSON is accepted as input by default (the latter is a security feature). @@ -57,29 +63,36 @@ * reasonably versatile output formats You can choose between the most compact guarenteed single-line format possible (nice for simple line-based protocols), a pure-ascii - format (for when your transport is not 8-bit clean), or a - pretty-printed format (for when you want to read that stuff). Or you - can combine those features in whatever way you like. + format (for when your transport is not 8-bit clean, still supports + the whole unicode range), or a pretty-printed format (for when you + want to read that stuff). Or you can combine those features in + whatever way you like. FUNCTIONAL INTERFACE The following convinience methods are provided by this module. They are exported by default: - $json_string = to_json $perl_scalar + $json_text = to_json $perl_scalar Converts the given Perl data structure (a simple scalar or a reference to a hash or array) to a UTF-8 encoded, binary string (that is, the string contains octets only). Croaks on error. - This function call is functionally identical to - "JSON::XS->new->utf8->encode ($perl_scalar)". + This function call is functionally identical to: + + $json_text = JSON::XS->new->utf8->encode ($perl_scalar) - $perl_scalar = from_json $json_string + except being faster. + + $perl_scalar = from_json $json_text The opposite of "to_json": expects an UTF-8 (binary) string and - tries to parse that as an UTF-8 encoded JSON string, returning the + tries to parse that as an UTF-8 encoded JSON text, returning the resulting simple scalar or reference. Croaks on error. - This function call is functionally identical to - "JSON::XS->new->utf8->decode ($json_string)". + This function call is functionally identical to: + + $perl_scalar = JSON::XS->new->utf8->decode ($json_text) + + except being faster. OBJECT-ORIENTED INTERFACE The object oriented interface lets you configure your own encoding or @@ -93,36 +106,47 @@ The mutators for flags all return the JSON object again and thus calls can be chained: - my $json = JSON::XS->new->utf8(1)->space_after(1)->encode ({a => [1,2]}) + my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]}) => {"a": [1, 2]} $json = $json->ascii ([$enable]) If $enable is true (or missing), then the "encode" method will not - generate characters outside the code range 0..127. Any unicode - characters outside that range will be escaped using either a single - \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence, - as per RFC4627. + generate characters outside the code range 0..127 (which is ASCII). + Any unicode characters outside that range will be escaped using + either a single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL + escape sequence, as per RFC4627. If $enable is false, then the "encode" method will not escape - Unicode characters unless necessary. + Unicode characters unless required by the JSON syntax. This results + in a faster and more compact format. - JSON::XS->new->ascii (1)->encode (chr 0x10401) - => \ud801\udc01 + JSON::XS->new->ascii (1)->encode ([chr 0x10401]) + => ["\ud801\udc01"] $json = $json->utf8 ([$enable]) If $enable is true (or missing), then the "encode" method will - encode the JSON string into UTF-8, as required by many protocols, + encode the JSON result into UTF-8, as required by many protocols, while the "decode" method expects to be handled an UTF-8-encoded string. Please note that UTF-8-encoded strings do not contain any characters outside the range 0..255, they are thus useful for - bytewise/binary I/O. + bytewise/binary I/O. In future versions, enabling this option might + enable autodetection of the UTF-16 and UTF-32 encoding families, as + described in RFC4627. If $enable is false, then the "encode" method will return the JSON string as a (non-encoded) unicode string, while "decode" expects thus a unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs to be done yourself, e.g. using the Encode module. - Example, output UTF-16-encoded JSON: + Example, output UTF-16BE-encoded JSON: + + use Encode; + $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object); + + Example, decode UTF-32LE-encoded JSON: + + use Encode; + $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext); $json = $json->pretty ([$enable]) This enables (or disables) all of the "indent", "space_before" and @@ -147,10 +171,9 @@ properly. If $enable is false, no newlines or indenting will be produced, and - the resulting JSON strings is guarenteed not to contain any - "newlines". + the resulting JSON text is guarenteed not to contain any "newlines". - This setting has no effect when decoding JSON strings. + This setting has no effect when decoding JSON texts. $json = $json->space_before ([$enable]) If $enable is true (or missing), then the "encode" method will add @@ -160,7 +183,7 @@ If $enable is false, then the "encode" method will not add any extra space at those places. - This setting has no effect when decoding JSON strings. You will also + This setting has no effect when decoding JSON texts. You will also most likely combine this setting with "space_after". Example, space_before enabled, space_after and indent disabled: @@ -176,7 +199,7 @@ If $enable is false, then the "encode" method will not add any extra space at those places. - This setting has no effect when decoding JSON strings. + This setting has no effect when decoding JSON texts. Example, space_before and indent disabled, space_after enabled: @@ -192,12 +215,12 @@ between runs of the same script). This option is useful if you want the same data structure to be - encoded as the same JSON string (given the same overall settings). - If it is disabled, the same hash migh be encoded differently even if + encoded as the same JSON text (given the same overall settings). If + it is disabled, the same hash migh be encoded differently even if contains the same data, as key-value pairs have no inherent ordering in Perl. - This setting has no effect when decoding JSON strings. + This setting has no effect when decoding JSON texts. $json = $json->allow_nonref ([$enable]) If $enable is true (or missing), then the "encode" method can @@ -206,7 +229,7 @@ "decode" will accept those JSON values instead of croaking. If $enable is false, then the "encode" method will croak if it isn't - passed an arrayref or hashref, as JSON strings must either be an + passed an arrayref or hashref, as JSON texts must either be an object or array. Likewise, "decode" will croak if given something that is not a JSON object or array. @@ -220,11 +243,16 @@ Perl usually over-allocates memory a bit when allocating space for strings. This flag optionally resizes strings generated by either "encode" or "decode" to their minimum size possible. This can save - memory when your JSON strings are either very very long or you have + memory when your JSON texts are either very very long or you have many short strings. It will also try to downgrade any strings to octet-form if possible: perl stores strings internally either in an encoding called UTF-X or in octet-form. The latter cannot store - everything but uses less space in general. + everything but uses less space in general (and some buggy Perl or C + code might even rely on that internal representation being used). + + The actual definition of what shrink does might change in future + versions, but it will always try to save space at the expense of + time. If $enable is true (or missing), the string returned by "encode" will be shrunk-to-fit, while all strings generated by "decode" will @@ -238,7 +266,27 @@ or floats internally (there is no difference on the Perl level), saving space. - $json_string = $json->encode ($perl_scalar) + $json = $json->max_depth ([$maximum_nesting_depth]) + Sets the maximum nesting level (default 4096) accepted while + encoding or decoding. If the JSON text or Perl data structure has an + equal or higher nesting level then this limit, then the encoder and + decoder will stop and croak at that point. + + Nesting level is defined by number of hash- or arrayrefs that the + encoder needs to traverse to reach a given point or the number of + "{" or "[" characters without their matching closing parenthesis + crossed to reach a given character in a string. + + Setting the maximum depth to one disallows any nesting, so that + ensures that the object is only a single hash/object or array. + + The argument to "max_depth" will be rounded up to the next nearest + power of two. + + See SECURITY CONSIDERATIONS, below, for more info on why this is + useful. + + $json_text = $json->encode ($perl_scalar) Converts the given Perl data structure (a simple scalar or a reference to a hash or array) to its JSON representation. Simple scalars will be converted into JSON string or number sequences, @@ -247,10 +295,9 @@ become JSON "null" values. Neither "true" nor "false" values will be generated. - $perl_scalar = $json->decode ($json_string) - The opposite of "encode": expects a JSON string and tries to parse - it, returning the resulting simple scalar or reference. Croaks on - error. + $perl_scalar = $json->decode ($json_text) + The opposite of "encode": expects a JSON text and tries to parse it, + returning the resulting simple scalar or reference. Croaks on error. JSON numbers and strings become simple Perl scalars. JSON arrays become Perl arrayrefs and JSON objects become Perl hashrefs. "true" @@ -304,17 +351,28 @@ hash references Perl hash references become JSON objects. As there is no inherent - ordering in hash keys, they will usually be encoded in a - pseudo-random order that can change between runs of the same program - but stays generally the same within a single run of a program. - JSON::XS can optionally sort the hash keys (determined by the - *canonical* flag), so the same datastructure will serialise to the - same JSON text (given same settings and version of JSON::XS), but - this incurs a runtime overhead. + ordering in hash keys (or JSON objects), they will usually be + encoded in a pseudo-random order that can change between runs of the + same program but stays generally the same within a single run of a + program. JSON::XS can optionally sort the hash keys (determined by + the *canonical* flag), so the same datastructure will serialise to + the same JSON text (given same settings and version of JSON::XS), + but this incurs a runtime overhead and is only rarely useful, e.g. + when you want to compare some JSON text against another for + equality. array references Perl array references become JSON arrays. + other references + Other unblessed references are generally not allowed and will cause + an exception to be thrown, except for references to the integers 0 + and 1, which get turned into "false" and "true" atoms in JSON. You + can also use "JSON::XS::false" and "JSON::XS::true" to improve + readability. + + to_json [\0,JSON::XS::true] # yields [false,true] + blessed objects Blessed objects are not allowed. JSON::XS currently tries to encode their underlying representation (hash- or arrayref), but this @@ -388,7 +446,7 @@ magic values will make it croak). Does not even generate valid JSON ("{1,2}" gets converted to "{1:2}" - which is not a valid JSON string. + which is not a valid JSON text. Unmaintained (maintainer unresponsive for many months, bugs are not getting fixed). @@ -399,7 +457,7 @@ Very inflexible (no human-readable format supported, format pretty much undocumented. I need at least a format for easy reading by humans and a single-line compact format for use in a protocol, and - preferably a way to generate ASCII-only JSON strings). + preferably a way to generate ASCII-only JSON texts). Completely broken (and confusingly documented) Unicode handling (unicode escapes are not working properly, you need to set @@ -432,8 +490,8 @@ No roundtripping. - Does not generate valid JSON (key strings are often unquoted, empty - keys result in nothing being output) + Does not generate valid JSON texts (key strings are often unquoted, + empty keys result in nothing being output) Does not check input for validity. @@ -444,22 +502,26 @@ system. First comes a comparison between various modules using a very short JSON - string (83 bytes), showing the number of encodes/decodes per second - (JSON::XS is the functional interface, while JSON::XS/2 is the OO - interface with pretty-printing and hashkey sorting enabled). Higher is - better: + string: + + {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null} + + It shows the number of encodes/decodes per second (JSON::XS uses the + functional interface, while JSON::XS/2 uses the OO interface with + pretty-printing and hashkey sorting enabled). Higher is better: module | encode | decode | -----------|------------|------------| - JSON | 14006 | 6820 | - JSON::DWIW | 200937 | 120386 | - JSON::PC | 85065 | 129366 | - JSON::Syck | 59898 | 44232 | - JSON::XS | 1171478 | 342435 | - JSON::XS/2 | 730760 | 328714 | + JSON | 11488.516 | 7823.035 | + JSON::DWIW | 94708.054 | 129094.260 | + JSON::PC | 63884.157 | 128528.212 | + JSON::Syck | 34898.677 | 42096.911 | + JSON::XS | 654027.064 | 396423.669 | + JSON::XS/2 | 371564.190 | 371725.613 | -----------+------------+------------+ - That is, JSON::XS is 6 times faster than than JSON::DWIW and about 80 + That is, JSON::XS is more than six times faster than JSON::DWIW on + encoding, more than three times faster on decoding, and about thirty times faster than JSON, even with pretty-printing and key sorting. Using a longer test string (roughly 18KB, generated from Yahoo! Locals @@ -467,34 +529,54 @@ module | encode | decode | -----------|------------|------------| - JSON | 673 | 38 | - JSON::DWIW | 5271 | 770 | - JSON::PC | 9901 | 2491 | - JSON::Syck | 2360 | 786 | - JSON::XS | 37398 | 3202 | - JSON::XS/2 | 13765 | 3153 | + JSON | 273.023 | 44.674 | + JSON::DWIW | 1089.383 | 1145.704 | + JSON::PC | 3097.419 | 2393.921 | + JSON::Syck | 514.060 | 843.053 | + JSON::XS | 6479.668 | 3636.364 | + JSON::XS/2 | 3774.221 | 3599.124 | -----------+------------+------------+ - Again, JSON::XS leads by far in the encoding case, while still beating - every other module in the decoding case. + Again, JSON::XS leads by far. - On large strings containing lots of unicode characters, some modules - (such as JSON::PC) decode faster than JSON::XS, but the result will be - broken due to missing unicode handling. Others refuse to decode or - encode properly, so it was impossible to prepare a fair comparison table - for that case. - -RESOURCE LIMITS - JSON::XS does not impose any limits on the size of JSON texts or Perl - values they represent - if your machine can handle it, JSON::XS will - encode or decode it. Future versions might optionally impose structure - depth and memory use resource limits. + On large strings containing lots of high unicode characters, some + modules (such as JSON::PC) seem to decode faster than JSON::XS, but the + result will be broken due to missing (or wrong) unicode handling. Others + refuse to decode or encode properly, so it was impossible to prepare a + fair comparison table for that case. + +SECURITY CONSIDERATIONS + When you are using JSON in a protocol, talking to untrusted potentially + hostile creatures requires relatively few measures. + + First of all, your JSON decoder should be secure, that is, should not + have any buffer overflows. Obviously, this module should ensure that and + I am trying hard on making that true, but you never know. + + Second, you need to avoid resource-starving attacks. That means you + should limit the size of JSON texts you accept, or make sure then when + your resources run out, thats just fine (e.g. by using a separate + process that can crash safely). The size of a JSON text in octets or + characters is usually a good indication of the size of the resources + required to decode it into a Perl structure. + + Third, JSON::XS recurses using the C stack when decoding objects and + arrays. The C stack is a limited resource: for instance, on my amd64 + machine with 8MB of stack size I can decode around 180k nested arrays + but only 14k nested JSON objects. If that is exceeded, the program + crashes. Thats why the default nesting limit is set to 4096. If your + process has a smaller stack, you should adjust this setting accordingly + with the "max_depth" method. + + And last but least, something else could bomb you that I forgot to think + of. In that case, you get to keep the pieces. I am alway sopen for + hints, though... BUGS While the goal of this module is to be correct, that unfortunately does not mean its bug-free, only that I think its design is bug-free. It is - still very young and not well-tested. If you keep reporting bugs they - will be fixed swiftly, though. + still relatively early in its development. If you keep reporting bugs + they will be fixed swiftly, though. AUTHOR Marc Lehmann