--- JSON-XS/XS.pm 2007/03/23 17:40:29 1.10 +++ JSON-XS/XS.pm 2007/03/25 21:19:13 1.23 @@ -6,6 +6,22 @@ use JSON::XS; + # exported functions, they croak on error + # and expect/generate UTF-8 + + $utf8_encoded_json_text = to_json $perl_hash_or_arrayref; + $perl_hash_or_arrayref = from_json $utf8_encoded_json_text; + + # objToJson and jsonToObj aliases to to_json and from_json + # are exported for compatibility to the JSON module, + # but should not be used in new code. + + # OO-interface + + $coder = JSON::XS->new->ascii->pretty->allow_nonref; + $pretty_printed_unencoded = $coder->encode ($perl_scalar); + $perl_scalar = $coder->decode ($unicode_json_text); + =head1 DESCRIPTION This module converts Perl data structures to JSON and vice versa. Its @@ -27,7 +43,7 @@ =over 4 -=item * correct handling of unicode issues +=item * correct unicode handling This module knows how to handle Unicode, and even documents how and when it does so. @@ -36,11 +52,12 @@ When you serialise a perl data structure using only datatypes supported by JSON, the deserialised data structure is identical on the Perl level. -(e.g. the string "2.0" doesn't suddenly become "2"). +(e.g. the string "2.0" doesn't suddenly become "2" just because it looks +like a number). =item * strict checking of JSON correctness -There is no guessing, no generating of illegal JSON strings by default, +There is no guessing, no generating of illegal JSON texts by default, and only JSON is accepted as input by default (the latter is a security feature). @@ -57,10 +74,10 @@ =item * reasonably versatile output formats You can choose between the most compact guarenteed single-line format -possible (nice for simple line-based protocols), a pure-ascii format (for -when your transport is not 8-bit clean), or a pretty-printed format (for -when you want to read that stuff). Or you can combine those features in -whatever way you like. +possible (nice for simple line-based protocols), a pure-ascii format +(for when your transport is not 8-bit clean, still supports the whole +unicode range), or a pretty-printed format (for when you want to read that +stuff). Or you can combine those features in whatever way you like. =back @@ -68,11 +85,13 @@ package JSON::XS; +use strict; + BEGIN { - $VERSION = '0.3'; - @ISA = qw(Exporter); + our $VERSION = '0.8'; + our @ISA = qw(Exporter); - @EXPORT = qw(to_json from_json); + our @EXPORT = qw(to_json from_json objToJson jsonToObj); require Exporter; require XSLoader; @@ -86,24 +105,33 @@ =over 4 -=item $json_string = to_json $perl_scalar +=item $json_text = to_json $perl_scalar Converts the given Perl data structure (a simple scalar or a reference to a hash or array) to a UTF-8 encoded, binary string (that is, the string contains octets only). Croaks on error. -This function call is functionally identical to C<< JSON::XS->new->utf8->encode ($perl_scalar) >>. +This function call is functionally identical to: + + $json_text = JSON::XS->new->utf8->encode ($perl_scalar) -=item $perl_scalar = from_json $json_string +except being faster. + +=item $perl_scalar = from_json $json_text The opposite of C: expects an UTF-8 (binary) string and tries to -parse that as an UTF-8 encoded JSON string, returning the resulting simple +parse that as an UTF-8 encoded JSON text, returning the resulting simple scalar or reference. Croaks on error. -This function call is functionally identical to C<< JSON::XS->new->utf8->decode ($json_string) >>. +This function call is functionally identical to: + + $perl_scalar = JSON::XS->new->utf8->decode ($json_text) + +except being faster. =back + =head1 OBJECT-ORIENTED INTERFACE The object oriented interface lets you configure your own encoding or @@ -119,42 +147,57 @@ The mutators for flags all return the JSON object again and thus calls can be chained: - my $json = JSON::XS->new->utf8(1)->space_after(1)->encode ({a => [1,2]}) + my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]}) => {"a": [1, 2]} =item $json = $json->ascii ([$enable]) -If C<$enable> is true (or missing), then the C method will -not generate characters outside the code range C<0..127>. Any unicode -characters outside that range will be escaped using either a single -\uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence, as per -RFC4627. +If C<$enable> is true (or missing), then the C method will not +generate characters outside the code range C<0..127> (which is ASCII). Any +unicode characters outside that range will be escaped using either a +single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence, +as per RFC4627. If C<$enable> is false, then the C method will not escape Unicode -characters unless necessary. +characters unless required by the JSON syntax. This results in a faster +and more compact format. - JSON::XS->new->ascii (1)->encode (chr 0x10401) - => \ud801\udc01 + JSON::XS->new->ascii (1)->encode ([chr 0x10401]) + => ["\ud801\udc01"] =item $json = $json->utf8 ([$enable]) If C<$enable> is true (or missing), then the C method will encode -the JSON string into UTF-8, as required by many protocols, while the +the JSON result into UTF-8, as required by many protocols, while the C method expects to be handled an UTF-8-encoded string. Please note that UTF-8-encoded strings do not contain any characters outside the -range C<0..255>, they are thus useful for bytewise/binary I/O. +range C<0..255>, they are thus useful for bytewise/binary I/O. In future +versions, enabling this option might enable autodetection of the UTF-16 +and UTF-32 encoding families, as described in RFC4627. If C<$enable> is false, then the C method will return the JSON string as a (non-encoded) unicode string, while C expects thus a unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs to be done yourself, e.g. using the Encode module. +Example, output UTF-16BE-encoded JSON: + + use Encode; + $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object); + +Example, decode UTF-32LE-encoded JSON: + + use Encode; + $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext); + =item $json = $json->pretty ([$enable]) This enables (or disables) all of the C, C and C (and in the future possibly more) flags in one call to generate the most readable (or most compact) form possible. +Example, pretty-print some simple structure: + my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]}) => { @@ -171,9 +214,9 @@ into its own line, identing them properly. If C<$enable> is false, no newlines or indenting will be produced, and the -resulting JSON strings is guarenteed not to contain any C. +resulting JSON text is guarenteed not to contain any C. -This setting has no effect when decoding JSON strings. +This setting has no effect when decoding JSON texts. =item $json = $json->space_before ([$enable]) @@ -183,8 +226,12 @@ If C<$enable> is false, then the C method will not add any extra space at those places. -This setting has no effect when decoding JSON strings. You will also most -likely combine this setting with C. +This setting has no effect when decoding JSON texts. You will also +most likely combine this setting with C. + +Example, space_before enabled, space_after and indent disabled: + + {"key" :"value"} =item $json = $json->space_after ([$enable]) @@ -196,7 +243,11 @@ If C<$enable> is false, then the C method will not add any extra space at those places. -This setting has no effect when decoding JSON strings. +This setting has no effect when decoding JSON texts. + +Example, space_before and indent disabled, space_after enabled: + + {"key": "value"} =item $json = $json->canonical ([$enable]) @@ -208,11 +259,11 @@ of the same script). This option is useful if you want the same data structure to be encoded as -the same JSON string (given the same overall settings). If it is disabled, +the same JSON text (given the same overall settings). If it is disabled, the same hash migh be encoded differently even if contains the same data, as key-value pairs have no inherent ordering in Perl. -This setting has no effect when decoding JSON strings. +This setting has no effect when decoding JSON texts. =item $json = $json->allow_nonref ([$enable]) @@ -222,16 +273,22 @@ values instead of croaking. If C<$enable> is false, then the C method will croak if it isn't -passed an arrayref or hashref, as JSON strings must either be an object +passed an arrayref or hashref, as JSON texts must either be an object or array. Likewise, C will croak if given something that is not a JSON object or array. +Example, encode a Perl scalar as JSON value with enabled C, +resulting in an invalid JSON text: + + JSON::XS->new->allow_nonref->encode ("Hello, World!") + => "Hello, World!" + =item $json = $json->shrink ([$enable]) Perl usually over-allocates memory a bit when allocating space for strings. This flag optionally resizes strings generated by either C or C to their minimum size possible. This can save -memory when your JSON strings are either very very long or you have many +memory when your JSON texts are either very very long or you have many short strings. It will also try to downgrade any strings to octet-form if possible: perl stores strings internally either in an encoding called UTF-X or in octet-form. The latter cannot store everything but uses less @@ -247,7 +304,27 @@ strings that look like integers or floats into integers or floats internally (there is no difference on the Perl level), saving space. -=item $json_string = $json->encode ($perl_scalar) +=item $json = $json->max_depth ([$maximum_nesting_depth]) + +Sets the maximum nesting level (default C<8192>) accepted while encoding +or decoding. If the JSON text or Perl data structure has an equal or +higher nesting level then this limit, then the encoder and decoder will +stop and croak at that point. + +Nesting level is defined by number of hash- or arrayrefs that the encoder +needs to traverse to reach a given point or the number of C<{> or C<[> +characters without their matching closing parenthesis crossed to reach a +given character in a string. + +Setting the maximum depth to one disallows any nesting, so that ensures +that the object is only a single hash/object or array. + +The argument to C will be rounded up to the next nearest power +of two. + +See SECURITY CONSIDERATIONS, below, for more info on why this is useful. + +=item $json_text = $json->encode ($perl_scalar) Converts the given Perl data structure (a simple scalar or a reference to a hash or array) to its JSON representation. Simple scalars will be @@ -256,9 +333,9 @@ Perl values (e.g. C) become JSON C values. Neither C nor C values will be generated. -=item $perl_scalar = $json->decode ($json_string) +=item $perl_scalar = $json->decode ($json_text) -The opposite of C: expects a JSON string and tries to parse it, +The opposite of C: expects a JSON text and tries to parse it, returning the resulting simple scalar or reference. Croaks on error. JSON numbers and strings become simple Perl scalars. JSON arrays become @@ -267,6 +344,7 @@ =back + =head1 MAPPING This section describes how JSON::XS maps Perl values to JSON values and @@ -285,7 +363,7 @@ =item object A JSON object becomes a reference to a hash in Perl. No ordering of object -keys is preserved. +keys is preserved (JSON does not preserver object key ordering itself). =item array @@ -331,7 +409,7 @@ Perl hash references become JSON objects. As there is no inherent ordering in hash keys, they will usually be encoded in a pseudo-random order that can change between runs of the same program but stays generally the same -within the single run of a program. JSON::XS can optionally sort the hash +within a single run of a program. JSON::XS can optionally sort the hash keys (determined by the I flag), so the same datastructure will serialise to the same JSON text (given same settings and version of JSON::XS), but this incurs a runtime overhead. @@ -381,8 +459,13 @@ You can not currently output JSON booleans or force the type in other, less obscure, ways. Tell me if you need this capability. +=item circular data structures + +Those will be encoded until memory or stackspace runs out. + =back + =head1 COMPARISON As already mentioned, this module was created because none of the existing @@ -417,7 +500,7 @@ values will make it croak). Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}> -which is not a valid JSON string. +which is not a valid JSON text. Unmaintained (maintainer unresponsive for many months, bugs are not getting fixed). @@ -429,7 +512,7 @@ Very inflexible (no human-readable format supported, format pretty much undocumented. I need at least a format for easy reading by humans and a single-line compact format for use in a protocol, and preferably a way to -generate ASCII-only JSON strings). +generate ASCII-only JSON texts). Completely broken (and confusingly documented) Unicode handling (unicode escapes are not working properly, you need to set ImplicitUnicode to @@ -462,7 +545,7 @@ No roundtripping. -Does not generate valid JSON (key strings are often unquoted, empty keys +Does not generate valid JSON texts (key strings are often unquoted, empty keys result in nothing being output) Does not check input for validity. @@ -476,49 +559,86 @@ in the JSON::XS distribution, to make it easy to compare on your own system. -First is a comparison between various modules using a very simple JSON -string, showing the number of encodes/decodes per second (JSON::XS is -the functional interface, while JSON::XS/2 is the OO interface with -pretty-printing and hashkey sorting enabled). +First comes a comparison between various modules using a very short JSON +string: + + {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null} + +It shows the number of encodes/decodes per second (JSON::XS uses the +functional interface, while JSON::XS/2 uses the OO interface with +pretty-printing and hashkey sorting enabled). Higher is better: module | encode | decode | -----------|------------|------------| - JSON | 14006 | 6820 | - JSON::DWIW | 200937 | 120386 | - JSON::PC | 85065 | 129366 | - JSON::Syck | 59898 | 44232 | - JSON::XS | 1171478 | 342435 | - JSON::XS/2 | 730760 | 328714 | + JSON | 11488.516 | 7823.035 | + JSON::DWIW | 94708.054 | 129094.260 | + JSON::PC | 63884.157 | 128528.212 | + JSON::Syck | 34898.677 | 42096.911 | + JSON::XS | 654027.064 | 396423.669 | + JSON::XS/2 | 371564.190 | 371725.613 | -----------+------------+------------+ -That is, JSON::XS is 6 times faster than than JSON::DWIW and about 80 -times faster than JSON, even with pretty-printing and key sorting. +That is, JSON::XS is more than six times faster than JSON::DWIW on +encoding, more than three times faster on decoding, and about thirty times +faster than JSON, even with pretty-printing and key sorting. -Using a longer test string (roughly 8KB, generated from Yahoo! Locals +Using a longer test string (roughly 18KB, generated from Yahoo! Locals search API (http://nanoref.com/yahooapis/mgPdGg): module | encode | decode | -----------|------------|------------| - JSON | 673 | 38 | - JSON::DWIW | 5271 | 770 | - JSON::PC | 9901 | 2491 | - JSON::Syck | 2360 | 786 | - JSON::XS | 37398 | 3202 | - JSON::XS/2 | 13765 | 3153 | + JSON | 273.023 | 44.674 | + JSON::DWIW | 1089.383 | 1145.704 | + JSON::PC | 3097.419 | 2393.921 | + JSON::Syck | 514.060 | 843.053 | + JSON::XS | 6479.668 | 3636.364 | + JSON::XS/2 | 3774.221 | 3599.124 | -----------+------------+------------+ -Again, JSON::XS leads by far in the encoding case, while still beating -every other module in the decoding case. +Again, JSON::XS leads by far. + +On large strings containing lots of high unicode characters, some modules +(such as JSON::PC) seem to decode faster than JSON::XS, but the result +will be broken due to missing (or wrong) unicode handling. Others refuse +to decode or encode properly, so it was impossible to prepare a fair +comparison table for that case. + + +=head1 SECURITY CONSIDERATIONS + +When you are using JSON in a protocol, talking to untrusted potentially +hostile creatures requires relatively few measures. + +First of all, your JSON decoder should be secure, that is, should not have +any buffer overflows. Obviously, this module should ensure that and I am +trying hard on making that true, but you never know. + +Second, you need to avoid resource-starving attacks. That means you should +limit the size of JSON texts you accept, or make sure then when your +resources run out, thats just fine (e.g. by using a separate process that +can crash safely). The size of a JSON text in octets or characters is +usually a good indication of the size of the resources required to decode +it into a Perl structure. + +Third, JSON::XS recurses using the C stack when decoding objects and +arrays. The C stack is a limited resource: for instance, on my amd64 +machine with 8MB of stack size I can decode around 180k nested arrays +but only 14k nested JSON objects. If that is exceeded, the program +crashes. Thats why the default nesting limit is set to 8192. If your +process has a smaller stack, you should adjust this setting accordingly +with the C method. + +And last but least, something else could bomb you that I forgot to think +of. In that case, you get to keep the pieces. I am alway sopen for hints, +though... -Last example is an almost 8MB large hash with many large binary values -(PNG files), resulting in a lot of escaping: =head1 BUGS While the goal of this module is to be correct, that unfortunately does not mean its bug-free, only that I think its design is bug-free. It is -still very young and not well-tested. If you keep reporting bugs they will -be fixed swiftly, though. +still relatively early in its development. If you keep reporting bugs they +will be fixed swiftly, though. =cut