--- JSON-XS/README 2007/03/22 16:40:16 1.1 +++ JSON-XS/README 2007/04/04 00:01:44 1.10 @@ -1,132 +1,580 @@ NAME - Convert::Scalar - convert between different representations of perl - scalars + JSON::XS - JSON serialising/deserialising, done correctly and fast SYNOPSIS - use Convert::Scalar; + use JSON::XS; + + # exported functions, they croak on error + # and expect/generate UTF-8 + + $utf8_encoded_json_text = to_json $perl_hash_or_arrayref; + $perl_hash_or_arrayref = from_json $utf8_encoded_json_text; + + # objToJson and jsonToObj aliases to to_json and from_json + # are exported for compatibility to the JSON module, + # but should not be used in new code. + + # OO-interface + + $coder = JSON::XS->new->ascii->pretty->allow_nonref; + $pretty_printed_unencoded = $coder->encode ($perl_scalar); + $perl_scalar = $coder->decode ($unicode_json_text); DESCRIPTION - This module exports various internal perl methods that change the - internal representation or state of a perl scalar. All of these work - in-place, that is, they modify their scalar argument. No functions are - exported by default. - - The following export tags exist: - - :utf8 all functions with utf8 in their name - :taint all functions with taint in their name - :refcnt all functions with refcnt in their name - :ok all *ok-functions. - - utf8 scalar[, mode] - Returns true when the given scalar is marked as utf8, false - otherwise. If the optional mode argument is given, also forces the - interpretation of the string to utf8 (mode true) or plain bytes - (mode false). The actual (byte-) content is not changed. The return - value always reflects the state before any modification is done. - - This function is useful when you "import" utf8-data into perl, or - when some external function (e.g. storing/retrieving from a - database) removes the utf8-flag. - - utf8_on scalar - Similar to "utf8 scalar, 1", but additionally returns the scalar - (the argument is still modified in-place). - - utf8_off scalar - Similar to "utf8 scalar, 0", but additionally returns the scalar - (the argument is still modified in-place). - - utf8_valid scalar [Perl 5.7] - Returns true if the bytes inside the scalar form a valid utf8 - string, false otherwise (the check is independent of the actual - encoding perl thinks the string is in). - - utf8_upgrade scalar - Convert the string content of the scalar in-place to its - UTF8-encoded form (and also returns it). - - utf8_downgrade scalar[, fail_ok=0] - Attempt to convert the string content of the scalar from - UTF8-encoded to ISO-8859-1. This may not be possible if the string - contains characters that cannot be represented in a single byte; if - this is the case, it leaves the scalar unchanged and either returns - false or, if "fail_ok" is not true (the default), croaks. - - utf8_encode scalar - Convert the string value of the scalar to UTF8-encoded, but then - turn off the "SvUTF8" flag so that it looks like bytes to perl - again. (Might be removed in future versions). - - utf8_length scalar - Returns the number of characters in the string, counting wide UTF8 - characters as a single character, independent of wether the scalar - is marked as containing bytes or mulitbyte characters. - - unmagic scalar, type - Remove the specified magic from the scalar (DANGEROUS!). - - weaken scalar - Weaken a reference. (See also WeakRef). - - taint scalar - Taint the scalar. - - tainted scalar - returns true when the scalar is tainted, false otherwise. - - untaint scalar - Remove the tainted flag from the specified scalar. - - grow scalar, newlen - Sets the memory area used for the scalar to the given length, if the - current length is less than the new value. This does not affect the - contents of the scalar, but is only useful to "pre-allocate" memory - space if you know the scalar will grow. The return value is the - modified scalar (the scalar is modified in-place). - - refcnt scalar[, newrefcnt] - Returns the current reference count of the given scalar and - optionally sets it to the given reference count. - - refcnt_inc scalar - Increments the reference count of the given scalar inplace. - - refcnt_dec scalar - Decrements the reference count of the given scalar inplace. Use - "weaken" instead if you understand what this function is fore. - Better yet: don't use this module in this case. - - refcnt_rv scalar[, newrefcnt] - Works like "refcnt", but dereferences the given reference first. - This is useful to find the reference count of arrays or hashes, - which cnanot be passed directly. Remember that taking a reference of - some object increases it's reference count, so the reference count - used by the *_rv-functions tend to be one higher. - - refcnt_inc_rv scalar - Works like "refcnt_inc", but dereferences the given reference first. - - refcnt_dec_rv scalar - Works like "refcnt_dec", but dereferences the given reference first. - - ok scalar - uok scalar - rok scalar - pok scalar - nok scalar - niok scalar - Calls SvOK, SvUOK, SvROK, SvPOK, SvNOK or SvNIOK on the given - scalar, respectively. - - CANDIDATES FOR FUTURE RELEASES - The following API functions (perlapi) are considered for future - inclusion in this module If you want them, write me. - - sv_upgrade - sv_pvn_force - sv_pvutf8n_force - the sv2xx family + This module converts Perl data structures to JSON and vice versa. Its + primary goal is to be *correct* and its secondary goal is to be *fast*. + To reach the latter goal it was written in C. + + As this is the n-th-something JSON module on CPAN, what was the reason + to write yet another JSON module? While it seems there are many JSON + modules, none of them correctly handle all corner cases, and in most + cases their maintainers are unresponsive, gone missing, or not listening + to bug reports for other reasons. + + See COMPARISON, below, for a comparison to some other JSON modules. + + See MAPPING, below, on how JSON::XS maps perl values to JSON values and + vice versa. + + FEATURES + * correct unicode handling + This module knows how to handle Unicode, and even documents how and + when it does so. + + * round-trip integrity + When you serialise a perl data structure using only datatypes + supported by JSON, the deserialised data structure is identical on + the Perl level. (e.g. the string "2.0" doesn't suddenly become "2" + just because it looks like a number). + + * strict checking of JSON correctness + There is no guessing, no generating of illegal JSON texts by + default, and only JSON is accepted as input by default (the latter + is a security feature). + + * fast + Compared to other JSON modules, this module compares favourably in + terms of speed, too. + + * simple to use + This module has both a simple functional interface as well as an OO + interface. + + * reasonably versatile output formats + You can choose between the most compact guarenteed single-line + format possible (nice for simple line-based protocols), a pure-ascii + format (for when your transport is not 8-bit clean, still supports + the whole unicode range), or a pretty-printed format (for when you + want to read that stuff). Or you can combine those features in + whatever way you like. + +FUNCTIONAL INTERFACE + The following convinience methods are provided by this module. They are + exported by default: + + $json_text = to_json $perl_scalar + Converts the given Perl data structure (a simple scalar or a + reference to a hash or array) to a UTF-8 encoded, binary string + (that is, the string contains octets only). Croaks on error. + + This function call is functionally identical to: + + $json_text = JSON::XS->new->utf8->encode ($perl_scalar) + + except being faster. + + $perl_scalar = from_json $json_text + The opposite of "to_json": expects an UTF-8 (binary) string and + tries to parse that as an UTF-8 encoded JSON text, returning the + resulting simple scalar or reference. Croaks on error. + + This function call is functionally identical to: + + $perl_scalar = JSON::XS->new->utf8->decode ($json_text) + + except being faster. + +OBJECT-ORIENTED INTERFACE + The object oriented interface lets you configure your own encoding or + decoding style, within the limits of supported formats. + + $json = new JSON::XS + Creates a new JSON::XS object that can be used to de/encode JSON + strings. All boolean flags described below are by default + *disabled*. + + The mutators for flags all return the JSON object again and thus + calls can be chained: + + my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]}) + => {"a": [1, 2]} + + $json = $json->ascii ([$enable]) + If $enable is true (or missing), then the "encode" method will not + generate characters outside the code range 0..127 (which is ASCII). + Any unicode characters outside that range will be escaped using + either a single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL + escape sequence, as per RFC4627. + + If $enable is false, then the "encode" method will not escape + Unicode characters unless required by the JSON syntax. This results + in a faster and more compact format. + + JSON::XS->new->ascii (1)->encode ([chr 0x10401]) + => ["\ud801\udc01"] + + $json = $json->utf8 ([$enable]) + If $enable is true (or missing), then the "encode" method will + encode the JSON result into UTF-8, as required by many protocols, + while the "decode" method expects to be handled an UTF-8-encoded + string. Please note that UTF-8-encoded strings do not contain any + characters outside the range 0..255, they are thus useful for + bytewise/binary I/O. In future versions, enabling this option might + enable autodetection of the UTF-16 and UTF-32 encoding families, as + described in RFC4627. + + If $enable is false, then the "encode" method will return the JSON + string as a (non-encoded) unicode string, while "decode" expects + thus a unicode string. Any decoding or encoding (e.g. to UTF-8 or + UTF-16) needs to be done yourself, e.g. using the Encode module. + + Example, output UTF-16BE-encoded JSON: + + use Encode; + $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object); + + Example, decode UTF-32LE-encoded JSON: + + use Encode; + $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext); + + $json = $json->pretty ([$enable]) + This enables (or disables) all of the "indent", "space_before" and + "space_after" (and in the future possibly more) flags in one call to + generate the most readable (or most compact) form possible. + + Example, pretty-print some simple structure: + + my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]}) + => + { + "a" : [ + 1, + 2 + ] + } + + $json = $json->indent ([$enable]) + If $enable is true (or missing), then the "encode" method will use a + multiline format as output, putting every array member or + object/hash key-value pair into its own line, identing them + properly. + + If $enable is false, no newlines or indenting will be produced, and + the resulting JSON text is guarenteed not to contain any "newlines". + + This setting has no effect when decoding JSON texts. + + $json = $json->space_before ([$enable]) + If $enable is true (or missing), then the "encode" method will add + an extra optional space before the ":" separating keys from values + in JSON objects. + + If $enable is false, then the "encode" method will not add any extra + space at those places. + + This setting has no effect when decoding JSON texts. You will also + most likely combine this setting with "space_after". + + Example, space_before enabled, space_after and indent disabled: + + {"key" :"value"} + + $json = $json->space_after ([$enable]) + If $enable is true (or missing), then the "encode" method will add + an extra optional space after the ":" separating keys from values in + JSON objects and extra whitespace after the "," separating key-value + pairs and array members. + + If $enable is false, then the "encode" method will not add any extra + space at those places. + + This setting has no effect when decoding JSON texts. + + Example, space_before and indent disabled, space_after enabled: + + {"key": "value"} + + $json = $json->canonical ([$enable]) + If $enable is true (or missing), then the "encode" method will + output JSON objects by sorting their keys. This is adding a + comparatively high overhead. + + If $enable is false, then the "encode" method will output key-value + pairs in the order Perl stores them (which will likely change + between runs of the same script). + + This option is useful if you want the same data structure to be + encoded as the same JSON text (given the same overall settings). If + it is disabled, the same hash migh be encoded differently even if + contains the same data, as key-value pairs have no inherent ordering + in Perl. + + This setting has no effect when decoding JSON texts. + + $json = $json->allow_nonref ([$enable]) + If $enable is true (or missing), then the "encode" method can + convert a non-reference into its corresponding string, number or + null JSON value, which is an extension to RFC4627. Likewise, + "decode" will accept those JSON values instead of croaking. + + If $enable is false, then the "encode" method will croak if it isn't + passed an arrayref or hashref, as JSON texts must either be an + object or array. Likewise, "decode" will croak if given something + that is not a JSON object or array. + + Example, encode a Perl scalar as JSON value with enabled + "allow_nonref", resulting in an invalid JSON text: + + JSON::XS->new->allow_nonref->encode ("Hello, World!") + => "Hello, World!" + + $json = $json->shrink ([$enable]) + Perl usually over-allocates memory a bit when allocating space for + strings. This flag optionally resizes strings generated by either + "encode" or "decode" to their minimum size possible. This can save + memory when your JSON texts are either very very long or you have + many short strings. It will also try to downgrade any strings to + octet-form if possible: perl stores strings internally either in an + encoding called UTF-X or in octet-form. The latter cannot store + everything but uses less space in general (and some buggy Perl or C + code might even rely on that internal representation being used). + + The actual definition of what shrink does might change in future + versions, but it will always try to save space at the expense of + time. + + If $enable is true (or missing), the string returned by "encode" + will be shrunk-to-fit, while all strings generated by "decode" will + also be shrunk-to-fit. + + If $enable is false, then the normal perl allocation algorithms are + used. If you work with your data, then this is likely to be faster. + + In the future, this setting might control other things, such as + converting strings that look like integers or floats into integers + or floats internally (there is no difference on the Perl level), + saving space. + + $json = $json->max_depth ([$maximum_nesting_depth]) + Sets the maximum nesting level (default 512) accepted while encoding + or decoding. If the JSON text or Perl data structure has an equal or + higher nesting level then this limit, then the encoder and decoder + will stop and croak at that point. + + Nesting level is defined by number of hash- or arrayrefs that the + encoder needs to traverse to reach a given point or the number of + "{" or "[" characters without their matching closing parenthesis + crossed to reach a given character in a string. + + Setting the maximum depth to one disallows any nesting, so that + ensures that the object is only a single hash/object or array. + + The argument to "max_depth" will be rounded up to the next nearest + power of two. + + See SECURITY CONSIDERATIONS, below, for more info on why this is + useful. + + $json_text = $json->encode ($perl_scalar) + Converts the given Perl data structure (a simple scalar or a + reference to a hash or array) to its JSON representation. Simple + scalars will be converted into JSON string or number sequences, + while references to arrays become JSON arrays and references to + hashes become JSON objects. Undefined Perl values (e.g. "undef") + become JSON "null" values. Neither "true" nor "false" values will be + generated. + + $perl_scalar = $json->decode ($json_text) + The opposite of "encode": expects a JSON text and tries to parse it, + returning the resulting simple scalar or reference. Croaks on error. + + JSON numbers and strings become simple Perl scalars. JSON arrays + become Perl arrayrefs and JSON objects become Perl hashrefs. "true" + becomes 1, "false" becomes 0 and "null" becomes "undef". + +MAPPING + This section describes how JSON::XS maps Perl values to JSON values and + vice versa. These mappings are designed to "do the right thing" in most + circumstances automatically, preserving round-tripping characteristics + (what you put in comes out as something equivalent). + + For the more enlightened: note that in the following descriptions, + lowercase *perl* refers to the Perl interpreter, while uppcercase *Perl* + refers to the abstract Perl language itself. + + JSON -> PERL + object + A JSON object becomes a reference to a hash in Perl. No ordering of + object keys is preserved (JSON does not preserver object key + ordering itself). + + array + A JSON array becomes a reference to an array in Perl. + + string + A JSON string becomes a string scalar in Perl - Unicode codepoints + in JSON are represented by the same codepoints in the Perl string, + so no manual decoding is necessary. + + number + A JSON number becomes either an integer or numeric (floating point) + scalar in perl, depending on its range and any fractional parts. On + the Perl level, there is no difference between those as Perl handles + all the conversion details, but an integer may take slightly less + memory and might represent more values exactly than (floating point) + numbers. + + true, false + These JSON atoms become 0, 1, respectively. Information is lost in + this process. Future versions might represent those values + differently, but they will be guarenteed to act like these integers + would normally in Perl. + + null + A JSON null atom becomes "undef" in Perl. + + PERL -> JSON + The mapping from Perl to JSON is slightly more difficult, as Perl is a + truly typeless language, so we can only guess which JSON type is meant + by a Perl value. + + hash references + Perl hash references become JSON objects. As there is no inherent + ordering in hash keys (or JSON objects), they will usually be + encoded in a pseudo-random order that can change between runs of the + same program but stays generally the same within a single run of a + program. JSON::XS can optionally sort the hash keys (determined by + the *canonical* flag), so the same datastructure will serialise to + the same JSON text (given same settings and version of JSON::XS), + but this incurs a runtime overhead and is only rarely useful, e.g. + when you want to compare some JSON text against another for + equality. + + array references + Perl array references become JSON arrays. + + other references + Other unblessed references are generally not allowed and will cause + an exception to be thrown, except for references to the integers 0 + and 1, which get turned into "false" and "true" atoms in JSON. You + can also use "JSON::XS::false" and "JSON::XS::true" to improve + readability. + + to_json [\0,JSON::XS::true] # yields [false,true] + + blessed objects + Blessed objects are not allowed. JSON::XS currently tries to encode + their underlying representation (hash- or arrayref), but this + behaviour might change in future versions. + + simple scalars + Simple Perl scalars (any scalar that is not a reference) are the + most difficult objects to encode: JSON::XS will encode undefined + scalars as JSON null value, scalars that have last been used in a + string context before encoding as JSON strings and anything else as + number value: + + # dump as number + to_json [2] # yields [2] + to_json [-3.0e17] # yields [-3e+17] + my $value = 5; to_json [$value] # yields [5] + + # used as string, so dump as string + print $value; + to_json [$value] # yields ["5"] + + # undef becomes null + to_json [undef] # yields [null] + + You can force the type to be a string by stringifying it: + + my $x = 3.1; # some variable containing a number + "$x"; # stringified + $x .= ""; # another, more awkward way to stringify + print $x; # perl does it for you, too, quite often + + You can force the type to be a number by numifying it: + + my $x = "3"; # some variable containing a string + $x += 0; # numify it, ensuring it will be dumped as a number + $x *= 1; # same thing, the choise is yours. + + You can not currently output JSON booleans or force the type in + other, less obscure, ways. Tell me if you need this capability. + +COMPARISON + As already mentioned, this module was created because none of the + existing JSON modules could be made to work correctly. First I will + describe the problems (or pleasures) I encountered with various existing + JSON modules, followed by some benchmark values. JSON::XS was designed + not to suffer from any of these problems or limitations. + + JSON 1.07 + Slow (but very portable, as it is written in pure Perl). + + Undocumented/buggy Unicode handling (how JSON handles unicode values + is undocumented. One can get far by feeding it unicode strings and + doing en-/decoding oneself, but unicode escapes are not working + properly). + + No roundtripping (strings get clobbered if they look like numbers, + e.g. the string 2.0 will encode to 2.0 instead of "2.0", and that + will decode into the number 2. + + JSON::PC 0.01 + Very fast. + + Undocumented/buggy Unicode handling. + + No roundtripping. + + Has problems handling many Perl values (e.g. regex results and other + magic values will make it croak). + + Does not even generate valid JSON ("{1,2}" gets converted to "{1:2}" + which is not a valid JSON text. + + Unmaintained (maintainer unresponsive for many months, bugs are not + getting fixed). + + JSON::Syck 0.21 + Very buggy (often crashes). + + Very inflexible (no human-readable format supported, format pretty + much undocumented. I need at least a format for easy reading by + humans and a single-line compact format for use in a protocol, and + preferably a way to generate ASCII-only JSON texts). + + Completely broken (and confusingly documented) Unicode handling + (unicode escapes are not working properly, you need to set + ImplicitUnicode to *different* values on en- and decoding to get + symmetric behaviour). + + No roundtripping (simple cases work, but this depends on wether the + scalar value was used in a numeric context or not). + + Dumping hashes may skip hash values depending on iterator state. + + Unmaintained (maintainer unresponsive for many months, bugs are not + getting fixed). + + Does not check input for validity (i.e. will accept non-JSON input + and return "something" instead of raising an exception. This is a + security issue: imagine two banks transfering money between each + other using JSON. One bank might parse a given non-JSON request and + deduct money, while the other might reject the transaction with a + syntax error. While a good protocol will at least recover, that is + extra unnecessary work and the transaction will still not succeed). + + JSON::DWIW 0.04 + Very fast. Very natural. Very nice. + + Undocumented unicode handling (but the best of the pack. Unicode + escapes still don't get parsed properly). + + Very inflexible. + + No roundtripping. + + Does not generate valid JSON texts (key strings are often unquoted, + empty keys result in nothing being output) + + Does not check input for validity. + + SPEED + It seems that JSON::XS is surprisingly fast, as shown in the following + tables. They have been generated with the help of the "eg/bench" program + in the JSON::XS distribution, to make it easy to compare on your own + system. + + First comes a comparison between various modules using a very short JSON + string: + + {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null} + + It shows the number of encodes/decodes per second (JSON::XS uses the + functional interface, while JSON::XS/2 uses the OO interface with + pretty-printing and hashkey sorting enabled). Higher is better: + + module | encode | decode | + -----------|------------|------------| + JSON | 11488.516 | 7823.035 | + JSON::DWIW | 94708.054 | 129094.260 | + JSON::PC | 63884.157 | 128528.212 | + JSON::Syck | 34898.677 | 42096.911 | + JSON::XS | 654027.064 | 396423.669 | + JSON::XS/2 | 371564.190 | 371725.613 | + -----------+------------+------------+ + + That is, JSON::XS is more than six times faster than JSON::DWIW on + encoding, more than three times faster on decoding, and about thirty + times faster than JSON, even with pretty-printing and key sorting. + + Using a longer test string (roughly 18KB, generated from Yahoo! Locals + search API (http://nanoref.com/yahooapis/mgPdGg): + + module | encode | decode | + -----------|------------|------------| + JSON | 273.023 | 44.674 | + JSON::DWIW | 1089.383 | 1145.704 | + JSON::PC | 3097.419 | 2393.921 | + JSON::Syck | 514.060 | 843.053 | + JSON::XS | 6479.668 | 3636.364 | + JSON::XS/2 | 3774.221 | 3599.124 | + -----------+------------+------------+ + + Again, JSON::XS leads by far. + + On large strings containing lots of high unicode characters, some + modules (such as JSON::PC) seem to decode faster than JSON::XS, but the + result will be broken due to missing (or wrong) unicode handling. Others + refuse to decode or encode properly, so it was impossible to prepare a + fair comparison table for that case. + +SECURITY CONSIDERATIONS + When you are using JSON in a protocol, talking to untrusted potentially + hostile creatures requires relatively few measures. + + First of all, your JSON decoder should be secure, that is, should not + have any buffer overflows. Obviously, this module should ensure that and + I am trying hard on making that true, but you never know. + + Second, you need to avoid resource-starving attacks. That means you + should limit the size of JSON texts you accept, or make sure then when + your resources run out, thats just fine (e.g. by using a separate + process that can crash safely). The size of a JSON text in octets or + characters is usually a good indication of the size of the resources + required to decode it into a Perl structure. + + Third, JSON::XS recurses using the C stack when decoding objects and + arrays. The C stack is a limited resource: for instance, on my amd64 + machine with 8MB of stack size I can decode around 180k nested arrays + but only 14k nested JSON objects (due to perl itself recursing deeply on + croak to free the temporary). If that is exceeded, the program crashes. + to be conservative, the default nesting limit is set to 512. If your + process has a smaller stack, you should adjust this setting accordingly + with the "max_depth" method. + + And last but least, something else could bomb you that I forgot to think + of. In that case, you get to keep the pieces. I am alway sopen for + hints, though... + +BUGS + While the goal of this module is to be correct, that unfortunately does + not mean its bug-free, only that I think its design is bug-free. It is + still relatively early in its development. If you keep reporting bugs + they will be fixed swiftly, though. AUTHOR Marc Lehmann