--- JSON-XS/README 2007/03/22 16:40:16 1.1 +++ JSON-XS/README 2007/03/22 21:13:58 1.2 @@ -1,132 +1,329 @@ NAME - Convert::Scalar - convert between different representations of perl - scalars + JSON::XS - JSON serialising/deserialising, done correctly and fast SYNOPSIS - use Convert::Scalar; + use JSON::XS; DESCRIPTION - This module exports various internal perl methods that change the - internal representation or state of a perl scalar. All of these work - in-place, that is, they modify their scalar argument. No functions are - exported by default. - - The following export tags exist: - - :utf8 all functions with utf8 in their name - :taint all functions with taint in their name - :refcnt all functions with refcnt in their name - :ok all *ok-functions. - - utf8 scalar[, mode] - Returns true when the given scalar is marked as utf8, false - otherwise. If the optional mode argument is given, also forces the - interpretation of the string to utf8 (mode true) or plain bytes - (mode false). The actual (byte-) content is not changed. The return - value always reflects the state before any modification is done. - - This function is useful when you "import" utf8-data into perl, or - when some external function (e.g. storing/retrieving from a - database) removes the utf8-flag. - - utf8_on scalar - Similar to "utf8 scalar, 1", but additionally returns the scalar - (the argument is still modified in-place). - - utf8_off scalar - Similar to "utf8 scalar, 0", but additionally returns the scalar - (the argument is still modified in-place). - - utf8_valid scalar [Perl 5.7] - Returns true if the bytes inside the scalar form a valid utf8 - string, false otherwise (the check is independent of the actual - encoding perl thinks the string is in). - - utf8_upgrade scalar - Convert the string content of the scalar in-place to its - UTF8-encoded form (and also returns it). - - utf8_downgrade scalar[, fail_ok=0] - Attempt to convert the string content of the scalar from - UTF8-encoded to ISO-8859-1. This may not be possible if the string - contains characters that cannot be represented in a single byte; if - this is the case, it leaves the scalar unchanged and either returns - false or, if "fail_ok" is not true (the default), croaks. - - utf8_encode scalar - Convert the string value of the scalar to UTF8-encoded, but then - turn off the "SvUTF8" flag so that it looks like bytes to perl - again. (Might be removed in future versions). - - utf8_length scalar - Returns the number of characters in the string, counting wide UTF8 - characters as a single character, independent of wether the scalar - is marked as containing bytes or mulitbyte characters. - - unmagic scalar, type - Remove the specified magic from the scalar (DANGEROUS!). - - weaken scalar - Weaken a reference. (See also WeakRef). - - taint scalar - Taint the scalar. - - tainted scalar - returns true when the scalar is tainted, false otherwise. - - untaint scalar - Remove the tainted flag from the specified scalar. - - grow scalar, newlen - Sets the memory area used for the scalar to the given length, if the - current length is less than the new value. This does not affect the - contents of the scalar, but is only useful to "pre-allocate" memory - space if you know the scalar will grow. The return value is the - modified scalar (the scalar is modified in-place). - - refcnt scalar[, newrefcnt] - Returns the current reference count of the given scalar and - optionally sets it to the given reference count. - - refcnt_inc scalar - Increments the reference count of the given scalar inplace. - - refcnt_dec scalar - Decrements the reference count of the given scalar inplace. Use - "weaken" instead if you understand what this function is fore. - Better yet: don't use this module in this case. - - refcnt_rv scalar[, newrefcnt] - Works like "refcnt", but dereferences the given reference first. - This is useful to find the reference count of arrays or hashes, - which cnanot be passed directly. Remember that taking a reference of - some object increases it's reference count, so the reference count - used by the *_rv-functions tend to be one higher. - - refcnt_inc_rv scalar - Works like "refcnt_inc", but dereferences the given reference first. - - refcnt_dec_rv scalar - Works like "refcnt_dec", but dereferences the given reference first. - - ok scalar - uok scalar - rok scalar - pok scalar - nok scalar - niok scalar - Calls SvOK, SvUOK, SvROK, SvPOK, SvNOK or SvNIOK on the given - scalar, respectively. - - CANDIDATES FOR FUTURE RELEASES - The following API functions (perlapi) are considered for future - inclusion in this module If you want them, write me. - - sv_upgrade - sv_pvn_force - sv_pvutf8n_force - the sv2xx family + This module converts Perl data structures to JSON and vice versa. Its + primary goal is to be *correct* and its secondary goal is to be *fast*. + To reach the latter goal it was written in C. + + As this is the n-th-something JSON module on CPAN, what was the reason + to write yet another JSON module? While it seems there are many JSON + modules, none of them correctly handle all corner cases, and in most + cases their maintainers are unresponsive, gone missing, or not listening + to bug reports for other reasons. + + See COMPARISON, below, for a comparison to some other JSON modules. + + FEATURES + * correct handling of unicode issues + This module knows how to handle Unicode, and even documents how it + does so. + + * round-trip integrity + When you serialise a perl data structure using only datatypes + supported by JSON, the deserialised data structure is identical on + the Perl level. (e.g. the string "2.0" doesn't suddenly become "2"). + + * strict checking of JSON correctness + There is no guessing, no generating of illegal JSON strings by + default, and only JSON is accepted as input (the latter is a + security feature). + + * fast + compared to other JSON modules, this module compares favourably. + + * simple to use + This module has both a simple functional interface as well as an OO + interface. + + * reasonably versatile output formats + You can choose between the most compact format possible, a + pure-ascii format, or a pretty-printed format. Or you can combine + those features in whatever way you like. + +FUNCTIONAL INTERFACE + The following convinience methods are provided by this module. They are + exported by default: + + $json_string = to_json $perl_scalar + Converts the given Perl data structure (a simple scalar or a + reference to a hash or array) to a UTF-8 encoded, binary string + (that is, the string contains octets only). Croaks on error. + + This function call is functionally identical to "JSON::XS->new->utf8 + (1)->encode ($perl_scalar)". + + $perl_scalar = from_json $json_string + The opposite of "to_json": expects an UTF-8 (binary) string and + tries to parse that as an UTF-8 encoded JSON string, returning the + resulting simple scalar or reference. Croaks on error. + + This function call is functionally identical to "JSON::XS->new->utf8 + (1)->decode ($json_string)". + +OBJECT-ORIENTED INTERFACE + The object oriented interface lets you configure your own encoding or + decoding style, within the limits of supported formats. + + $json = new JSON::XS + Creates a new JSON::XS object that can be used to de/encode JSON + strings. All boolean flags described below are by default + *disabled*. + + The mutators for flags all return the JSON object again and thus + calls can be chained: + + my $json = JSON::XS->new->utf8(1)->space_after(1)->encode ({a => [1,2]}) + => {"a": [1, 2]} + + $json = $json->ascii ($enable) + If $enable is true, then the "encode" method will not generate + characters outside the code range 0..127. Any unicode characters + outside that range will be escaped using either a single \uXXXX (BMP + characters) or a double \uHHHH\uLLLLL escape sequence, as per + RFC4627. + + If $enable is false, then the "encode" method will not escape + Unicode characters unless necessary. + + JSON::XS->new->ascii (1)->encode (chr 0x10401) + => \ud801\udc01 + + $json = $json->utf8 ($enable) + If $enable is true, then the "encode" method will encode the JSON + string into UTF-8, as required by many protocols, while the "decode" + method expects to be handled an UTF-8-encoded string. Please note + that UTF-8-encoded strings do not contain any characters outside the + range 0..255, they are thus useful for bytewise/binary I/O. + + If $enable is false, then the "encode" method will return the JSON + string as a (non-encoded) unicode string, while "decode" expects + thus a unicode string. Any decoding or encoding (e.g. to UTF-8 or + UTF-16) needs to be done yourself, e.g. using the Encode module. + + $json = $json->pretty ($enable) + This enables (or disables) all of the "indent", "space_before" and + "space_after" (and in the future possibly more) flags in one call to + generate the most readable (or most compact) form possible. + + my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]}) + => + { + "a" : [ + 1, + 2 + ] + } + + $json = $json->indent ($enable) + If $enable is true, then the "encode" method will use a multiline + format as output, putting every array member or object/hash + key-value pair into its own line, identing them properly. + + If $enable is false, no newlines or indenting will be produced, and + the resulting JSON strings is guarenteed not to contain any + "newlines". + + This setting has no effect when decoding JSON strings. + + $json = $json->space_before ($enable) + If $enable is true, then the "encode" method will add an extra + optional space before the ":" separating keys from values in JSON + objects. + + If $enable is false, then the "encode" method will not add any extra + space at those places. + + This setting has no effect when decoding JSON strings. You will also + most likely combine this setting with "space_after". + + $json = $json->space_after ($enable) + If $enable is true, then the "encode" method will add an extra + optional space after the ":" separating keys from values in JSON + objects and extra whitespace after the "," separating key-value + pairs and array members. + + If $enable is false, then the "encode" method will not add any extra + space at those places. + + This setting has no effect when decoding JSON strings. + + $json = $json->canonical ($enable) + If $enable is true, then the "encode" method will output JSON + objects by sorting their keys. This is adding a comparatively high + overhead. + + If $enable is false, then the "encode" method will output key-value + pairs in the order Perl stores them (which will likely change + between runs of the same script). + + This option is useful if you want the same data structure to be + encoded as the same JSON string (given the same overall settings). + If it is disabled, the same hash migh be encoded differently even if + contains the same data, as key-value pairs have no inherent ordering + in Perl. + + This setting has no effect when decoding JSON strings. + + $json = $json->allow_nonref ($enable) + If $enable is true, then the "encode" method can convert a + non-reference into its corresponding string, number or null JSON + value, which is an extension to RFC4627. Likewise, "decode" will + accept those JSON values instead of croaking. + + If $enable is false, then the "encode" method will croak if it isn't + passed an arrayref or hashref, as JSON strings must either be an + object or array. Likewise, "decode" will croak if given something + that is not a JSON object or array. + + $json_string = $json->encode ($perl_scalar) + Converts the given Perl data structure (a simple scalar or a + reference to a hash or array) to its JSON representation. Simple + scalars will be converted into JSON string or number sequences, + while references to arrays become JSON arrays and references to + hashes become JSON objects. Undefined Perl values (e.g. "undef") + become JSON "null" values. Neither "true" nor "false" values will be + generated. + + $perl_scalar = $json->decode ($json_string) + The opposite of "encode": expects a JSON string and tries to parse + it, returning the resulting simple scalar or reference. Croaks on + error. + + JSON numbers and strings become simple Perl scalars. JSON arrays + become Perl arrayrefs and JSON objects become Perl hashrefs. "true" + becomes 1, "false" becomes 0 and "null" becomes "undef". + +COMPARISON + As already mentioned, this module was created because none of the + existing JSON modules could be made to work correctly. First I will + describe the problems (or pleasures) I encountered with various existing + JSON modules, followed by some benchmark values. JSON::XS was designed + not to suffer from any of these problems or limitations. + + JSON + Slow (but very portable, as it is written in pure Perl). + + Undocumented/buggy Unicode handling (how JSON handles unicode values + is undocumented. One can get far by feeding it unicode strings and + doing en-/decoding oneself, but unicode escapes are not working + properly). + + No roundtripping (strings get clobbered if they look like numbers, + e.g. the string 2.0 will encode to 2.0 instead of "2.0", and that + will decode into the number 2. + + JSON::PC + Very fast. + + Undocumented/buggy Unicode handling. + + No roundtripping. + + Has problems handling many Perl values (e.g. regex results and other + magic values will make it croak). + + Does not even generate valid JSON ("{1,2}" gets converted to "{1:2}" + which is not a valid JSON string. + + Unmaintained (maintainer unresponsive for many months, bugs are not + getting fixed). + + JSON::Syck + Very buggy (often crashes). + + Very inflexible (no human-readable format supported, format pretty + much undocumented. I need at least a format for easy reading by + humans and a single-line compact format for use in a protocol, and + preferably a way to generate ASCII-only JSON strings). + + Completely broken (and confusingly documented) Unicode handling + (unicode escapes are not working properly, you need to set + ImplicitUnicode to *different* values on en- and decoding to get + symmetric behaviour). + + No roundtripping (simple cases work, but this depends on wether the + scalar value was used in a numeric context or not). + + Dumping hashes may skip hash values depending on iterator state. + + Unmaintained (maintainer unresponsive for many months, bugs are not + getting fixed). + + Does not check input for validity (i.e. will accept non-JSON input + and return "something" instead of raising an exception. This is a + security issue: imagine two banks transfering money between each + other using JSON. One bank might parse a given non-JSON request and + deduct money, while the other might reject the transaction with a + syntax error. While a good protocol will at least recover, that is + extra unnecessary work and the transaction will still not succeed). + + JSON::DWIW + Very fast. Very natural. Very nice. + + Undocumented unicode handling (but the best of the pack. Unicode + escapes still don't get parsed properly). + + Very inflexible. + + No roundtripping. + + Does not generate valid JSON (key strings are often unquoted, empty + keys result in nothing being output) + + Does not check input for validity. + + SPEED + It seems that JSON::XS is surprisingly fast, as shown in the following + tables. They have been generated with the help of the "eg/bench" program + in the JSON::XS distribution, to make it easy to compare on your own + system. + + First is a comparison between various modules using a very simple JSON + string, showing the number of encodes/decodes per second (JSON::XS is + the functional interface, while JSON::XS/2 is the OO interface with + pretty-printing and hashkey sorting enabled). + + module | encode | decode | + -----------|------------|------------| + JSON | 14006 | 6820 | + JSON::DWIW | 200937 | 120386 | + JSON::PC | 85065 | 129366 | + JSON::Syck | 59898 | 44232 | + JSON::XS | 1171478 | 342435 | + JSON::XS/2 | 730760 | 328714 | + -----------+------------+------------+ + + That is, JSON::XS is 6 times faster than than JSON::DWIW and about 80 + times faster than JSON, even with pretty-printing and key sorting. + + Using a longer test string (roughly 8KB, generated from Yahoo! Locals + search API (http://nanoref.com/yahooapis/mgPdGg): + + module | encode | decode | + -----------|------------|------------| + JSON | 673 | 38 | + JSON::DWIW | 5271 | 770 | + JSON::PC | 9901 | 2491 | + JSON::Syck | 2360 | 786 | + JSON::XS | 37398 | 3202 | + JSON::XS/2 | 13765 | 3153 | + -----------+------------+------------+ + + Again, JSON::XS leads by far in the encoding case, while still beating + every other module in the decoding case. + + Last example is an almost 8MB large hash with many large binary values + (PNG files), resulting in a lot of escaping: + +BUGS + While the goal of this module is to be correct, that unfortunately does + not mean its bug-free, only that I think its design is bug-free. It is + still very young and not well-tested. If you keep reporting bugs they + will be fixed swiftly, though. AUTHOR Marc Lehmann