ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/README
Revision: 1.29
Committed: Thu Feb 19 01:13:46 2009 UTC (15 years, 3 months ago) by root
Branch: MAIN
CVS Tags: rel-2_2311, rel-2_232, rel-2_24
Changes since 1.28: +68 -2 lines
Log Message:
2.311

File Contents

# User Rev Content
1 root 1.1 NAME
2 root 1.2 JSON::XS - JSON serialising/deserialising, done correctly and fast
3 root 1.1
4 root 1.23 JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
5 root 1.19 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
6    
7 root 1.1 SYNOPSIS
8 root 1.2 use JSON::XS;
9 root 1.1
10 root 1.8 # exported functions, they croak on error
11     # and expect/generate UTF-8
12 root 1.4
13 root 1.22 $utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
14     $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
15 root 1.4
16 root 1.8 # OO-interface
17 root 1.4
18     $coder = JSON::XS->new->ascii->pretty->allow_nonref;
19     $pretty_printed_unencoded = $coder->encode ($perl_scalar);
20     $perl_scalar = $coder->decode ($unicode_json_text);
21    
22 root 1.21 # Note that JSON version 2.0 and above will automatically use JSON::XS
23     # if available, at virtually no speed overhead either, so you should
24     # be able to just:
25 root 1.23
26     use JSON;
27 root 1.21
28     # and do the same things, except that you have a pure-perl fallback now.
29    
30 root 1.1 DESCRIPTION
31 root 1.2 This module converts Perl data structures to JSON and vice versa. Its
32     primary goal is to be *correct* and its secondary goal is to be *fast*.
33     To reach the latter goal it was written in C.
34    
35 root 1.21 Beginning with version 2.0 of the JSON module, when both JSON and
36     JSON::XS are installed, then JSON will fall back on JSON::XS (this can
37 root 1.26 be overridden) with no overhead due to emulation (by inheriting
38 root 1.21 constructor and methods). If JSON::XS is not available, it will fall
39     back to the compatible JSON::PP module as backend, so using JSON instead
40     of JSON::XS gives you a portable JSON API that can be fast when you need
41     and doesn't require a C compiler when that is a problem.
42    
43 root 1.2 As this is the n-th-something JSON module on CPAN, what was the reason
44     to write yet another JSON module? While it seems there are many JSON
45     modules, none of them correctly handle all corner cases, and in most
46     cases their maintainers are unresponsive, gone missing, or not listening
47     to bug reports for other reasons.
48    
49 root 1.4 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
50     vice versa.
51    
52 root 1.2 FEATURES
53 root 1.23 * correct Unicode handling
54    
55     This module knows how to handle Unicode, documents how and when it
56     does so, and even documents what "correct" means.
57    
58     * round-trip integrity
59 root 1.2
60 root 1.26 When you serialise a perl data structure using only data types
61 root 1.2 supported by JSON, the deserialised data structure is identical on
62 root 1.8 the Perl level. (e.g. the string "2.0" doesn't suddenly become "2"
63 root 1.23 just because it looks like a number). There minor *are* exceptions
64     to this, read the MAPPING section below to learn about those.
65    
66     * strict checking of JSON correctness
67 root 1.2
68 root 1.6 There is no guessing, no generating of illegal JSON texts by
69 root 1.4 default, and only JSON is accepted as input by default (the latter
70     is a security feature).
71 root 1.2
72 root 1.23 * fast
73    
74     Compared to other JSON modules and other serialisers such as
75     Storable, this module usually compares favourably in terms of speed,
76     too.
77    
78     * simple to use
79 root 1.2
80 root 1.23 This module has both a simple functional interface as well as an
81 root 1.26 object oriented interface interface.
82 root 1.23
83     * reasonably versatile output formats
84    
85     You can choose between the most compact guaranteed-single-line
86 root 1.26 format possible (nice for simple line-based protocols), a pure-ASCII
87 root 1.8 format (for when your transport is not 8-bit clean, still supports
88 root 1.20 the whole Unicode range), or a pretty-printed format (for when you
89 root 1.8 want to read that stuff). Or you can combine those features in
90     whatever way you like.
91 root 1.2
92     FUNCTIONAL INTERFACE
93 root 1.20 The following convenience methods are provided by this module. They are
94 root 1.2 exported by default:
95    
96 root 1.22 $json_text = encode_json $perl_scalar
97 root 1.19 Converts the given Perl data structure to a UTF-8 encoded, binary
98     string (that is, the string contains octets only). Croaks on error.
99 root 1.2
100 root 1.6 This function call is functionally identical to:
101 root 1.2
102 root 1.6 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
103    
104 root 1.26 Except being faster.
105 root 1.6
106 root 1.22 $perl_scalar = decode_json $json_text
107     The opposite of "encode_json": expects an UTF-8 (binary) string and
108 root 1.6 tries to parse that as an UTF-8 encoded JSON text, returning the
109 root 1.19 resulting reference. Croaks on error.
110 root 1.2
111 root 1.6 This function call is functionally identical to:
112    
113     $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
114    
115 root 1.26 Except being faster.
116 root 1.2
117 root 1.14 $is_boolean = JSON::XS::is_bool $scalar
118     Returns true if the passed scalar represents either JSON::XS::true
119     or JSON::XS::false, two constants that act like 1 and 0,
120     respectively and are used to represent JSON "true" and "false"
121     values in Perl.
122    
123     See MAPPING, below, for more information on how JSON values are
124     mapped to Perl.
125    
126 root 1.19 A FEW NOTES ON UNICODE AND PERL
127     Since this often leads to confusion, here are a few very clear words on
128     how Unicode works in Perl, modulo bugs.
129    
130     1. Perl strings can store characters with ordinal values > 255.
131 root 1.20 This enables you to store Unicode characters as single characters in
132 root 1.19 a Perl string - very natural.
133    
134     2. Perl does *not* associate an encoding with your strings.
135 root 1.23 ... until you force it to, e.g. when matching it against a regex, or
136 root 1.19 printing the scalar to a file, in which case Perl either interprets
137     your string as locale-encoded text, octets/binary, or as Unicode,
138     depending on various settings. In no case is an encoding stored
139     together with your data, it is *use* that decides encoding, not any
140 root 1.23 magical meta data.
141 root 1.19
142     3. The internal utf-8 flag has no meaning with regards to the encoding
143     of your string.
144     Just ignore that flag unless you debug a Perl bug, a module written
145     in XS or want to dive into the internals of perl. Otherwise it will
146     only confuse you, as, despite the name, it says nothing about how
147 root 1.20 your string is encoded. You can have Unicode strings with that flag
148 root 1.19 set, with that flag clear, and you can have binary data with that
149     flag set and that flag clear. Other possibilities exist, too.
150    
151     If you didn't know about that flag, just the better, pretend it
152     doesn't exist.
153    
154     4. A "Unicode String" is simply a string where each character can be
155 root 1.26 validly interpreted as a Unicode code point.
156 root 1.19 If you have UTF-8 encoded data, it is no longer a Unicode string,
157     but a Unicode string encoded in UTF-8, giving you a binary string.
158    
159     5. A string containing "high" (> 255) character values is *not* a UTF-8
160     string.
161 root 1.20 It's a fact. Learn to live with it.
162 root 1.19
163     I hope this helps :)
164    
165 root 1.2 OBJECT-ORIENTED INTERFACE
166     The object oriented interface lets you configure your own encoding or
167     decoding style, within the limits of supported formats.
168    
169     $json = new JSON::XS
170     Creates a new JSON::XS object that can be used to de/encode JSON
171     strings. All boolean flags described below are by default
172     *disabled*.
173    
174     The mutators for flags all return the JSON object again and thus
175     calls can be chained:
176    
177 root 1.6 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
178 root 1.2 => {"a": [1, 2]}
179    
180 root 1.4 $json = $json->ascii ([$enable])
181 root 1.21 $enabled = $json->get_ascii
182 root 1.4 If $enable is true (or missing), then the "encode" method will not
183 root 1.6 generate characters outside the code range 0..127 (which is ASCII).
184 root 1.20 Any Unicode characters outside that range will be escaped using
185 root 1.6 either a single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL
186 root 1.11 escape sequence, as per RFC4627. The resulting encoded JSON text can
187 root 1.20 be treated as a native Unicode string, an ascii-encoded,
188 root 1.11 latin1-encoded or UTF-8 encoded string, or any other superset of
189     ASCII.
190 root 1.2
191     If $enable is false, then the "encode" method will not escape
192 root 1.11 Unicode characters unless required by the JSON syntax or other
193     flags. This results in a faster and more compact format.
194    
195 root 1.23 See also the section *ENCODING/CODESET FLAG NOTES* later in this
196     document.
197    
198 root 1.11 The main use for this flag is to produce JSON texts that can be
199     transmitted over a 7-bit channel, as the encoded JSON texts will not
200     contain any 8 bit characters.
201 root 1.2
202 root 1.6 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
203     => ["\ud801\udc01"]
204 root 1.2
205 root 1.11 $json = $json->latin1 ([$enable])
206 root 1.21 $enabled = $json->get_latin1
207 root 1.11 If $enable is true (or missing), then the "encode" method will
208     encode the resulting JSON text as latin1 (or iso-8859-1), escaping
209     any characters outside the code range 0..255. The resulting string
210 root 1.20 can be treated as a latin1-encoded JSON text or a native Unicode
211 root 1.11 string. The "decode" method will not be affected in any way by this
212 root 1.20 flag, as "decode" by default expects Unicode, which is a strict
213 root 1.11 superset of latin1.
214    
215     If $enable is false, then the "encode" method will not escape
216     Unicode characters unless required by the JSON syntax or other
217     flags.
218    
219 root 1.23 See also the section *ENCODING/CODESET FLAG NOTES* later in this
220     document.
221    
222 root 1.11 The main use for this flag is efficiently encoding binary data as
223     JSON text, as most octets will not be escaped, resulting in a
224     smaller encoded size. The disadvantage is that the resulting JSON
225     text is encoded in latin1 (and must correctly be treated as such
226 root 1.20 when storing and transferring), a rare encoding for JSON. It is
227 root 1.11 therefore most useful when you want to store data structures known
228     to contain binary data efficiently in files or databases, not when
229     talking to other JSON encoders/decoders.
230    
231     JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
232     => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
233    
234 root 1.4 $json = $json->utf8 ([$enable])
235 root 1.21 $enabled = $json->get_utf8
236 root 1.4 If $enable is true (or missing), then the "encode" method will
237 root 1.6 encode the JSON result into UTF-8, as required by many protocols,
238 root 1.4 while the "decode" method expects to be handled an UTF-8-encoded
239     string. Please note that UTF-8-encoded strings do not contain any
240     characters outside the range 0..255, they are thus useful for
241 root 1.6 bytewise/binary I/O. In future versions, enabling this option might
242     enable autodetection of the UTF-16 and UTF-32 encoding families, as
243     described in RFC4627.
244 root 1.2
245     If $enable is false, then the "encode" method will return the JSON
246 root 1.20 string as a (non-encoded) Unicode string, while "decode" expects
247     thus a Unicode string. Any decoding or encoding (e.g. to UTF-8 or
248 root 1.2 UTF-16) needs to be done yourself, e.g. using the Encode module.
249    
250 root 1.23 See also the section *ENCODING/CODESET FLAG NOTES* later in this
251     document.
252    
253 root 1.6 Example, output UTF-16BE-encoded JSON:
254    
255     use Encode;
256     $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
257    
258     Example, decode UTF-32LE-encoded JSON:
259    
260     use Encode;
261     $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
262 root 1.4
263     $json = $json->pretty ([$enable])
264 root 1.2 This enables (or disables) all of the "indent", "space_before" and
265     "space_after" (and in the future possibly more) flags in one call to
266     generate the most readable (or most compact) form possible.
267    
268 root 1.4 Example, pretty-print some simple structure:
269    
270 root 1.2 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
271     =>
272     {
273     "a" : [
274     1,
275     2
276     ]
277     }
278    
279 root 1.4 $json = $json->indent ([$enable])
280 root 1.21 $enabled = $json->get_indent
281 root 1.4 If $enable is true (or missing), then the "encode" method will use a
282     multiline format as output, putting every array member or
283 root 1.20 object/hash key-value pair into its own line, indenting them
284 root 1.4 properly.
285 root 1.2
286     If $enable is false, no newlines or indenting will be produced, and
287 root 1.20 the resulting JSON text is guaranteed not to contain any "newlines".
288 root 1.2
289 root 1.6 This setting has no effect when decoding JSON texts.
290 root 1.2
291 root 1.4 $json = $json->space_before ([$enable])
292 root 1.21 $enabled = $json->get_space_before
293 root 1.4 If $enable is true (or missing), then the "encode" method will add
294     an extra optional space before the ":" separating keys from values
295     in JSON objects.
296 root 1.2
297     If $enable is false, then the "encode" method will not add any extra
298     space at those places.
299    
300 root 1.6 This setting has no effect when decoding JSON texts. You will also
301 root 1.2 most likely combine this setting with "space_after".
302    
303 root 1.4 Example, space_before enabled, space_after and indent disabled:
304    
305     {"key" :"value"}
306    
307     $json = $json->space_after ([$enable])
308 root 1.21 $enabled = $json->get_space_after
309 root 1.4 If $enable is true (or missing), then the "encode" method will add
310     an extra optional space after the ":" separating keys from values in
311     JSON objects and extra whitespace after the "," separating key-value
312 root 1.2 pairs and array members.
313    
314     If $enable is false, then the "encode" method will not add any extra
315     space at those places.
316    
317 root 1.6 This setting has no effect when decoding JSON texts.
318 root 1.2
319 root 1.4 Example, space_before and indent disabled, space_after enabled:
320    
321     {"key": "value"}
322    
323 root 1.17 $json = $json->relaxed ([$enable])
324 root 1.21 $enabled = $json->get_relaxed
325 root 1.17 If $enable is true (or missing), then "decode" will accept some
326     extensions to normal JSON syntax (see below). "encode" will not be
327     affected in anyway. *Be aware that this option makes you accept
328     invalid JSON texts as if they were valid!*. I suggest only to use
329     this option to parse application-specific files written by humans
330     (configuration files, resource files etc.)
331    
332     If $enable is false (the default), then "decode" will only accept
333     valid JSON texts.
334    
335     Currently accepted extensions are:
336    
337 root 1.23 * list items can have an end-comma
338    
339 root 1.17 JSON *separates* array elements and key-value pairs with commas.
340     This can be annoying if you write JSON texts manually and want
341     to be able to quickly append elements, so this extension accepts
342     comma at the end of such items not just between them:
343    
344     [
345     1,
346     2, <- this comma not normally allowed
347     ]
348     {
349     "k1": "v1",
350     "k2": "v2", <- this comma not normally allowed
351     }
352    
353 root 1.23 * shell-style '#'-comments
354    
355 root 1.18 Whenever JSON allows whitespace, shell-style comments are
356     additionally allowed. They are terminated by the first
357     carriage-return or line-feed character, after which more
358     white-space and comments are allowed.
359    
360     [
361     1, # this comment not allowed in JSON
362     # neither this one...
363     ]
364    
365 root 1.4 $json = $json->canonical ([$enable])
366 root 1.21 $enabled = $json->get_canonical
367 root 1.4 If $enable is true (or missing), then the "encode" method will
368     output JSON objects by sorting their keys. This is adding a
369     comparatively high overhead.
370 root 1.2
371     If $enable is false, then the "encode" method will output key-value
372     pairs in the order Perl stores them (which will likely change
373     between runs of the same script).
374    
375     This option is useful if you want the same data structure to be
376 root 1.6 encoded as the same JSON text (given the same overall settings). If
377 root 1.20 it is disabled, the same hash might be encoded differently even if
378 root 1.2 contains the same data, as key-value pairs have no inherent ordering
379     in Perl.
380    
381 root 1.6 This setting has no effect when decoding JSON texts.
382 root 1.2
383 root 1.4 $json = $json->allow_nonref ([$enable])
384 root 1.21 $enabled = $json->get_allow_nonref
385 root 1.4 If $enable is true (or missing), then the "encode" method can
386     convert a non-reference into its corresponding string, number or
387     null JSON value, which is an extension to RFC4627. Likewise,
388     "decode" will accept those JSON values instead of croaking.
389 root 1.2
390     If $enable is false, then the "encode" method will croak if it isn't
391 root 1.6 passed an arrayref or hashref, as JSON texts must either be an
392 root 1.2 object or array. Likewise, "decode" will croak if given something
393     that is not a JSON object or array.
394    
395 root 1.4 Example, encode a Perl scalar as JSON value with enabled
396     "allow_nonref", resulting in an invalid JSON text:
397    
398     JSON::XS->new->allow_nonref->encode ("Hello, World!")
399     => "Hello, World!"
400    
401 root 1.25 $json = $json->allow_unknown ([$enable])
402     $enabled = $json->get_allow_unknown
403     If $enable is true (or missing), then "encode" will *not* throw an
404     exception when it encounters values it cannot represent in JSON (for
405     example, filehandles) but instead will encode a JSON "null" value.
406     Note that blessed objects are not included here and are handled
407     separately by c<allow_nonref>.
408    
409     If $enable is false (the default), then "encode" will throw an
410     exception when it encounters anything it cannot encode as JSON.
411    
412     This option does not affect "decode" in any way, and it is
413     recommended to leave it off unless you know your communications
414     partner.
415    
416 root 1.15 $json = $json->allow_blessed ([$enable])
417 root 1.21 $enabled = $json->get_allow_blessed
418 root 1.15 If $enable is true (or missing), then the "encode" method will not
419     barf when it encounters a blessed reference. Instead, the value of
420 root 1.20 the convert_blessed option will decide whether "null"
421 root 1.21 ("convert_blessed" disabled or no "TO_JSON" method found) or a
422 root 1.15 representation of the object ("convert_blessed" enabled and
423 root 1.21 "TO_JSON" method found) is being encoded. Has no effect on "decode".
424 root 1.15
425     If $enable is false (the default), then "encode" will throw an
426     exception when it encounters a blessed object.
427    
428     $json = $json->convert_blessed ([$enable])
429 root 1.21 $enabled = $json->get_convert_blessed
430 root 1.15 If $enable is true (or missing), then "encode", upon encountering a
431     blessed object, will check for the availability of the "TO_JSON"
432     method on the object's class. If found, it will be called in scalar
433     context and the resulting scalar will be encoded instead of the
434     object. If no "TO_JSON" method is found, the value of
435     "allow_blessed" will decide what to do.
436    
437     The "TO_JSON" method may safely call die if it wants. If "TO_JSON"
438     returns other blessed objects, those will be handled in the same
439     way. "TO_JSON" must take care of not causing an endless recursion
440     cycle (== crash) in this case. The name of "TO_JSON" was chosen
441     because other methods called by the Perl core (== not by the user of
442     the object) are usually in upper case letters and to avoid
443 root 1.22 collisions with any "to_json" function or method.
444 root 1.15
445     This setting does not yet influence "decode" in any way, but in the
446     future, global hooks might get installed that influence "decode" and
447     are enabled by this setting.
448    
449     If $enable is false, then the "allow_blessed" setting will decide
450     what to do when a blessed object is found.
451    
452     $json = $json->filter_json_object ([$coderef->($hashref)])
453     When $coderef is specified, it will be called from "decode" each
454     time it decodes a JSON object. The only argument is a reference to
455     the newly-created hash. If the code references returns a single
456     scalar (which need not be a reference), this value (i.e. a copy of
457     that scalar to avoid aliasing) is inserted into the deserialised
458     data structure. If it returns an empty list (NOTE: *not* "undef",
459     which is a valid scalar), the original deserialised hash will be
460     inserted. This setting can slow down decoding considerably.
461    
462     When $coderef is omitted or undefined, any existing callback will be
463     removed and "decode" will not change the deserialised hash in any
464     way.
465    
466     Example, convert all JSON objects into the integer 5:
467    
468     my $js = JSON::XS->new->filter_json_object (sub { 5 });
469     # returns [5]
470     $js->decode ('[{}]')
471     # throw an exception because allow_nonref is not enabled
472     # so a lone 5 is not allowed.
473     $js->decode ('{"a":1, "b":2}');
474    
475     $json = $json->filter_json_single_key_object ($key [=>
476     $coderef->($value)])
477     Works remotely similar to "filter_json_object", but is only called
478     for JSON objects having a single key named $key.
479    
480     This $coderef is called before the one specified via
481     "filter_json_object", if any. It gets passed the single value in the
482     JSON object. If it returns a single value, it will be inserted into
483     the data structure. If it returns nothing (not even "undef" but the
484     empty list), the callback from "filter_json_object" will be called
485     next, as if no single-key callback were specified.
486    
487     If $coderef is omitted or undefined, the corresponding callback will
488     be disabled. There can only ever be one callback for a given key.
489    
490     As this callback gets called less often then the
491     "filter_json_object" one, decoding speed will not usually suffer as
492     much. Therefore, single-key objects make excellent targets to
493     serialise Perl objects into, especially as single-key JSON objects
494 root 1.20 are as close to the type-tagged value concept as JSON gets (it's
495 root 1.15 basically an ID/VALUE tuple). Of course, JSON does not support this
496     in any way, so you need to make sure your data never looks like a
497     serialised Perl hash.
498    
499     Typical names for the single object key are "__class_whatever__", or
500     "$__dollars_are_rarely_used__$" or "}ugly_brace_placement", or even
501     things like "__class_md5sum(classname)__", to reduce the risk of
502     clashing with real hashes.
503    
504     Example, decode JSON objects of the form "{ "__widget__" => <id> }"
505     into the corresponding $WIDGET{<id>} object:
506    
507     # return whatever is in $WIDGET{5}:
508     JSON::XS
509     ->new
510     ->filter_json_single_key_object (__widget__ => sub {
511     $WIDGET{ $_[0] }
512     })
513     ->decode ('{"__widget__": 5')
514    
515     # this can be used with a TO_JSON method in some "widget" class
516     # for serialisation to json:
517     sub WidgetBase::TO_JSON {
518     my ($self) = @_;
519    
520     unless ($self->{id}) {
521     $self->{id} = ..get..some..id..;
522     $WIDGET{$self->{id}} = $self;
523     }
524    
525     { __widget__ => $self->{id} }
526     }
527    
528 root 1.4 $json = $json->shrink ([$enable])
529 root 1.21 $enabled = $json->get_shrink
530 root 1.4 Perl usually over-allocates memory a bit when allocating space for
531     strings. This flag optionally resizes strings generated by either
532     "encode" or "decode" to their minimum size possible. This can save
533 root 1.6 memory when your JSON texts are either very very long or you have
534 root 1.4 many short strings. It will also try to downgrade any strings to
535     octet-form if possible: perl stores strings internally either in an
536     encoding called UTF-X or in octet-form. The latter cannot store
537 root 1.9 everything but uses less space in general (and some buggy Perl or C
538     code might even rely on that internal representation being used).
539    
540     The actual definition of what shrink does might change in future
541     versions, but it will always try to save space at the expense of
542     time.
543 root 1.4
544     If $enable is true (or missing), the string returned by "encode"
545     will be shrunk-to-fit, while all strings generated by "decode" will
546     also be shrunk-to-fit.
547    
548     If $enable is false, then the normal perl allocation algorithms are
549     used. If you work with your data, then this is likely to be faster.
550    
551     In the future, this setting might control other things, such as
552     converting strings that look like integers or floats into integers
553     or floats internally (there is no difference on the Perl level),
554     saving space.
555    
556 root 1.8 $json = $json->max_depth ([$maximum_nesting_depth])
557 root 1.21 $max_depth = $json->get_max_depth
558 root 1.10 Sets the maximum nesting level (default 512) accepted while encoding
559 root 1.25 or decoding. If a higher nesting level is detected in JSON text or a
560     Perl data structure, then the encoder and decoder will stop and
561     croak at that point.
562 root 1.8
563     Nesting level is defined by number of hash- or arrayrefs that the
564     encoder needs to traverse to reach a given point or the number of
565     "{" or "[" characters without their matching closing parenthesis
566     crossed to reach a given character in a string.
567    
568     Setting the maximum depth to one disallows any nesting, so that
569     ensures that the object is only a single hash/object or array.
570    
571 root 1.25 If no argument is given, the highest possible setting will be used,
572     which is rarely useful.
573    
574     Note that nesting is implemented by recursion in C. The default
575     value has been chosen to be as large as typical operating systems
576     allow without crashing.
577 root 1.15
578     See SECURITY CONSIDERATIONS, below, for more info on why this is
579     useful.
580    
581     $json = $json->max_size ([$maximum_string_size])
582 root 1.21 $max_size = $json->get_max_size
583 root 1.15 Set the maximum length a JSON text may have (in bytes) where
584     decoding is being attempted. The default is 0, meaning no limit.
585 root 1.25 When "decode" is called on a string that is longer then this many
586     bytes, it will not attempt to decode the string but throw an
587 root 1.15 exception. This setting has no effect on "encode" (yet).
588    
589 root 1.25 If no argument is given, the limit check will be deactivated (same
590     as when 0 is specified).
591 root 1.8
592     See SECURITY CONSIDERATIONS, below, for more info on why this is
593     useful.
594    
595 root 1.6 $json_text = $json->encode ($perl_scalar)
596 root 1.2 Converts the given Perl data structure (a simple scalar or a
597     reference to a hash or array) to its JSON representation. Simple
598     scalars will be converted into JSON string or number sequences,
599     while references to arrays become JSON arrays and references to
600     hashes become JSON objects. Undefined Perl values (e.g. "undef")
601     become JSON "null" values. Neither "true" nor "false" values will be
602     generated.
603    
604 root 1.6 $perl_scalar = $json->decode ($json_text)
605     The opposite of "encode": expects a JSON text and tries to parse it,
606     returning the resulting simple scalar or reference. Croaks on error.
607 root 1.2
608     JSON numbers and strings become simple Perl scalars. JSON arrays
609     become Perl arrayrefs and JSON objects become Perl hashrefs. "true"
610     becomes 1, "false" becomes 0 and "null" becomes "undef".
611    
612 root 1.11 ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
613     This works like the "decode" method, but instead of raising an
614     exception when there is trailing garbage after the first JSON
615     object, it will silently stop parsing there and return the number of
616     characters consumed so far.
617    
618     This is useful if your JSON texts are not delimited by an outer
619     protocol (which is not the brightest thing to do in the first place)
620     and you need to know where the JSON text ends.
621    
622     JSON::XS->new->decode_prefix ("[1] the tail")
623     => ([], 3)
624    
625 root 1.24 INCREMENTAL PARSING
626     In some cases, there is the need for incremental parsing of JSON texts.
627     While this module always has to keep both JSON text and resulting Perl
628     data structure in memory at one time, it does allow you to parse a JSON
629     stream incrementally. It does so by accumulating text until it has a
630     full JSON object, which it then can decode. This process is similar to
631     using "decode_prefix" to see if a full JSON object is available, but is
632 root 1.27 much more efficient (and can be implemented with a minimum of method
633     calls).
634 root 1.24
635 root 1.27 JSON::XS will only attempt to parse the JSON text once it is sure it has
636     enough text to get a decisive result, using a very simple but truly
637     incremental parser. This means that it sometimes won't stop as early as
638     the full parser, for example, it doesn't detect parenthese mismatches.
639     The only thing it guarantees is that it starts decoding as soon as a
640     syntactically valid JSON text has been seen. This means you need to set
641     resource limits (e.g. "max_size") to ensure the parser will stop parsing
642     in the presence if syntax errors.
643    
644     The following methods implement this incremental parser.
645 root 1.24
646     [void, scalar or list context] = $json->incr_parse ([$string])
647     This is the central parsing function. It can both append new text
648     and extract objects from the stream accumulated so far (both of
649     these functions are optional).
650    
651     If $string is given, then this string is appended to the already
652     existing JSON fragment stored in the $json object.
653    
654     After that, if the function is called in void context, it will
655     simply return without doing anything further. This can be used to
656     add more text in as many chunks as you want.
657    
658     If the method is called in scalar context, then it will try to
659     extract exactly *one* JSON object. If that is successful, it will
660     return this object, otherwise it will return "undef". If there is a
661     parse error, this method will croak just as "decode" would do (one
662     can then use "incr_skip" to skip the errornous part). This is the
663     most common way of using the method.
664    
665     And finally, in list context, it will try to extract as many objects
666     from the stream as it can find and return them, or the empty list
667     otherwise. For this to work, there must be no separators between the
668     JSON objects or arrays, instead they must be concatenated
669     back-to-back. If an error occurs, an exception will be raised as in
670     the scalar context case. Note that in this case, any
671     previously-parsed JSON texts will be lost.
672    
673     $lvalue_string = $json->incr_text
674     This method returns the currently stored JSON fragment as an lvalue,
675     that is, you can manipulate it. This *only* works when a preceding
676     call to "incr_parse" in *scalar context* successfully returned an
677     object. Under all other circumstances you must not call this
678     function (I mean it. although in simple tests it might actually
679     work, it *will* fail under real world conditions). As a special
680     exception, you can also call this method before having parsed
681     anything.
682    
683     This function is useful in two cases: a) finding the trailing text
684     after a JSON object or b) parsing multiple JSON objects separated by
685     non-JSON text (such as commas).
686    
687     $json->incr_skip
688     This will reset the state of the incremental parser and will remove
689 root 1.29 the parsed text from the input buffer so far. This is useful after
690 root 1.24 "incr_parse" died, in which case the input buffer and incremental
691     parser state is left unchanged, to skip the text parsed so far and
692     to reset the parse state.
693    
694 root 1.29 The difference to "incr_reset" is that only text until the parse
695     error occured is removed.
696    
697 root 1.26 $json->incr_reset
698     This completely resets the incremental parser, that is, after this
699     call, it will be as if the parser had never parsed anything.
700    
701 root 1.29 This is useful if you want to repeatedly parse JSON objects and want
702 root 1.26 to ignore any trailing data, which means you have to reset the
703     parser after each successful decode.
704    
705 root 1.24 LIMITATIONS
706     All options that affect decoding are supported, except "allow_nonref".
707     The reason for this is that it cannot be made to work sensibly: JSON
708     objects and arrays are self-delimited, i.e. you can concatenate them
709     back to back and still decode them perfectly. This does not hold true
710     for JSON numbers, however.
711    
712     For example, is the string 1 a single JSON number, or is it simply the
713     start of 12? Or is 12 a single JSON number, or the concatenation of 1
714     and 2? In neither case you can tell, and this is why JSON::XS takes the
715     conservative route and disallows this case.
716    
717     EXAMPLES
718     Some examples will make all this clearer. First, a simple example that
719     works similarly to "decode_prefix": We want to decode the JSON object at
720     the start of a string and identify the portion after the JSON object:
721    
722     my $text = "[1,2,3] hello";
723    
724     my $json = new JSON::XS;
725    
726     my $obj = $json->incr_parse ($text)
727     or die "expected JSON object or array at beginning of string";
728    
729     my $tail = $json->incr_text;
730     # $tail now contains " hello"
731    
732     Easy, isn't it?
733    
734     Now for a more complicated example: Imagine a hypothetical protocol
735     where you read some requests from a TCP stream, and each request is a
736     JSON array, without any separation between them (in fact, it is often
737     useful to use newlines as "separators", as these get interpreted as
738     whitespace at the start of the JSON text, which makes it possible to
739     test said protocol with "telnet"...).
740    
741     Here is how you'd do it (it is trivial to write this in an event-based
742     manner):
743    
744     my $json = new JSON::XS;
745    
746     # read some data from the socket
747     while (sysread $socket, my $buf, 4096) {
748    
749     # split and decode as many requests as possible
750     for my $request ($json->incr_parse ($buf)) {
751     # act on the $request
752     }
753     }
754    
755     Another complicated example: Assume you have a string with JSON objects
756     or arrays, all separated by (optional) comma characters (e.g. "[1],[2],
757     [3]"). To parse them, we have to skip the commas between the JSON texts,
758     and here is where the lvalue-ness of "incr_text" comes in useful:
759    
760     my $text = "[1],[2], [3]";
761     my $json = new JSON::XS;
762    
763     # void context, so no parsing done
764     $json->incr_parse ($text);
765    
766     # now extract as many objects as possible. note the
767     # use of scalar context so incr_text can be called.
768     while (my $obj = $json->incr_parse) {
769     # do something with $obj
770    
771     # now skip the optional comma
772     $json->incr_text =~ s/^ \s* , //x;
773     }
774    
775     Now lets go for a very complex example: Assume that you have a gigantic
776     JSON array-of-objects, many gigabytes in size, and you want to parse it,
777     but you cannot load it into memory fully (this has actually happened in
778     the real world :).
779    
780     Well, you lost, you have to implement your own JSON parser. But JSON::XS
781     can still help you: You implement a (very simple) array parser and let
782     JSON decode the array elements, which are all full JSON objects on their
783     own (this wouldn't work if the array elements could be JSON numbers, for
784     example):
785    
786     my $json = new JSON::XS;
787    
788     # open the monster
789     open my $fh, "<bigfile.json"
790     or die "bigfile: $!";
791    
792     # first parse the initial "["
793     for (;;) {
794     sysread $fh, my $buf, 65536
795     or die "read error: $!";
796     $json->incr_parse ($buf); # void context, so no parsing
797    
798     # Exit the loop once we found and removed(!) the initial "[".
799     # In essence, we are (ab-)using the $json object as a simple scalar
800     # we append data to.
801     last if $json->incr_text =~ s/^ \s* \[ //x;
802     }
803    
804     # now we have the skipped the initial "[", so continue
805     # parsing all the elements.
806     for (;;) {
807     # in this loop we read data until we got a single JSON object
808     for (;;) {
809     if (my $obj = $json->incr_parse) {
810     # do something with $obj
811     last;
812     }
813    
814     # add more data
815     sysread $fh, my $buf, 65536
816     or die "read error: $!";
817     $json->incr_parse ($buf); # void context, so no parsing
818     }
819    
820     # in this loop we read data until we either found and parsed the
821     # separating "," between elements, or the final "]"
822     for (;;) {
823     # first skip whitespace
824     $json->incr_text =~ s/^\s*//;
825    
826     # if we find "]", we are done
827     if ($json->incr_text =~ s/^\]//) {
828     print "finished.\n";
829     exit;
830     }
831    
832     # if we find ",", we can continue with the next element
833     if ($json->incr_text =~ s/^,//) {
834     last;
835     }
836    
837     # if we find anything else, we have a parse error!
838     if (length $json->incr_text) {
839     die "parse error near ", $json->incr_text;
840     }
841    
842     # else add more data
843     sysread $fh, my $buf, 65536
844     or die "read error: $!";
845     $json->incr_parse ($buf); # void context, so no parsing
846     }
847    
848     This is a complex example, but most of the complexity comes from the
849     fact that we are trying to be correct (bear with me if I am wrong, I
850     never ran the above example :).
851    
852 root 1.4 MAPPING
853     This section describes how JSON::XS maps Perl values to JSON values and
854     vice versa. These mappings are designed to "do the right thing" in most
855     circumstances automatically, preserving round-tripping characteristics
856     (what you put in comes out as something equivalent).
857    
858     For the more enlightened: note that in the following descriptions,
859 root 1.20 lowercase *perl* refers to the Perl interpreter, while uppercase *Perl*
860 root 1.4 refers to the abstract Perl language itself.
861    
862     JSON -> PERL
863     object
864     A JSON object becomes a reference to a hash in Perl. No ordering of
865 root 1.20 object keys is preserved (JSON does not preserve object key ordering
866     itself).
867 root 1.4
868     array
869     A JSON array becomes a reference to an array in Perl.
870    
871     string
872     A JSON string becomes a string scalar in Perl - Unicode codepoints
873     in JSON are represented by the same codepoints in the Perl string,
874     so no manual decoding is necessary.
875    
876     number
877 root 1.16 A JSON number becomes either an integer, numeric (floating point) or
878     string scalar in perl, depending on its range and any fractional
879     parts. On the Perl level, there is no difference between those as
880     Perl handles all the conversion details, but an integer may take
881     slightly less memory and might represent more values exactly than
882 root 1.23 floating point numbers.
883 root 1.16
884     If the number consists of digits only, JSON::XS will try to
885     represent it as an integer value. If that fails, it will try to
886     represent it as a numeric (floating point) value if that is possible
887     without loss of precision. Otherwise it will preserve the number as
888 root 1.23 a string value (in which case you lose roundtripping ability, as the
889     JSON number will be re-encoded toa JSON string).
890 root 1.16
891     Numbers containing a fractional or exponential part will always be
892     represented as numeric (floating point) values, possibly at a loss
893 root 1.23 of precision (in which case you might lose perfect roundtripping
894     ability, but the JSON number will still be re-encoded as a JSON
895     number).
896 root 1.4
897     true, false
898 root 1.14 These JSON atoms become "JSON::XS::true" and "JSON::XS::false",
899     respectively. They are overloaded to act almost exactly like the
900 root 1.20 numbers 1 and 0. You can check whether a scalar is a JSON boolean by
901 root 1.14 using the "JSON::XS::is_bool" function.
902 root 1.4
903     null
904     A JSON null atom becomes "undef" in Perl.
905    
906     PERL -> JSON
907     The mapping from Perl to JSON is slightly more difficult, as Perl is a
908     truly typeless language, so we can only guess which JSON type is meant
909     by a Perl value.
910    
911     hash references
912     Perl hash references become JSON objects. As there is no inherent
913 root 1.9 ordering in hash keys (or JSON objects), they will usually be
914     encoded in a pseudo-random order that can change between runs of the
915     same program but stays generally the same within a single run of a
916     program. JSON::XS can optionally sort the hash keys (determined by
917     the *canonical* flag), so the same datastructure will serialise to
918     the same JSON text (given same settings and version of JSON::XS),
919     but this incurs a runtime overhead and is only rarely useful, e.g.
920     when you want to compare some JSON text against another for
921     equality.
922 root 1.4
923     array references
924     Perl array references become JSON arrays.
925    
926 root 1.9 other references
927     Other unblessed references are generally not allowed and will cause
928     an exception to be thrown, except for references to the integers 0
929     and 1, which get turned into "false" and "true" atoms in JSON. You
930     can also use "JSON::XS::false" and "JSON::XS::true" to improve
931     readability.
932    
933 root 1.26 encode_json [\0, JSON::XS::true] # yields [false,true]
934 root 1.9
935 root 1.14 JSON::XS::true, JSON::XS::false
936     These special values become JSON true and JSON false values,
937 root 1.19 respectively. You can also use "\1" and "\0" directly if you want.
938 root 1.14
939 root 1.4 blessed objects
940 root 1.23 Blessed objects are not directly representable in JSON. See the
941     "allow_blessed" and "convert_blessed" methods on various options on
942     how to deal with this: basically, you can choose between throwing an
943     exception, encoding the reference as if it weren't blessed, or
944     provide your own serialiser method.
945 root 1.4
946     simple scalars
947     Simple Perl scalars (any scalar that is not a reference) are the
948     most difficult objects to encode: JSON::XS will encode undefined
949 root 1.23 scalars as JSON "null" values, scalars that have last been used in a
950     string context before encoding as JSON strings, and anything else as
951 root 1.4 number value:
952    
953     # dump as number
954 root 1.22 encode_json [2] # yields [2]
955     encode_json [-3.0e17] # yields [-3e+17]
956     my $value = 5; encode_json [$value] # yields [5]
957 root 1.4
958     # used as string, so dump as string
959     print $value;
960 root 1.22 encode_json [$value] # yields ["5"]
961 root 1.4
962     # undef becomes null
963 root 1.22 encode_json [undef] # yields [null]
964 root 1.4
965 root 1.20 You can force the type to be a JSON string by stringifying it:
966 root 1.4
967     my $x = 3.1; # some variable containing a number
968     "$x"; # stringified
969     $x .= ""; # another, more awkward way to stringify
970     print $x; # perl does it for you, too, quite often
971    
972 root 1.20 You can force the type to be a JSON number by numifying it:
973 root 1.4
974     my $x = "3"; # some variable containing a string
975     $x += 0; # numify it, ensuring it will be dumped as a number
976 root 1.20 $x *= 1; # same thing, the choice is yours.
977 root 1.4
978 root 1.20 You can not currently force the type in other, less obscure, ways.
979 root 1.23 Tell me if you need this capability (but don't forget to explain why
980 root 1.24 it's needed :).
981 root 1.23
982     ENCODING/CODESET FLAG NOTES
983     The interested reader might have seen a number of flags that signify
984     encodings or codesets - "utf8", "latin1" and "ascii". There seems to be
985     some confusion on what these do, so here is a short comparison:
986    
987 root 1.24 "utf8" controls whether the JSON text created by "encode" (and expected
988 root 1.23 by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only
989 root 1.24 control whether "encode" escapes character values outside their
990 root 1.23 respective codeset range. Neither of these flags conflict with each
991     other, although some combinations make less sense than others.
992    
993     Care has been taken to make all flags symmetrical with respect to
994     "encode" and "decode", that is, texts encoded with any combination of
995     these flag values will be correctly decoded when the same flags are used
996     - in general, if you use different flag settings while encoding vs. when
997     decoding you likely have a bug somewhere.
998    
999     Below comes a verbose discussion of these flags. Note that a "codeset"
1000     is simply an abstract set of character-codepoint pairs, while an
1001     encoding takes those codepoint numbers and *encodes* them, in our case
1002     into octets. Unicode is (among other things) a codeset, UTF-8 is an
1003     encoding, and ISO-8859-1 (= latin 1) and ASCII are both codesets *and*
1004     encodings at the same time, which can be confusing.
1005    
1006     "utf8" flag disabled
1007     When "utf8" is disabled (the default), then "encode"/"decode"
1008     generate and expect Unicode strings, that is, characters with high
1009     ordinal Unicode values (> 255) will be encoded as such characters,
1010     and likewise such characters are decoded as-is, no canges to them
1011     will be done, except "(re-)interpreting" them as Unicode codepoints
1012     or Unicode characters, respectively (to Perl, these are the same
1013     thing in strings unless you do funny/weird/dumb stuff).
1014    
1015     This is useful when you want to do the encoding yourself (e.g. when
1016     you want to have UTF-16 encoded JSON texts) or when some other layer
1017     does the encoding for you (for example, when printing to a terminal
1018     using a filehandle that transparently encodes to UTF-8 you certainly
1019     do NOT want to UTF-8 encode your data first and have Perl encode it
1020     another time).
1021    
1022     "utf8" flag enabled
1023     If the "utf8"-flag is enabled, "encode"/"decode" will encode all
1024     characters using the corresponding UTF-8 multi-byte sequence, and
1025     will expect your input strings to be encoded as UTF-8, that is, no
1026     "character" of the input string must have any value > 255, as UTF-8
1027     does not allow that.
1028    
1029     The "utf8" flag therefore switches between two modes: disabled means
1030     you will get a Unicode string in Perl, enabled means you get an
1031     UTF-8 encoded octet/binary string in Perl.
1032    
1033     "latin1" or "ascii" flags enabled
1034     With "latin1" (or "ascii") enabled, "encode" will escape characters
1035     with ordinal values > 255 (> 127 with "ascii") and encode the
1036     remaining characters as specified by the "utf8" flag.
1037    
1038     If "utf8" is disabled, then the result is also correctly encoded in
1039     those character sets (as both are proper subsets of Unicode, meaning
1040     that a Unicode string with all character values < 256 is the same
1041     thing as a ISO-8859-1 string, and a Unicode string with all
1042     character values < 128 is the same thing as an ASCII string in
1043     Perl).
1044    
1045     If "utf8" is enabled, you still get a correct UTF-8-encoded string,
1046     regardless of these flags, just some more characters will be escaped
1047     using "\uXXXX" then before.
1048    
1049     Note that ISO-8859-1-*encoded* strings are not compatible with UTF-8
1050     encoding, while ASCII-encoded strings are. That is because the
1051     ISO-8859-1 encoding is NOT a subset of UTF-8 (despite the ISO-8859-1
1052     *codeset* being a subset of Unicode), while ASCII is.
1053    
1054     Surprisingly, "decode" will ignore these flags and so treat all
1055     input values as governed by the "utf8" flag. If it is disabled, this
1056     allows you to decode ISO-8859-1- and ASCII-encoded strings, as both
1057     strict subsets of Unicode. If it is enabled, you can correctly
1058     decode UTF-8 encoded strings.
1059    
1060     So neither "latin1" nor "ascii" are incompatible with the "utf8"
1061     flag - they only govern when the JSON output engine escapes a
1062     character or not.
1063    
1064     The main use for "latin1" is to relatively efficiently store binary
1065     data as JSON, at the expense of breaking compatibility with most
1066     JSON decoders.
1067    
1068     The main use for "ascii" is to force the output to not contain
1069     characters with values > 127, which means you can interpret the
1070     resulting string as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about
1071     any character set and 8-bit-encoding, and still get the same data
1072     structure back. This is useful when your channel for JSON transfer
1073     is not 8-bit clean or the encoding might be mangled in between (e.g.
1074     in mail), and works because ASCII is a proper subset of most 8-bit
1075     and multibyte encodings in use in the world.
1076 root 1.4
1077 root 1.29 JSON and ECMAscript
1078     JSON syntax is based on how literals are represented in javascript (the
1079     not-standardised predecessor of ECMAscript) which is presumably why it
1080     is called "JavaScript Object Notation".
1081    
1082     However, JSON is not a subset (and also not a superset of course) of
1083     ECMAscript (the standard) or javascript (whatever browsers actually
1084     implement).
1085    
1086     If you want to use javascript's "eval" function to "parse" JSON, you
1087     might run into parse errors for valid JSON texts, or the resulting data
1088     structure might not be queryable:
1089    
1090     One of the problems is that U+2028 and U+2029 are valid characters
1091     inside JSON strings, but are not allowed in ECMAscript string literals,
1092     so the following Perl fragment will not output something that can be
1093     guaranteed to be parsable by javascript's "eval":
1094    
1095     use JSON::XS;
1096    
1097     print encode_json [chr 0x2028];
1098    
1099     The right fix for this is to use a proper JSON parser in your javascript
1100     programs, and not rely on "eval" (see for example Douglas Crockford's
1101     json2.js parser).
1102    
1103     If this is not an option, you can, as a stop-gap measure, simply encode
1104     to ASCII-only JSON:
1105    
1106     use JSON::XS;
1107    
1108     print JSON::XS->new->ascii->encode ([chr 0x2028]);
1109    
1110     Note that this will enlarge the resulting JSON text quite a bit if you
1111     have many non-ASCII characters. You might be tempted to run some regexes
1112     to only escape U+2028 and U+2029, e.g.:
1113    
1114     # DO NOT USE THIS!
1115     my $json = JSON::XS->new->utf8->encode ([chr 0x2028]);
1116     $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
1117     $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
1118     print $json;
1119    
1120     Note that *this is a bad idea*: the above only works for U+2028 and
1121     U+2029 and thus only for fully ECMAscript-compliant parsers. Many
1122     existing javascript implementations, however, have issues with other
1123     characters as well - using "eval" naively simply *will* cause problems.
1124    
1125     Another problem is that some javascript implementations reserve some
1126     property names for their own purposes (which probably makes them
1127     non-ECMAscript-compliant). For example, Iceweasel reserves the
1128     "__proto__" property name for it's own purposes.
1129    
1130     If that is a problem, you could parse try to filter the resulting JSON
1131     output for these property strings, e.g.:
1132    
1133     $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
1134    
1135     This works because "__proto__" is not valid outside of strings, so every
1136     occurence of ""__proto__"\s*:" must be a string used as property name.
1137    
1138     If you know of other incompatibilities, please let me know.
1139    
1140 root 1.13 JSON and YAML
1141 root 1.23 You often hear that JSON is a subset of YAML. This is, however, a mass
1142     hysteria(*) and very far from the truth (as of the time of this
1143     writing), so let me state it clearly: *in general, there is no way to
1144     configure JSON::XS to output a data structure as valid YAML* that works
1145     in all cases.
1146 root 1.13
1147     If you really must use JSON::XS to generate YAML, you should use this
1148     algorithm (subject to change in future versions):
1149    
1150     my $to_yaml = JSON::XS->new->utf8->space_after (1);
1151     my $yaml = $to_yaml->encode ($ref) . "\n";
1152    
1153 root 1.23 This will *usually* generate JSON texts that also parse as valid YAML.
1154 root 1.13 Please note that YAML has hardcoded limits on (simple) object key
1155 root 1.23 lengths that JSON doesn't have and also has different and incompatible
1156     unicode handling, so you should make sure that your hash keys are
1157     noticeably shorter than the 1024 "stream characters" YAML allows and
1158     that you do not have characters with codepoint values outside the
1159     Unicode BMP (basic multilingual page). YAML also does not allow "\/"
1160     sequences in strings (which JSON::XS does not *currently* generate, but
1161     other JSON generators might).
1162    
1163     There might be other incompatibilities that I am not aware of (or the
1164     YAML specification has been changed yet again - it does so quite often).
1165     In general you should not try to generate YAML with a JSON generator or
1166     vice versa, or try to parse JSON with a YAML parser or vice versa:
1167     chances are high that you will run into severe interoperability problems
1168     when you least expect it.
1169 root 1.13
1170 root 1.23 (*) I have been pressured multiple times by Brian Ingerson (one of the
1171     authors of the YAML specification) to remove this paragraph, despite
1172     him acknowledging that the actual incompatibilities exist. As I was
1173     personally bitten by this "JSON is YAML" lie, I refused and said I
1174     will continue to educate people about these issues, so others do not
1175     run into the same problem again and again. After this, Brian called
1176     me a (quote)*complete and worthless idiot*(unquote).
1177    
1178     In my opinion, instead of pressuring and insulting people who
1179     actually clarify issues with YAML and the wrong statements of some
1180     of its proponents, I would kindly suggest reading the JSON spec
1181     (which is not that difficult or long) and finally make YAML
1182     compatible to it, and educating users about the changes, instead of
1183     spreading lies about the real compatibility for many *years* and
1184     trying to silence people who point out that it isn't true.
1185 root 1.13
1186 root 1.2 SPEED
1187     It seems that JSON::XS is surprisingly fast, as shown in the following
1188     tables. They have been generated with the help of the "eg/bench" program
1189     in the JSON::XS distribution, to make it easy to compare on your own
1190     system.
1191    
1192 root 1.12 First comes a comparison between various modules using a very short
1193 root 1.23 single-line JSON string (also available at
1194     <http://dist.schmorp.de/misc/json/short.json>).
1195 root 1.7
1196 root 1.25 {"method": "handleMessage", "params": ["user1",
1197     "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1198     true, false]}
1199 root 1.7
1200     It shows the number of encodes/decodes per second (JSON::XS uses the
1201     functional interface, while JSON::XS/2 uses the OO interface with
1202 root 1.13 pretty-printing and hashkey sorting enabled, JSON::XS/3 enables shrink).
1203     Higher is better:
1204 root 1.2
1205     module | encode | decode |
1206     -----------|------------|------------|
1207 root 1.21 JSON 1.x | 4990.842 | 4088.813 |
1208 root 1.15 JSON::DWIW | 51653.990 | 71575.154 |
1209     JSON::PC | 65948.176 | 74631.744 |
1210     JSON::PP | 8931.652 | 3817.168 |
1211     JSON::Syck | 24877.248 | 27776.848 |
1212     JSON::XS | 388361.481 | 227951.304 |
1213     JSON::XS/2 | 227951.304 | 218453.333 |
1214     JSON::XS/3 | 338250.323 | 218453.333 |
1215     Storable | 16500.016 | 135300.129 |
1216 root 1.2 -----------+------------+------------+
1217    
1218 root 1.12 That is, JSON::XS is about five times faster than JSON::DWIW on
1219 root 1.20 encoding, about three times faster on decoding, and over forty times
1220 root 1.12 faster than JSON, even with pretty-printing and key sorting. It also
1221     compares favourably to Storable for small amounts of data.
1222 root 1.2
1223 root 1.5 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
1224 root 1.23 search API (<http://dist.schmorp.de/misc/json/long.json>).
1225 root 1.2
1226     module | encode | decode |
1227     -----------|------------|------------|
1228 root 1.21 JSON 1.x | 55.260 | 34.971 |
1229 root 1.15 JSON::DWIW | 825.228 | 1082.513 |
1230     JSON::PC | 3571.444 | 2394.829 |
1231     JSON::PP | 210.987 | 32.574 |
1232     JSON::Syck | 552.551 | 787.544 |
1233     JSON::XS | 5780.463 | 4854.519 |
1234     JSON::XS/2 | 3869.998 | 4798.975 |
1235     JSON::XS/3 | 5862.880 | 4798.975 |
1236     Storable | 4445.002 | 5235.027 |
1237 root 1.2 -----------+------------+------------+
1238    
1239 root 1.13 Again, JSON::XS leads by far (except for Storable which non-surprisingly
1240     decodes faster).
1241 root 1.2
1242 root 1.20 On large strings containing lots of high Unicode characters, some
1243 root 1.7 modules (such as JSON::PC) seem to decode faster than JSON::XS, but the
1244 root 1.20 result will be broken due to missing (or wrong) Unicode handling. Others
1245 root 1.7 refuse to decode or encode properly, so it was impossible to prepare a
1246     fair comparison table for that case.
1247 root 1.5
1248 root 1.8 SECURITY CONSIDERATIONS
1249     When you are using JSON in a protocol, talking to untrusted potentially
1250     hostile creatures requires relatively few measures.
1251    
1252     First of all, your JSON decoder should be secure, that is, should not
1253     have any buffer overflows. Obviously, this module should ensure that and
1254     I am trying hard on making that true, but you never know.
1255    
1256     Second, you need to avoid resource-starving attacks. That means you
1257     should limit the size of JSON texts you accept, or make sure then when
1258 root 1.20 your resources run out, that's just fine (e.g. by using a separate
1259 root 1.8 process that can crash safely). The size of a JSON text in octets or
1260     characters is usually a good indication of the size of the resources
1261 root 1.15 required to decode it into a Perl structure. While JSON::XS can check
1262     the size of the JSON text, it might be too late when you already have it
1263     in memory, so you might want to check the size before you accept the
1264     string.
1265 root 1.8
1266     Third, JSON::XS recurses using the C stack when decoding objects and
1267     arrays. The C stack is a limited resource: for instance, on my amd64
1268     machine with 8MB of stack size I can decode around 180k nested arrays
1269 root 1.10 but only 14k nested JSON objects (due to perl itself recursing deeply on
1270     croak to free the temporary). If that is exceeded, the program crashes.
1271 root 1.23 To be conservative, the default nesting limit is set to 512. If your
1272 root 1.8 process has a smaller stack, you should adjust this setting accordingly
1273     with the "max_depth" method.
1274    
1275 root 1.23 Something else could bomb you, too, that I forgot to think of. In that
1276     case, you get to keep the pieces. I am always open for hints, though...
1277    
1278     Also keep in mind that JSON::XS might leak contents of your Perl data
1279     structures in its error messages, so when you serialise sensitive
1280     information you might want to make sure that exceptions thrown by
1281     JSON::XS will not end up in front of untrusted eyes.
1282 root 1.2
1283 root 1.20 If you are using JSON::XS to return packets to consumption by JavaScript
1284 root 1.14 scripts in a browser you should have a look at
1285 root 1.20 <http://jpsykes.com/47/practical-csrf-and-json-security> to see whether
1286 root 1.14 you are vulnerable to some common attack vectors (which really are
1287     browser design bugs, but it is still you who will have to deal with it,
1288 root 1.23 as major browser developers care only for features, not about getting
1289 root 1.14 security right).
1290    
1291 root 1.19 THREADS
1292 root 1.20 This module is *not* guaranteed to be thread safe and there are no plans
1293 root 1.19 to change this until Perl gets thread support (as opposed to the
1294     horribly slow so-called "threads" which are simply slow and bloated
1295 root 1.24 process simulations - use fork, it's *much* faster, cheaper, better).
1296 root 1.19
1297 root 1.20 (It might actually work, but you have been warned).
1298 root 1.19
1299 root 1.2 BUGS
1300     While the goal of this module is to be correct, that unfortunately does
1301 root 1.26 not mean it's bug-free, only that I think its design is bug-free. If you
1302     keep reporting bugs they will be fixed swiftly, though.
1303 root 1.1
1304 root 1.19 Please refrain from using rt.cpan.org or any other bug reporting
1305     service. I put the contact address into my modules for a reason.
1306    
1307 root 1.24 SEE ALSO
1308     The json_xs command line utility for quick experiments.
1309    
1310 root 1.1 AUTHOR
1311     Marc Lehmann <schmorp@schmorp.de>
1312     http://home.schmorp.de/
1313