ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.29
Committed: Mon Apr 9 05:09:57 2007 UTC (17 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-1_11
Changes since 1.28: +1 -1 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 JSON::XS - JSON serialising/deserialising, done correctly and fast
4
5 =head1 SYNOPSIS
6
7 use JSON::XS;
8
9 # exported functions, they croak on error
10 # and expect/generate UTF-8
11
12 $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
13 $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
14
15 # objToJson and jsonToObj aliases to to_json and from_json
16 # are exported for compatibility to the JSON module,
17 # but should not be used in new code.
18
19 # OO-interface
20
21 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
22 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
23 $perl_scalar = $coder->decode ($unicode_json_text);
24
25 =head1 DESCRIPTION
26
27 This module converts Perl data structures to JSON and vice versa. Its
28 primary goal is to be I<correct> and its secondary goal is to be
29 I<fast>. To reach the latter goal it was written in C.
30
31 As this is the n-th-something JSON module on CPAN, what was the reason
32 to write yet another JSON module? While it seems there are many JSON
33 modules, none of them correctly handle all corner cases, and in most cases
34 their maintainers are unresponsive, gone missing, or not listening to bug
35 reports for other reasons.
36
37 See COMPARISON, below, for a comparison to some other JSON modules.
38
39 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
40 vice versa.
41
42 =head2 FEATURES
43
44 =over 4
45
46 =item * correct unicode handling
47
48 This module knows how to handle Unicode, and even documents how and when
49 it does so.
50
51 =item * round-trip integrity
52
53 When you serialise a perl data structure using only datatypes supported
54 by JSON, the deserialised data structure is identical on the Perl level.
55 (e.g. the string "2.0" doesn't suddenly become "2" just because it looks
56 like a number).
57
58 =item * strict checking of JSON correctness
59
60 There is no guessing, no generating of illegal JSON texts by default,
61 and only JSON is accepted as input by default (the latter is a security
62 feature).
63
64 =item * fast
65
66 Compared to other JSON modules, this module compares favourably in terms
67 of speed, too.
68
69 =item * simple to use
70
71 This module has both a simple functional interface as well as an OO
72 interface.
73
74 =item * reasonably versatile output formats
75
76 You can choose between the most compact guarenteed single-line format
77 possible (nice for simple line-based protocols), a pure-ascii format
78 (for when your transport is not 8-bit clean, still supports the whole
79 unicode range), or a pretty-printed format (for when you want to read that
80 stuff). Or you can combine those features in whatever way you like.
81
82 =back
83
84 =cut
85
86 package JSON::XS;
87
88 use strict;
89
90 BEGIN {
91 our $VERSION = '1.11';
92 our @ISA = qw(Exporter);
93
94 our @EXPORT = qw(to_json from_json objToJson jsonToObj);
95 require Exporter;
96
97 require XSLoader;
98 XSLoader::load JSON::XS::, $VERSION;
99 }
100
101 =head1 FUNCTIONAL INTERFACE
102
103 The following convinience methods are provided by this module. They are
104 exported by default:
105
106 =over 4
107
108 =item $json_text = to_json $perl_scalar
109
110 Converts the given Perl data structure (a simple scalar or a reference to
111 a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
112 octets only). Croaks on error.
113
114 This function call is functionally identical to:
115
116 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
117
118 except being faster.
119
120 =item $perl_scalar = from_json $json_text
121
122 The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
123 parse that as an UTF-8 encoded JSON text, returning the resulting simple
124 scalar or reference. Croaks on error.
125
126 This function call is functionally identical to:
127
128 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
129
130 except being faster.
131
132 =back
133
134
135 =head1 OBJECT-ORIENTED INTERFACE
136
137 The object oriented interface lets you configure your own encoding or
138 decoding style, within the limits of supported formats.
139
140 =over 4
141
142 =item $json = new JSON::XS
143
144 Creates a new JSON::XS object that can be used to de/encode JSON
145 strings. All boolean flags described below are by default I<disabled>.
146
147 The mutators for flags all return the JSON object again and thus calls can
148 be chained:
149
150 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
151 => {"a": [1, 2]}
152
153 =item $json = $json->ascii ([$enable])
154
155 If C<$enable> is true (or missing), then the C<encode> method will not
156 generate characters outside the code range C<0..127> (which is ASCII). Any
157 unicode characters outside that range will be escaped using either a
158 single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
159 as per RFC4627.
160
161 If C<$enable> is false, then the C<encode> method will not escape Unicode
162 characters unless required by the JSON syntax. This results in a faster
163 and more compact format.
164
165 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
166 => ["\ud801\udc01"]
167
168 =item $json = $json->utf8 ([$enable])
169
170 If C<$enable> is true (or missing), then the C<encode> method will encode
171 the JSON result into UTF-8, as required by many protocols, while the
172 C<decode> method expects to be handled an UTF-8-encoded string. Please
173 note that UTF-8-encoded strings do not contain any characters outside the
174 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
175 versions, enabling this option might enable autodetection of the UTF-16
176 and UTF-32 encoding families, as described in RFC4627.
177
178 If C<$enable> is false, then the C<encode> method will return the JSON
179 string as a (non-encoded) unicode string, while C<decode> expects thus a
180 unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
181 to be done yourself, e.g. using the Encode module.
182
183 Example, output UTF-16BE-encoded JSON:
184
185 use Encode;
186 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
187
188 Example, decode UTF-32LE-encoded JSON:
189
190 use Encode;
191 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
192
193 =item $json = $json->pretty ([$enable])
194
195 This enables (or disables) all of the C<indent>, C<space_before> and
196 C<space_after> (and in the future possibly more) flags in one call to
197 generate the most readable (or most compact) form possible.
198
199 Example, pretty-print some simple structure:
200
201 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
202 =>
203 {
204 "a" : [
205 1,
206 2
207 ]
208 }
209
210 =item $json = $json->indent ([$enable])
211
212 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
213 format as output, putting every array member or object/hash key-value pair
214 into its own line, identing them properly.
215
216 If C<$enable> is false, no newlines or indenting will be produced, and the
217 resulting JSON text is guarenteed not to contain any C<newlines>.
218
219 This setting has no effect when decoding JSON texts.
220
221 =item $json = $json->space_before ([$enable])
222
223 If C<$enable> is true (or missing), then the C<encode> method will add an extra
224 optional space before the C<:> separating keys from values in JSON objects.
225
226 If C<$enable> is false, then the C<encode> method will not add any extra
227 space at those places.
228
229 This setting has no effect when decoding JSON texts. You will also
230 most likely combine this setting with C<space_after>.
231
232 Example, space_before enabled, space_after and indent disabled:
233
234 {"key" :"value"}
235
236 =item $json = $json->space_after ([$enable])
237
238 If C<$enable> is true (or missing), then the C<encode> method will add an extra
239 optional space after the C<:> separating keys from values in JSON objects
240 and extra whitespace after the C<,> separating key-value pairs and array
241 members.
242
243 If C<$enable> is false, then the C<encode> method will not add any extra
244 space at those places.
245
246 This setting has no effect when decoding JSON texts.
247
248 Example, space_before and indent disabled, space_after enabled:
249
250 {"key": "value"}
251
252 =item $json = $json->canonical ([$enable])
253
254 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
255 by sorting their keys. This is adding a comparatively high overhead.
256
257 If C<$enable> is false, then the C<encode> method will output key-value
258 pairs in the order Perl stores them (which will likely change between runs
259 of the same script).
260
261 This option is useful if you want the same data structure to be encoded as
262 the same JSON text (given the same overall settings). If it is disabled,
263 the same hash migh be encoded differently even if contains the same data,
264 as key-value pairs have no inherent ordering in Perl.
265
266 This setting has no effect when decoding JSON texts.
267
268 =item $json = $json->allow_nonref ([$enable])
269
270 If C<$enable> is true (or missing), then the C<encode> method can convert a
271 non-reference into its corresponding string, number or null JSON value,
272 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
273 values instead of croaking.
274
275 If C<$enable> is false, then the C<encode> method will croak if it isn't
276 passed an arrayref or hashref, as JSON texts must either be an object
277 or array. Likewise, C<decode> will croak if given something that is not a
278 JSON object or array.
279
280 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
281 resulting in an invalid JSON text:
282
283 JSON::XS->new->allow_nonref->encode ("Hello, World!")
284 => "Hello, World!"
285
286 =item $json = $json->shrink ([$enable])
287
288 Perl usually over-allocates memory a bit when allocating space for
289 strings. This flag optionally resizes strings generated by either
290 C<encode> or C<decode> to their minimum size possible. This can save
291 memory when your JSON texts are either very very long or you have many
292 short strings. It will also try to downgrade any strings to octet-form
293 if possible: perl stores strings internally either in an encoding called
294 UTF-X or in octet-form. The latter cannot store everything but uses less
295 space in general (and some buggy Perl or C code might even rely on that
296 internal representation being used).
297
298 The actual definition of what shrink does might change in future versions,
299 but it will always try to save space at the expense of time.
300
301 If C<$enable> is true (or missing), the string returned by C<encode> will
302 be shrunk-to-fit, while all strings generated by C<decode> will also be
303 shrunk-to-fit.
304
305 If C<$enable> is false, then the normal perl allocation algorithms are used.
306 If you work with your data, then this is likely to be faster.
307
308 In the future, this setting might control other things, such as converting
309 strings that look like integers or floats into integers or floats
310 internally (there is no difference on the Perl level), saving space.
311
312 =item $json = $json->max_depth ([$maximum_nesting_depth])
313
314 Sets the maximum nesting level (default C<512>) accepted while encoding
315 or decoding. If the JSON text or Perl data structure has an equal or
316 higher nesting level then this limit, then the encoder and decoder will
317 stop and croak at that point.
318
319 Nesting level is defined by number of hash- or arrayrefs that the encoder
320 needs to traverse to reach a given point or the number of C<{> or C<[>
321 characters without their matching closing parenthesis crossed to reach a
322 given character in a string.
323
324 Setting the maximum depth to one disallows any nesting, so that ensures
325 that the object is only a single hash/object or array.
326
327 The argument to C<max_depth> will be rounded up to the next nearest power
328 of two.
329
330 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
331
332 =item $json_text = $json->encode ($perl_scalar)
333
334 Converts the given Perl data structure (a simple scalar or a reference
335 to a hash or array) to its JSON representation. Simple scalars will be
336 converted into JSON string or number sequences, while references to arrays
337 become JSON arrays and references to hashes become JSON objects. Undefined
338 Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
339 nor C<false> values will be generated.
340
341 =item $perl_scalar = $json->decode ($json_text)
342
343 The opposite of C<encode>: expects a JSON text and tries to parse it,
344 returning the resulting simple scalar or reference. Croaks on error.
345
346 JSON numbers and strings become simple Perl scalars. JSON arrays become
347 Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
348 C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
349
350 =back
351
352
353 =head1 MAPPING
354
355 This section describes how JSON::XS maps Perl values to JSON values and
356 vice versa. These mappings are designed to "do the right thing" in most
357 circumstances automatically, preserving round-tripping characteristics
358 (what you put in comes out as something equivalent).
359
360 For the more enlightened: note that in the following descriptions,
361 lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
362 refers to the abstract Perl language itself.
363
364 =head2 JSON -> PERL
365
366 =over 4
367
368 =item object
369
370 A JSON object becomes a reference to a hash in Perl. No ordering of object
371 keys is preserved (JSON does not preserver object key ordering itself).
372
373 =item array
374
375 A JSON array becomes a reference to an array in Perl.
376
377 =item string
378
379 A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
380 are represented by the same codepoints in the Perl string, so no manual
381 decoding is necessary.
382
383 =item number
384
385 A JSON number becomes either an integer or numeric (floating point)
386 scalar in perl, depending on its range and any fractional parts. On the
387 Perl level, there is no difference between those as Perl handles all the
388 conversion details, but an integer may take slightly less memory and might
389 represent more values exactly than (floating point) numbers.
390
391 =item true, false
392
393 These JSON atoms become C<0>, C<1>, respectively. Information is lost in
394 this process. Future versions might represent those values differently,
395 but they will be guarenteed to act like these integers would normally in
396 Perl.
397
398 =item null
399
400 A JSON null atom becomes C<undef> in Perl.
401
402 =back
403
404 =head2 PERL -> JSON
405
406 The mapping from Perl to JSON is slightly more difficult, as Perl is a
407 truly typeless language, so we can only guess which JSON type is meant by
408 a Perl value.
409
410 =over 4
411
412 =item hash references
413
414 Perl hash references become JSON objects. As there is no inherent ordering
415 in hash keys (or JSON objects), they will usually be encoded in a
416 pseudo-random order that can change between runs of the same program but
417 stays generally the same within a single run of a program. JSON::XS can
418 optionally sort the hash keys (determined by the I<canonical> flag), so
419 the same datastructure will serialise to the same JSON text (given same
420 settings and version of JSON::XS), but this incurs a runtime overhead
421 and is only rarely useful, e.g. when you want to compare some JSON text
422 against another for equality.
423
424 =item array references
425
426 Perl array references become JSON arrays.
427
428 =item other references
429
430 Other unblessed references are generally not allowed and will cause an
431 exception to be thrown, except for references to the integers C<0> and
432 C<1>, which get turned into C<false> and C<true> atoms in JSON. You can
433 also use C<JSON::XS::false> and C<JSON::XS::true> to improve readability.
434
435 to_json [\0,JSON::XS::true] # yields [false,true]
436
437 =item blessed objects
438
439 Blessed objects are not allowed. JSON::XS currently tries to encode their
440 underlying representation (hash- or arrayref), but this behaviour might
441 change in future versions.
442
443 =item simple scalars
444
445 Simple Perl scalars (any scalar that is not a reference) are the most
446 difficult objects to encode: JSON::XS will encode undefined scalars as
447 JSON null value, scalars that have last been used in a string context
448 before encoding as JSON strings and anything else as number value:
449
450 # dump as number
451 to_json [2] # yields [2]
452 to_json [-3.0e17] # yields [-3e+17]
453 my $value = 5; to_json [$value] # yields [5]
454
455 # used as string, so dump as string
456 print $value;
457 to_json [$value] # yields ["5"]
458
459 # undef becomes null
460 to_json [undef] # yields [null]
461
462 You can force the type to be a string by stringifying it:
463
464 my $x = 3.1; # some variable containing a number
465 "$x"; # stringified
466 $x .= ""; # another, more awkward way to stringify
467 print $x; # perl does it for you, too, quite often
468
469 You can force the type to be a number by numifying it:
470
471 my $x = "3"; # some variable containing a string
472 $x += 0; # numify it, ensuring it will be dumped as a number
473 $x *= 1; # same thing, the choise is yours.
474
475 You can not currently output JSON booleans or force the type in other,
476 less obscure, ways. Tell me if you need this capability.
477
478 =back
479
480
481 =head1 COMPARISON
482
483 As already mentioned, this module was created because none of the existing
484 JSON modules could be made to work correctly. First I will describe the
485 problems (or pleasures) I encountered with various existing JSON modules,
486 followed by some benchmark values. JSON::XS was designed not to suffer
487 from any of these problems or limitations.
488
489 =over 4
490
491 =item JSON 1.07
492
493 Slow (but very portable, as it is written in pure Perl).
494
495 Undocumented/buggy Unicode handling (how JSON handles unicode values is
496 undocumented. One can get far by feeding it unicode strings and doing
497 en-/decoding oneself, but unicode escapes are not working properly).
498
499 No roundtripping (strings get clobbered if they look like numbers, e.g.
500 the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
501 decode into the number 2.
502
503 =item JSON::PC 0.01
504
505 Very fast.
506
507 Undocumented/buggy Unicode handling.
508
509 No roundtripping.
510
511 Has problems handling many Perl values (e.g. regex results and other magic
512 values will make it croak).
513
514 Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
515 which is not a valid JSON text.
516
517 Unmaintained (maintainer unresponsive for many months, bugs are not
518 getting fixed).
519
520 =item JSON::Syck 0.21
521
522 Very buggy (often crashes).
523
524 Very inflexible (no human-readable format supported, format pretty much
525 undocumented. I need at least a format for easy reading by humans and a
526 single-line compact format for use in a protocol, and preferably a way to
527 generate ASCII-only JSON texts).
528
529 Completely broken (and confusingly documented) Unicode handling (unicode
530 escapes are not working properly, you need to set ImplicitUnicode to
531 I<different> values on en- and decoding to get symmetric behaviour).
532
533 No roundtripping (simple cases work, but this depends on wether the scalar
534 value was used in a numeric context or not).
535
536 Dumping hashes may skip hash values depending on iterator state.
537
538 Unmaintained (maintainer unresponsive for many months, bugs are not
539 getting fixed).
540
541 Does not check input for validity (i.e. will accept non-JSON input and
542 return "something" instead of raising an exception. This is a security
543 issue: imagine two banks transfering money between each other using
544 JSON. One bank might parse a given non-JSON request and deduct money,
545 while the other might reject the transaction with a syntax error. While a
546 good protocol will at least recover, that is extra unnecessary work and
547 the transaction will still not succeed).
548
549 =item JSON::DWIW 0.04
550
551 Very fast. Very natural. Very nice.
552
553 Undocumented unicode handling (but the best of the pack. Unicode escapes
554 still don't get parsed properly).
555
556 Very inflexible.
557
558 No roundtripping.
559
560 Does not generate valid JSON texts (key strings are often unquoted, empty keys
561 result in nothing being output)
562
563 Does not check input for validity.
564
565 =back
566
567 =head2 SPEED
568
569 It seems that JSON::XS is surprisingly fast, as shown in the following
570 tables. They have been generated with the help of the C<eg/bench> program
571 in the JSON::XS distribution, to make it easy to compare on your own
572 system.
573
574 First comes a comparison between various modules using a very short JSON
575 string:
576
577 {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null}
578
579 It shows the number of encodes/decodes per second (JSON::XS uses the
580 functional interface, while JSON::XS/2 uses the OO interface with
581 pretty-printing and hashkey sorting enabled). Higher is better:
582
583 module | encode | decode |
584 -----------|------------|------------|
585 JSON | 11488.516 | 7823.035 |
586 JSON::DWIW | 94708.054 | 129094.260 |
587 JSON::PC | 63884.157 | 128528.212 |
588 JSON::Syck | 34898.677 | 42096.911 |
589 JSON::XS | 654027.064 | 396423.669 |
590 JSON::XS/2 | 371564.190 | 371725.613 |
591 -----------+------------+------------+
592
593 That is, JSON::XS is more than six times faster than JSON::DWIW on
594 encoding, more than three times faster on decoding, and about thirty times
595 faster than JSON, even with pretty-printing and key sorting.
596
597 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
598 search API (http://nanoref.com/yahooapis/mgPdGg):
599
600 module | encode | decode |
601 -----------|------------|------------|
602 JSON | 273.023 | 44.674 |
603 JSON::DWIW | 1089.383 | 1145.704 |
604 JSON::PC | 3097.419 | 2393.921 |
605 JSON::Syck | 514.060 | 843.053 |
606 JSON::XS | 6479.668 | 3636.364 |
607 JSON::XS/2 | 3774.221 | 3599.124 |
608 -----------+------------+------------+
609
610 Again, JSON::XS leads by far.
611
612 On large strings containing lots of high unicode characters, some modules
613 (such as JSON::PC) seem to decode faster than JSON::XS, but the result
614 will be broken due to missing (or wrong) unicode handling. Others refuse
615 to decode or encode properly, so it was impossible to prepare a fair
616 comparison table for that case.
617
618
619 =head1 SECURITY CONSIDERATIONS
620
621 When you are using JSON in a protocol, talking to untrusted potentially
622 hostile creatures requires relatively few measures.
623
624 First of all, your JSON decoder should be secure, that is, should not have
625 any buffer overflows. Obviously, this module should ensure that and I am
626 trying hard on making that true, but you never know.
627
628 Second, you need to avoid resource-starving attacks. That means you should
629 limit the size of JSON texts you accept, or make sure then when your
630 resources run out, thats just fine (e.g. by using a separate process that
631 can crash safely). The size of a JSON text in octets or characters is
632 usually a good indication of the size of the resources required to decode
633 it into a Perl structure.
634
635 Third, JSON::XS recurses using the C stack when decoding objects and
636 arrays. The C stack is a limited resource: for instance, on my amd64
637 machine with 8MB of stack size I can decode around 180k nested arrays but
638 only 14k nested JSON objects (due to perl itself recursing deeply on croak
639 to free the temporary). If that is exceeded, the program crashes. to be
640 conservative, the default nesting limit is set to 512. If your process
641 has a smaller stack, you should adjust this setting accordingly with the
642 C<max_depth> method.
643
644 And last but least, something else could bomb you that I forgot to think
645 of. In that case, you get to keep the pieces. I am alway sopen for hints,
646 though...
647
648
649 =head1 BUGS
650
651 While the goal of this module is to be correct, that unfortunately does
652 not mean its bug-free, only that I think its design is bug-free. It is
653 still relatively early in its development. If you keep reporting bugs they
654 will be fixed swiftly, though.
655
656 =cut
657
658 sub true() { \1 }
659 sub false() { \0 }
660
661 1;
662
663 =head1 AUTHOR
664
665 Marc Lehmann <schmorp@schmorp.de>
666 http://home.schmorp.de/
667
668 =cut
669