ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.23
Committed: Sun Mar 25 21:19:13 2007 UTC (17 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-0_8
Changes since 1.22: +53 -7 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 JSON::XS - JSON serialising/deserialising, done correctly and fast
4
5 =head1 SYNOPSIS
6
7 use JSON::XS;
8
9 # exported functions, they croak on error
10 # and expect/generate UTF-8
11
12 $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
13 $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
14
15 # objToJson and jsonToObj aliases to to_json and from_json
16 # are exported for compatibility to the JSON module,
17 # but should not be used in new code.
18
19 # OO-interface
20
21 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
22 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
23 $perl_scalar = $coder->decode ($unicode_json_text);
24
25 =head1 DESCRIPTION
26
27 This module converts Perl data structures to JSON and vice versa. Its
28 primary goal is to be I<correct> and its secondary goal is to be
29 I<fast>. To reach the latter goal it was written in C.
30
31 As this is the n-th-something JSON module on CPAN, what was the reason
32 to write yet another JSON module? While it seems there are many JSON
33 modules, none of them correctly handle all corner cases, and in most cases
34 their maintainers are unresponsive, gone missing, or not listening to bug
35 reports for other reasons.
36
37 See COMPARISON, below, for a comparison to some other JSON modules.
38
39 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
40 vice versa.
41
42 =head2 FEATURES
43
44 =over 4
45
46 =item * correct unicode handling
47
48 This module knows how to handle Unicode, and even documents how and when
49 it does so.
50
51 =item * round-trip integrity
52
53 When you serialise a perl data structure using only datatypes supported
54 by JSON, the deserialised data structure is identical on the Perl level.
55 (e.g. the string "2.0" doesn't suddenly become "2" just because it looks
56 like a number).
57
58 =item * strict checking of JSON correctness
59
60 There is no guessing, no generating of illegal JSON texts by default,
61 and only JSON is accepted as input by default (the latter is a security
62 feature).
63
64 =item * fast
65
66 Compared to other JSON modules, this module compares favourably in terms
67 of speed, too.
68
69 =item * simple to use
70
71 This module has both a simple functional interface as well as an OO
72 interface.
73
74 =item * reasonably versatile output formats
75
76 You can choose between the most compact guarenteed single-line format
77 possible (nice for simple line-based protocols), a pure-ascii format
78 (for when your transport is not 8-bit clean, still supports the whole
79 unicode range), or a pretty-printed format (for when you want to read that
80 stuff). Or you can combine those features in whatever way you like.
81
82 =back
83
84 =cut
85
86 package JSON::XS;
87
88 use strict;
89
90 BEGIN {
91 our $VERSION = '0.8';
92 our @ISA = qw(Exporter);
93
94 our @EXPORT = qw(to_json from_json objToJson jsonToObj);
95 require Exporter;
96
97 require XSLoader;
98 XSLoader::load JSON::XS::, $VERSION;
99 }
100
101 =head1 FUNCTIONAL INTERFACE
102
103 The following convinience methods are provided by this module. They are
104 exported by default:
105
106 =over 4
107
108 =item $json_text = to_json $perl_scalar
109
110 Converts the given Perl data structure (a simple scalar or a reference to
111 a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
112 octets only). Croaks on error.
113
114 This function call is functionally identical to:
115
116 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
117
118 except being faster.
119
120 =item $perl_scalar = from_json $json_text
121
122 The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
123 parse that as an UTF-8 encoded JSON text, returning the resulting simple
124 scalar or reference. Croaks on error.
125
126 This function call is functionally identical to:
127
128 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
129
130 except being faster.
131
132 =back
133
134
135 =head1 OBJECT-ORIENTED INTERFACE
136
137 The object oriented interface lets you configure your own encoding or
138 decoding style, within the limits of supported formats.
139
140 =over 4
141
142 =item $json = new JSON::XS
143
144 Creates a new JSON::XS object that can be used to de/encode JSON
145 strings. All boolean flags described below are by default I<disabled>.
146
147 The mutators for flags all return the JSON object again and thus calls can
148 be chained:
149
150 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
151 => {"a": [1, 2]}
152
153 =item $json = $json->ascii ([$enable])
154
155 If C<$enable> is true (or missing), then the C<encode> method will not
156 generate characters outside the code range C<0..127> (which is ASCII). Any
157 unicode characters outside that range will be escaped using either a
158 single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
159 as per RFC4627.
160
161 If C<$enable> is false, then the C<encode> method will not escape Unicode
162 characters unless required by the JSON syntax. This results in a faster
163 and more compact format.
164
165 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
166 => ["\ud801\udc01"]
167
168 =item $json = $json->utf8 ([$enable])
169
170 If C<$enable> is true (or missing), then the C<encode> method will encode
171 the JSON result into UTF-8, as required by many protocols, while the
172 C<decode> method expects to be handled an UTF-8-encoded string. Please
173 note that UTF-8-encoded strings do not contain any characters outside the
174 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
175 versions, enabling this option might enable autodetection of the UTF-16
176 and UTF-32 encoding families, as described in RFC4627.
177
178 If C<$enable> is false, then the C<encode> method will return the JSON
179 string as a (non-encoded) unicode string, while C<decode> expects thus a
180 unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
181 to be done yourself, e.g. using the Encode module.
182
183 Example, output UTF-16BE-encoded JSON:
184
185 use Encode;
186 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
187
188 Example, decode UTF-32LE-encoded JSON:
189
190 use Encode;
191 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
192
193 =item $json = $json->pretty ([$enable])
194
195 This enables (or disables) all of the C<indent>, C<space_before> and
196 C<space_after> (and in the future possibly more) flags in one call to
197 generate the most readable (or most compact) form possible.
198
199 Example, pretty-print some simple structure:
200
201 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
202 =>
203 {
204 "a" : [
205 1,
206 2
207 ]
208 }
209
210 =item $json = $json->indent ([$enable])
211
212 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
213 format as output, putting every array member or object/hash key-value pair
214 into its own line, identing them properly.
215
216 If C<$enable> is false, no newlines or indenting will be produced, and the
217 resulting JSON text is guarenteed not to contain any C<newlines>.
218
219 This setting has no effect when decoding JSON texts.
220
221 =item $json = $json->space_before ([$enable])
222
223 If C<$enable> is true (or missing), then the C<encode> method will add an extra
224 optional space before the C<:> separating keys from values in JSON objects.
225
226 If C<$enable> is false, then the C<encode> method will not add any extra
227 space at those places.
228
229 This setting has no effect when decoding JSON texts. You will also
230 most likely combine this setting with C<space_after>.
231
232 Example, space_before enabled, space_after and indent disabled:
233
234 {"key" :"value"}
235
236 =item $json = $json->space_after ([$enable])
237
238 If C<$enable> is true (or missing), then the C<encode> method will add an extra
239 optional space after the C<:> separating keys from values in JSON objects
240 and extra whitespace after the C<,> separating key-value pairs and array
241 members.
242
243 If C<$enable> is false, then the C<encode> method will not add any extra
244 space at those places.
245
246 This setting has no effect when decoding JSON texts.
247
248 Example, space_before and indent disabled, space_after enabled:
249
250 {"key": "value"}
251
252 =item $json = $json->canonical ([$enable])
253
254 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
255 by sorting their keys. This is adding a comparatively high overhead.
256
257 If C<$enable> is false, then the C<encode> method will output key-value
258 pairs in the order Perl stores them (which will likely change between runs
259 of the same script).
260
261 This option is useful if you want the same data structure to be encoded as
262 the same JSON text (given the same overall settings). If it is disabled,
263 the same hash migh be encoded differently even if contains the same data,
264 as key-value pairs have no inherent ordering in Perl.
265
266 This setting has no effect when decoding JSON texts.
267
268 =item $json = $json->allow_nonref ([$enable])
269
270 If C<$enable> is true (or missing), then the C<encode> method can convert a
271 non-reference into its corresponding string, number or null JSON value,
272 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
273 values instead of croaking.
274
275 If C<$enable> is false, then the C<encode> method will croak if it isn't
276 passed an arrayref or hashref, as JSON texts must either be an object
277 or array. Likewise, C<decode> will croak if given something that is not a
278 JSON object or array.
279
280 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
281 resulting in an invalid JSON text:
282
283 JSON::XS->new->allow_nonref->encode ("Hello, World!")
284 => "Hello, World!"
285
286 =item $json = $json->shrink ([$enable])
287
288 Perl usually over-allocates memory a bit when allocating space for
289 strings. This flag optionally resizes strings generated by either
290 C<encode> or C<decode> to their minimum size possible. This can save
291 memory when your JSON texts are either very very long or you have many
292 short strings. It will also try to downgrade any strings to octet-form
293 if possible: perl stores strings internally either in an encoding called
294 UTF-X or in octet-form. The latter cannot store everything but uses less
295 space in general.
296
297 If C<$enable> is true (or missing), the string returned by C<encode> will be shrunk-to-fit,
298 while all strings generated by C<decode> will also be shrunk-to-fit.
299
300 If C<$enable> is false, then the normal perl allocation algorithms are used.
301 If you work with your data, then this is likely to be faster.
302
303 In the future, this setting might control other things, such as converting
304 strings that look like integers or floats into integers or floats
305 internally (there is no difference on the Perl level), saving space.
306
307 =item $json = $json->max_depth ([$maximum_nesting_depth])
308
309 Sets the maximum nesting level (default C<8192>) accepted while encoding
310 or decoding. If the JSON text or Perl data structure has an equal or
311 higher nesting level then this limit, then the encoder and decoder will
312 stop and croak at that point.
313
314 Nesting level is defined by number of hash- or arrayrefs that the encoder
315 needs to traverse to reach a given point or the number of C<{> or C<[>
316 characters without their matching closing parenthesis crossed to reach a
317 given character in a string.
318
319 Setting the maximum depth to one disallows any nesting, so that ensures
320 that the object is only a single hash/object or array.
321
322 The argument to C<max_depth> will be rounded up to the next nearest power
323 of two.
324
325 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
326
327 =item $json_text = $json->encode ($perl_scalar)
328
329 Converts the given Perl data structure (a simple scalar or a reference
330 to a hash or array) to its JSON representation. Simple scalars will be
331 converted into JSON string or number sequences, while references to arrays
332 become JSON arrays and references to hashes become JSON objects. Undefined
333 Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
334 nor C<false> values will be generated.
335
336 =item $perl_scalar = $json->decode ($json_text)
337
338 The opposite of C<encode>: expects a JSON text and tries to parse it,
339 returning the resulting simple scalar or reference. Croaks on error.
340
341 JSON numbers and strings become simple Perl scalars. JSON arrays become
342 Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
343 C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
344
345 =back
346
347
348 =head1 MAPPING
349
350 This section describes how JSON::XS maps Perl values to JSON values and
351 vice versa. These mappings are designed to "do the right thing" in most
352 circumstances automatically, preserving round-tripping characteristics
353 (what you put in comes out as something equivalent).
354
355 For the more enlightened: note that in the following descriptions,
356 lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
357 refers to the abstract Perl language itself.
358
359 =head2 JSON -> PERL
360
361 =over 4
362
363 =item object
364
365 A JSON object becomes a reference to a hash in Perl. No ordering of object
366 keys is preserved (JSON does not preserver object key ordering itself).
367
368 =item array
369
370 A JSON array becomes a reference to an array in Perl.
371
372 =item string
373
374 A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
375 are represented by the same codepoints in the Perl string, so no manual
376 decoding is necessary.
377
378 =item number
379
380 A JSON number becomes either an integer or numeric (floating point)
381 scalar in perl, depending on its range and any fractional parts. On the
382 Perl level, there is no difference between those as Perl handles all the
383 conversion details, but an integer may take slightly less memory and might
384 represent more values exactly than (floating point) numbers.
385
386 =item true, false
387
388 These JSON atoms become C<0>, C<1>, respectively. Information is lost in
389 this process. Future versions might represent those values differently,
390 but they will be guarenteed to act like these integers would normally in
391 Perl.
392
393 =item null
394
395 A JSON null atom becomes C<undef> in Perl.
396
397 =back
398
399 =head2 PERL -> JSON
400
401 The mapping from Perl to JSON is slightly more difficult, as Perl is a
402 truly typeless language, so we can only guess which JSON type is meant by
403 a Perl value.
404
405 =over 4
406
407 =item hash references
408
409 Perl hash references become JSON objects. As there is no inherent ordering
410 in hash keys, they will usually be encoded in a pseudo-random order that
411 can change between runs of the same program but stays generally the same
412 within a single run of a program. JSON::XS can optionally sort the hash
413 keys (determined by the I<canonical> flag), so the same datastructure
414 will serialise to the same JSON text (given same settings and version of
415 JSON::XS), but this incurs a runtime overhead.
416
417 =item array references
418
419 Perl array references become JSON arrays.
420
421 =item blessed objects
422
423 Blessed objects are not allowed. JSON::XS currently tries to encode their
424 underlying representation (hash- or arrayref), but this behaviour might
425 change in future versions.
426
427 =item simple scalars
428
429 Simple Perl scalars (any scalar that is not a reference) are the most
430 difficult objects to encode: JSON::XS will encode undefined scalars as
431 JSON null value, scalars that have last been used in a string context
432 before encoding as JSON strings and anything else as number value:
433
434 # dump as number
435 to_json [2] # yields [2]
436 to_json [-3.0e17] # yields [-3e+17]
437 my $value = 5; to_json [$value] # yields [5]
438
439 # used as string, so dump as string
440 print $value;
441 to_json [$value] # yields ["5"]
442
443 # undef becomes null
444 to_json [undef] # yields [null]
445
446 You can force the type to be a string by stringifying it:
447
448 my $x = 3.1; # some variable containing a number
449 "$x"; # stringified
450 $x .= ""; # another, more awkward way to stringify
451 print $x; # perl does it for you, too, quite often
452
453 You can force the type to be a number by numifying it:
454
455 my $x = "3"; # some variable containing a string
456 $x += 0; # numify it, ensuring it will be dumped as a number
457 $x *= 1; # same thing, the choise is yours.
458
459 You can not currently output JSON booleans or force the type in other,
460 less obscure, ways. Tell me if you need this capability.
461
462 =item circular data structures
463
464 Those will be encoded until memory or stackspace runs out.
465
466 =back
467
468
469 =head1 COMPARISON
470
471 As already mentioned, this module was created because none of the existing
472 JSON modules could be made to work correctly. First I will describe the
473 problems (or pleasures) I encountered with various existing JSON modules,
474 followed by some benchmark values. JSON::XS was designed not to suffer
475 from any of these problems or limitations.
476
477 =over 4
478
479 =item JSON 1.07
480
481 Slow (but very portable, as it is written in pure Perl).
482
483 Undocumented/buggy Unicode handling (how JSON handles unicode values is
484 undocumented. One can get far by feeding it unicode strings and doing
485 en-/decoding oneself, but unicode escapes are not working properly).
486
487 No roundtripping (strings get clobbered if they look like numbers, e.g.
488 the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
489 decode into the number 2.
490
491 =item JSON::PC 0.01
492
493 Very fast.
494
495 Undocumented/buggy Unicode handling.
496
497 No roundtripping.
498
499 Has problems handling many Perl values (e.g. regex results and other magic
500 values will make it croak).
501
502 Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
503 which is not a valid JSON text.
504
505 Unmaintained (maintainer unresponsive for many months, bugs are not
506 getting fixed).
507
508 =item JSON::Syck 0.21
509
510 Very buggy (often crashes).
511
512 Very inflexible (no human-readable format supported, format pretty much
513 undocumented. I need at least a format for easy reading by humans and a
514 single-line compact format for use in a protocol, and preferably a way to
515 generate ASCII-only JSON texts).
516
517 Completely broken (and confusingly documented) Unicode handling (unicode
518 escapes are not working properly, you need to set ImplicitUnicode to
519 I<different> values on en- and decoding to get symmetric behaviour).
520
521 No roundtripping (simple cases work, but this depends on wether the scalar
522 value was used in a numeric context or not).
523
524 Dumping hashes may skip hash values depending on iterator state.
525
526 Unmaintained (maintainer unresponsive for many months, bugs are not
527 getting fixed).
528
529 Does not check input for validity (i.e. will accept non-JSON input and
530 return "something" instead of raising an exception. This is a security
531 issue: imagine two banks transfering money between each other using
532 JSON. One bank might parse a given non-JSON request and deduct money,
533 while the other might reject the transaction with a syntax error. While a
534 good protocol will at least recover, that is extra unnecessary work and
535 the transaction will still not succeed).
536
537 =item JSON::DWIW 0.04
538
539 Very fast. Very natural. Very nice.
540
541 Undocumented unicode handling (but the best of the pack. Unicode escapes
542 still don't get parsed properly).
543
544 Very inflexible.
545
546 No roundtripping.
547
548 Does not generate valid JSON texts (key strings are often unquoted, empty keys
549 result in nothing being output)
550
551 Does not check input for validity.
552
553 =back
554
555 =head2 SPEED
556
557 It seems that JSON::XS is surprisingly fast, as shown in the following
558 tables. They have been generated with the help of the C<eg/bench> program
559 in the JSON::XS distribution, to make it easy to compare on your own
560 system.
561
562 First comes a comparison between various modules using a very short JSON
563 string:
564
565 {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null}
566
567 It shows the number of encodes/decodes per second (JSON::XS uses the
568 functional interface, while JSON::XS/2 uses the OO interface with
569 pretty-printing and hashkey sorting enabled). Higher is better:
570
571 module | encode | decode |
572 -----------|------------|------------|
573 JSON | 11488.516 | 7823.035 |
574 JSON::DWIW | 94708.054 | 129094.260 |
575 JSON::PC | 63884.157 | 128528.212 |
576 JSON::Syck | 34898.677 | 42096.911 |
577 JSON::XS | 654027.064 | 396423.669 |
578 JSON::XS/2 | 371564.190 | 371725.613 |
579 -----------+------------+------------+
580
581 That is, JSON::XS is more than six times faster than JSON::DWIW on
582 encoding, more than three times faster on decoding, and about thirty times
583 faster than JSON, even with pretty-printing and key sorting.
584
585 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
586 search API (http://nanoref.com/yahooapis/mgPdGg):
587
588 module | encode | decode |
589 -----------|------------|------------|
590 JSON | 273.023 | 44.674 |
591 JSON::DWIW | 1089.383 | 1145.704 |
592 JSON::PC | 3097.419 | 2393.921 |
593 JSON::Syck | 514.060 | 843.053 |
594 JSON::XS | 6479.668 | 3636.364 |
595 JSON::XS/2 | 3774.221 | 3599.124 |
596 -----------+------------+------------+
597
598 Again, JSON::XS leads by far.
599
600 On large strings containing lots of high unicode characters, some modules
601 (such as JSON::PC) seem to decode faster than JSON::XS, but the result
602 will be broken due to missing (or wrong) unicode handling. Others refuse
603 to decode or encode properly, so it was impossible to prepare a fair
604 comparison table for that case.
605
606
607 =head1 SECURITY CONSIDERATIONS
608
609 When you are using JSON in a protocol, talking to untrusted potentially
610 hostile creatures requires relatively few measures.
611
612 First of all, your JSON decoder should be secure, that is, should not have
613 any buffer overflows. Obviously, this module should ensure that and I am
614 trying hard on making that true, but you never know.
615
616 Second, you need to avoid resource-starving attacks. That means you should
617 limit the size of JSON texts you accept, or make sure then when your
618 resources run out, thats just fine (e.g. by using a separate process that
619 can crash safely). The size of a JSON text in octets or characters is
620 usually a good indication of the size of the resources required to decode
621 it into a Perl structure.
622
623 Third, JSON::XS recurses using the C stack when decoding objects and
624 arrays. The C stack is a limited resource: for instance, on my amd64
625 machine with 8MB of stack size I can decode around 180k nested arrays
626 but only 14k nested JSON objects. If that is exceeded, the program
627 crashes. Thats why the default nesting limit is set to 8192. If your
628 process has a smaller stack, you should adjust this setting accordingly
629 with the C<max_depth> method.
630
631 And last but least, something else could bomb you that I forgot to think
632 of. In that case, you get to keep the pieces. I am alway sopen for hints,
633 though...
634
635
636 =head1 BUGS
637
638 While the goal of this module is to be correct, that unfortunately does
639 not mean its bug-free, only that I think its design is bug-free. It is
640 still relatively early in its development. If you keep reporting bugs they
641 will be fixed swiftly, though.
642
643 =cut
644
645 1;
646
647 =head1 AUTHOR
648
649 Marc Lehmann <schmorp@schmorp.de>
650 http://home.schmorp.de/
651
652 =cut
653