ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.20
Committed: Sun Mar 25 00:47:42 2007 UTC (17 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-0_7
Changes since 1.19: +5 -3 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 JSON::XS - JSON serialising/deserialising, done correctly and fast
4
5 =head1 SYNOPSIS
6
7 use JSON::XS;
8
9 # exported functions, croak on error
10
11 $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
12 $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
13
14 # oo-interface
15
16 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
17 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
18 $perl_scalar = $coder->decode ($unicode_json_text);
19
20 =head1 DESCRIPTION
21
22 This module converts Perl data structures to JSON and vice versa. Its
23 primary goal is to be I<correct> and its secondary goal is to be
24 I<fast>. To reach the latter goal it was written in C.
25
26 As this is the n-th-something JSON module on CPAN, what was the reason
27 to write yet another JSON module? While it seems there are many JSON
28 modules, none of them correctly handle all corner cases, and in most cases
29 their maintainers are unresponsive, gone missing, or not listening to bug
30 reports for other reasons.
31
32 See COMPARISON, below, for a comparison to some other JSON modules.
33
34 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
35 vice versa.
36
37 =head2 FEATURES
38
39 =over 4
40
41 =item * correct handling of unicode issues
42
43 This module knows how to handle Unicode, and even documents how and when
44 it does so.
45
46 =item * round-trip integrity
47
48 When you serialise a perl data structure using only datatypes supported
49 by JSON, the deserialised data structure is identical on the Perl level.
50 (e.g. the string "2.0" doesn't suddenly become "2").
51
52 =item * strict checking of JSON correctness
53
54 There is no guessing, no generating of illegal JSON texts by default,
55 and only JSON is accepted as input by default (the latter is a security
56 feature).
57
58 =item * fast
59
60 Compared to other JSON modules, this module compares favourably in terms
61 of speed, too.
62
63 =item * simple to use
64
65 This module has both a simple functional interface as well as an OO
66 interface.
67
68 =item * reasonably versatile output formats
69
70 You can choose between the most compact guarenteed single-line format
71 possible (nice for simple line-based protocols), a pure-ascii format (for
72 when your transport is not 8-bit clean), or a pretty-printed format (for
73 when you want to read that stuff). Or you can combine those features in
74 whatever way you like.
75
76 =back
77
78 =cut
79
80 package JSON::XS;
81
82 use strict;
83
84 BEGIN {
85 our $VERSION = '0.7';
86 our @ISA = qw(Exporter);
87
88 our @EXPORT = qw(to_json from_json);
89 require Exporter;
90
91 require XSLoader;
92 XSLoader::load JSON::XS::, $VERSION;
93 }
94
95 =head1 FUNCTIONAL INTERFACE
96
97 The following convinience methods are provided by this module. They are
98 exported by default:
99
100 =over 4
101
102 =item $json_text = to_json $perl_scalar
103
104 Converts the given Perl data structure (a simple scalar or a reference to
105 a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
106 octets only). Croaks on error.
107
108 This function call is functionally identical to:
109
110 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
111
112 except being faster.
113
114 =item $perl_scalar = from_json $json_text
115
116 The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
117 parse that as an UTF-8 encoded JSON text, returning the resulting simple
118 scalar or reference. Croaks on error.
119
120 This function call is functionally identical to:
121
122 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
123
124 except being faster.
125
126 =back
127
128 =head1 OBJECT-ORIENTED INTERFACE
129
130 The object oriented interface lets you configure your own encoding or
131 decoding style, within the limits of supported formats.
132
133 =over 4
134
135 =item $json = new JSON::XS
136
137 Creates a new JSON::XS object that can be used to de/encode JSON
138 strings. All boolean flags described below are by default I<disabled>.
139
140 The mutators for flags all return the JSON object again and thus calls can
141 be chained:
142
143 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
144 => {"a": [1, 2]}
145
146 =item $json = $json->ascii ([$enable])
147
148 If C<$enable> is true (or missing), then the C<encode> method will not
149 generate characters outside the code range C<0..127> (which is ASCII). Any
150 unicode characters outside that range will be escaped using either a
151 single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
152 as per RFC4627.
153
154 If C<$enable> is false, then the C<encode> method will not escape Unicode
155 characters unless required by the JSON syntax. This results in a faster
156 and more compact format.
157
158 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
159 => ["\ud801\udc01"]
160
161 =item $json = $json->utf8 ([$enable])
162
163 If C<$enable> is true (or missing), then the C<encode> method will encode
164 the JSON result into UTF-8, as required by many protocols, while the
165 C<decode> method expects to be handled an UTF-8-encoded string. Please
166 note that UTF-8-encoded strings do not contain any characters outside the
167 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
168 versions, enabling this option might enable autodetection of the UTF-16
169 and UTF-32 encoding families, as described in RFC4627.
170
171 If C<$enable> is false, then the C<encode> method will return the JSON
172 string as a (non-encoded) unicode string, while C<decode> expects thus a
173 unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
174 to be done yourself, e.g. using the Encode module.
175
176 Example, output UTF-16BE-encoded JSON:
177
178 use Encode;
179 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
180
181 Example, decode UTF-32LE-encoded JSON:
182
183 use Encode;
184 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
185
186 =item $json = $json->pretty ([$enable])
187
188 This enables (or disables) all of the C<indent>, C<space_before> and
189 C<space_after> (and in the future possibly more) flags in one call to
190 generate the most readable (or most compact) form possible.
191
192 Example, pretty-print some simple structure:
193
194 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
195 =>
196 {
197 "a" : [
198 1,
199 2
200 ]
201 }
202
203 =item $json = $json->indent ([$enable])
204
205 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
206 format as output, putting every array member or object/hash key-value pair
207 into its own line, identing them properly.
208
209 If C<$enable> is false, no newlines or indenting will be produced, and the
210 resulting JSON text is guarenteed not to contain any C<newlines>.
211
212 This setting has no effect when decoding JSON texts.
213
214 =item $json = $json->space_before ([$enable])
215
216 If C<$enable> is true (or missing), then the C<encode> method will add an extra
217 optional space before the C<:> separating keys from values in JSON objects.
218
219 If C<$enable> is false, then the C<encode> method will not add any extra
220 space at those places.
221
222 This setting has no effect when decoding JSON texts. You will also
223 most likely combine this setting with C<space_after>.
224
225 Example, space_before enabled, space_after and indent disabled:
226
227 {"key" :"value"}
228
229 =item $json = $json->space_after ([$enable])
230
231 If C<$enable> is true (or missing), then the C<encode> method will add an extra
232 optional space after the C<:> separating keys from values in JSON objects
233 and extra whitespace after the C<,> separating key-value pairs and array
234 members.
235
236 If C<$enable> is false, then the C<encode> method will not add any extra
237 space at those places.
238
239 This setting has no effect when decoding JSON texts.
240
241 Example, space_before and indent disabled, space_after enabled:
242
243 {"key": "value"}
244
245 =item $json = $json->canonical ([$enable])
246
247 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
248 by sorting their keys. This is adding a comparatively high overhead.
249
250 If C<$enable> is false, then the C<encode> method will output key-value
251 pairs in the order Perl stores them (which will likely change between runs
252 of the same script).
253
254 This option is useful if you want the same data structure to be encoded as
255 the same JSON text (given the same overall settings). If it is disabled,
256 the same hash migh be encoded differently even if contains the same data,
257 as key-value pairs have no inherent ordering in Perl.
258
259 This setting has no effect when decoding JSON texts.
260
261 =item $json = $json->allow_nonref ([$enable])
262
263 If C<$enable> is true (or missing), then the C<encode> method can convert a
264 non-reference into its corresponding string, number or null JSON value,
265 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
266 values instead of croaking.
267
268 If C<$enable> is false, then the C<encode> method will croak if it isn't
269 passed an arrayref or hashref, as JSON texts must either be an object
270 or array. Likewise, C<decode> will croak if given something that is not a
271 JSON object or array.
272
273 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
274 resulting in an invalid JSON text:
275
276 JSON::XS->new->allow_nonref->encode ("Hello, World!")
277 => "Hello, World!"
278
279 =item $json = $json->shrink ([$enable])
280
281 Perl usually over-allocates memory a bit when allocating space for
282 strings. This flag optionally resizes strings generated by either
283 C<encode> or C<decode> to their minimum size possible. This can save
284 memory when your JSON texts are either very very long or you have many
285 short strings. It will also try to downgrade any strings to octet-form
286 if possible: perl stores strings internally either in an encoding called
287 UTF-X or in octet-form. The latter cannot store everything but uses less
288 space in general.
289
290 If C<$enable> is true (or missing), the string returned by C<encode> will be shrunk-to-fit,
291 while all strings generated by C<decode> will also be shrunk-to-fit.
292
293 If C<$enable> is false, then the normal perl allocation algorithms are used.
294 If you work with your data, then this is likely to be faster.
295
296 In the future, this setting might control other things, such as converting
297 strings that look like integers or floats into integers or floats
298 internally (there is no difference on the Perl level), saving space.
299
300 =item $json_text = $json->encode ($perl_scalar)
301
302 Converts the given Perl data structure (a simple scalar or a reference
303 to a hash or array) to its JSON representation. Simple scalars will be
304 converted into JSON string or number sequences, while references to arrays
305 become JSON arrays and references to hashes become JSON objects. Undefined
306 Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
307 nor C<false> values will be generated.
308
309 =item $perl_scalar = $json->decode ($json_text)
310
311 The opposite of C<encode>: expects a JSON text and tries to parse it,
312 returning the resulting simple scalar or reference. Croaks on error.
313
314 JSON numbers and strings become simple Perl scalars. JSON arrays become
315 Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
316 C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
317
318 =back
319
320 =head1 MAPPING
321
322 This section describes how JSON::XS maps Perl values to JSON values and
323 vice versa. These mappings are designed to "do the right thing" in most
324 circumstances automatically, preserving round-tripping characteristics
325 (what you put in comes out as something equivalent).
326
327 For the more enlightened: note that in the following descriptions,
328 lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
329 refers to the abstract Perl language itself.
330
331 =head2 JSON -> PERL
332
333 =over 4
334
335 =item object
336
337 A JSON object becomes a reference to a hash in Perl. No ordering of object
338 keys is preserved (JSON does not preserver object key ordering itself).
339
340 =item array
341
342 A JSON array becomes a reference to an array in Perl.
343
344 =item string
345
346 A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
347 are represented by the same codepoints in the Perl string, so no manual
348 decoding is necessary.
349
350 =item number
351
352 A JSON number becomes either an integer or numeric (floating point)
353 scalar in perl, depending on its range and any fractional parts. On the
354 Perl level, there is no difference between those as Perl handles all the
355 conversion details, but an integer may take slightly less memory and might
356 represent more values exactly than (floating point) numbers.
357
358 =item true, false
359
360 These JSON atoms become C<0>, C<1>, respectively. Information is lost in
361 this process. Future versions might represent those values differently,
362 but they will be guarenteed to act like these integers would normally in
363 Perl.
364
365 =item null
366
367 A JSON null atom becomes C<undef> in Perl.
368
369 =back
370
371 =head2 PERL -> JSON
372
373 The mapping from Perl to JSON is slightly more difficult, as Perl is a
374 truly typeless language, so we can only guess which JSON type is meant by
375 a Perl value.
376
377 =over 4
378
379 =item hash references
380
381 Perl hash references become JSON objects. As there is no inherent ordering
382 in hash keys, they will usually be encoded in a pseudo-random order that
383 can change between runs of the same program but stays generally the same
384 within a single run of a program. JSON::XS can optionally sort the hash
385 keys (determined by the I<canonical> flag), so the same datastructure
386 will serialise to the same JSON text (given same settings and version of
387 JSON::XS), but this incurs a runtime overhead.
388
389 =item array references
390
391 Perl array references become JSON arrays.
392
393 =item blessed objects
394
395 Blessed objects are not allowed. JSON::XS currently tries to encode their
396 underlying representation (hash- or arrayref), but this behaviour might
397 change in future versions.
398
399 =item simple scalars
400
401 Simple Perl scalars (any scalar that is not a reference) are the most
402 difficult objects to encode: JSON::XS will encode undefined scalars as
403 JSON null value, scalars that have last been used in a string context
404 before encoding as JSON strings and anything else as number value:
405
406 # dump as number
407 to_json [2] # yields [2]
408 to_json [-3.0e17] # yields [-3e+17]
409 my $value = 5; to_json [$value] # yields [5]
410
411 # used as string, so dump as string
412 print $value;
413 to_json [$value] # yields ["5"]
414
415 # undef becomes null
416 to_json [undef] # yields [null]
417
418 You can force the type to be a string by stringifying it:
419
420 my $x = 3.1; # some variable containing a number
421 "$x"; # stringified
422 $x .= ""; # another, more awkward way to stringify
423 print $x; # perl does it for you, too, quite often
424
425 You can force the type to be a number by numifying it:
426
427 my $x = "3"; # some variable containing a string
428 $x += 0; # numify it, ensuring it will be dumped as a number
429 $x *= 1; # same thing, the choise is yours.
430
431 You can not currently output JSON booleans or force the type in other,
432 less obscure, ways. Tell me if you need this capability.
433
434 =item circular data structures
435
436 Those will be encoded until memory or stackspace runs out.
437
438 =back
439
440 =head1 COMPARISON
441
442 As already mentioned, this module was created because none of the existing
443 JSON modules could be made to work correctly. First I will describe the
444 problems (or pleasures) I encountered with various existing JSON modules,
445 followed by some benchmark values. JSON::XS was designed not to suffer
446 from any of these problems or limitations.
447
448 =over 4
449
450 =item JSON 1.07
451
452 Slow (but very portable, as it is written in pure Perl).
453
454 Undocumented/buggy Unicode handling (how JSON handles unicode values is
455 undocumented. One can get far by feeding it unicode strings and doing
456 en-/decoding oneself, but unicode escapes are not working properly).
457
458 No roundtripping (strings get clobbered if they look like numbers, e.g.
459 the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
460 decode into the number 2.
461
462 =item JSON::PC 0.01
463
464 Very fast.
465
466 Undocumented/buggy Unicode handling.
467
468 No roundtripping.
469
470 Has problems handling many Perl values (e.g. regex results and other magic
471 values will make it croak).
472
473 Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
474 which is not a valid JSON text.
475
476 Unmaintained (maintainer unresponsive for many months, bugs are not
477 getting fixed).
478
479 =item JSON::Syck 0.21
480
481 Very buggy (often crashes).
482
483 Very inflexible (no human-readable format supported, format pretty much
484 undocumented. I need at least a format for easy reading by humans and a
485 single-line compact format for use in a protocol, and preferably a way to
486 generate ASCII-only JSON texts).
487
488 Completely broken (and confusingly documented) Unicode handling (unicode
489 escapes are not working properly, you need to set ImplicitUnicode to
490 I<different> values on en- and decoding to get symmetric behaviour).
491
492 No roundtripping (simple cases work, but this depends on wether the scalar
493 value was used in a numeric context or not).
494
495 Dumping hashes may skip hash values depending on iterator state.
496
497 Unmaintained (maintainer unresponsive for many months, bugs are not
498 getting fixed).
499
500 Does not check input for validity (i.e. will accept non-JSON input and
501 return "something" instead of raising an exception. This is a security
502 issue: imagine two banks transfering money between each other using
503 JSON. One bank might parse a given non-JSON request and deduct money,
504 while the other might reject the transaction with a syntax error. While a
505 good protocol will at least recover, that is extra unnecessary work and
506 the transaction will still not succeed).
507
508 =item JSON::DWIW 0.04
509
510 Very fast. Very natural. Very nice.
511
512 Undocumented unicode handling (but the best of the pack. Unicode escapes
513 still don't get parsed properly).
514
515 Very inflexible.
516
517 No roundtripping.
518
519 Does not generate valid JSON texts (key strings are often unquoted, empty keys
520 result in nothing being output)
521
522 Does not check input for validity.
523
524 =back
525
526 =head2 SPEED
527
528 It seems that JSON::XS is surprisingly fast, as shown in the following
529 tables. They have been generated with the help of the C<eg/bench> program
530 in the JSON::XS distribution, to make it easy to compare on your own
531 system.
532
533 First comes a comparison between various modules using a very short JSON
534 string:
535
536 {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null}
537
538 It shows the number of encodes/decodes per second (JSON::XS uses the
539 functional interface, while JSON::XS/2 uses the OO interface with
540 pretty-printing and hashkey sorting enabled). Higher is better:
541
542 module | encode | decode |
543 -----------|------------|------------|
544 JSON | 11488.516 | 7823.035 |
545 JSON::DWIW | 94708.054 | 129094.260 |
546 JSON::PC | 63884.157 | 128528.212 |
547 JSON::Syck | 34898.677 | 42096.911 |
548 JSON::XS | 654027.064 | 396423.669 |
549 JSON::XS/2 | 371564.190 | 371725.613 |
550 -----------+------------+------------+
551
552 That is, JSON::XS is more than six times faster than JSON::DWIW on
553 encoding, more than three times faster on decoding, and about thirty times
554 faster than JSON, even with pretty-printing and key sorting.
555
556 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
557 search API (http://nanoref.com/yahooapis/mgPdGg):
558
559 module | encode | decode |
560 -----------|------------|------------|
561 JSON | 273.023 | 44.674 |
562 JSON::DWIW | 1089.383 | 1145.704 |
563 JSON::PC | 3097.419 | 2393.921 |
564 JSON::Syck | 514.060 | 843.053 |
565 JSON::XS | 6479.668 | 3636.364 |
566 JSON::XS/2 | 3774.221 | 3599.124 |
567 -----------+------------+------------+
568
569 Again, JSON::XS leads by far.
570
571 On large strings containing lots of high unicode characters, some modules
572 (such as JSON::PC) seem to decode faster than JSON::XS, but the result
573 will be broken due to missing (or wrong) unicode handling. Others refuse
574 to decode or encode properly, so it was impossible to prepare a fair
575 comparison table for that case.
576
577 =head1 RESOURCE LIMITS
578
579 JSON::XS does not impose any limits on the size of JSON texts or Perl
580 values they represent - if your machine can handle it, JSON::XS will
581 encode or decode it. Future versions might optionally impose structure
582 depth and memory use resource limits.
583
584 =head1 BUGS
585
586 While the goal of this module is to be correct, that unfortunately does
587 not mean its bug-free, only that I think its design is bug-free. It is
588 still very young and not well-tested. If you keep reporting bugs they will
589 be fixed swiftly, though.
590
591 =cut
592
593 1;
594
595 =head1 AUTHOR
596
597 Marc Lehmann <schmorp@schmorp.de>
598 http://home.schmorp.de/
599
600 =cut
601