ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.21
Committed: Sun Mar 25 02:32:40 2007 UTC (17 years, 1 month ago) by root
Branch: MAIN
Changes since 1.20: +12 -8 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 JSON::XS - JSON serialising/deserialising, done correctly and fast
4
5 =head1 SYNOPSIS
6
7 use JSON::XS;
8
9 # exported functions, croak on error
10
11 $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
12 $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
13
14 # objToJson and jsonToObj are exported for JSON
15 # compatibility, but should not be used in new code.
16
17 # oo-interface
18
19 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
20 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
21 $perl_scalar = $coder->decode ($unicode_json_text);
22
23 =head1 DESCRIPTION
24
25 This module converts Perl data structures to JSON and vice versa. Its
26 primary goal is to be I<correct> and its secondary goal is to be
27 I<fast>. To reach the latter goal it was written in C.
28
29 As this is the n-th-something JSON module on CPAN, what was the reason
30 to write yet another JSON module? While it seems there are many JSON
31 modules, none of them correctly handle all corner cases, and in most cases
32 their maintainers are unresponsive, gone missing, or not listening to bug
33 reports for other reasons.
34
35 See COMPARISON, below, for a comparison to some other JSON modules.
36
37 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
38 vice versa.
39
40 =head2 FEATURES
41
42 =over 4
43
44 =item * correct unicode handling
45
46 This module knows how to handle Unicode, and even documents how and when
47 it does so.
48
49 =item * round-trip integrity
50
51 When you serialise a perl data structure using only datatypes supported
52 by JSON, the deserialised data structure is identical on the Perl level.
53 (e.g. the string "2.0" doesn't suddenly become "2" just because it looks
54 like a number).
55
56 =item * strict checking of JSON correctness
57
58 There is no guessing, no generating of illegal JSON texts by default,
59 and only JSON is accepted as input by default (the latter is a security
60 feature).
61
62 =item * fast
63
64 Compared to other JSON modules, this module compares favourably in terms
65 of speed, too.
66
67 =item * simple to use
68
69 This module has both a simple functional interface as well as an OO
70 interface.
71
72 =item * reasonably versatile output formats
73
74 You can choose between the most compact guarenteed single-line format
75 possible (nice for simple line-based protocols), a pure-ascii format
76 (for when your transport is not 8-bit clean, still supports the whole
77 unicode range), or a pretty-printed format (for when you want to read that
78 stuff). Or you can combine those features in whatever way you like.
79
80 =back
81
82 =cut
83
84 package JSON::XS;
85
86 use strict;
87
88 BEGIN {
89 our $VERSION = '0.8';
90 our @ISA = qw(Exporter);
91
92 our @EXPORT = qw(to_json from_json objToJson jsonToObj);
93 require Exporter;
94
95 require XSLoader;
96 XSLoader::load JSON::XS::, $VERSION;
97 }
98
99 =head1 FUNCTIONAL INTERFACE
100
101 The following convinience methods are provided by this module. They are
102 exported by default:
103
104 =over 4
105
106 =item $json_text = to_json $perl_scalar
107
108 Converts the given Perl data structure (a simple scalar or a reference to
109 a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
110 octets only). Croaks on error.
111
112 This function call is functionally identical to:
113
114 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
115
116 except being faster.
117
118 =item $perl_scalar = from_json $json_text
119
120 The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
121 parse that as an UTF-8 encoded JSON text, returning the resulting simple
122 scalar or reference. Croaks on error.
123
124 This function call is functionally identical to:
125
126 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
127
128 except being faster.
129
130 =back
131
132 =head1 OBJECT-ORIENTED INTERFACE
133
134 The object oriented interface lets you configure your own encoding or
135 decoding style, within the limits of supported formats.
136
137 =over 4
138
139 =item $json = new JSON::XS
140
141 Creates a new JSON::XS object that can be used to de/encode JSON
142 strings. All boolean flags described below are by default I<disabled>.
143
144 The mutators for flags all return the JSON object again and thus calls can
145 be chained:
146
147 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
148 => {"a": [1, 2]}
149
150 =item $json = $json->ascii ([$enable])
151
152 If C<$enable> is true (or missing), then the C<encode> method will not
153 generate characters outside the code range C<0..127> (which is ASCII). Any
154 unicode characters outside that range will be escaped using either a
155 single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
156 as per RFC4627.
157
158 If C<$enable> is false, then the C<encode> method will not escape Unicode
159 characters unless required by the JSON syntax. This results in a faster
160 and more compact format.
161
162 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
163 => ["\ud801\udc01"]
164
165 =item $json = $json->utf8 ([$enable])
166
167 If C<$enable> is true (or missing), then the C<encode> method will encode
168 the JSON result into UTF-8, as required by many protocols, while the
169 C<decode> method expects to be handled an UTF-8-encoded string. Please
170 note that UTF-8-encoded strings do not contain any characters outside the
171 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
172 versions, enabling this option might enable autodetection of the UTF-16
173 and UTF-32 encoding families, as described in RFC4627.
174
175 If C<$enable> is false, then the C<encode> method will return the JSON
176 string as a (non-encoded) unicode string, while C<decode> expects thus a
177 unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
178 to be done yourself, e.g. using the Encode module.
179
180 Example, output UTF-16BE-encoded JSON:
181
182 use Encode;
183 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
184
185 Example, decode UTF-32LE-encoded JSON:
186
187 use Encode;
188 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
189
190 =item $json = $json->pretty ([$enable])
191
192 This enables (or disables) all of the C<indent>, C<space_before> and
193 C<space_after> (and in the future possibly more) flags in one call to
194 generate the most readable (or most compact) form possible.
195
196 Example, pretty-print some simple structure:
197
198 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
199 =>
200 {
201 "a" : [
202 1,
203 2
204 ]
205 }
206
207 =item $json = $json->indent ([$enable])
208
209 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
210 format as output, putting every array member or object/hash key-value pair
211 into its own line, identing them properly.
212
213 If C<$enable> is false, no newlines or indenting will be produced, and the
214 resulting JSON text is guarenteed not to contain any C<newlines>.
215
216 This setting has no effect when decoding JSON texts.
217
218 =item $json = $json->space_before ([$enable])
219
220 If C<$enable> is true (or missing), then the C<encode> method will add an extra
221 optional space before the C<:> separating keys from values in JSON objects.
222
223 If C<$enable> is false, then the C<encode> method will not add any extra
224 space at those places.
225
226 This setting has no effect when decoding JSON texts. You will also
227 most likely combine this setting with C<space_after>.
228
229 Example, space_before enabled, space_after and indent disabled:
230
231 {"key" :"value"}
232
233 =item $json = $json->space_after ([$enable])
234
235 If C<$enable> is true (or missing), then the C<encode> method will add an extra
236 optional space after the C<:> separating keys from values in JSON objects
237 and extra whitespace after the C<,> separating key-value pairs and array
238 members.
239
240 If C<$enable> is false, then the C<encode> method will not add any extra
241 space at those places.
242
243 This setting has no effect when decoding JSON texts.
244
245 Example, space_before and indent disabled, space_after enabled:
246
247 {"key": "value"}
248
249 =item $json = $json->canonical ([$enable])
250
251 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
252 by sorting their keys. This is adding a comparatively high overhead.
253
254 If C<$enable> is false, then the C<encode> method will output key-value
255 pairs in the order Perl stores them (which will likely change between runs
256 of the same script).
257
258 This option is useful if you want the same data structure to be encoded as
259 the same JSON text (given the same overall settings). If it is disabled,
260 the same hash migh be encoded differently even if contains the same data,
261 as key-value pairs have no inherent ordering in Perl.
262
263 This setting has no effect when decoding JSON texts.
264
265 =item $json = $json->allow_nonref ([$enable])
266
267 If C<$enable> is true (or missing), then the C<encode> method can convert a
268 non-reference into its corresponding string, number or null JSON value,
269 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
270 values instead of croaking.
271
272 If C<$enable> is false, then the C<encode> method will croak if it isn't
273 passed an arrayref or hashref, as JSON texts must either be an object
274 or array. Likewise, C<decode> will croak if given something that is not a
275 JSON object or array.
276
277 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
278 resulting in an invalid JSON text:
279
280 JSON::XS->new->allow_nonref->encode ("Hello, World!")
281 => "Hello, World!"
282
283 =item $json = $json->shrink ([$enable])
284
285 Perl usually over-allocates memory a bit when allocating space for
286 strings. This flag optionally resizes strings generated by either
287 C<encode> or C<decode> to their minimum size possible. This can save
288 memory when your JSON texts are either very very long or you have many
289 short strings. It will also try to downgrade any strings to octet-form
290 if possible: perl stores strings internally either in an encoding called
291 UTF-X or in octet-form. The latter cannot store everything but uses less
292 space in general.
293
294 If C<$enable> is true (or missing), the string returned by C<encode> will be shrunk-to-fit,
295 while all strings generated by C<decode> will also be shrunk-to-fit.
296
297 If C<$enable> is false, then the normal perl allocation algorithms are used.
298 If you work with your data, then this is likely to be faster.
299
300 In the future, this setting might control other things, such as converting
301 strings that look like integers or floats into integers or floats
302 internally (there is no difference on the Perl level), saving space.
303
304 =item $json_text = $json->encode ($perl_scalar)
305
306 Converts the given Perl data structure (a simple scalar or a reference
307 to a hash or array) to its JSON representation. Simple scalars will be
308 converted into JSON string or number sequences, while references to arrays
309 become JSON arrays and references to hashes become JSON objects. Undefined
310 Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
311 nor C<false> values will be generated.
312
313 =item $perl_scalar = $json->decode ($json_text)
314
315 The opposite of C<encode>: expects a JSON text and tries to parse it,
316 returning the resulting simple scalar or reference. Croaks on error.
317
318 JSON numbers and strings become simple Perl scalars. JSON arrays become
319 Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
320 C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
321
322 =back
323
324 =head1 MAPPING
325
326 This section describes how JSON::XS maps Perl values to JSON values and
327 vice versa. These mappings are designed to "do the right thing" in most
328 circumstances automatically, preserving round-tripping characteristics
329 (what you put in comes out as something equivalent).
330
331 For the more enlightened: note that in the following descriptions,
332 lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
333 refers to the abstract Perl language itself.
334
335 =head2 JSON -> PERL
336
337 =over 4
338
339 =item object
340
341 A JSON object becomes a reference to a hash in Perl. No ordering of object
342 keys is preserved (JSON does not preserver object key ordering itself).
343
344 =item array
345
346 A JSON array becomes a reference to an array in Perl.
347
348 =item string
349
350 A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
351 are represented by the same codepoints in the Perl string, so no manual
352 decoding is necessary.
353
354 =item number
355
356 A JSON number becomes either an integer or numeric (floating point)
357 scalar in perl, depending on its range and any fractional parts. On the
358 Perl level, there is no difference between those as Perl handles all the
359 conversion details, but an integer may take slightly less memory and might
360 represent more values exactly than (floating point) numbers.
361
362 =item true, false
363
364 These JSON atoms become C<0>, C<1>, respectively. Information is lost in
365 this process. Future versions might represent those values differently,
366 but they will be guarenteed to act like these integers would normally in
367 Perl.
368
369 =item null
370
371 A JSON null atom becomes C<undef> in Perl.
372
373 =back
374
375 =head2 PERL -> JSON
376
377 The mapping from Perl to JSON is slightly more difficult, as Perl is a
378 truly typeless language, so we can only guess which JSON type is meant by
379 a Perl value.
380
381 =over 4
382
383 =item hash references
384
385 Perl hash references become JSON objects. As there is no inherent ordering
386 in hash keys, they will usually be encoded in a pseudo-random order that
387 can change between runs of the same program but stays generally the same
388 within a single run of a program. JSON::XS can optionally sort the hash
389 keys (determined by the I<canonical> flag), so the same datastructure
390 will serialise to the same JSON text (given same settings and version of
391 JSON::XS), but this incurs a runtime overhead.
392
393 =item array references
394
395 Perl array references become JSON arrays.
396
397 =item blessed objects
398
399 Blessed objects are not allowed. JSON::XS currently tries to encode their
400 underlying representation (hash- or arrayref), but this behaviour might
401 change in future versions.
402
403 =item simple scalars
404
405 Simple Perl scalars (any scalar that is not a reference) are the most
406 difficult objects to encode: JSON::XS will encode undefined scalars as
407 JSON null value, scalars that have last been used in a string context
408 before encoding as JSON strings and anything else as number value:
409
410 # dump as number
411 to_json [2] # yields [2]
412 to_json [-3.0e17] # yields [-3e+17]
413 my $value = 5; to_json [$value] # yields [5]
414
415 # used as string, so dump as string
416 print $value;
417 to_json [$value] # yields ["5"]
418
419 # undef becomes null
420 to_json [undef] # yields [null]
421
422 You can force the type to be a string by stringifying it:
423
424 my $x = 3.1; # some variable containing a number
425 "$x"; # stringified
426 $x .= ""; # another, more awkward way to stringify
427 print $x; # perl does it for you, too, quite often
428
429 You can force the type to be a number by numifying it:
430
431 my $x = "3"; # some variable containing a string
432 $x += 0; # numify it, ensuring it will be dumped as a number
433 $x *= 1; # same thing, the choise is yours.
434
435 You can not currently output JSON booleans or force the type in other,
436 less obscure, ways. Tell me if you need this capability.
437
438 =item circular data structures
439
440 Those will be encoded until memory or stackspace runs out.
441
442 =back
443
444 =head1 COMPARISON
445
446 As already mentioned, this module was created because none of the existing
447 JSON modules could be made to work correctly. First I will describe the
448 problems (or pleasures) I encountered with various existing JSON modules,
449 followed by some benchmark values. JSON::XS was designed not to suffer
450 from any of these problems or limitations.
451
452 =over 4
453
454 =item JSON 1.07
455
456 Slow (but very portable, as it is written in pure Perl).
457
458 Undocumented/buggy Unicode handling (how JSON handles unicode values is
459 undocumented. One can get far by feeding it unicode strings and doing
460 en-/decoding oneself, but unicode escapes are not working properly).
461
462 No roundtripping (strings get clobbered if they look like numbers, e.g.
463 the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
464 decode into the number 2.
465
466 =item JSON::PC 0.01
467
468 Very fast.
469
470 Undocumented/buggy Unicode handling.
471
472 No roundtripping.
473
474 Has problems handling many Perl values (e.g. regex results and other magic
475 values will make it croak).
476
477 Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
478 which is not a valid JSON text.
479
480 Unmaintained (maintainer unresponsive for many months, bugs are not
481 getting fixed).
482
483 =item JSON::Syck 0.21
484
485 Very buggy (often crashes).
486
487 Very inflexible (no human-readable format supported, format pretty much
488 undocumented. I need at least a format for easy reading by humans and a
489 single-line compact format for use in a protocol, and preferably a way to
490 generate ASCII-only JSON texts).
491
492 Completely broken (and confusingly documented) Unicode handling (unicode
493 escapes are not working properly, you need to set ImplicitUnicode to
494 I<different> values on en- and decoding to get symmetric behaviour).
495
496 No roundtripping (simple cases work, but this depends on wether the scalar
497 value was used in a numeric context or not).
498
499 Dumping hashes may skip hash values depending on iterator state.
500
501 Unmaintained (maintainer unresponsive for many months, bugs are not
502 getting fixed).
503
504 Does not check input for validity (i.e. will accept non-JSON input and
505 return "something" instead of raising an exception. This is a security
506 issue: imagine two banks transfering money between each other using
507 JSON. One bank might parse a given non-JSON request and deduct money,
508 while the other might reject the transaction with a syntax error. While a
509 good protocol will at least recover, that is extra unnecessary work and
510 the transaction will still not succeed).
511
512 =item JSON::DWIW 0.04
513
514 Very fast. Very natural. Very nice.
515
516 Undocumented unicode handling (but the best of the pack. Unicode escapes
517 still don't get parsed properly).
518
519 Very inflexible.
520
521 No roundtripping.
522
523 Does not generate valid JSON texts (key strings are often unquoted, empty keys
524 result in nothing being output)
525
526 Does not check input for validity.
527
528 =back
529
530 =head2 SPEED
531
532 It seems that JSON::XS is surprisingly fast, as shown in the following
533 tables. They have been generated with the help of the C<eg/bench> program
534 in the JSON::XS distribution, to make it easy to compare on your own
535 system.
536
537 First comes a comparison between various modules using a very short JSON
538 string:
539
540 {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null}
541
542 It shows the number of encodes/decodes per second (JSON::XS uses the
543 functional interface, while JSON::XS/2 uses the OO interface with
544 pretty-printing and hashkey sorting enabled). Higher is better:
545
546 module | encode | decode |
547 -----------|------------|------------|
548 JSON | 11488.516 | 7823.035 |
549 JSON::DWIW | 94708.054 | 129094.260 |
550 JSON::PC | 63884.157 | 128528.212 |
551 JSON::Syck | 34898.677 | 42096.911 |
552 JSON::XS | 654027.064 | 396423.669 |
553 JSON::XS/2 | 371564.190 | 371725.613 |
554 -----------+------------+------------+
555
556 That is, JSON::XS is more than six times faster than JSON::DWIW on
557 encoding, more than three times faster on decoding, and about thirty times
558 faster than JSON, even with pretty-printing and key sorting.
559
560 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
561 search API (http://nanoref.com/yahooapis/mgPdGg):
562
563 module | encode | decode |
564 -----------|------------|------------|
565 JSON | 273.023 | 44.674 |
566 JSON::DWIW | 1089.383 | 1145.704 |
567 JSON::PC | 3097.419 | 2393.921 |
568 JSON::Syck | 514.060 | 843.053 |
569 JSON::XS | 6479.668 | 3636.364 |
570 JSON::XS/2 | 3774.221 | 3599.124 |
571 -----------+------------+------------+
572
573 Again, JSON::XS leads by far.
574
575 On large strings containing lots of high unicode characters, some modules
576 (such as JSON::PC) seem to decode faster than JSON::XS, but the result
577 will be broken due to missing (or wrong) unicode handling. Others refuse
578 to decode or encode properly, so it was impossible to prepare a fair
579 comparison table for that case.
580
581 =head1 RESOURCE LIMITS
582
583 JSON::XS does not impose any limits on the size of JSON texts or Perl
584 values they represent - if your machine can handle it, JSON::XS will
585 encode or decode it. Future versions might optionally impose structure
586 depth and memory use resource limits.
587
588 =head1 BUGS
589
590 While the goal of this module is to be correct, that unfortunately does
591 not mean its bug-free, only that I think its design is bug-free. It is
592 still very young and not well-tested. If you keep reporting bugs they will
593 be fixed swiftly, though.
594
595 =cut
596
597 1;
598
599 =head1 AUTHOR
600
601 Marc Lehmann <schmorp@schmorp.de>
602 http://home.schmorp.de/
603
604 =cut
605