ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.77
Committed: Tue Dec 4 10:37:42 2007 UTC (16 years, 5 months ago) by root
Branch: MAIN
CVS Tags: rel-2_0
Changes since 1.76: +16 -0 lines
Log Message:
2.0

File Contents

# Content
1 =head1 NAME
2
3 JSON::XS - JSON serialising/deserialising, done correctly and fast
4
5 JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
6 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
7
8 =head1 SYNOPSIS
9
10 use JSON::XS;
11
12 # exported functions, they croak on error
13 # and expect/generate UTF-8
14
15 $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
16 $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
17
18 # OO-interface
19
20 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
21 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
22 $perl_scalar = $coder->decode ($unicode_json_text);
23
24 # Note that JSON version 2.0 and above will automatically use JSON::XS
25 # if available, at virtually no speed overhead either, so you should
26 # be able to just:
27
28 use JSON;
29
30 # and do the same things, except that you have a pure-perl fallback now.
31
32 =head1 DESCRIPTION
33
34 This module converts Perl data structures to JSON and vice versa. Its
35 primary goal is to be I<correct> and its secondary goal is to be
36 I<fast>. To reach the latter goal it was written in C.
37
38 Beginning with version 2.0 of the JSON module, when both JSON and
39 JSON::XS are installed, then JSON will fall back on JSON::XS (this can be
40 overriden) with no overhead due to emulation (by inheritign constructor
41 and methods). If JSON::XS is not available, it will fall back to the
42 compatible JSON::PP module as backend, so using JSON instead of JSON::XS
43 gives you a portable JSON API that can be fast when you need and doesn't
44 require a C compiler when that is a problem.
45
46 As this is the n-th-something JSON module on CPAN, what was the reason
47 to write yet another JSON module? While it seems there are many JSON
48 modules, none of them correctly handle all corner cases, and in most cases
49 their maintainers are unresponsive, gone missing, or not listening to bug
50 reports for other reasons.
51
52 See COMPARISON, below, for a comparison to some other JSON modules.
53
54 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
55 vice versa.
56
57 =head2 FEATURES
58
59 =over 4
60
61 =item * correct Unicode handling
62
63 This module knows how to handle Unicode, and even documents how and when
64 it does so.
65
66 =item * round-trip integrity
67
68 When you serialise a perl data structure using only datatypes supported
69 by JSON, the deserialised data structure is identical on the Perl level.
70 (e.g. the string "2.0" doesn't suddenly become "2" just because it looks
71 like a number).
72
73 =item * strict checking of JSON correctness
74
75 There is no guessing, no generating of illegal JSON texts by default,
76 and only JSON is accepted as input by default (the latter is a security
77 feature).
78
79 =item * fast
80
81 Compared to other JSON modules, this module compares favourably in terms
82 of speed, too.
83
84 =item * simple to use
85
86 This module has both a simple functional interface as well as an OO
87 interface.
88
89 =item * reasonably versatile output formats
90
91 You can choose between the most compact guaranteed single-line format
92 possible (nice for simple line-based protocols), a pure-ascii format
93 (for when your transport is not 8-bit clean, still supports the whole
94 Unicode range), or a pretty-printed format (for when you want to read that
95 stuff). Or you can combine those features in whatever way you like.
96
97 =back
98
99 =cut
100
101 package JSON::XS;
102
103 use strict;
104
105 our $VERSION = '2.0';
106 our @ISA = qw(Exporter);
107
108 our @EXPORT = qw(to_json from_json);
109
110 use Exporter;
111 use XSLoader;
112
113 =head1 FUNCTIONAL INTERFACE
114
115 The following convenience methods are provided by this module. They are
116 exported by default:
117
118 =over 4
119
120 =item $json_text = to_json $perl_scalar
121
122 Converts the given Perl data structure to a UTF-8 encoded, binary string
123 (that is, the string contains octets only). Croaks on error.
124
125 This function call is functionally identical to:
126
127 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
128
129 except being faster.
130
131 =item $perl_scalar = from_json $json_text
132
133 The opposite of C<to_json>: expects an UTF-8 (binary) string and tries
134 to parse that as an UTF-8 encoded JSON text, returning the resulting
135 reference. Croaks on error.
136
137 This function call is functionally identical to:
138
139 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
140
141 except being faster.
142
143 =item $is_boolean = JSON::XS::is_bool $scalar
144
145 Returns true if the passed scalar represents either JSON::XS::true or
146 JSON::XS::false, two constants that act like C<1> and C<0>, respectively
147 and are used to represent JSON C<true> and C<false> values in Perl.
148
149 See MAPPING, below, for more information on how JSON values are mapped to
150 Perl.
151
152 =back
153
154
155 =head1 A FEW NOTES ON UNICODE AND PERL
156
157 Since this often leads to confusion, here are a few very clear words on
158 how Unicode works in Perl, modulo bugs.
159
160 =over 4
161
162 =item 1. Perl strings can store characters with ordinal values > 255.
163
164 This enables you to store Unicode characters as single characters in a
165 Perl string - very natural.
166
167 =item 2. Perl does I<not> associate an encoding with your strings.
168
169 Unless you force it to, e.g. when matching it against a regex, or printing
170 the scalar to a file, in which case Perl either interprets your string as
171 locale-encoded text, octets/binary, or as Unicode, depending on various
172 settings. In no case is an encoding stored together with your data, it is
173 I<use> that decides encoding, not any magical metadata.
174
175 =item 3. The internal utf-8 flag has no meaning with regards to the
176 encoding of your string.
177
178 Just ignore that flag unless you debug a Perl bug, a module written in
179 XS or want to dive into the internals of perl. Otherwise it will only
180 confuse you, as, despite the name, it says nothing about how your string
181 is encoded. You can have Unicode strings with that flag set, with that
182 flag clear, and you can have binary data with that flag set and that flag
183 clear. Other possibilities exist, too.
184
185 If you didn't know about that flag, just the better, pretend it doesn't
186 exist.
187
188 =item 4. A "Unicode String" is simply a string where each character can be
189 validly interpreted as a Unicode codepoint.
190
191 If you have UTF-8 encoded data, it is no longer a Unicode string, but a
192 Unicode string encoded in UTF-8, giving you a binary string.
193
194 =item 5. A string containing "high" (> 255) character values is I<not> a UTF-8 string.
195
196 It's a fact. Learn to live with it.
197
198 =back
199
200 I hope this helps :)
201
202
203 =head1 OBJECT-ORIENTED INTERFACE
204
205 The object oriented interface lets you configure your own encoding or
206 decoding style, within the limits of supported formats.
207
208 =over 4
209
210 =item $json = new JSON::XS
211
212 Creates a new JSON::XS object that can be used to de/encode JSON
213 strings. All boolean flags described below are by default I<disabled>.
214
215 The mutators for flags all return the JSON object again and thus calls can
216 be chained:
217
218 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
219 => {"a": [1, 2]}
220
221 =item $json = $json->ascii ([$enable])
222
223 =item $enabled = $json->get_ascii
224
225 If C<$enable> is true (or missing), then the C<encode> method will not
226 generate characters outside the code range C<0..127> (which is ASCII). Any
227 Unicode characters outside that range will be escaped using either a
228 single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
229 as per RFC4627. The resulting encoded JSON text can be treated as a native
230 Unicode string, an ascii-encoded, latin1-encoded or UTF-8 encoded string,
231 or any other superset of ASCII.
232
233 If C<$enable> is false, then the C<encode> method will not escape Unicode
234 characters unless required by the JSON syntax or other flags. This results
235 in a faster and more compact format.
236
237 The main use for this flag is to produce JSON texts that can be
238 transmitted over a 7-bit channel, as the encoded JSON texts will not
239 contain any 8 bit characters.
240
241 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
242 => ["\ud801\udc01"]
243
244 =item $json = $json->latin1 ([$enable])
245
246 =item $enabled = $json->get_latin1
247
248 If C<$enable> is true (or missing), then the C<encode> method will encode
249 the resulting JSON text as latin1 (or iso-8859-1), escaping any characters
250 outside the code range C<0..255>. The resulting string can be treated as a
251 latin1-encoded JSON text or a native Unicode string. The C<decode> method
252 will not be affected in any way by this flag, as C<decode> by default
253 expects Unicode, which is a strict superset of latin1.
254
255 If C<$enable> is false, then the C<encode> method will not escape Unicode
256 characters unless required by the JSON syntax or other flags.
257
258 The main use for this flag is efficiently encoding binary data as JSON
259 text, as most octets will not be escaped, resulting in a smaller encoded
260 size. The disadvantage is that the resulting JSON text is encoded
261 in latin1 (and must correctly be treated as such when storing and
262 transferring), a rare encoding for JSON. It is therefore most useful when
263 you want to store data structures known to contain binary data efficiently
264 in files or databases, not when talking to other JSON encoders/decoders.
265
266 JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
267 => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
268
269 =item $json = $json->utf8 ([$enable])
270
271 =item $enabled = $json->get_utf8
272
273 If C<$enable> is true (or missing), then the C<encode> method will encode
274 the JSON result into UTF-8, as required by many protocols, while the
275 C<decode> method expects to be handled an UTF-8-encoded string. Please
276 note that UTF-8-encoded strings do not contain any characters outside the
277 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
278 versions, enabling this option might enable autodetection of the UTF-16
279 and UTF-32 encoding families, as described in RFC4627.
280
281 If C<$enable> is false, then the C<encode> method will return the JSON
282 string as a (non-encoded) Unicode string, while C<decode> expects thus a
283 Unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
284 to be done yourself, e.g. using the Encode module.
285
286 Example, output UTF-16BE-encoded JSON:
287
288 use Encode;
289 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
290
291 Example, decode UTF-32LE-encoded JSON:
292
293 use Encode;
294 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
295
296 =item $json = $json->pretty ([$enable])
297
298 This enables (or disables) all of the C<indent>, C<space_before> and
299 C<space_after> (and in the future possibly more) flags in one call to
300 generate the most readable (or most compact) form possible.
301
302 Example, pretty-print some simple structure:
303
304 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
305 =>
306 {
307 "a" : [
308 1,
309 2
310 ]
311 }
312
313 =item $json = $json->indent ([$enable])
314
315 =item $enabled = $json->get_indent
316
317 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
318 format as output, putting every array member or object/hash key-value pair
319 into its own line, indenting them properly.
320
321 If C<$enable> is false, no newlines or indenting will be produced, and the
322 resulting JSON text is guaranteed not to contain any C<newlines>.
323
324 This setting has no effect when decoding JSON texts.
325
326 =item $json = $json->space_before ([$enable])
327
328 =item $enabled = $json->get_space_before
329
330 If C<$enable> is true (or missing), then the C<encode> method will add an extra
331 optional space before the C<:> separating keys from values in JSON objects.
332
333 If C<$enable> is false, then the C<encode> method will not add any extra
334 space at those places.
335
336 This setting has no effect when decoding JSON texts. You will also
337 most likely combine this setting with C<space_after>.
338
339 Example, space_before enabled, space_after and indent disabled:
340
341 {"key" :"value"}
342
343 =item $json = $json->space_after ([$enable])
344
345 =item $enabled = $json->get_space_after
346
347 If C<$enable> is true (or missing), then the C<encode> method will add an extra
348 optional space after the C<:> separating keys from values in JSON objects
349 and extra whitespace after the C<,> separating key-value pairs and array
350 members.
351
352 If C<$enable> is false, then the C<encode> method will not add any extra
353 space at those places.
354
355 This setting has no effect when decoding JSON texts.
356
357 Example, space_before and indent disabled, space_after enabled:
358
359 {"key": "value"}
360
361 =item $json = $json->relaxed ([$enable])
362
363 =item $enabled = $json->get_relaxed
364
365 If C<$enable> is true (or missing), then C<decode> will accept some
366 extensions to normal JSON syntax (see below). C<encode> will not be
367 affected in anyway. I<Be aware that this option makes you accept invalid
368 JSON texts as if they were valid!>. I suggest only to use this option to
369 parse application-specific files written by humans (configuration files,
370 resource files etc.)
371
372 If C<$enable> is false (the default), then C<decode> will only accept
373 valid JSON texts.
374
375 Currently accepted extensions are:
376
377 =over 4
378
379 =item * list items can have an end-comma
380
381 JSON I<separates> array elements and key-value pairs with commas. This
382 can be annoying if you write JSON texts manually and want to be able to
383 quickly append elements, so this extension accepts comma at the end of
384 such items not just between them:
385
386 [
387 1,
388 2, <- this comma not normally allowed
389 ]
390 {
391 "k1": "v1",
392 "k2": "v2", <- this comma not normally allowed
393 }
394
395 =item * shell-style '#'-comments
396
397 Whenever JSON allows whitespace, shell-style comments are additionally
398 allowed. They are terminated by the first carriage-return or line-feed
399 character, after which more white-space and comments are allowed.
400
401 [
402 1, # this comment not allowed in JSON
403 # neither this one...
404 ]
405
406 =back
407
408 =item $json = $json->canonical ([$enable])
409
410 =item $enabled = $json->get_canonical
411
412 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
413 by sorting their keys. This is adding a comparatively high overhead.
414
415 If C<$enable> is false, then the C<encode> method will output key-value
416 pairs in the order Perl stores them (which will likely change between runs
417 of the same script).
418
419 This option is useful if you want the same data structure to be encoded as
420 the same JSON text (given the same overall settings). If it is disabled,
421 the same hash might be encoded differently even if contains the same data,
422 as key-value pairs have no inherent ordering in Perl.
423
424 This setting has no effect when decoding JSON texts.
425
426 =item $json = $json->allow_nonref ([$enable])
427
428 =item $enabled = $json->get_allow_nonref
429
430 If C<$enable> is true (or missing), then the C<encode> method can convert a
431 non-reference into its corresponding string, number or null JSON value,
432 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
433 values instead of croaking.
434
435 If C<$enable> is false, then the C<encode> method will croak if it isn't
436 passed an arrayref or hashref, as JSON texts must either be an object
437 or array. Likewise, C<decode> will croak if given something that is not a
438 JSON object or array.
439
440 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
441 resulting in an invalid JSON text:
442
443 JSON::XS->new->allow_nonref->encode ("Hello, World!")
444 => "Hello, World!"
445
446 =item $json = $json->allow_blessed ([$enable])
447
448 =item $enabled = $json->get_allow_blessed
449
450 If C<$enable> is true (or missing), then the C<encode> method will not
451 barf when it encounters a blessed reference. Instead, the value of the
452 B<convert_blessed> option will decide whether C<null> (C<convert_blessed>
453 disabled or no C<TO_JSON> method found) or a representation of the
454 object (C<convert_blessed> enabled and C<TO_JSON> method found) is being
455 encoded. Has no effect on C<decode>.
456
457 If C<$enable> is false (the default), then C<encode> will throw an
458 exception when it encounters a blessed object.
459
460 =item $json = $json->convert_blessed ([$enable])
461
462 =item $enabled = $json->get_convert_blessed
463
464 If C<$enable> is true (or missing), then C<encode>, upon encountering a
465 blessed object, will check for the availability of the C<TO_JSON> method
466 on the object's class. If found, it will be called in scalar context
467 and the resulting scalar will be encoded instead of the object. If no
468 C<TO_JSON> method is found, the value of C<allow_blessed> will decide what
469 to do.
470
471 The C<TO_JSON> method may safely call die if it wants. If C<TO_JSON>
472 returns other blessed objects, those will be handled in the same
473 way. C<TO_JSON> must take care of not causing an endless recursion cycle
474 (== crash) in this case. The name of C<TO_JSON> was chosen because other
475 methods called by the Perl core (== not by the user of the object) are
476 usually in upper case letters and to avoid collisions with the C<to_json>
477 function.
478
479 This setting does not yet influence C<decode> in any way, but in the
480 future, global hooks might get installed that influence C<decode> and are
481 enabled by this setting.
482
483 If C<$enable> is false, then the C<allow_blessed> setting will decide what
484 to do when a blessed object is found.
485
486 =item $json = $json->filter_json_object ([$coderef->($hashref)])
487
488 When C<$coderef> is specified, it will be called from C<decode> each
489 time it decodes a JSON object. The only argument is a reference to the
490 newly-created hash. If the code references returns a single scalar (which
491 need not be a reference), this value (i.e. a copy of that scalar to avoid
492 aliasing) is inserted into the deserialised data structure. If it returns
493 an empty list (NOTE: I<not> C<undef>, which is a valid scalar), the
494 original deserialised hash will be inserted. This setting can slow down
495 decoding considerably.
496
497 When C<$coderef> is omitted or undefined, any existing callback will
498 be removed and C<decode> will not change the deserialised hash in any
499 way.
500
501 Example, convert all JSON objects into the integer 5:
502
503 my $js = JSON::XS->new->filter_json_object (sub { 5 });
504 # returns [5]
505 $js->decode ('[{}]')
506 # throw an exception because allow_nonref is not enabled
507 # so a lone 5 is not allowed.
508 $js->decode ('{"a":1, "b":2}');
509
510 =item $json = $json->filter_json_single_key_object ($key [=> $coderef->($value)])
511
512 Works remotely similar to C<filter_json_object>, but is only called for
513 JSON objects having a single key named C<$key>.
514
515 This C<$coderef> is called before the one specified via
516 C<filter_json_object>, if any. It gets passed the single value in the JSON
517 object. If it returns a single value, it will be inserted into the data
518 structure. If it returns nothing (not even C<undef> but the empty list),
519 the callback from C<filter_json_object> will be called next, as if no
520 single-key callback were specified.
521
522 If C<$coderef> is omitted or undefined, the corresponding callback will be
523 disabled. There can only ever be one callback for a given key.
524
525 As this callback gets called less often then the C<filter_json_object>
526 one, decoding speed will not usually suffer as much. Therefore, single-key
527 objects make excellent targets to serialise Perl objects into, especially
528 as single-key JSON objects are as close to the type-tagged value concept
529 as JSON gets (it's basically an ID/VALUE tuple). Of course, JSON does not
530 support this in any way, so you need to make sure your data never looks
531 like a serialised Perl hash.
532
533 Typical names for the single object key are C<__class_whatever__>, or
534 C<$__dollars_are_rarely_used__$> or C<}ugly_brace_placement>, or even
535 things like C<__class_md5sum(classname)__>, to reduce the risk of clashing
536 with real hashes.
537
538 Example, decode JSON objects of the form C<< { "__widget__" => <id> } >>
539 into the corresponding C<< $WIDGET{<id>} >> object:
540
541 # return whatever is in $WIDGET{5}:
542 JSON::XS
543 ->new
544 ->filter_json_single_key_object (__widget__ => sub {
545 $WIDGET{ $_[0] }
546 })
547 ->decode ('{"__widget__": 5')
548
549 # this can be used with a TO_JSON method in some "widget" class
550 # for serialisation to json:
551 sub WidgetBase::TO_JSON {
552 my ($self) = @_;
553
554 unless ($self->{id}) {
555 $self->{id} = ..get..some..id..;
556 $WIDGET{$self->{id}} = $self;
557 }
558
559 { __widget__ => $self->{id} }
560 }
561
562 =item $json = $json->shrink ([$enable])
563
564 =item $enabled = $json->get_shrink
565
566 Perl usually over-allocates memory a bit when allocating space for
567 strings. This flag optionally resizes strings generated by either
568 C<encode> or C<decode> to their minimum size possible. This can save
569 memory when your JSON texts are either very very long or you have many
570 short strings. It will also try to downgrade any strings to octet-form
571 if possible: perl stores strings internally either in an encoding called
572 UTF-X or in octet-form. The latter cannot store everything but uses less
573 space in general (and some buggy Perl or C code might even rely on that
574 internal representation being used).
575
576 The actual definition of what shrink does might change in future versions,
577 but it will always try to save space at the expense of time.
578
579 If C<$enable> is true (or missing), the string returned by C<encode> will
580 be shrunk-to-fit, while all strings generated by C<decode> will also be
581 shrunk-to-fit.
582
583 If C<$enable> is false, then the normal perl allocation algorithms are used.
584 If you work with your data, then this is likely to be faster.
585
586 In the future, this setting might control other things, such as converting
587 strings that look like integers or floats into integers or floats
588 internally (there is no difference on the Perl level), saving space.
589
590 =item $json = $json->max_depth ([$maximum_nesting_depth])
591
592 =item $max_depth = $json->get_max_depth
593
594 Sets the maximum nesting level (default C<512>) accepted while encoding
595 or decoding. If the JSON text or Perl data structure has an equal or
596 higher nesting level then this limit, then the encoder and decoder will
597 stop and croak at that point.
598
599 Nesting level is defined by number of hash- or arrayrefs that the encoder
600 needs to traverse to reach a given point or the number of C<{> or C<[>
601 characters without their matching closing parenthesis crossed to reach a
602 given character in a string.
603
604 Setting the maximum depth to one disallows any nesting, so that ensures
605 that the object is only a single hash/object or array.
606
607 The argument to C<max_depth> will be rounded up to the next highest power
608 of two. If no argument is given, the highest possible setting will be
609 used, which is rarely useful.
610
611 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
612
613 =item $json = $json->max_size ([$maximum_string_size])
614
615 =item $max_size = $json->get_max_size
616
617 Set the maximum length a JSON text may have (in bytes) where decoding is
618 being attempted. The default is C<0>, meaning no limit. When C<decode>
619 is called on a string longer then this number of characters it will not
620 attempt to decode the string but throw an exception. This setting has no
621 effect on C<encode> (yet).
622
623 The argument to C<max_size> will be rounded up to the next B<highest>
624 power of two (so may be more than requested). If no argument is given, the
625 limit check will be deactivated (same as when C<0> is specified).
626
627 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
628
629 =item $json_text = $json->encode ($perl_scalar)
630
631 Converts the given Perl data structure (a simple scalar or a reference
632 to a hash or array) to its JSON representation. Simple scalars will be
633 converted into JSON string or number sequences, while references to arrays
634 become JSON arrays and references to hashes become JSON objects. Undefined
635 Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
636 nor C<false> values will be generated.
637
638 =item $perl_scalar = $json->decode ($json_text)
639
640 The opposite of C<encode>: expects a JSON text and tries to parse it,
641 returning the resulting simple scalar or reference. Croaks on error.
642
643 JSON numbers and strings become simple Perl scalars. JSON arrays become
644 Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
645 C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
646
647 =item ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
648
649 This works like the C<decode> method, but instead of raising an exception
650 when there is trailing garbage after the first JSON object, it will
651 silently stop parsing there and return the number of characters consumed
652 so far.
653
654 This is useful if your JSON texts are not delimited by an outer protocol
655 (which is not the brightest thing to do in the first place) and you need
656 to know where the JSON text ends.
657
658 JSON::XS->new->decode_prefix ("[1] the tail")
659 => ([], 3)
660
661 =back
662
663
664 =head1 MAPPING
665
666 This section describes how JSON::XS maps Perl values to JSON values and
667 vice versa. These mappings are designed to "do the right thing" in most
668 circumstances automatically, preserving round-tripping characteristics
669 (what you put in comes out as something equivalent).
670
671 For the more enlightened: note that in the following descriptions,
672 lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
673 refers to the abstract Perl language itself.
674
675
676 =head2 JSON -> PERL
677
678 =over 4
679
680 =item object
681
682 A JSON object becomes a reference to a hash in Perl. No ordering of object
683 keys is preserved (JSON does not preserve object key ordering itself).
684
685 =item array
686
687 A JSON array becomes a reference to an array in Perl.
688
689 =item string
690
691 A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
692 are represented by the same codepoints in the Perl string, so no manual
693 decoding is necessary.
694
695 =item number
696
697 A JSON number becomes either an integer, numeric (floating point) or
698 string scalar in perl, depending on its range and any fractional parts. On
699 the Perl level, there is no difference between those as Perl handles all
700 the conversion details, but an integer may take slightly less memory and
701 might represent more values exactly than (floating point) numbers.
702
703 If the number consists of digits only, JSON::XS will try to represent
704 it as an integer value. If that fails, it will try to represent it as
705 a numeric (floating point) value if that is possible without loss of
706 precision. Otherwise it will preserve the number as a string value.
707
708 Numbers containing a fractional or exponential part will always be
709 represented as numeric (floating point) values, possibly at a loss of
710 precision.
711
712 This might create round-tripping problems as numbers might become strings,
713 but as Perl is typeless there is no other way to do it.
714
715 =item true, false
716
717 These JSON atoms become C<JSON::XS::true> and C<JSON::XS::false>,
718 respectively. They are overloaded to act almost exactly like the numbers
719 C<1> and C<0>. You can check whether a scalar is a JSON boolean by using
720 the C<JSON::XS::is_bool> function.
721
722 =item null
723
724 A JSON null atom becomes C<undef> in Perl.
725
726 =back
727
728
729 =head2 PERL -> JSON
730
731 The mapping from Perl to JSON is slightly more difficult, as Perl is a
732 truly typeless language, so we can only guess which JSON type is meant by
733 a Perl value.
734
735 =over 4
736
737 =item hash references
738
739 Perl hash references become JSON objects. As there is no inherent ordering
740 in hash keys (or JSON objects), they will usually be encoded in a
741 pseudo-random order that can change between runs of the same program but
742 stays generally the same within a single run of a program. JSON::XS can
743 optionally sort the hash keys (determined by the I<canonical> flag), so
744 the same datastructure will serialise to the same JSON text (given same
745 settings and version of JSON::XS), but this incurs a runtime overhead
746 and is only rarely useful, e.g. when you want to compare some JSON text
747 against another for equality.
748
749 =item array references
750
751 Perl array references become JSON arrays.
752
753 =item other references
754
755 Other unblessed references are generally not allowed and will cause an
756 exception to be thrown, except for references to the integers C<0> and
757 C<1>, which get turned into C<false> and C<true> atoms in JSON. You can
758 also use C<JSON::XS::false> and C<JSON::XS::true> to improve readability.
759
760 to_json [\0,JSON::XS::true] # yields [false,true]
761
762 =item JSON::XS::true, JSON::XS::false
763
764 These special values become JSON true and JSON false values,
765 respectively. You can also use C<\1> and C<\0> directly if you want.
766
767 =item blessed objects
768
769 Blessed objects are not allowed. JSON::XS currently tries to encode their
770 underlying representation (hash- or arrayref), but this behaviour might
771 change in future versions.
772
773 =item simple scalars
774
775 Simple Perl scalars (any scalar that is not a reference) are the most
776 difficult objects to encode: JSON::XS will encode undefined scalars as
777 JSON null value, scalars that have last been used in a string context
778 before encoding as JSON strings and anything else as number value:
779
780 # dump as number
781 to_json [2] # yields [2]
782 to_json [-3.0e17] # yields [-3e+17]
783 my $value = 5; to_json [$value] # yields [5]
784
785 # used as string, so dump as string
786 print $value;
787 to_json [$value] # yields ["5"]
788
789 # undef becomes null
790 to_json [undef] # yields [null]
791
792 You can force the type to be a JSON string by stringifying it:
793
794 my $x = 3.1; # some variable containing a number
795 "$x"; # stringified
796 $x .= ""; # another, more awkward way to stringify
797 print $x; # perl does it for you, too, quite often
798
799 You can force the type to be a JSON number by numifying it:
800
801 my $x = "3"; # some variable containing a string
802 $x += 0; # numify it, ensuring it will be dumped as a number
803 $x *= 1; # same thing, the choice is yours.
804
805 You can not currently force the type in other, less obscure, ways. Tell me
806 if you need this capability.
807
808 =back
809
810
811 =head1 COMPARISON
812
813 As already mentioned, this module was created because none of the existing
814 JSON modules could be made to work correctly. First I will describe the
815 problems (or pleasures) I encountered with various existing JSON modules,
816 followed by some benchmark values. JSON::XS was designed not to suffer
817 from any of these problems or limitations.
818
819 =over 4
820
821 =item JSON 1.07
822
823 Slow (but very portable, as it is written in pure Perl).
824
825 Undocumented/buggy Unicode handling (how JSON handles Unicode values is
826 undocumented. One can get far by feeding it Unicode strings and doing
827 en-/decoding oneself, but Unicode escapes are not working properly).
828
829 No round-tripping (strings get clobbered if they look like numbers, e.g.
830 the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
831 decode into the number 2.
832
833 =item JSON::PC 0.01
834
835 Very fast.
836
837 Undocumented/buggy Unicode handling.
838
839 No round-tripping.
840
841 Has problems handling many Perl values (e.g. regex results and other magic
842 values will make it croak).
843
844 Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
845 which is not a valid JSON text.
846
847 Unmaintained (maintainer unresponsive for many months, bugs are not
848 getting fixed).
849
850 =item JSON::Syck 0.21
851
852 Very buggy (often crashes).
853
854 Very inflexible (no human-readable format supported, format pretty much
855 undocumented. I need at least a format for easy reading by humans and a
856 single-line compact format for use in a protocol, and preferably a way to
857 generate ASCII-only JSON texts).
858
859 Completely broken (and confusingly documented) Unicode handling (Unicode
860 escapes are not working properly, you need to set ImplicitUnicode to
861 I<different> values on en- and decoding to get symmetric behaviour).
862
863 No round-tripping (simple cases work, but this depends on whether the scalar
864 value was used in a numeric context or not).
865
866 Dumping hashes may skip hash values depending on iterator state.
867
868 Unmaintained (maintainer unresponsive for many months, bugs are not
869 getting fixed).
870
871 Does not check input for validity (i.e. will accept non-JSON input and
872 return "something" instead of raising an exception. This is a security
873 issue: imagine two banks transferring money between each other using
874 JSON. One bank might parse a given non-JSON request and deduct money,
875 while the other might reject the transaction with a syntax error. While a
876 good protocol will at least recover, that is extra unnecessary work and
877 the transaction will still not succeed).
878
879 =item JSON::DWIW 0.04
880
881 Very fast. Very natural. Very nice.
882
883 Undocumented Unicode handling (but the best of the pack. Unicode escapes
884 still don't get parsed properly).
885
886 Very inflexible.
887
888 No round-tripping.
889
890 Does not generate valid JSON texts (key strings are often unquoted, empty keys
891 result in nothing being output)
892
893 Does not check input for validity.
894
895 =back
896
897
898 =head2 JSON and YAML
899
900 You often hear that JSON is a subset (or a close subset) of YAML. This is,
901 however, a mass hysteria and very far from the truth. In general, there is
902 no way to configure JSON::XS to output a data structure as valid YAML.
903
904 If you really must use JSON::XS to generate YAML, you should use this
905 algorithm (subject to change in future versions):
906
907 my $to_yaml = JSON::XS->new->utf8->space_after (1);
908 my $yaml = $to_yaml->encode ($ref) . "\n";
909
910 This will usually generate JSON texts that also parse as valid
911 YAML. Please note that YAML has hardcoded limits on (simple) object key
912 lengths that JSON doesn't have, so you should make sure that your hash
913 keys are noticeably shorter than the 1024 characters YAML allows.
914
915 There might be other incompatibilities that I am not aware of. In general
916 you should not try to generate YAML with a JSON generator or vice versa,
917 or try to parse JSON with a YAML parser or vice versa: chances are high
918 that you will run into severe interoperability problems.
919
920
921 =head2 SPEED
922
923 It seems that JSON::XS is surprisingly fast, as shown in the following
924 tables. They have been generated with the help of the C<eg/bench> program
925 in the JSON::XS distribution, to make it easy to compare on your own
926 system.
927
928 First comes a comparison between various modules using a very short
929 single-line JSON string:
930
931 {"method": "handleMessage", "params": ["user1", "we were just talking"], \
932 "id": null, "array":[1,11,234,-5,1e5,1e7, true, false]}
933
934 It shows the number of encodes/decodes per second (JSON::XS uses
935 the functional interface, while JSON::XS/2 uses the OO interface
936 with pretty-printing and hashkey sorting enabled, JSON::XS/3 enables
937 shrink). Higher is better:
938
939 module | encode | decode |
940 -----------|------------|------------|
941 JSON 1.x | 4990.842 | 4088.813 |
942 JSON::DWIW | 51653.990 | 71575.154 |
943 JSON::PC | 65948.176 | 74631.744 |
944 JSON::PP | 8931.652 | 3817.168 |
945 JSON::Syck | 24877.248 | 27776.848 |
946 JSON::XS | 388361.481 | 227951.304 |
947 JSON::XS/2 | 227951.304 | 218453.333 |
948 JSON::XS/3 | 338250.323 | 218453.333 |
949 Storable | 16500.016 | 135300.129 |
950 -----------+------------+------------+
951
952 That is, JSON::XS is about five times faster than JSON::DWIW on encoding,
953 about three times faster on decoding, and over forty times faster
954 than JSON, even with pretty-printing and key sorting. It also compares
955 favourably to Storable for small amounts of data.
956
957 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
958 search API (http://nanoref.com/yahooapis/mgPdGg):
959
960 module | encode | decode |
961 -----------|------------|------------|
962 JSON 1.x | 55.260 | 34.971 |
963 JSON::DWIW | 825.228 | 1082.513 |
964 JSON::PC | 3571.444 | 2394.829 |
965 JSON::PP | 210.987 | 32.574 |
966 JSON::Syck | 552.551 | 787.544 |
967 JSON::XS | 5780.463 | 4854.519 |
968 JSON::XS/2 | 3869.998 | 4798.975 |
969 JSON::XS/3 | 5862.880 | 4798.975 |
970 Storable | 4445.002 | 5235.027 |
971 -----------+------------+------------+
972
973 Again, JSON::XS leads by far (except for Storable which non-surprisingly
974 decodes faster).
975
976 On large strings containing lots of high Unicode characters, some modules
977 (such as JSON::PC) seem to decode faster than JSON::XS, but the result
978 will be broken due to missing (or wrong) Unicode handling. Others refuse
979 to decode or encode properly, so it was impossible to prepare a fair
980 comparison table for that case.
981
982
983 =head1 SECURITY CONSIDERATIONS
984
985 When you are using JSON in a protocol, talking to untrusted potentially
986 hostile creatures requires relatively few measures.
987
988 First of all, your JSON decoder should be secure, that is, should not have
989 any buffer overflows. Obviously, this module should ensure that and I am
990 trying hard on making that true, but you never know.
991
992 Second, you need to avoid resource-starving attacks. That means you should
993 limit the size of JSON texts you accept, or make sure then when your
994 resources run out, that's just fine (e.g. by using a separate process that
995 can crash safely). The size of a JSON text in octets or characters is
996 usually a good indication of the size of the resources required to decode
997 it into a Perl structure. While JSON::XS can check the size of the JSON
998 text, it might be too late when you already have it in memory, so you
999 might want to check the size before you accept the string.
1000
1001 Third, JSON::XS recurses using the C stack when decoding objects and
1002 arrays. The C stack is a limited resource: for instance, on my amd64
1003 machine with 8MB of stack size I can decode around 180k nested arrays but
1004 only 14k nested JSON objects (due to perl itself recursing deeply on croak
1005 to free the temporary). If that is exceeded, the program crashes. to be
1006 conservative, the default nesting limit is set to 512. If your process
1007 has a smaller stack, you should adjust this setting accordingly with the
1008 C<max_depth> method.
1009
1010 And last but least, something else could bomb you that I forgot to think
1011 of. In that case, you get to keep the pieces. I am always open for hints,
1012 though...
1013
1014 If you are using JSON::XS to return packets to consumption
1015 by JavaScript scripts in a browser you should have a look at
1016 L<http://jpsykes.com/47/practical-csrf-and-json-security> to see whether
1017 you are vulnerable to some common attack vectors (which really are browser
1018 design bugs, but it is still you who will have to deal with it, as major
1019 browser developers care only for features, not about doing security
1020 right).
1021
1022
1023 =head1 THREADS
1024
1025 This module is I<not> guaranteed to be thread safe and there are no
1026 plans to change this until Perl gets thread support (as opposed to the
1027 horribly slow so-called "threads" which are simply slow and bloated
1028 process simulations - use fork, its I<much> faster, cheaper, better).
1029
1030 (It might actually work, but you have been warned).
1031
1032
1033 =head1 BUGS
1034
1035 While the goal of this module is to be correct, that unfortunately does
1036 not mean its bug-free, only that I think its design is bug-free. It is
1037 still relatively early in its development. If you keep reporting bugs they
1038 will be fixed swiftly, though.
1039
1040 Please refrain from using rt.cpan.org or any other bug reporting
1041 service. I put the contact address into my modules for a reason.
1042
1043 =cut
1044
1045 our $true = do { bless \(my $dummy = 1), "JSON::XS::Boolean" };
1046 our $false = do { bless \(my $dummy = 0), "JSON::XS::Boolean" };
1047
1048 sub true() { $true }
1049 sub false() { $false }
1050
1051 sub is_bool($) {
1052 UNIVERSAL::isa $_[0], "JSON::XS::Boolean"
1053 # or UNIVERSAL::isa $_[0], "JSON::Literal"
1054 }
1055
1056 XSLoader::load "JSON::XS", $VERSION;
1057
1058 package JSON::XS::Boolean;
1059
1060 use overload
1061 "0+" => sub { ${$_[0]} },
1062 "++" => sub { $_[0] = ${$_[0]} + 1 },
1063 "--" => sub { $_[0] = ${$_[0]} - 1 },
1064 fallback => 1;
1065
1066 1;
1067
1068 =head1 AUTHOR
1069
1070 Marc Lehmann <schmorp@schmorp.de>
1071 http://home.schmorp.de/
1072
1073 =cut
1074