ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.76
Committed: Sun Dec 2 15:34:13 2007 UTC (16 years, 5 months ago) by root
Branch: MAIN
Changes since 1.75: +2 -2 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 JSON::XS - JSON serialising/deserialising, done correctly and fast
4
5 JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
6 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
7
8 =head1 SYNOPSIS
9
10 use JSON::XS;
11
12 # exported functions, they croak on error
13 # and expect/generate UTF-8
14
15 $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
16 $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
17
18 # OO-interface
19
20 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
21 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
22 $perl_scalar = $coder->decode ($unicode_json_text);
23
24 =head1 DESCRIPTION
25
26 This module converts Perl data structures to JSON and vice versa. Its
27 primary goal is to be I<correct> and its secondary goal is to be
28 I<fast>. To reach the latter goal it was written in C.
29
30 As this is the n-th-something JSON module on CPAN, what was the reason
31 to write yet another JSON module? While it seems there are many JSON
32 modules, none of them correctly handle all corner cases, and in most cases
33 their maintainers are unresponsive, gone missing, or not listening to bug
34 reports for other reasons.
35
36 See COMPARISON, below, for a comparison to some other JSON modules.
37
38 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
39 vice versa.
40
41 =head2 FEATURES
42
43 =over 4
44
45 =item * correct Unicode handling
46
47 This module knows how to handle Unicode, and even documents how and when
48 it does so.
49
50 =item * round-trip integrity
51
52 When you serialise a perl data structure using only datatypes supported
53 by JSON, the deserialised data structure is identical on the Perl level.
54 (e.g. the string "2.0" doesn't suddenly become "2" just because it looks
55 like a number).
56
57 =item * strict checking of JSON correctness
58
59 There is no guessing, no generating of illegal JSON texts by default,
60 and only JSON is accepted as input by default (the latter is a security
61 feature).
62
63 =item * fast
64
65 Compared to other JSON modules, this module compares favourably in terms
66 of speed, too.
67
68 =item * simple to use
69
70 This module has both a simple functional interface as well as an OO
71 interface.
72
73 =item * reasonably versatile output formats
74
75 You can choose between the most compact guaranteed single-line format
76 possible (nice for simple line-based protocols), a pure-ascii format
77 (for when your transport is not 8-bit clean, still supports the whole
78 Unicode range), or a pretty-printed format (for when you want to read that
79 stuff). Or you can combine those features in whatever way you like.
80
81 =back
82
83 =cut
84
85 package JSON::XS;
86
87 use strict;
88
89 our $VERSION = '2.0';
90 our @ISA = qw(Exporter);
91
92 our @EXPORT = qw(to_json from_json);
93
94 use Exporter;
95 use XSLoader;
96
97 =head1 FUNCTIONAL INTERFACE
98
99 The following convenience methods are provided by this module. They are
100 exported by default:
101
102 =over 4
103
104 =item $json_text = to_json $perl_scalar
105
106 Converts the given Perl data structure to a UTF-8 encoded, binary string
107 (that is, the string contains octets only). Croaks on error.
108
109 This function call is functionally identical to:
110
111 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
112
113 except being faster.
114
115 =item $perl_scalar = from_json $json_text
116
117 The opposite of C<to_json>: expects an UTF-8 (binary) string and tries
118 to parse that as an UTF-8 encoded JSON text, returning the resulting
119 reference. Croaks on error.
120
121 This function call is functionally identical to:
122
123 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
124
125 except being faster.
126
127 =item $is_boolean = JSON::XS::is_bool $scalar
128
129 Returns true if the passed scalar represents either JSON::XS::true or
130 JSON::XS::false, two constants that act like C<1> and C<0>, respectively
131 and are used to represent JSON C<true> and C<false> values in Perl.
132
133 See MAPPING, below, for more information on how JSON values are mapped to
134 Perl.
135
136 =back
137
138
139 =head1 A FEW NOTES ON UNICODE AND PERL
140
141 Since this often leads to confusion, here are a few very clear words on
142 how Unicode works in Perl, modulo bugs.
143
144 =over 4
145
146 =item 1. Perl strings can store characters with ordinal values > 255.
147
148 This enables you to store Unicode characters as single characters in a
149 Perl string - very natural.
150
151 =item 2. Perl does I<not> associate an encoding with your strings.
152
153 Unless you force it to, e.g. when matching it against a regex, or printing
154 the scalar to a file, in which case Perl either interprets your string as
155 locale-encoded text, octets/binary, or as Unicode, depending on various
156 settings. In no case is an encoding stored together with your data, it is
157 I<use> that decides encoding, not any magical metadata.
158
159 =item 3. The internal utf-8 flag has no meaning with regards to the
160 encoding of your string.
161
162 Just ignore that flag unless you debug a Perl bug, a module written in
163 XS or want to dive into the internals of perl. Otherwise it will only
164 confuse you, as, despite the name, it says nothing about how your string
165 is encoded. You can have Unicode strings with that flag set, with that
166 flag clear, and you can have binary data with that flag set and that flag
167 clear. Other possibilities exist, too.
168
169 If you didn't know about that flag, just the better, pretend it doesn't
170 exist.
171
172 =item 4. A "Unicode String" is simply a string where each character can be
173 validly interpreted as a Unicode codepoint.
174
175 If you have UTF-8 encoded data, it is no longer a Unicode string, but a
176 Unicode string encoded in UTF-8, giving you a binary string.
177
178 =item 5. A string containing "high" (> 255) character values is I<not> a UTF-8 string.
179
180 It's a fact. Learn to live with it.
181
182 =back
183
184 I hope this helps :)
185
186
187 =head1 OBJECT-ORIENTED INTERFACE
188
189 The object oriented interface lets you configure your own encoding or
190 decoding style, within the limits of supported formats.
191
192 =over 4
193
194 =item $json = new JSON::XS
195
196 Creates a new JSON::XS object that can be used to de/encode JSON
197 strings. All boolean flags described below are by default I<disabled>.
198
199 The mutators for flags all return the JSON object again and thus calls can
200 be chained:
201
202 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
203 => {"a": [1, 2]}
204
205 =item $json = $json->ascii ([$enable])
206
207 =item $enabled = $json->get_ascii
208
209 If C<$enable> is true (or missing), then the C<encode> method will not
210 generate characters outside the code range C<0..127> (which is ASCII). Any
211 Unicode characters outside that range will be escaped using either a
212 single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
213 as per RFC4627. The resulting encoded JSON text can be treated as a native
214 Unicode string, an ascii-encoded, latin1-encoded or UTF-8 encoded string,
215 or any other superset of ASCII.
216
217 If C<$enable> is false, then the C<encode> method will not escape Unicode
218 characters unless required by the JSON syntax or other flags. This results
219 in a faster and more compact format.
220
221 The main use for this flag is to produce JSON texts that can be
222 transmitted over a 7-bit channel, as the encoded JSON texts will not
223 contain any 8 bit characters.
224
225 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
226 => ["\ud801\udc01"]
227
228 =item $json = $json->latin1 ([$enable])
229
230 =item $enabled = $json->get_latin1
231
232 If C<$enable> is true (or missing), then the C<encode> method will encode
233 the resulting JSON text as latin1 (or iso-8859-1), escaping any characters
234 outside the code range C<0..255>. The resulting string can be treated as a
235 latin1-encoded JSON text or a native Unicode string. The C<decode> method
236 will not be affected in any way by this flag, as C<decode> by default
237 expects Unicode, which is a strict superset of latin1.
238
239 If C<$enable> is false, then the C<encode> method will not escape Unicode
240 characters unless required by the JSON syntax or other flags.
241
242 The main use for this flag is efficiently encoding binary data as JSON
243 text, as most octets will not be escaped, resulting in a smaller encoded
244 size. The disadvantage is that the resulting JSON text is encoded
245 in latin1 (and must correctly be treated as such when storing and
246 transferring), a rare encoding for JSON. It is therefore most useful when
247 you want to store data structures known to contain binary data efficiently
248 in files or databases, not when talking to other JSON encoders/decoders.
249
250 JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
251 => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
252
253 =item $json = $json->utf8 ([$enable])
254
255 =item $enabled = $json->get_utf8
256
257 If C<$enable> is true (or missing), then the C<encode> method will encode
258 the JSON result into UTF-8, as required by many protocols, while the
259 C<decode> method expects to be handled an UTF-8-encoded string. Please
260 note that UTF-8-encoded strings do not contain any characters outside the
261 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
262 versions, enabling this option might enable autodetection of the UTF-16
263 and UTF-32 encoding families, as described in RFC4627.
264
265 If C<$enable> is false, then the C<encode> method will return the JSON
266 string as a (non-encoded) Unicode string, while C<decode> expects thus a
267 Unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
268 to be done yourself, e.g. using the Encode module.
269
270 Example, output UTF-16BE-encoded JSON:
271
272 use Encode;
273 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
274
275 Example, decode UTF-32LE-encoded JSON:
276
277 use Encode;
278 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
279
280 =item $json = $json->pretty ([$enable])
281
282 This enables (or disables) all of the C<indent>, C<space_before> and
283 C<space_after> (and in the future possibly more) flags in one call to
284 generate the most readable (or most compact) form possible.
285
286 Example, pretty-print some simple structure:
287
288 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
289 =>
290 {
291 "a" : [
292 1,
293 2
294 ]
295 }
296
297 =item $json = $json->indent ([$enable])
298
299 =item $enabled = $json->get_indent
300
301 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
302 format as output, putting every array member or object/hash key-value pair
303 into its own line, indenting them properly.
304
305 If C<$enable> is false, no newlines or indenting will be produced, and the
306 resulting JSON text is guaranteed not to contain any C<newlines>.
307
308 This setting has no effect when decoding JSON texts.
309
310 =item $json = $json->space_before ([$enable])
311
312 =item $enabled = $json->get_space_before
313
314 If C<$enable> is true (or missing), then the C<encode> method will add an extra
315 optional space before the C<:> separating keys from values in JSON objects.
316
317 If C<$enable> is false, then the C<encode> method will not add any extra
318 space at those places.
319
320 This setting has no effect when decoding JSON texts. You will also
321 most likely combine this setting with C<space_after>.
322
323 Example, space_before enabled, space_after and indent disabled:
324
325 {"key" :"value"}
326
327 =item $json = $json->space_after ([$enable])
328
329 =item $enabled = $json->get_space_after
330
331 If C<$enable> is true (or missing), then the C<encode> method will add an extra
332 optional space after the C<:> separating keys from values in JSON objects
333 and extra whitespace after the C<,> separating key-value pairs and array
334 members.
335
336 If C<$enable> is false, then the C<encode> method will not add any extra
337 space at those places.
338
339 This setting has no effect when decoding JSON texts.
340
341 Example, space_before and indent disabled, space_after enabled:
342
343 {"key": "value"}
344
345 =item $json = $json->relaxed ([$enable])
346
347 =item $enabled = $json->get_relaxed
348
349 If C<$enable> is true (or missing), then C<decode> will accept some
350 extensions to normal JSON syntax (see below). C<encode> will not be
351 affected in anyway. I<Be aware that this option makes you accept invalid
352 JSON texts as if they were valid!>. I suggest only to use this option to
353 parse application-specific files written by humans (configuration files,
354 resource files etc.)
355
356 If C<$enable> is false (the default), then C<decode> will only accept
357 valid JSON texts.
358
359 Currently accepted extensions are:
360
361 =over 4
362
363 =item * list items can have an end-comma
364
365 JSON I<separates> array elements and key-value pairs with commas. This
366 can be annoying if you write JSON texts manually and want to be able to
367 quickly append elements, so this extension accepts comma at the end of
368 such items not just between them:
369
370 [
371 1,
372 2, <- this comma not normally allowed
373 ]
374 {
375 "k1": "v1",
376 "k2": "v2", <- this comma not normally allowed
377 }
378
379 =item * shell-style '#'-comments
380
381 Whenever JSON allows whitespace, shell-style comments are additionally
382 allowed. They are terminated by the first carriage-return or line-feed
383 character, after which more white-space and comments are allowed.
384
385 [
386 1, # this comment not allowed in JSON
387 # neither this one...
388 ]
389
390 =back
391
392 =item $json = $json->canonical ([$enable])
393
394 =item $enabled = $json->get_canonical
395
396 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
397 by sorting their keys. This is adding a comparatively high overhead.
398
399 If C<$enable> is false, then the C<encode> method will output key-value
400 pairs in the order Perl stores them (which will likely change between runs
401 of the same script).
402
403 This option is useful if you want the same data structure to be encoded as
404 the same JSON text (given the same overall settings). If it is disabled,
405 the same hash might be encoded differently even if contains the same data,
406 as key-value pairs have no inherent ordering in Perl.
407
408 This setting has no effect when decoding JSON texts.
409
410 =item $json = $json->allow_nonref ([$enable])
411
412 =item $enabled = $json->get_allow_nonref
413
414 If C<$enable> is true (or missing), then the C<encode> method can convert a
415 non-reference into its corresponding string, number or null JSON value,
416 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
417 values instead of croaking.
418
419 If C<$enable> is false, then the C<encode> method will croak if it isn't
420 passed an arrayref or hashref, as JSON texts must either be an object
421 or array. Likewise, C<decode> will croak if given something that is not a
422 JSON object or array.
423
424 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
425 resulting in an invalid JSON text:
426
427 JSON::XS->new->allow_nonref->encode ("Hello, World!")
428 => "Hello, World!"
429
430 =item $json = $json->allow_blessed ([$enable])
431
432 =item $enabled = $json->get_allow_blessed
433
434 If C<$enable> is true (or missing), then the C<encode> method will not
435 barf when it encounters a blessed reference. Instead, the value of the
436 B<convert_blessed> option will decide whether C<null> (C<convert_blessed>
437 disabled or no C<TO_JSON> method found) or a representation of the
438 object (C<convert_blessed> enabled and C<TO_JSON> method found) is being
439 encoded. Has no effect on C<decode>.
440
441 If C<$enable> is false (the default), then C<encode> will throw an
442 exception when it encounters a blessed object.
443
444 =item $json = $json->convert_blessed ([$enable])
445
446 =item $enabled = $json->get_convert_blessed
447
448 If C<$enable> is true (or missing), then C<encode>, upon encountering a
449 blessed object, will check for the availability of the C<TO_JSON> method
450 on the object's class. If found, it will be called in scalar context
451 and the resulting scalar will be encoded instead of the object. If no
452 C<TO_JSON> method is found, the value of C<allow_blessed> will decide what
453 to do.
454
455 The C<TO_JSON> method may safely call die if it wants. If C<TO_JSON>
456 returns other blessed objects, those will be handled in the same
457 way. C<TO_JSON> must take care of not causing an endless recursion cycle
458 (== crash) in this case. The name of C<TO_JSON> was chosen because other
459 methods called by the Perl core (== not by the user of the object) are
460 usually in upper case letters and to avoid collisions with the C<to_json>
461 function.
462
463 This setting does not yet influence C<decode> in any way, but in the
464 future, global hooks might get installed that influence C<decode> and are
465 enabled by this setting.
466
467 If C<$enable> is false, then the C<allow_blessed> setting will decide what
468 to do when a blessed object is found.
469
470 =item $json = $json->filter_json_object ([$coderef->($hashref)])
471
472 When C<$coderef> is specified, it will be called from C<decode> each
473 time it decodes a JSON object. The only argument is a reference to the
474 newly-created hash. If the code references returns a single scalar (which
475 need not be a reference), this value (i.e. a copy of that scalar to avoid
476 aliasing) is inserted into the deserialised data structure. If it returns
477 an empty list (NOTE: I<not> C<undef>, which is a valid scalar), the
478 original deserialised hash will be inserted. This setting can slow down
479 decoding considerably.
480
481 When C<$coderef> is omitted or undefined, any existing callback will
482 be removed and C<decode> will not change the deserialised hash in any
483 way.
484
485 Example, convert all JSON objects into the integer 5:
486
487 my $js = JSON::XS->new->filter_json_object (sub { 5 });
488 # returns [5]
489 $js->decode ('[{}]')
490 # throw an exception because allow_nonref is not enabled
491 # so a lone 5 is not allowed.
492 $js->decode ('{"a":1, "b":2}');
493
494 =item $json = $json->filter_json_single_key_object ($key [=> $coderef->($value)])
495
496 Works remotely similar to C<filter_json_object>, but is only called for
497 JSON objects having a single key named C<$key>.
498
499 This C<$coderef> is called before the one specified via
500 C<filter_json_object>, if any. It gets passed the single value in the JSON
501 object. If it returns a single value, it will be inserted into the data
502 structure. If it returns nothing (not even C<undef> but the empty list),
503 the callback from C<filter_json_object> will be called next, as if no
504 single-key callback were specified.
505
506 If C<$coderef> is omitted or undefined, the corresponding callback will be
507 disabled. There can only ever be one callback for a given key.
508
509 As this callback gets called less often then the C<filter_json_object>
510 one, decoding speed will not usually suffer as much. Therefore, single-key
511 objects make excellent targets to serialise Perl objects into, especially
512 as single-key JSON objects are as close to the type-tagged value concept
513 as JSON gets (it's basically an ID/VALUE tuple). Of course, JSON does not
514 support this in any way, so you need to make sure your data never looks
515 like a serialised Perl hash.
516
517 Typical names for the single object key are C<__class_whatever__>, or
518 C<$__dollars_are_rarely_used__$> or C<}ugly_brace_placement>, or even
519 things like C<__class_md5sum(classname)__>, to reduce the risk of clashing
520 with real hashes.
521
522 Example, decode JSON objects of the form C<< { "__widget__" => <id> } >>
523 into the corresponding C<< $WIDGET{<id>} >> object:
524
525 # return whatever is in $WIDGET{5}:
526 JSON::XS
527 ->new
528 ->filter_json_single_key_object (__widget__ => sub {
529 $WIDGET{ $_[0] }
530 })
531 ->decode ('{"__widget__": 5')
532
533 # this can be used with a TO_JSON method in some "widget" class
534 # for serialisation to json:
535 sub WidgetBase::TO_JSON {
536 my ($self) = @_;
537
538 unless ($self->{id}) {
539 $self->{id} = ..get..some..id..;
540 $WIDGET{$self->{id}} = $self;
541 }
542
543 { __widget__ => $self->{id} }
544 }
545
546 =item $json = $json->shrink ([$enable])
547
548 =item $enabled = $json->get_shrink
549
550 Perl usually over-allocates memory a bit when allocating space for
551 strings. This flag optionally resizes strings generated by either
552 C<encode> or C<decode> to their minimum size possible. This can save
553 memory when your JSON texts are either very very long or you have many
554 short strings. It will also try to downgrade any strings to octet-form
555 if possible: perl stores strings internally either in an encoding called
556 UTF-X or in octet-form. The latter cannot store everything but uses less
557 space in general (and some buggy Perl or C code might even rely on that
558 internal representation being used).
559
560 The actual definition of what shrink does might change in future versions,
561 but it will always try to save space at the expense of time.
562
563 If C<$enable> is true (or missing), the string returned by C<encode> will
564 be shrunk-to-fit, while all strings generated by C<decode> will also be
565 shrunk-to-fit.
566
567 If C<$enable> is false, then the normal perl allocation algorithms are used.
568 If you work with your data, then this is likely to be faster.
569
570 In the future, this setting might control other things, such as converting
571 strings that look like integers or floats into integers or floats
572 internally (there is no difference on the Perl level), saving space.
573
574 =item $json = $json->max_depth ([$maximum_nesting_depth])
575
576 =item $max_depth = $json->get_max_depth
577
578 Sets the maximum nesting level (default C<512>) accepted while encoding
579 or decoding. If the JSON text or Perl data structure has an equal or
580 higher nesting level then this limit, then the encoder and decoder will
581 stop and croak at that point.
582
583 Nesting level is defined by number of hash- or arrayrefs that the encoder
584 needs to traverse to reach a given point or the number of C<{> or C<[>
585 characters without their matching closing parenthesis crossed to reach a
586 given character in a string.
587
588 Setting the maximum depth to one disallows any nesting, so that ensures
589 that the object is only a single hash/object or array.
590
591 The argument to C<max_depth> will be rounded up to the next highest power
592 of two. If no argument is given, the highest possible setting will be
593 used, which is rarely useful.
594
595 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
596
597 =item $json = $json->max_size ([$maximum_string_size])
598
599 =item $max_size = $json->get_max_size
600
601 Set the maximum length a JSON text may have (in bytes) where decoding is
602 being attempted. The default is C<0>, meaning no limit. When C<decode>
603 is called on a string longer then this number of characters it will not
604 attempt to decode the string but throw an exception. This setting has no
605 effect on C<encode> (yet).
606
607 The argument to C<max_size> will be rounded up to the next B<highest>
608 power of two (so may be more than requested). If no argument is given, the
609 limit check will be deactivated (same as when C<0> is specified).
610
611 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
612
613 =item $json_text = $json->encode ($perl_scalar)
614
615 Converts the given Perl data structure (a simple scalar or a reference
616 to a hash or array) to its JSON representation. Simple scalars will be
617 converted into JSON string or number sequences, while references to arrays
618 become JSON arrays and references to hashes become JSON objects. Undefined
619 Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
620 nor C<false> values will be generated.
621
622 =item $perl_scalar = $json->decode ($json_text)
623
624 The opposite of C<encode>: expects a JSON text and tries to parse it,
625 returning the resulting simple scalar or reference. Croaks on error.
626
627 JSON numbers and strings become simple Perl scalars. JSON arrays become
628 Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
629 C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
630
631 =item ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
632
633 This works like the C<decode> method, but instead of raising an exception
634 when there is trailing garbage after the first JSON object, it will
635 silently stop parsing there and return the number of characters consumed
636 so far.
637
638 This is useful if your JSON texts are not delimited by an outer protocol
639 (which is not the brightest thing to do in the first place) and you need
640 to know where the JSON text ends.
641
642 JSON::XS->new->decode_prefix ("[1] the tail")
643 => ([], 3)
644
645 =back
646
647
648 =head1 MAPPING
649
650 This section describes how JSON::XS maps Perl values to JSON values and
651 vice versa. These mappings are designed to "do the right thing" in most
652 circumstances automatically, preserving round-tripping characteristics
653 (what you put in comes out as something equivalent).
654
655 For the more enlightened: note that in the following descriptions,
656 lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
657 refers to the abstract Perl language itself.
658
659
660 =head2 JSON -> PERL
661
662 =over 4
663
664 =item object
665
666 A JSON object becomes a reference to a hash in Perl. No ordering of object
667 keys is preserved (JSON does not preserve object key ordering itself).
668
669 =item array
670
671 A JSON array becomes a reference to an array in Perl.
672
673 =item string
674
675 A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
676 are represented by the same codepoints in the Perl string, so no manual
677 decoding is necessary.
678
679 =item number
680
681 A JSON number becomes either an integer, numeric (floating point) or
682 string scalar in perl, depending on its range and any fractional parts. On
683 the Perl level, there is no difference between those as Perl handles all
684 the conversion details, but an integer may take slightly less memory and
685 might represent more values exactly than (floating point) numbers.
686
687 If the number consists of digits only, JSON::XS will try to represent
688 it as an integer value. If that fails, it will try to represent it as
689 a numeric (floating point) value if that is possible without loss of
690 precision. Otherwise it will preserve the number as a string value.
691
692 Numbers containing a fractional or exponential part will always be
693 represented as numeric (floating point) values, possibly at a loss of
694 precision.
695
696 This might create round-tripping problems as numbers might become strings,
697 but as Perl is typeless there is no other way to do it.
698
699 =item true, false
700
701 These JSON atoms become C<JSON::XS::true> and C<JSON::XS::false>,
702 respectively. They are overloaded to act almost exactly like the numbers
703 C<1> and C<0>. You can check whether a scalar is a JSON boolean by using
704 the C<JSON::XS::is_bool> function.
705
706 =item null
707
708 A JSON null atom becomes C<undef> in Perl.
709
710 =back
711
712
713 =head2 PERL -> JSON
714
715 The mapping from Perl to JSON is slightly more difficult, as Perl is a
716 truly typeless language, so we can only guess which JSON type is meant by
717 a Perl value.
718
719 =over 4
720
721 =item hash references
722
723 Perl hash references become JSON objects. As there is no inherent ordering
724 in hash keys (or JSON objects), they will usually be encoded in a
725 pseudo-random order that can change between runs of the same program but
726 stays generally the same within a single run of a program. JSON::XS can
727 optionally sort the hash keys (determined by the I<canonical> flag), so
728 the same datastructure will serialise to the same JSON text (given same
729 settings and version of JSON::XS), but this incurs a runtime overhead
730 and is only rarely useful, e.g. when you want to compare some JSON text
731 against another for equality.
732
733 =item array references
734
735 Perl array references become JSON arrays.
736
737 =item other references
738
739 Other unblessed references are generally not allowed and will cause an
740 exception to be thrown, except for references to the integers C<0> and
741 C<1>, which get turned into C<false> and C<true> atoms in JSON. You can
742 also use C<JSON::XS::false> and C<JSON::XS::true> to improve readability.
743
744 to_json [\0,JSON::XS::true] # yields [false,true]
745
746 =item JSON::XS::true, JSON::XS::false
747
748 These special values become JSON true and JSON false values,
749 respectively. You can also use C<\1> and C<\0> directly if you want.
750
751 =item blessed objects
752
753 Blessed objects are not allowed. JSON::XS currently tries to encode their
754 underlying representation (hash- or arrayref), but this behaviour might
755 change in future versions.
756
757 =item simple scalars
758
759 Simple Perl scalars (any scalar that is not a reference) are the most
760 difficult objects to encode: JSON::XS will encode undefined scalars as
761 JSON null value, scalars that have last been used in a string context
762 before encoding as JSON strings and anything else as number value:
763
764 # dump as number
765 to_json [2] # yields [2]
766 to_json [-3.0e17] # yields [-3e+17]
767 my $value = 5; to_json [$value] # yields [5]
768
769 # used as string, so dump as string
770 print $value;
771 to_json [$value] # yields ["5"]
772
773 # undef becomes null
774 to_json [undef] # yields [null]
775
776 You can force the type to be a JSON string by stringifying it:
777
778 my $x = 3.1; # some variable containing a number
779 "$x"; # stringified
780 $x .= ""; # another, more awkward way to stringify
781 print $x; # perl does it for you, too, quite often
782
783 You can force the type to be a JSON number by numifying it:
784
785 my $x = "3"; # some variable containing a string
786 $x += 0; # numify it, ensuring it will be dumped as a number
787 $x *= 1; # same thing, the choice is yours.
788
789 You can not currently force the type in other, less obscure, ways. Tell me
790 if you need this capability.
791
792 =back
793
794
795 =head1 COMPARISON
796
797 As already mentioned, this module was created because none of the existing
798 JSON modules could be made to work correctly. First I will describe the
799 problems (or pleasures) I encountered with various existing JSON modules,
800 followed by some benchmark values. JSON::XS was designed not to suffer
801 from any of these problems or limitations.
802
803 =over 4
804
805 =item JSON 1.07
806
807 Slow (but very portable, as it is written in pure Perl).
808
809 Undocumented/buggy Unicode handling (how JSON handles Unicode values is
810 undocumented. One can get far by feeding it Unicode strings and doing
811 en-/decoding oneself, but Unicode escapes are not working properly).
812
813 No round-tripping (strings get clobbered if they look like numbers, e.g.
814 the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
815 decode into the number 2.
816
817 =item JSON::PC 0.01
818
819 Very fast.
820
821 Undocumented/buggy Unicode handling.
822
823 No round-tripping.
824
825 Has problems handling many Perl values (e.g. regex results and other magic
826 values will make it croak).
827
828 Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
829 which is not a valid JSON text.
830
831 Unmaintained (maintainer unresponsive for many months, bugs are not
832 getting fixed).
833
834 =item JSON::Syck 0.21
835
836 Very buggy (often crashes).
837
838 Very inflexible (no human-readable format supported, format pretty much
839 undocumented. I need at least a format for easy reading by humans and a
840 single-line compact format for use in a protocol, and preferably a way to
841 generate ASCII-only JSON texts).
842
843 Completely broken (and confusingly documented) Unicode handling (Unicode
844 escapes are not working properly, you need to set ImplicitUnicode to
845 I<different> values on en- and decoding to get symmetric behaviour).
846
847 No round-tripping (simple cases work, but this depends on whether the scalar
848 value was used in a numeric context or not).
849
850 Dumping hashes may skip hash values depending on iterator state.
851
852 Unmaintained (maintainer unresponsive for many months, bugs are not
853 getting fixed).
854
855 Does not check input for validity (i.e. will accept non-JSON input and
856 return "something" instead of raising an exception. This is a security
857 issue: imagine two banks transferring money between each other using
858 JSON. One bank might parse a given non-JSON request and deduct money,
859 while the other might reject the transaction with a syntax error. While a
860 good protocol will at least recover, that is extra unnecessary work and
861 the transaction will still not succeed).
862
863 =item JSON::DWIW 0.04
864
865 Very fast. Very natural. Very nice.
866
867 Undocumented Unicode handling (but the best of the pack. Unicode escapes
868 still don't get parsed properly).
869
870 Very inflexible.
871
872 No round-tripping.
873
874 Does not generate valid JSON texts (key strings are often unquoted, empty keys
875 result in nothing being output)
876
877 Does not check input for validity.
878
879 =back
880
881
882 =head2 JSON and YAML
883
884 You often hear that JSON is a subset (or a close subset) of YAML. This is,
885 however, a mass hysteria and very far from the truth. In general, there is
886 no way to configure JSON::XS to output a data structure as valid YAML.
887
888 If you really must use JSON::XS to generate YAML, you should use this
889 algorithm (subject to change in future versions):
890
891 my $to_yaml = JSON::XS->new->utf8->space_after (1);
892 my $yaml = $to_yaml->encode ($ref) . "\n";
893
894 This will usually generate JSON texts that also parse as valid
895 YAML. Please note that YAML has hardcoded limits on (simple) object key
896 lengths that JSON doesn't have, so you should make sure that your hash
897 keys are noticeably shorter than the 1024 characters YAML allows.
898
899 There might be other incompatibilities that I am not aware of. In general
900 you should not try to generate YAML with a JSON generator or vice versa,
901 or try to parse JSON with a YAML parser or vice versa: chances are high
902 that you will run into severe interoperability problems.
903
904
905 =head2 SPEED
906
907 It seems that JSON::XS is surprisingly fast, as shown in the following
908 tables. They have been generated with the help of the C<eg/bench> program
909 in the JSON::XS distribution, to make it easy to compare on your own
910 system.
911
912 First comes a comparison between various modules using a very short
913 single-line JSON string:
914
915 {"method": "handleMessage", "params": ["user1", "we were just talking"], \
916 "id": null, "array":[1,11,234,-5,1e5,1e7, true, false]}
917
918 It shows the number of encodes/decodes per second (JSON::XS uses
919 the functional interface, while JSON::XS/2 uses the OO interface
920 with pretty-printing and hashkey sorting enabled, JSON::XS/3 enables
921 shrink). Higher is better:
922
923 module | encode | decode |
924 -----------|------------|------------|
925 JSON 1.x | 4990.842 | 4088.813 |
926 JSON::DWIW | 51653.990 | 71575.154 |
927 JSON::PC | 65948.176 | 74631.744 |
928 JSON::PP | 8931.652 | 3817.168 |
929 JSON::Syck | 24877.248 | 27776.848 |
930 JSON::XS | 388361.481 | 227951.304 |
931 JSON::XS/2 | 227951.304 | 218453.333 |
932 JSON::XS/3 | 338250.323 | 218453.333 |
933 Storable | 16500.016 | 135300.129 |
934 -----------+------------+------------+
935
936 That is, JSON::XS is about five times faster than JSON::DWIW on encoding,
937 about three times faster on decoding, and over forty times faster
938 than JSON, even with pretty-printing and key sorting. It also compares
939 favourably to Storable for small amounts of data.
940
941 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
942 search API (http://nanoref.com/yahooapis/mgPdGg):
943
944 module | encode | decode |
945 -----------|------------|------------|
946 JSON 1.x | 55.260 | 34.971 |
947 JSON::DWIW | 825.228 | 1082.513 |
948 JSON::PC | 3571.444 | 2394.829 |
949 JSON::PP | 210.987 | 32.574 |
950 JSON::Syck | 552.551 | 787.544 |
951 JSON::XS | 5780.463 | 4854.519 |
952 JSON::XS/2 | 3869.998 | 4798.975 |
953 JSON::XS/3 | 5862.880 | 4798.975 |
954 Storable | 4445.002 | 5235.027 |
955 -----------+------------+------------+
956
957 Again, JSON::XS leads by far (except for Storable which non-surprisingly
958 decodes faster).
959
960 On large strings containing lots of high Unicode characters, some modules
961 (such as JSON::PC) seem to decode faster than JSON::XS, but the result
962 will be broken due to missing (or wrong) Unicode handling. Others refuse
963 to decode or encode properly, so it was impossible to prepare a fair
964 comparison table for that case.
965
966
967 =head1 SECURITY CONSIDERATIONS
968
969 When you are using JSON in a protocol, talking to untrusted potentially
970 hostile creatures requires relatively few measures.
971
972 First of all, your JSON decoder should be secure, that is, should not have
973 any buffer overflows. Obviously, this module should ensure that and I am
974 trying hard on making that true, but you never know.
975
976 Second, you need to avoid resource-starving attacks. That means you should
977 limit the size of JSON texts you accept, or make sure then when your
978 resources run out, that's just fine (e.g. by using a separate process that
979 can crash safely). The size of a JSON text in octets or characters is
980 usually a good indication of the size of the resources required to decode
981 it into a Perl structure. While JSON::XS can check the size of the JSON
982 text, it might be too late when you already have it in memory, so you
983 might want to check the size before you accept the string.
984
985 Third, JSON::XS recurses using the C stack when decoding objects and
986 arrays. The C stack is a limited resource: for instance, on my amd64
987 machine with 8MB of stack size I can decode around 180k nested arrays but
988 only 14k nested JSON objects (due to perl itself recursing deeply on croak
989 to free the temporary). If that is exceeded, the program crashes. to be
990 conservative, the default nesting limit is set to 512. If your process
991 has a smaller stack, you should adjust this setting accordingly with the
992 C<max_depth> method.
993
994 And last but least, something else could bomb you that I forgot to think
995 of. In that case, you get to keep the pieces. I am always open for hints,
996 though...
997
998 If you are using JSON::XS to return packets to consumption
999 by JavaScript scripts in a browser you should have a look at
1000 L<http://jpsykes.com/47/practical-csrf-and-json-security> to see whether
1001 you are vulnerable to some common attack vectors (which really are browser
1002 design bugs, but it is still you who will have to deal with it, as major
1003 browser developers care only for features, not about doing security
1004 right).
1005
1006
1007 =head1 THREADS
1008
1009 This module is I<not> guaranteed to be thread safe and there are no
1010 plans to change this until Perl gets thread support (as opposed to the
1011 horribly slow so-called "threads" which are simply slow and bloated
1012 process simulations - use fork, its I<much> faster, cheaper, better).
1013
1014 (It might actually work, but you have been warned).
1015
1016
1017 =head1 BUGS
1018
1019 While the goal of this module is to be correct, that unfortunately does
1020 not mean its bug-free, only that I think its design is bug-free. It is
1021 still relatively early in its development. If you keep reporting bugs they
1022 will be fixed swiftly, though.
1023
1024 Please refrain from using rt.cpan.org or any other bug reporting
1025 service. I put the contact address into my modules for a reason.
1026
1027 =cut
1028
1029 our $true = do { bless \(my $dummy = 1), "JSON::XS::Boolean" };
1030 our $false = do { bless \(my $dummy = 0), "JSON::XS::Boolean" };
1031
1032 sub true() { $true }
1033 sub false() { $false }
1034
1035 sub is_bool($) {
1036 UNIVERSAL::isa $_[0], "JSON::XS::Boolean"
1037 # or UNIVERSAL::isa $_[0], "JSON::Literal"
1038 }
1039
1040 XSLoader::load "JSON::XS", $VERSION;
1041
1042 package JSON::XS::Boolean;
1043
1044 use overload
1045 "0+" => sub { ${$_[0]} },
1046 "++" => sub { $_[0] = ${$_[0]} + 1 },
1047 "--" => sub { $_[0] = ${$_[0]} - 1 },
1048 fallback => 1;
1049
1050 1;
1051
1052 =head1 AUTHOR
1053
1054 Marc Lehmann <schmorp@schmorp.de>
1055 http://home.schmorp.de/
1056
1057 =cut
1058