ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.169
Committed: Thu Nov 15 20:49:12 2018 UTC (5 years, 6 months ago) by root
Branch: MAIN
Changes since 1.168: +49 -43 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 JSON::XS - JSON serialising/deserialising, done correctly and fast
4
5 =encoding utf-8
6
7 JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
8 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
9
10 =head1 SYNOPSIS
11
12 use JSON::XS;
13
14 # exported functions, they croak on error
15 # and expect/generate UTF-8
16
17 $utf8_encoded_json_text = encode_json $perl_hash_or_arrayref;
18 $perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;
19
20 # OO-interface
21
22 $coder = JSON::XS->new->ascii->pretty->allow_nonref;
23 $pretty_printed_unencoded = $coder->encode ($perl_scalar);
24 $perl_scalar = $coder->decode ($unicode_json_text);
25
26 # Note that JSON version 2.0 and above will automatically use JSON::XS
27 # if available, at virtually no speed overhead either, so you should
28 # be able to just:
29
30 use JSON;
31
32 # and do the same things, except that you have a pure-perl fallback now.
33
34 =head1 DESCRIPTION
35
36 This module converts Perl data structures to JSON and vice versa. Its
37 primary goal is to be I<correct> and its secondary goal is to be
38 I<fast>. To reach the latter goal it was written in C.
39
40 Beginning with version 2.0 of the JSON module, when both JSON and
41 JSON::XS are installed, then JSON will fall back on JSON::XS (this can be
42 overridden) with no overhead due to emulation (by inheriting constructor
43 and methods). If JSON::XS is not available, it will fall back to the
44 compatible JSON::PP module as backend, so using JSON instead of JSON::XS
45 gives you a portable JSON API that can be fast when you need it and
46 doesn't require a C compiler when that is a problem.
47
48 As this is the n-th-something JSON module on CPAN, what was the reason
49 to write yet another JSON module? While it seems there are many JSON
50 modules, none of them correctly handle all corner cases, and in most cases
51 their maintainers are unresponsive, gone missing, or not listening to bug
52 reports for other reasons.
53
54 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
55 vice versa.
56
57 =head2 FEATURES
58
59 =over 4
60
61 =item * correct Unicode handling
62
63 This module knows how to handle Unicode, documents how and when it does
64 so, and even documents what "correct" means.
65
66 =item * round-trip integrity
67
68 When you serialise a perl data structure using only data types supported
69 by JSON and Perl, the deserialised data structure is identical on the Perl
70 level. (e.g. the string "2.0" doesn't suddenly become "2" just because
71 it looks like a number). There I<are> minor exceptions to this, read the
72 MAPPING section below to learn about those.
73
74 =item * strict checking of JSON correctness
75
76 There is no guessing, no generating of illegal JSON texts by default,
77 and only JSON is accepted as input by default (the latter is a security
78 feature).
79
80 =item * fast
81
82 Compared to other JSON modules and other serialisers such as Storable,
83 this module usually compares favourably in terms of speed, too.
84
85 =item * simple to use
86
87 This module has both a simple functional interface as well as an object
88 oriented interface.
89
90 =item * reasonably versatile output formats
91
92 You can choose between the most compact guaranteed-single-line format
93 possible (nice for simple line-based protocols), a pure-ASCII format
94 (for when your transport is not 8-bit clean, still supports the whole
95 Unicode range), or a pretty-printed format (for when you want to read that
96 stuff). Or you can combine those features in whatever way you like.
97
98 =back
99
100 =cut
101
102 package JSON::XS;
103
104 use common::sense;
105
106 our $VERSION = 3.04;
107 our @ISA = qw(Exporter);
108
109 our @EXPORT = qw(encode_json decode_json);
110
111 use Exporter;
112 use XSLoader;
113
114 use Types::Serialiser ();
115
116 =head1 FUNCTIONAL INTERFACE
117
118 The following convenience methods are provided by this module. They are
119 exported by default:
120
121 =over 4
122
123 =item $json_text = encode_json $perl_scalar
124
125 Converts the given Perl data structure to a UTF-8 encoded, binary string
126 (that is, the string contains octets only). Croaks on error.
127
128 This function call is functionally identical to:
129
130 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
131
132 Except being faster.
133
134 =item $perl_scalar = decode_json $json_text
135
136 The opposite of C<encode_json>: expects a UTF-8 (binary) string and tries
137 to parse that as a UTF-8 encoded JSON text, returning the resulting
138 reference. Croaks on error.
139
140 This function call is functionally identical to:
141
142 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
143
144 Except being faster.
145
146 =back
147
148
149 =head1 A FEW NOTES ON UNICODE AND PERL
150
151 Since this often leads to confusion, here are a few very clear words on
152 how Unicode works in Perl, modulo bugs.
153
154 =over 4
155
156 =item 1. Perl strings can store characters with ordinal values > 255.
157
158 This enables you to store Unicode characters as single characters in a
159 Perl string - very natural.
160
161 =item 2. Perl does I<not> associate an encoding with your strings.
162
163 ... until you force it to, e.g. when matching it against a regex, or
164 printing the scalar to a file, in which case Perl either interprets your
165 string as locale-encoded text, octets/binary, or as Unicode, depending
166 on various settings. In no case is an encoding stored together with your
167 data, it is I<use> that decides encoding, not any magical meta data.
168
169 =item 3. The internal utf-8 flag has no meaning with regards to the
170 encoding of your string.
171
172 Just ignore that flag unless you debug a Perl bug, a module written in
173 XS or want to dive into the internals of perl. Otherwise it will only
174 confuse you, as, despite the name, it says nothing about how your string
175 is encoded. You can have Unicode strings with that flag set, with that
176 flag clear, and you can have binary data with that flag set and that flag
177 clear. Other possibilities exist, too.
178
179 If you didn't know about that flag, just the better, pretend it doesn't
180 exist.
181
182 =item 4. A "Unicode String" is simply a string where each character can be
183 validly interpreted as a Unicode code point.
184
185 If you have UTF-8 encoded data, it is no longer a Unicode string, but a
186 Unicode string encoded in UTF-8, giving you a binary string.
187
188 =item 5. A string containing "high" (> 255) character values is I<not> a UTF-8 string.
189
190 It's a fact. Learn to live with it.
191
192 =back
193
194 I hope this helps :)
195
196
197 =head1 OBJECT-ORIENTED INTERFACE
198
199 The object oriented interface lets you configure your own encoding or
200 decoding style, within the limits of supported formats.
201
202 =over 4
203
204 =item $json = new JSON::XS
205
206 Creates a new JSON::XS object that can be used to de/encode JSON
207 strings. All boolean flags described below are by default I<disabled>
208 (with the exception of C<allow_nonref>, which defaults to I<enabled> since
209 version C<4.0>).
210
211 The mutators for flags all return the JSON object again and thus calls can
212 be chained:
213
214 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
215 => {"a": [1, 2]}
216
217 =item $json = $json->ascii ([$enable])
218
219 =item $enabled = $json->get_ascii
220
221 If C<$enable> is true (or missing), then the C<encode> method will not
222 generate characters outside the code range C<0..127> (which is ASCII). Any
223 Unicode characters outside that range will be escaped using either a
224 single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
225 as per RFC4627. The resulting encoded JSON text can be treated as a native
226 Unicode string, an ascii-encoded, latin1-encoded or UTF-8 encoded string,
227 or any other superset of ASCII.
228
229 If C<$enable> is false, then the C<encode> method will not escape Unicode
230 characters unless required by the JSON syntax or other flags. This results
231 in a faster and more compact format.
232
233 See also the section I<ENCODING/CODESET FLAG NOTES> later in this
234 document.
235
236 The main use for this flag is to produce JSON texts that can be
237 transmitted over a 7-bit channel, as the encoded JSON texts will not
238 contain any 8 bit characters.
239
240 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
241 => ["\ud801\udc01"]
242
243 =item $json = $json->latin1 ([$enable])
244
245 =item $enabled = $json->get_latin1
246
247 If C<$enable> is true (or missing), then the C<encode> method will encode
248 the resulting JSON text as latin1 (or iso-8859-1), escaping any characters
249 outside the code range C<0..255>. The resulting string can be treated as a
250 latin1-encoded JSON text or a native Unicode string. The C<decode> method
251 will not be affected in any way by this flag, as C<decode> by default
252 expects Unicode, which is a strict superset of latin1.
253
254 If C<$enable> is false, then the C<encode> method will not escape Unicode
255 characters unless required by the JSON syntax or other flags.
256
257 See also the section I<ENCODING/CODESET FLAG NOTES> later in this
258 document.
259
260 The main use for this flag is efficiently encoding binary data as JSON
261 text, as most octets will not be escaped, resulting in a smaller encoded
262 size. The disadvantage is that the resulting JSON text is encoded
263 in latin1 (and must correctly be treated as such when storing and
264 transferring), a rare encoding for JSON. It is therefore most useful when
265 you want to store data structures known to contain binary data efficiently
266 in files or databases, not when talking to other JSON encoders/decoders.
267
268 JSON::XS->new->latin1->encode (["\x{89}\x{abc}"]
269 => ["\x{89}\\u0abc"] # (perl syntax, U+abc escaped, U+89 not)
270
271 =item $json = $json->utf8 ([$enable])
272
273 =item $enabled = $json->get_utf8
274
275 If C<$enable> is true (or missing), then the C<encode> method will encode
276 the JSON result into UTF-8, as required by many protocols, while the
277 C<decode> method expects to be handed a UTF-8-encoded string. Please
278 note that UTF-8-encoded strings do not contain any characters outside the
279 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
280 versions, enabling this option might enable autodetection of the UTF-16
281 and UTF-32 encoding families, as described in RFC4627.
282
283 If C<$enable> is false, then the C<encode> method will return the JSON
284 string as a (non-encoded) Unicode string, while C<decode> expects thus a
285 Unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
286 to be done yourself, e.g. using the Encode module.
287
288 See also the section I<ENCODING/CODESET FLAG NOTES> later in this
289 document.
290
291 Example, output UTF-16BE-encoded JSON:
292
293 use Encode;
294 $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
295
296 Example, decode UTF-32LE-encoded JSON:
297
298 use Encode;
299 $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
300
301 =item $json = $json->pretty ([$enable])
302
303 This enables (or disables) all of the C<indent>, C<space_before> and
304 C<space_after> (and in the future possibly more) flags in one call to
305 generate the most readable (or most compact) form possible.
306
307 Example, pretty-print some simple structure:
308
309 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
310 =>
311 {
312 "a" : [
313 1,
314 2
315 ]
316 }
317
318 =item $json = $json->indent ([$enable])
319
320 =item $enabled = $json->get_indent
321
322 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
323 format as output, putting every array member or object/hash key-value pair
324 into its own line, indenting them properly.
325
326 If C<$enable> is false, no newlines or indenting will be produced, and the
327 resulting JSON text is guaranteed not to contain any C<newlines>.
328
329 This setting has no effect when decoding JSON texts.
330
331 =item $json = $json->space_before ([$enable])
332
333 =item $enabled = $json->get_space_before
334
335 If C<$enable> is true (or missing), then the C<encode> method will add an extra
336 optional space before the C<:> separating keys from values in JSON objects.
337
338 If C<$enable> is false, then the C<encode> method will not add any extra
339 space at those places.
340
341 This setting has no effect when decoding JSON texts. You will also
342 most likely combine this setting with C<space_after>.
343
344 Example, space_before enabled, space_after and indent disabled:
345
346 {"key" :"value"}
347
348 =item $json = $json->space_after ([$enable])
349
350 =item $enabled = $json->get_space_after
351
352 If C<$enable> is true (or missing), then the C<encode> method will add an extra
353 optional space after the C<:> separating keys from values in JSON objects
354 and extra whitespace after the C<,> separating key-value pairs and array
355 members.
356
357 If C<$enable> is false, then the C<encode> method will not add any extra
358 space at those places.
359
360 This setting has no effect when decoding JSON texts.
361
362 Example, space_before and indent disabled, space_after enabled:
363
364 {"key": "value"}
365
366 =item $json = $json->relaxed ([$enable])
367
368 =item $enabled = $json->get_relaxed
369
370 If C<$enable> is true (or missing), then C<decode> will accept some
371 extensions to normal JSON syntax (see below). C<encode> will not be
372 affected in any way. I<Be aware that this option makes you accept invalid
373 JSON texts as if they were valid!>. I suggest only to use this option to
374 parse application-specific files written by humans (configuration files,
375 resource files etc.)
376
377 If C<$enable> is false (the default), then C<decode> will only accept
378 valid JSON texts.
379
380 Currently accepted extensions are:
381
382 =over 4
383
384 =item * list items can have an end-comma
385
386 JSON I<separates> array elements and key-value pairs with commas. This
387 can be annoying if you write JSON texts manually and want to be able to
388 quickly append elements, so this extension accepts comma at the end of
389 such items not just between them:
390
391 [
392 1,
393 2, <- this comma not normally allowed
394 ]
395 {
396 "k1": "v1",
397 "k2": "v2", <- this comma not normally allowed
398 }
399
400 =item * shell-style '#'-comments
401
402 Whenever JSON allows whitespace, shell-style comments are additionally
403 allowed. They are terminated by the first carriage-return or line-feed
404 character, after which more white-space and comments are allowed.
405
406 [
407 1, # this comment not allowed in JSON
408 # neither this one...
409 ]
410
411 =item * literal ASCII TAB characters in strings
412
413 Literal ASCII TAB characters are now allowed in strings (and treated as
414 C<\t>).
415
416 [
417 "Hello\tWorld",
418 "Hello<TAB>World", # literal <TAB> would not normally be allowed
419 ]
420
421 =back
422
423 =item $json = $json->canonical ([$enable])
424
425 =item $enabled = $json->get_canonical
426
427 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
428 by sorting their keys. This is adding a comparatively high overhead.
429
430 If C<$enable> is false, then the C<encode> method will output key-value
431 pairs in the order Perl stores them (which will likely change between runs
432 of the same script, and can change even within the same run from 5.18
433 onwards).
434
435 This option is useful if you want the same data structure to be encoded as
436 the same JSON text (given the same overall settings). If it is disabled,
437 the same hash might be encoded differently even if contains the same data,
438 as key-value pairs have no inherent ordering in Perl.
439
440 This setting has no effect when decoding JSON texts.
441
442 This setting has currently no effect on tied hashes.
443
444 =item $json = $json->allow_nonref ([$enable])
445
446 =item $enabled = $json->get_allow_nonref
447
448 Unlike other boolean options, this opotion is enabled by default beginning
449 with version C<4.0>. See L<SECURITY CONSIDERATIONS> for the gory details.
450
451 If C<$enable> is true (or missing), then the C<encode> method can convert a
452 non-reference into its corresponding string, number or null JSON value,
453 which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
454 values instead of croaking.
455
456 If C<$enable> is false, then the C<encode> method will croak if it isn't
457 passed an arrayref or hashref, as JSON texts must either be an object
458 or array. Likewise, C<decode> will croak if given something that is not a
459 JSON object or array.
460
461 Example, encode a Perl scalar as JSON value without enabled C<allow_nonref>,
462 resulting in an error:
463
464 JSON::XS->new->allow_nonref (0)->encode ("Hello, World!")
465 => hash- or arrayref expected...
466
467 =item $json = $json->allow_unknown ([$enable])
468
469 =item $enabled = $json->get_allow_unknown
470
471 If C<$enable> is true (or missing), then C<encode> will I<not> throw an
472 exception when it encounters values it cannot represent in JSON (for
473 example, filehandles) but instead will encode a JSON C<null> value. Note
474 that blessed objects are not included here and are handled separately by
475 c<allow_nonref>.
476
477 If C<$enable> is false (the default), then C<encode> will throw an
478 exception when it encounters anything it cannot encode as JSON.
479
480 This option does not affect C<decode> in any way, and it is recommended to
481 leave it off unless you know your communications partner.
482
483 =item $json = $json->allow_blessed ([$enable])
484
485 =item $enabled = $json->get_allow_blessed
486
487 See L<OBJECT SERIALISATION> for details.
488
489 If C<$enable> is true (or missing), then the C<encode> method will not
490 barf when it encounters a blessed reference that it cannot convert
491 otherwise. Instead, a JSON C<null> value is encoded instead of the object.
492
493 If C<$enable> is false (the default), then C<encode> will throw an
494 exception when it encounters a blessed object that it cannot convert
495 otherwise.
496
497 This setting has no effect on C<decode>.
498
499 =item $json = $json->convert_blessed ([$enable])
500
501 =item $enabled = $json->get_convert_blessed
502
503 See L<OBJECT SERIALISATION> for details.
504
505 If C<$enable> is true (or missing), then C<encode>, upon encountering a
506 blessed object, will check for the availability of the C<TO_JSON> method
507 on the object's class. If found, it will be called in scalar context and
508 the resulting scalar will be encoded instead of the object.
509
510 The C<TO_JSON> method may safely call die if it wants. If C<TO_JSON>
511 returns other blessed objects, those will be handled in the same
512 way. C<TO_JSON> must take care of not causing an endless recursion cycle
513 (== crash) in this case. The name of C<TO_JSON> was chosen because other
514 methods called by the Perl core (== not by the user of the object) are
515 usually in upper case letters and to avoid collisions with any C<to_json>
516 function or method.
517
518 If C<$enable> is false (the default), then C<encode> will not consider
519 this type of conversion.
520
521 This setting has no effect on C<decode>.
522
523 =item $json = $json->allow_tags ([$enable])
524
525 =item $enabled = $json->get_allow_tags
526
527 See L<OBJECT SERIALISATION> for details.
528
529 If C<$enable> is true (or missing), then C<encode>, upon encountering a
530 blessed object, will check for the availability of the C<FREEZE> method on
531 the object's class. If found, it will be used to serialise the object into
532 a nonstandard tagged JSON value (that JSON decoders cannot decode).
533
534 It also causes C<decode> to parse such tagged JSON values and deserialise
535 them via a call to the C<THAW> method.
536
537 If C<$enable> is false (the default), then C<encode> will not consider
538 this type of conversion, and tagged JSON values will cause a parse error
539 in C<decode>, as if tags were not part of the grammar.
540
541 =item $json = $json->filter_json_object ([$coderef->($hashref)])
542
543 When C<$coderef> is specified, it will be called from C<decode> each
544 time it decodes a JSON object. The only argument is a reference to
545 the newly-created hash. If the code reference returns a single scalar
546 (which need not be a reference), this value (or rather a copy of it) is
547 inserted into the deserialised data structure. If it returns an empty
548 list (NOTE: I<not> C<undef>, which is a valid scalar), the original
549 deserialised hash will be inserted. This setting can slow down decoding
550 considerably.
551
552 When C<$coderef> is omitted or undefined, any existing callback will
553 be removed and C<decode> will not change the deserialised hash in any
554 way.
555
556 Example, convert all JSON objects into the integer 5:
557
558 my $js = JSON::XS->new->filter_json_object (sub { 5 });
559 # returns [5]
560 $js->decode ('[{}]')
561 # throw an exception because allow_nonref is not enabled
562 # so a lone 5 is not allowed.
563 $js->decode ('{"a":1, "b":2}');
564
565 =item $json = $json->filter_json_single_key_object ($key [=> $coderef->($value)])
566
567 Works remotely similar to C<filter_json_object>, but is only called for
568 JSON objects having a single key named C<$key>.
569
570 This C<$coderef> is called before the one specified via
571 C<filter_json_object>, if any. It gets passed the single value in the JSON
572 object. If it returns a single value, it will be inserted into the data
573 structure. If it returns nothing (not even C<undef> but the empty list),
574 the callback from C<filter_json_object> will be called next, as if no
575 single-key callback were specified.
576
577 If C<$coderef> is omitted or undefined, the corresponding callback will be
578 disabled. There can only ever be one callback for a given key.
579
580 As this callback gets called less often then the C<filter_json_object>
581 one, decoding speed will not usually suffer as much. Therefore, single-key
582 objects make excellent targets to serialise Perl objects into, especially
583 as single-key JSON objects are as close to the type-tagged value concept
584 as JSON gets (it's basically an ID/VALUE tuple). Of course, JSON does not
585 support this in any way, so you need to make sure your data never looks
586 like a serialised Perl hash.
587
588 Typical names for the single object key are C<__class_whatever__>, or
589 C<$__dollars_are_rarely_used__$> or C<}ugly_brace_placement>, or even
590 things like C<__class_md5sum(classname)__>, to reduce the risk of clashing
591 with real hashes.
592
593 Example, decode JSON objects of the form C<< { "__widget__" => <id> } >>
594 into the corresponding C<< $WIDGET{<id>} >> object:
595
596 # return whatever is in $WIDGET{5}:
597 JSON::XS
598 ->new
599 ->filter_json_single_key_object (__widget__ => sub {
600 $WIDGET{ $_[0] }
601 })
602 ->decode ('{"__widget__": 5')
603
604 # this can be used with a TO_JSON method in some "widget" class
605 # for serialisation to json:
606 sub WidgetBase::TO_JSON {
607 my ($self) = @_;
608
609 unless ($self->{id}) {
610 $self->{id} = ..get..some..id..;
611 $WIDGET{$self->{id}} = $self;
612 }
613
614 { __widget__ => $self->{id} }
615 }
616
617 =item $json = $json->shrink ([$enable])
618
619 =item $enabled = $json->get_shrink
620
621 Perl usually over-allocates memory a bit when allocating space for
622 strings. This flag optionally resizes strings generated by either
623 C<encode> or C<decode> to their minimum size possible. This can save
624 memory when your JSON texts are either very very long or you have many
625 short strings. It will also try to downgrade any strings to octet-form
626 if possible: perl stores strings internally either in an encoding called
627 UTF-X or in octet-form. The latter cannot store everything but uses less
628 space in general (and some buggy Perl or C code might even rely on that
629 internal representation being used).
630
631 The actual definition of what shrink does might change in future versions,
632 but it will always try to save space at the expense of time.
633
634 If C<$enable> is true (or missing), the string returned by C<encode> will
635 be shrunk-to-fit, while all strings generated by C<decode> will also be
636 shrunk-to-fit.
637
638 If C<$enable> is false, then the normal perl allocation algorithms are used.
639 If you work with your data, then this is likely to be faster.
640
641 In the future, this setting might control other things, such as converting
642 strings that look like integers or floats into integers or floats
643 internally (there is no difference on the Perl level), saving space.
644
645 =item $json = $json->max_depth ([$maximum_nesting_depth])
646
647 =item $max_depth = $json->get_max_depth
648
649 Sets the maximum nesting level (default C<512>) accepted while encoding
650 or decoding. If a higher nesting level is detected in JSON text or a Perl
651 data structure, then the encoder and decoder will stop and croak at that
652 point.
653
654 Nesting level is defined by number of hash- or arrayrefs that the encoder
655 needs to traverse to reach a given point or the number of C<{> or C<[>
656 characters without their matching closing parenthesis crossed to reach a
657 given character in a string.
658
659 Setting the maximum depth to one disallows any nesting, so that ensures
660 that the object is only a single hash/object or array.
661
662 If no argument is given, the highest possible setting will be used, which
663 is rarely useful.
664
665 Note that nesting is implemented by recursion in C. The default value has
666 been chosen to be as large as typical operating systems allow without
667 crashing.
668
669 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
670
671 =item $json = $json->max_size ([$maximum_string_size])
672
673 =item $max_size = $json->get_max_size
674
675 Set the maximum length a JSON text may have (in bytes) where decoding is
676 being attempted. The default is C<0>, meaning no limit. When C<decode>
677 is called on a string that is longer then this many bytes, it will not
678 attempt to decode the string but throw an exception. This setting has no
679 effect on C<encode> (yet).
680
681 If no argument is given, the limit check will be deactivated (same as when
682 C<0> is specified).
683
684 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
685
686 =item $json_text = $json->encode ($perl_scalar)
687
688 Converts the given Perl value or data structure to its JSON
689 representation. Croaks on error.
690
691 =item $perl_scalar = $json->decode ($json_text)
692
693 The opposite of C<encode>: expects a JSON text and tries to parse it,
694 returning the resulting simple scalar or reference. Croaks on error.
695
696 =item ($perl_scalar, $characters) = $json->decode_prefix ($json_text)
697
698 This works like the C<decode> method, but instead of raising an exception
699 when there is trailing garbage after the first JSON object, it will
700 silently stop parsing there and return the number of characters consumed
701 so far.
702
703 This is useful if your JSON texts are not delimited by an outer protocol
704 and you need to know where the JSON text ends.
705
706 JSON::XS->new->decode_prefix ("[1] the tail")
707 => ([1], 3)
708
709 =back
710
711
712 =head1 INCREMENTAL PARSING
713
714 In some cases, there is the need for incremental parsing of JSON
715 texts. While this module always has to keep both JSON text and resulting
716 Perl data structure in memory at one time, it does allow you to parse a
717 JSON stream incrementally. It does so by accumulating text until it has
718 a full JSON object, which it then can decode. This process is similar to
719 using C<decode_prefix> to see if a full JSON object is available, but
720 is much more efficient (and can be implemented with a minimum of method
721 calls).
722
723 JSON::XS will only attempt to parse the JSON text once it is sure it
724 has enough text to get a decisive result, using a very simple but
725 truly incremental parser. This means that it sometimes won't stop as
726 early as the full parser, for example, it doesn't detect mismatched
727 parentheses. The only thing it guarantees is that it starts decoding as
728 soon as a syntactically valid JSON text has been seen. This means you need
729 to set resource limits (e.g. C<max_size>) to ensure the parser will stop
730 parsing in the presence if syntax errors.
731
732 The following methods implement this incremental parser.
733
734 =over 4
735
736 =item [void, scalar or list context] = $json->incr_parse ([$string])
737
738 This is the central parsing function. It can both append new text and
739 extract objects from the stream accumulated so far (both of these
740 functions are optional).
741
742 If C<$string> is given, then this string is appended to the already
743 existing JSON fragment stored in the C<$json> object.
744
745 After that, if the function is called in void context, it will simply
746 return without doing anything further. This can be used to add more text
747 in as many chunks as you want.
748
749 If the method is called in scalar context, then it will try to extract
750 exactly I<one> JSON object. If that is successful, it will return this
751 object, otherwise it will return C<undef>. If there is a parse error,
752 this method will croak just as C<decode> would do (one can then use
753 C<incr_skip> to skip the erroneous part). This is the most common way of
754 using the method.
755
756 And finally, in list context, it will try to extract as many objects
757 from the stream as it can find and return them, or the empty list
758 otherwise. For this to work, there must be no separators (other than
759 whitespace) between the JSON objects or arrays, instead they must be
760 concatenated back-to-back. If an error occurs, an exception will be
761 raised as in the scalar context case. Note that in this case, any
762 previously-parsed JSON texts will be lost.
763
764 Example: Parse some JSON arrays/objects in a given string and return
765 them.
766
767 my @objs = JSON::XS->new->incr_parse ("[5][7][1,2]");
768
769 =item $lvalue_string = $json->incr_text
770
771 This method returns the currently stored JSON fragment as an lvalue, that
772 is, you can manipulate it. This I<only> works when a preceding call to
773 C<incr_parse> in I<scalar context> successfully returned an object. Under
774 all other circumstances you must not call this function (I mean it.
775 although in simple tests it might actually work, it I<will> fail under
776 real world conditions). As a special exception, you can also call this
777 method before having parsed anything.
778
779 That means you can only use this function to look at or manipulate text
780 before or after complete JSON objects, not while the parser is in the
781 middle of parsing a JSON object.
782
783 This function is useful in two cases: a) finding the trailing text after a
784 JSON object or b) parsing multiple JSON objects separated by non-JSON text
785 (such as commas).
786
787 =item $json->incr_skip
788
789 This will reset the state of the incremental parser and will remove
790 the parsed text from the input buffer so far. This is useful after
791 C<incr_parse> died, in which case the input buffer and incremental parser
792 state is left unchanged, to skip the text parsed so far and to reset the
793 parse state.
794
795 The difference to C<incr_reset> is that only text until the parse error
796 occurred is removed.
797
798 =item $json->incr_reset
799
800 This completely resets the incremental parser, that is, after this call,
801 it will be as if the parser had never parsed anything.
802
803 This is useful if you want to repeatedly parse JSON objects and want to
804 ignore any trailing data, which means you have to reset the parser after
805 each successful decode.
806
807 =back
808
809 =head2 LIMITATIONS
810
811 All options that affect decoding are supported, except
812 C<allow_nonref>. The reason for this is that it cannot be made to work
813 sensibly: JSON objects and arrays are self-delimited, i.e. you can
814 concatenate them back to back and still decode them perfectly. This does
815 not hold true for JSON numbers, however.
816
817 For example, is the string C<1> a single JSON number, or is it simply the
818 start of C<12>? Or is C<12> a single JSON number, or the concatenation
819 of C<1> and C<2>? In neither case you can tell, and this is why JSON::XS
820 takes the conservative route and disallows this case.
821
822 =head2 EXAMPLES
823
824 Some examples will make all this clearer. First, a simple example that
825 works similarly to C<decode_prefix>: We want to decode the JSON object at
826 the start of a string and identify the portion after the JSON object:
827
828 my $text = "[1,2,3] hello";
829
830 my $json = new JSON::XS;
831
832 my $obj = $json->incr_parse ($text)
833 or die "expected JSON object or array at beginning of string";
834
835 my $tail = $json->incr_text;
836 # $tail now contains " hello"
837
838 Easy, isn't it?
839
840 Now for a more complicated example: Imagine a hypothetical protocol where
841 you read some requests from a TCP stream, and each request is a JSON
842 array, without any separation between them (in fact, it is often useful to
843 use newlines as "separators", as these get interpreted as whitespace at
844 the start of the JSON text, which makes it possible to test said protocol
845 with C<telnet>...).
846
847 Here is how you'd do it (it is trivial to write this in an event-based
848 manner):
849
850 my $json = new JSON::XS;
851
852 # read some data from the socket
853 while (sysread $socket, my $buf, 4096) {
854
855 # split and decode as many requests as possible
856 for my $request ($json->incr_parse ($buf)) {
857 # act on the $request
858 }
859 }
860
861 Another complicated example: Assume you have a string with JSON objects
862 or arrays, all separated by (optional) comma characters (e.g. C<[1],[2],
863 [3]>). To parse them, we have to skip the commas between the JSON texts,
864 and here is where the lvalue-ness of C<incr_text> comes in useful:
865
866 my $text = "[1],[2], [3]";
867 my $json = new JSON::XS;
868
869 # void context, so no parsing done
870 $json->incr_parse ($text);
871
872 # now extract as many objects as possible. note the
873 # use of scalar context so incr_text can be called.
874 while (my $obj = $json->incr_parse) {
875 # do something with $obj
876
877 # now skip the optional comma
878 $json->incr_text =~ s/^ \s* , //x;
879 }
880
881 Now lets go for a very complex example: Assume that you have a gigantic
882 JSON array-of-objects, many gigabytes in size, and you want to parse it,
883 but you cannot load it into memory fully (this has actually happened in
884 the real world :).
885
886 Well, you lost, you have to implement your own JSON parser. But JSON::XS
887 can still help you: You implement a (very simple) array parser and let
888 JSON decode the array elements, which are all full JSON objects on their
889 own (this wouldn't work if the array elements could be JSON numbers, for
890 example):
891
892 my $json = new JSON::XS;
893
894 # open the monster
895 open my $fh, "<bigfile.json"
896 or die "bigfile: $!";
897
898 # first parse the initial "["
899 for (;;) {
900 sysread $fh, my $buf, 65536
901 or die "read error: $!";
902 $json->incr_parse ($buf); # void context, so no parsing
903
904 # Exit the loop once we found and removed(!) the initial "[".
905 # In essence, we are (ab-)using the $json object as a simple scalar
906 # we append data to.
907 last if $json->incr_text =~ s/^ \s* \[ //x;
908 }
909
910 # now we have the skipped the initial "[", so continue
911 # parsing all the elements.
912 for (;;) {
913 # in this loop we read data until we got a single JSON object
914 for (;;) {
915 if (my $obj = $json->incr_parse) {
916 # do something with $obj
917 last;
918 }
919
920 # add more data
921 sysread $fh, my $buf, 65536
922 or die "read error: $!";
923 $json->incr_parse ($buf); # void context, so no parsing
924 }
925
926 # in this loop we read data until we either found and parsed the
927 # separating "," between elements, or the final "]"
928 for (;;) {
929 # first skip whitespace
930 $json->incr_text =~ s/^\s*//;
931
932 # if we find "]", we are done
933 if ($json->incr_text =~ s/^\]//) {
934 print "finished.\n";
935 exit;
936 }
937
938 # if we find ",", we can continue with the next element
939 if ($json->incr_text =~ s/^,//) {
940 last;
941 }
942
943 # if we find anything else, we have a parse error!
944 if (length $json->incr_text) {
945 die "parse error near ", $json->incr_text;
946 }
947
948 # else add more data
949 sysread $fh, my $buf, 65536
950 or die "read error: $!";
951 $json->incr_parse ($buf); # void context, so no parsing
952 }
953
954 This is a complex example, but most of the complexity comes from the fact
955 that we are trying to be correct (bear with me if I am wrong, I never ran
956 the above example :).
957
958
959
960 =head1 MAPPING
961
962 This section describes how JSON::XS maps Perl values to JSON values and
963 vice versa. These mappings are designed to "do the right thing" in most
964 circumstances automatically, preserving round-tripping characteristics
965 (what you put in comes out as something equivalent).
966
967 For the more enlightened: note that in the following descriptions,
968 lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
969 refers to the abstract Perl language itself.
970
971
972 =head2 JSON -> PERL
973
974 =over 4
975
976 =item object
977
978 A JSON object becomes a reference to a hash in Perl. No ordering of object
979 keys is preserved (JSON does not preserve object key ordering itself).
980
981 =item array
982
983 A JSON array becomes a reference to an array in Perl.
984
985 =item string
986
987 A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
988 are represented by the same codepoints in the Perl string, so no manual
989 decoding is necessary.
990
991 =item number
992
993 A JSON number becomes either an integer, numeric (floating point) or
994 string scalar in perl, depending on its range and any fractional parts. On
995 the Perl level, there is no difference between those as Perl handles all
996 the conversion details, but an integer may take slightly less memory and
997 might represent more values exactly than floating point numbers.
998
999 If the number consists of digits only, JSON::XS will try to represent
1000 it as an integer value. If that fails, it will try to represent it as
1001 a numeric (floating point) value if that is possible without loss of
1002 precision. Otherwise it will preserve the number as a string value (in
1003 which case you lose roundtripping ability, as the JSON number will be
1004 re-encoded to a JSON string).
1005
1006 Numbers containing a fractional or exponential part will always be
1007 represented as numeric (floating point) values, possibly at a loss of
1008 precision (in which case you might lose perfect roundtripping ability, but
1009 the JSON number will still be re-encoded as a JSON number).
1010
1011 Note that precision is not accuracy - binary floating point values cannot
1012 represent most decimal fractions exactly, and when converting from and to
1013 floating point, JSON::XS only guarantees precision up to but not including
1014 the least significant bit.
1015
1016 =item true, false
1017
1018 These JSON atoms become C<Types::Serialiser::true> and
1019 C<Types::Serialiser::false>, respectively. They are overloaded to act
1020 almost exactly like the numbers C<1> and C<0>. You can check whether
1021 a scalar is a JSON boolean by using the C<Types::Serialiser::is_bool>
1022 function (after C<use Types::Serialier>, of course).
1023
1024 =item null
1025
1026 A JSON null atom becomes C<undef> in Perl.
1027
1028 =item shell-style comments (C<< # I<text> >>)
1029
1030 As a nonstandard extension to the JSON syntax that is enabled by the
1031 C<relaxed> setting, shell-style comments are allowed. They can start
1032 anywhere outside strings and go till the end of the line.
1033
1034 =item tagged values (C<< (I<tag>)I<value> >>).
1035
1036 Another nonstandard extension to the JSON syntax, enabled with the
1037 C<allow_tags> setting, are tagged values. In this implementation, the
1038 I<tag> must be a perl package/class name encoded as a JSON string, and the
1039 I<value> must be a JSON array encoding optional constructor arguments.
1040
1041 See L<OBJECT SERIALISATION>, below, for details.
1042
1043 =back
1044
1045
1046 =head2 PERL -> JSON
1047
1048 The mapping from Perl to JSON is slightly more difficult, as Perl is a
1049 truly typeless language, so we can only guess which JSON type is meant by
1050 a Perl value.
1051
1052 =over 4
1053
1054 =item hash references
1055
1056 Perl hash references become JSON objects. As there is no inherent
1057 ordering in hash keys (or JSON objects), they will usually be encoded
1058 in a pseudo-random order. JSON::XS can optionally sort the hash keys
1059 (determined by the I<canonical> flag), so the same datastructure will
1060 serialise to the same JSON text (given same settings and version of
1061 JSON::XS), but this incurs a runtime overhead and is only rarely useful,
1062 e.g. when you want to compare some JSON text against another for equality.
1063
1064 =item array references
1065
1066 Perl array references become JSON arrays.
1067
1068 =item other references
1069
1070 Other unblessed references are generally not allowed and will cause an
1071 exception to be thrown, except for references to the integers C<0> and
1072 C<1>, which get turned into C<false> and C<true> atoms in JSON.
1073
1074 Since C<JSON::XS> uses the boolean model from L<Types::Serialiser>, you
1075 can also C<use Types::Serialiser> and then use C<Types::Serialiser::false>
1076 and C<Types::Serialiser::true> to improve readability.
1077
1078 use Types::Serialiser;
1079 encode_json [\0, Types::Serialiser::true] # yields [false,true]
1080
1081 =item Types::Serialiser::true, Types::Serialiser::false
1082
1083 These special values from the L<Types::Serialiser> module become JSON true
1084 and JSON false values, respectively. You can also use C<\1> and C<\0>
1085 directly if you want.
1086
1087 =item blessed objects
1088
1089 Blessed objects are not directly representable in JSON, but C<JSON::XS>
1090 allows various ways of handling objects. See L<OBJECT SERIALISATION>,
1091 below, for details.
1092
1093 =item simple scalars
1094
1095 Simple Perl scalars (any scalar that is not a reference) are the most
1096 difficult objects to encode: JSON::XS will encode undefined scalars as
1097 JSON C<null> values, scalars that have last been used in a string context
1098 before encoding as JSON strings, and anything else as number value:
1099
1100 # dump as number
1101 encode_json [2] # yields [2]
1102 encode_json [-3.0e17] # yields [-3e+17]
1103 my $value = 5; encode_json [$value] # yields [5]
1104
1105 # used as string, so dump as string
1106 print $value;
1107 encode_json [$value] # yields ["5"]
1108
1109 # undef becomes null
1110 encode_json [undef] # yields [null]
1111
1112 You can force the type to be a JSON string by stringifying it:
1113
1114 my $x = 3.1; # some variable containing a number
1115 "$x"; # stringified
1116 $x .= ""; # another, more awkward way to stringify
1117 print $x; # perl does it for you, too, quite often
1118
1119 You can force the type to be a JSON number by numifying it:
1120
1121 my $x = "3"; # some variable containing a string
1122 $x += 0; # numify it, ensuring it will be dumped as a number
1123 $x *= 1; # same thing, the choice is yours.
1124
1125 You can not currently force the type in other, less obscure, ways. Tell me
1126 if you need this capability (but don't forget to explain why it's needed
1127 :).
1128
1129 Note that numerical precision has the same meaning as under Perl (so
1130 binary to decimal conversion follows the same rules as in Perl, which
1131 can differ to other languages). Also, your perl interpreter might expose
1132 extensions to the floating point numbers of your platform, such as
1133 infinities or NaN's - these cannot be represented in JSON, and it is an
1134 error to pass those in.
1135
1136 =back
1137
1138 =head2 OBJECT SERIALISATION
1139
1140 As JSON cannot directly represent Perl objects, you have to choose between
1141 a pure JSON representation (without the ability to deserialise the object
1142 automatically again), and a nonstandard extension to the JSON syntax,
1143 tagged values.
1144
1145 =head3 SERIALISATION
1146
1147 What happens when C<JSON::XS> encounters a Perl object depends on the
1148 C<allow_blessed>, C<convert_blessed> and C<allow_tags> settings, which are
1149 used in this order:
1150
1151 =over 4
1152
1153 =item 1. C<allow_tags> is enabled and the object has a C<FREEZE> method.
1154
1155 In this case, C<JSON::XS> uses the L<Types::Serialiser> object
1156 serialisation protocol to create a tagged JSON value, using a nonstandard
1157 extension to the JSON syntax.
1158
1159 This works by invoking the C<FREEZE> method on the object, with the first
1160 argument being the object to serialise, and the second argument being the
1161 constant string C<JSON> to distinguish it from other serialisers.
1162
1163 The C<FREEZE> method can return any number of values (i.e. zero or
1164 more). These values and the paclkage/classname of the object will then be
1165 encoded as a tagged JSON value in the following format:
1166
1167 ("classname")[FREEZE return values...]
1168
1169 e.g.:
1170
1171 ("URI")["http://www.google.com/"]
1172 ("MyDate")[2013,10,29]
1173 ("ImageData::JPEG")["Z3...VlCg=="]
1174
1175 For example, the hypothetical C<My::Object> C<FREEZE> method might use the
1176 objects C<type> and C<id> members to encode the object:
1177
1178 sub My::Object::FREEZE {
1179 my ($self, $serialiser) = @_;
1180
1181 ($self->{type}, $self->{id})
1182 }
1183
1184 =item 2. C<convert_blessed> is enabled and the object has a C<TO_JSON> method.
1185
1186 In this case, the C<TO_JSON> method of the object is invoked in scalar
1187 context. It must return a single scalar that can be directly encoded into
1188 JSON. This scalar replaces the object in the JSON text.
1189
1190 For example, the following C<TO_JSON> method will convert all L<URI>
1191 objects to JSON strings when serialised. The fatc that these values
1192 originally were L<URI> objects is lost.
1193
1194 sub URI::TO_JSON {
1195 my ($uri) = @_;
1196 $uri->as_string
1197 }
1198
1199 =item 3. C<allow_blessed> is enabled.
1200
1201 The object will be serialised as a JSON null value.
1202
1203 =item 4. none of the above
1204
1205 If none of the settings are enabled or the respective methods are missing,
1206 C<JSON::XS> throws an exception.
1207
1208 =back
1209
1210 =head3 DESERIALISATION
1211
1212 For deserialisation there are only two cases to consider: either
1213 nonstandard tagging was used, in which case C<allow_tags> decides,
1214 or objects cannot be automatically be deserialised, in which
1215 case you can use postprocessing or the C<filter_json_object> or
1216 C<filter_json_single_key_object> callbacks to get some real objects our of
1217 your JSON.
1218
1219 This section only considers the tagged value case: I a tagged JSON object
1220 is encountered during decoding and C<allow_tags> is disabled, a parse
1221 error will result (as if tagged values were not part of the grammar).
1222
1223 If C<allow_tags> is enabled, C<JSON::XS> will look up the C<THAW> method
1224 of the package/classname used during serialisation (it will not attempt
1225 to load the package as a Perl module). If there is no such method, the
1226 decoding will fail with an error.
1227
1228 Otherwise, the C<THAW> method is invoked with the classname as first
1229 argument, the constant string C<JSON> as second argument, and all the
1230 values from the JSON array (the values originally returned by the
1231 C<FREEZE> method) as remaining arguments.
1232
1233 The method must then return the object. While technically you can return
1234 any Perl scalar, you might have to enable the C<enable_nonref> setting to
1235 make that work in all cases, so better return an actual blessed reference.
1236
1237 As an example, let's implement a C<THAW> function that regenerates the
1238 C<My::Object> from the C<FREEZE> example earlier:
1239
1240 sub My::Object::THAW {
1241 my ($class, $serialiser, $type, $id) = @_;
1242
1243 $class->new (type => $type, id => $id)
1244 }
1245
1246
1247 =head1 ENCODING/CODESET FLAG NOTES
1248
1249 The interested reader might have seen a number of flags that signify
1250 encodings or codesets - C<utf8>, C<latin1> and C<ascii>. There seems to be
1251 some confusion on what these do, so here is a short comparison:
1252
1253 C<utf8> controls whether the JSON text created by C<encode> (and expected
1254 by C<decode>) is UTF-8 encoded or not, while C<latin1> and C<ascii> only
1255 control whether C<encode> escapes character values outside their respective
1256 codeset range. Neither of these flags conflict with each other, although
1257 some combinations make less sense than others.
1258
1259 Care has been taken to make all flags symmetrical with respect to
1260 C<encode> and C<decode>, that is, texts encoded with any combination of
1261 these flag values will be correctly decoded when the same flags are used
1262 - in general, if you use different flag settings while encoding vs. when
1263 decoding you likely have a bug somewhere.
1264
1265 Below comes a verbose discussion of these flags. Note that a "codeset" is
1266 simply an abstract set of character-codepoint pairs, while an encoding
1267 takes those codepoint numbers and I<encodes> them, in our case into
1268 octets. Unicode is (among other things) a codeset, UTF-8 is an encoding,
1269 and ISO-8859-1 (= latin 1) and ASCII are both codesets I<and> encodings at
1270 the same time, which can be confusing.
1271
1272 =over 4
1273
1274 =item C<utf8> flag disabled
1275
1276 When C<utf8> is disabled (the default), then C<encode>/C<decode> generate
1277 and expect Unicode strings, that is, characters with high ordinal Unicode
1278 values (> 255) will be encoded as such characters, and likewise such
1279 characters are decoded as-is, no changes to them will be done, except
1280 "(re-)interpreting" them as Unicode codepoints or Unicode characters,
1281 respectively (to Perl, these are the same thing in strings unless you do
1282 funny/weird/dumb stuff).
1283
1284 This is useful when you want to do the encoding yourself (e.g. when you
1285 want to have UTF-16 encoded JSON texts) or when some other layer does
1286 the encoding for you (for example, when printing to a terminal using a
1287 filehandle that transparently encodes to UTF-8 you certainly do NOT want
1288 to UTF-8 encode your data first and have Perl encode it another time).
1289
1290 =item C<utf8> flag enabled
1291
1292 If the C<utf8>-flag is enabled, C<encode>/C<decode> will encode all
1293 characters using the corresponding UTF-8 multi-byte sequence, and will
1294 expect your input strings to be encoded as UTF-8, that is, no "character"
1295 of the input string must have any value > 255, as UTF-8 does not allow
1296 that.
1297
1298 The C<utf8> flag therefore switches between two modes: disabled means you
1299 will get a Unicode string in Perl, enabled means you get a UTF-8 encoded
1300 octet/binary string in Perl.
1301
1302 =item C<latin1> or C<ascii> flags enabled
1303
1304 With C<latin1> (or C<ascii>) enabled, C<encode> will escape characters
1305 with ordinal values > 255 (> 127 with C<ascii>) and encode the remaining
1306 characters as specified by the C<utf8> flag.
1307
1308 If C<utf8> is disabled, then the result is also correctly encoded in those
1309 character sets (as both are proper subsets of Unicode, meaning that a
1310 Unicode string with all character values < 256 is the same thing as a
1311 ISO-8859-1 string, and a Unicode string with all character values < 128 is
1312 the same thing as an ASCII string in Perl).
1313
1314 If C<utf8> is enabled, you still get a correct UTF-8-encoded string,
1315 regardless of these flags, just some more characters will be escaped using
1316 C<\uXXXX> then before.
1317
1318 Note that ISO-8859-1-I<encoded> strings are not compatible with UTF-8
1319 encoding, while ASCII-encoded strings are. That is because the ISO-8859-1
1320 encoding is NOT a subset of UTF-8 (despite the ISO-8859-1 I<codeset> being
1321 a subset of Unicode), while ASCII is.
1322
1323 Surprisingly, C<decode> will ignore these flags and so treat all input
1324 values as governed by the C<utf8> flag. If it is disabled, this allows you
1325 to decode ISO-8859-1- and ASCII-encoded strings, as both strict subsets of
1326 Unicode. If it is enabled, you can correctly decode UTF-8 encoded strings.
1327
1328 So neither C<latin1> nor C<ascii> are incompatible with the C<utf8> flag -
1329 they only govern when the JSON output engine escapes a character or not.
1330
1331 The main use for C<latin1> is to relatively efficiently store binary data
1332 as JSON, at the expense of breaking compatibility with most JSON decoders.
1333
1334 The main use for C<ascii> is to force the output to not contain characters
1335 with values > 127, which means you can interpret the resulting string
1336 as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about any character set and
1337 8-bit-encoding, and still get the same data structure back. This is useful
1338 when your channel for JSON transfer is not 8-bit clean or the encoding
1339 might be mangled in between (e.g. in mail), and works because ASCII is a
1340 proper subset of most 8-bit and multibyte encodings in use in the world.
1341
1342 =back
1343
1344
1345 =head2 JSON and ECMAscript
1346
1347 JSON syntax is based on how literals are represented in javascript (the
1348 not-standardised predecessor of ECMAscript) which is presumably why it is
1349 called "JavaScript Object Notation".
1350
1351 However, JSON is not a subset (and also not a superset of course) of
1352 ECMAscript (the standard) or javascript (whatever browsers actually
1353 implement).
1354
1355 If you want to use javascript's C<eval> function to "parse" JSON, you
1356 might run into parse errors for valid JSON texts, or the resulting data
1357 structure might not be queryable:
1358
1359 One of the problems is that U+2028 and U+2029 are valid characters inside
1360 JSON strings, but are not allowed in ECMAscript string literals, so the
1361 following Perl fragment will not output something that can be guaranteed
1362 to be parsable by javascript's C<eval>:
1363
1364 use JSON::XS;
1365
1366 print encode_json [chr 0x2028];
1367
1368 The right fix for this is to use a proper JSON parser in your javascript
1369 programs, and not rely on C<eval> (see for example Douglas Crockford's
1370 F<json2.js> parser).
1371
1372 If this is not an option, you can, as a stop-gap measure, simply encode to
1373 ASCII-only JSON:
1374
1375 use JSON::XS;
1376
1377 print JSON::XS->new->ascii->encode ([chr 0x2028]);
1378
1379 Note that this will enlarge the resulting JSON text quite a bit if you
1380 have many non-ASCII characters. You might be tempted to run some regexes
1381 to only escape U+2028 and U+2029, e.g.:
1382
1383 # DO NOT USE THIS!
1384 my $json = JSON::XS->new->utf8->encode ([chr 0x2028]);
1385 $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
1386 $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
1387 print $json;
1388
1389 Note that I<this is a bad idea>: the above only works for U+2028 and
1390 U+2029 and thus only for fully ECMAscript-compliant parsers. Many existing
1391 javascript implementations, however, have issues with other characters as
1392 well - using C<eval> naively simply I<will> cause problems.
1393
1394 Another problem is that some javascript implementations reserve
1395 some property names for their own purposes (which probably makes
1396 them non-ECMAscript-compliant). For example, Iceweasel reserves the
1397 C<__proto__> property name for its own purposes.
1398
1399 If that is a problem, you could parse try to filter the resulting JSON
1400 output for these property strings, e.g.:
1401
1402 $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
1403
1404 This works because C<__proto__> is not valid outside of strings, so every
1405 occurrence of C<"__proto__"\s*:> must be a string used as property name.
1406
1407 If you know of other incompatibilities, please let me know.
1408
1409
1410 =head2 JSON and YAML
1411
1412 You often hear that JSON is a subset of YAML. This is, however, a mass
1413 hysteria(*) and very far from the truth (as of the time of this writing),
1414 so let me state it clearly: I<in general, there is no way to configure
1415 JSON::XS to output a data structure as valid YAML> that works in all
1416 cases.
1417
1418 If you really must use JSON::XS to generate YAML, you should use this
1419 algorithm (subject to change in future versions):
1420
1421 my $to_yaml = JSON::XS->new->utf8->space_after (1);
1422 my $yaml = $to_yaml->encode ($ref) . "\n";
1423
1424 This will I<usually> generate JSON texts that also parse as valid
1425 YAML. Please note that YAML has hardcoded limits on (simple) object key
1426 lengths that JSON doesn't have and also has different and incompatible
1427 unicode character escape syntax, so you should make sure that your hash
1428 keys are noticeably shorter than the 1024 "stream characters" YAML allows
1429 and that you do not have characters with codepoint values outside the
1430 Unicode BMP (basic multilingual page). YAML also does not allow C<\/>
1431 sequences in strings (which JSON::XS does not I<currently> generate, but
1432 other JSON generators might).
1433
1434 There might be other incompatibilities that I am not aware of (or the YAML
1435 specification has been changed yet again - it does so quite often). In
1436 general you should not try to generate YAML with a JSON generator or vice
1437 versa, or try to parse JSON with a YAML parser or vice versa: chances are
1438 high that you will run into severe interoperability problems when you
1439 least expect it.
1440
1441 =over 4
1442
1443 =item (*)
1444
1445 I have been pressured multiple times by Brian Ingerson (one of the
1446 authors of the YAML specification) to remove this paragraph, despite him
1447 acknowledging that the actual incompatibilities exist. As I was personally
1448 bitten by this "JSON is YAML" lie, I refused and said I will continue to
1449 educate people about these issues, so others do not run into the same
1450 problem again and again. After this, Brian called me a (quote)I<complete
1451 and worthless idiot>(unquote).
1452
1453 In my opinion, instead of pressuring and insulting people who actually
1454 clarify issues with YAML and the wrong statements of some of its
1455 proponents, I would kindly suggest reading the JSON spec (which is not
1456 that difficult or long) and finally make YAML compatible to it, and
1457 educating users about the changes, instead of spreading lies about the
1458 real compatibility for many I<years> and trying to silence people who
1459 point out that it isn't true.
1460
1461 Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even
1462 though the incompatibilities have been documented (and are known to Brian)
1463 for many years and the spec makes explicit claims that YAML is a superset
1464 of JSON. It would be so easy to fix, but apparently, bullying people and
1465 corrupting userdata is so much easier.
1466
1467 =back
1468
1469
1470 =head2 SPEED
1471
1472 It seems that JSON::XS is surprisingly fast, as shown in the following
1473 tables. They have been generated with the help of the C<eg/bench> program
1474 in the JSON::XS distribution, to make it easy to compare on your own
1475 system.
1476
1477 First comes a comparison between various modules using
1478 a very short single-line JSON string (also available at
1479 L<http://dist.schmorp.de/misc/json/short.json>).
1480
1481 {"method": "handleMessage", "params": ["user1",
1482 "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1483 1, 0]}
1484
1485 It shows the number of encodes/decodes per second (JSON::XS uses
1486 the functional interface, while JSON::XS/2 uses the OO interface
1487 with pretty-printing and hashkey sorting enabled, JSON::XS/3 enables
1488 shrink. JSON::DWIW/DS uses the deserialise function, while JSON::DWIW::FJ
1489 uses the from_json method). Higher is better:
1490
1491 module | encode | decode |
1492 --------------|------------|------------|
1493 JSON::DWIW/DS | 86302.551 | 102300.098 |
1494 JSON::DWIW/FJ | 86302.551 | 75983.768 |
1495 JSON::PP | 15827.562 | 6638.658 |
1496 JSON::Syck | 63358.066 | 47662.545 |
1497 JSON::XS | 511500.488 | 511500.488 |
1498 JSON::XS/2 | 291271.111 | 388361.481 |
1499 JSON::XS/3 | 361577.931 | 361577.931 |
1500 Storable | 66788.280 | 265462.278 |
1501 --------------+------------+------------+
1502
1503 That is, JSON::XS is almost six times faster than JSON::DWIW on encoding,
1504 about five times faster on decoding, and over thirty to seventy times
1505 faster than JSON's pure perl implementation. It also compares favourably
1506 to Storable for small amounts of data.
1507
1508 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
1509 search API (L<http://dist.schmorp.de/misc/json/long.json>).
1510
1511 module | encode | decode |
1512 --------------|------------|------------|
1513 JSON::DWIW/DS | 1647.927 | 2673.916 |
1514 JSON::DWIW/FJ | 1630.249 | 2596.128 |
1515 JSON::PP | 400.640 | 62.311 |
1516 JSON::Syck | 1481.040 | 1524.869 |
1517 JSON::XS | 20661.596 | 9541.183 |
1518 JSON::XS/2 | 10683.403 | 9416.938 |
1519 JSON::XS/3 | 20661.596 | 9400.054 |
1520 Storable | 19765.806 | 10000.725 |
1521 --------------+------------+------------+
1522
1523 Again, JSON::XS leads by far (except for Storable which non-surprisingly
1524 decodes a bit faster).
1525
1526 On large strings containing lots of high Unicode characters, some modules
1527 (such as JSON::PC) seem to decode faster than JSON::XS, but the result
1528 will be broken due to missing (or wrong) Unicode handling. Others refuse
1529 to decode or encode properly, so it was impossible to prepare a fair
1530 comparison table for that case.
1531
1532
1533 =head1 SECURITY CONSIDERATIONS
1534
1535 When you are using JSON in a protocol, talking to untrusted potentially
1536 hostile creatures requires relatively few measures.
1537
1538 First of all, your JSON decoder should be secure, that is, should not have
1539 any buffer overflows. Obviously, this module should ensure that and I am
1540 trying hard on making that true, but you never know.
1541
1542 Second, you need to avoid resource-starving attacks. That means you should
1543 limit the size of JSON texts you accept, or make sure then when your
1544 resources run out, that's just fine (e.g. by using a separate process that
1545 can crash safely). The size of a JSON text in octets or characters is
1546 usually a good indication of the size of the resources required to decode
1547 it into a Perl structure. While JSON::XS can check the size of the JSON
1548 text, it might be too late when you already have it in memory, so you
1549 might want to check the size before you accept the string.
1550
1551 Third, JSON::XS recurses using the C stack when decoding objects and
1552 arrays. The C stack is a limited resource: for instance, on my amd64
1553 machine with 8MB of stack size I can decode around 180k nested arrays but
1554 only 14k nested JSON objects (due to perl itself recursing deeply on croak
1555 to free the temporary). If that is exceeded, the program crashes. To be
1556 conservative, the default nesting limit is set to 512. If your process
1557 has a smaller stack, you should adjust this setting accordingly with the
1558 C<max_depth> method.
1559
1560 Something else could bomb you, too, that I forgot to think of. In that
1561 case, you get to keep the pieces. I am always open for hints, though...
1562
1563 Also keep in mind that JSON::XS might leak contents of your Perl data
1564 structures in its error messages, so when you serialise sensitive
1565 information you might want to make sure that exceptions thrown by JSON::XS
1566 will not end up in front of untrusted eyes.
1567
1568 If you are using JSON::XS to return packets to consumption
1569 by JavaScript scripts in a browser you should have a look at
1570 L<http://blog.archive.jpsykes.com/47/practical-csrf-and-json-security/> to
1571 see whether you are vulnerable to some common attack vectors (which really
1572 are browser design bugs, but it is still you who will have to deal with
1573 it, as major browser developers care only for features, not about getting
1574 security right).
1575
1576
1577 =head2 "OLD" VS. "NEW" JSON (RFC 4627 VS. RFC 7159)
1578
1579 JSON originally required JSON texts to represent an array or object -
1580 scalar values were explicitly not allowed. This has changed, and versions
1581 of JSON::XS beginning with C<4.0> reflect this by allowing scalar values
1582 by default.
1583
1584 One reason why one might not want this is that this removes a fundamental
1585 property of JSON texts, namely that they are self-delimited and
1586 self-contained, or in other words, you could take any number of "old"
1587 JSON texts and paste them together, and the result would be unambiguously
1588 parseable:
1589
1590 [1,3]{"k":5}[][null] # four JSON texts, without doubt
1591
1592 By allowing scalars, this property is lost: in the following example, is
1593 this one JSON text (the number 12) or two JSON texts (the numbers 1 and
1594 2):
1595
1596 12 # could be 12, or 1 and 2
1597
1598 Another lost property of "old" JSON is that no lookahead is required to
1599 know the end of a JSON text, i.e. the JSON text definitely ended at the
1600 last C<]> or C<}> character, there was no need to read extra characters.
1601
1602 For example, a viable network protocol with "old" JSON was to simply
1603 exchange JSON texts without delimiter. For "new" JSON, you have to use a
1604 suitable delimiter (such as a newline) after every JSON text or ensure you
1605 never encode/decode scalar values.
1606
1607 Most protocols do work by only transferring arrays or objects, and the
1608 easiest way to avoid problems with the "new" JSON definition is to
1609 explicitly disallow scalar values in your encoder and decoder:
1610
1611 $json_coder = JSON::XS->new->allow_nonref (0)
1612
1613 This is a somewhat unhappy situation, and the blame can fully be put on
1614 JSON's inmventor, Douglas Crockford, who unilaterally changed the format
1615 in 2006 without consulting the IETF, forcing the IETF to either fork the
1616 format or go with it (as I was told, the IETF wasn't amused).
1617
1618
1619 =head1 INTEROPERABILITY WITH OTHER MODULES
1620
1621 C<JSON::XS> uses the L<Types::Serialiser> module to provide boolean
1622 constants. That means that the JSON true and false values will be
1623 comaptible to true and false values of other modules that do the same,
1624 such as L<JSON::PP> and L<CBOR::XS>.
1625
1626
1627 =head1 INTEROPERABILITY WITH OTHER JSON DECODERS
1628
1629 As long as you only serialise data that can be directly expressed in JSON,
1630 C<JSON::XS> is incapable of generating invalid JSON output (modulo bugs,
1631 but C<JSON::XS> has found more bugs in the official JSON testsuite (1)
1632 than the official JSON testsuite has found in C<JSON::XS> (0)).
1633
1634 When you have trouble decoding JSON generated by this module using other
1635 decoders, then it is very likely that you have an encoding mismatch or the
1636 other decoder is broken.
1637
1638 When decoding, C<JSON::XS> is strict by default and will likely catch all
1639 errors. There are currently two settings that change this: C<relaxed>
1640 makes C<JSON::XS> accept (but not generate) some non-standard extensions,
1641 and C<allow_tags> will allow you to encode and decode Perl objects, at the
1642 cost of not outputting valid JSON anymore.
1643
1644 =head2 TAGGED VALUE SYNTAX AND STANDARD JSON EN/DECODERS
1645
1646 When you use C<allow_tags> to use the extended (and also nonstandard and
1647 invalid) JSON syntax for serialised objects, and you still want to decode
1648 the generated When you want to serialise objects, you can run a regex
1649 to replace the tagged syntax by standard JSON arrays (it only works for
1650 "normal" package names without comma, newlines or single colons). First,
1651 the readable Perl version:
1652
1653 # if your FREEZE methods return no values, you need this replace first:
1654 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[\s*\]/[$1]/gx;
1655
1656 # this works for non-empty constructor arg lists:
1657 $json =~ s/\( \s* (" (?: [^\\":,]+|\\.|::)* ") \s* \) \s* \[/[$1,/gx;
1658
1659 And here is a less readable version that is easy to adapt to other
1660 languages:
1661
1662 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/[$1,/g;
1663
1664 Here is an ECMAScript version (same regex):
1665
1666 json = json.replace (/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/g, "[$1,");
1667
1668 Since this syntax converts to standard JSON arrays, it might be hard to
1669 distinguish serialised objects from normal arrays. You can prepend a
1670 "magic number" as first array element to reduce chances of a collision:
1671
1672 $json =~ s/\(\s*("([^\\":,]+|\\.|::)*")\s*\)\s*\[/["XU1peReLzT4ggEllLanBYq4G9VzliwKF",$1,/g;
1673
1674 And after decoding the JSON text, you could walk the data
1675 structure looking for arrays with a first element of
1676 C<XU1peReLzT4ggEllLanBYq4G9VzliwKF>.
1677
1678 The same approach can be used to create the tagged format with another
1679 encoder. First, you create an array with the magic string as first member,
1680 the classname as second, and constructor arguments last, encode it as part
1681 of your JSON structure, and then:
1682
1683 $json =~ s/\[\s*"XU1peReLzT4ggEllLanBYq4G9VzliwKF"\s*,\s*("([^\\":,]+|\\.|::)*")\s*,/($1)[/g;
1684
1685 Again, this has some limitations - the magic string must not be encoded
1686 with character escapes, and the constructor arguments must be non-empty.
1687
1688
1689 =head1 RFC7159
1690
1691 Since this module was written, Google has written a new JSON RFC, RFC 7159
1692 (and RFC7158). Unfortunately, this RFC breaks compatibility with both the
1693 original JSON specification on www.json.org and RFC4627.
1694
1695 As far as I can see, you can get partial compatibility when parsing by
1696 using C<< ->allow_nonref >>. However, consider the security implications
1697 of doing so.
1698
1699 I haven't decided yet when to break compatibility with RFC4627 by default
1700 (and potentially leave applications insecure) and change the default to
1701 follow RFC7159, but application authors are well advised to call C<<
1702 ->allow_nonref(0) >> even if this is the current default, if they cannot
1703 handle non-reference values, in preparation for the day when the default
1704 will change.
1705
1706
1707 =head1 (I-)THREADS
1708
1709 This module is I<not> guaranteed to be ithread (or MULTIPLICITY-) safe
1710 and there are no plans to change this. Note that perl's builtin so-called
1711 threads/ithreads are officially deprecated and should not be used.
1712
1713
1714 =head1 THE PERILS OF SETLOCALE
1715
1716 Sometimes people avoid the Perl locale support and directly call the
1717 system's setlocale function with C<LC_ALL>.
1718
1719 This breaks both perl and modules such as JSON::XS, as stringification of
1720 numbers no longer works correctly (e.g. C<$x = 0.1; print "$x"+1> might
1721 print C<1>, and JSON::XS might output illegal JSON as JSON::XS relies on
1722 perl to stringify numbers).
1723
1724 The solution is simple: don't call C<setlocale>, or use it for only those
1725 categories you need, such as C<LC_MESSAGES> or C<LC_CTYPE>.
1726
1727 If you need C<LC_NUMERIC>, you should enable it only around the code that
1728 actually needs it (avoiding stringification of numbers), and restore it
1729 afterwards.
1730
1731
1732 =head1 BUGS
1733
1734 While the goal of this module is to be correct, that unfortunately does
1735 not mean it's bug-free, only that I think its design is bug-free. If you
1736 keep reporting bugs they will be fixed swiftly, though.
1737
1738 Please refrain from using rt.cpan.org or any other bug reporting
1739 service. I put the contact address into my modules for a reason.
1740
1741 =cut
1742
1743 BEGIN {
1744 *true = \$Types::Serialiser::true;
1745 *true = \&Types::Serialiser::true;
1746 *false = \$Types::Serialiser::false;
1747 *false = \&Types::Serialiser::false;
1748 *is_bool = \&Types::Serialiser::is_bool;
1749
1750 *JSON::XS::Boolean:: = *Types::Serialiser::Boolean::;
1751 }
1752
1753 XSLoader::load "JSON::XS", $VERSION;
1754
1755 =head1 SEE ALSO
1756
1757 The F<json_xs> command line utility for quick experiments.
1758
1759 =head1 AUTHOR
1760
1761 Marc Lehmann <schmorp@schmorp.de>
1762 http://home.schmorp.de/
1763
1764 =cut
1765
1766 1
1767