ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.20
Committed: Sun Mar 25 00:47:42 2007 UTC (17 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-0_7
Changes since 1.19: +5 -3 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     JSON::XS - JSON serialising/deserialising, done correctly and fast
4    
5     =head1 SYNOPSIS
6    
7     use JSON::XS;
8    
9 root 1.12 # exported functions, croak on error
10    
11     $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
12     $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
13    
14     # oo-interface
15    
16     $coder = JSON::XS->new->ascii->pretty->allow_nonref;
17     $pretty_printed_unencoded = $coder->encode ($perl_scalar);
18     $perl_scalar = $coder->decode ($unicode_json_text);
19    
20 root 1.1 =head1 DESCRIPTION
21    
22 root 1.2 This module converts Perl data structures to JSON and vice versa. Its
23     primary goal is to be I<correct> and its secondary goal is to be
24     I<fast>. To reach the latter goal it was written in C.
25    
26     As this is the n-th-something JSON module on CPAN, what was the reason
27     to write yet another JSON module? While it seems there are many JSON
28     modules, none of them correctly handle all corner cases, and in most cases
29     their maintainers are unresponsive, gone missing, or not listening to bug
30     reports for other reasons.
31    
32     See COMPARISON, below, for a comparison to some other JSON modules.
33    
34 root 1.10 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
35     vice versa.
36    
37 root 1.2 =head2 FEATURES
38    
39 root 1.1 =over 4
40    
41 root 1.2 =item * correct handling of unicode issues
42    
43 root 1.10 This module knows how to handle Unicode, and even documents how and when
44     it does so.
45 root 1.2
46     =item * round-trip integrity
47    
48     When you serialise a perl data structure using only datatypes supported
49     by JSON, the deserialised data structure is identical on the Perl level.
50     (e.g. the string "2.0" doesn't suddenly become "2").
51    
52     =item * strict checking of JSON correctness
53    
54 root 1.16 There is no guessing, no generating of illegal JSON texts by default,
55 root 1.10 and only JSON is accepted as input by default (the latter is a security
56     feature).
57 root 1.2
58     =item * fast
59    
60 root 1.10 Compared to other JSON modules, this module compares favourably in terms
61     of speed, too.
62 root 1.2
63     =item * simple to use
64    
65     This module has both a simple functional interface as well as an OO
66     interface.
67    
68     =item * reasonably versatile output formats
69    
70 root 1.10 You can choose between the most compact guarenteed single-line format
71     possible (nice for simple line-based protocols), a pure-ascii format (for
72     when your transport is not 8-bit clean), or a pretty-printed format (for
73     when you want to read that stuff). Or you can combine those features in
74 root 1.2 whatever way you like.
75    
76     =back
77    
78 root 1.1 =cut
79    
80     package JSON::XS;
81    
82 root 1.20 use strict;
83    
84 root 1.1 BEGIN {
85 root 1.20 our $VERSION = '0.7';
86     our @ISA = qw(Exporter);
87 root 1.1
88 root 1.20 our @EXPORT = qw(to_json from_json);
89 root 1.1 require Exporter;
90    
91     require XSLoader;
92     XSLoader::load JSON::XS::, $VERSION;
93     }
94    
95 root 1.2 =head1 FUNCTIONAL INTERFACE
96    
97     The following convinience methods are provided by this module. They are
98     exported by default:
99    
100     =over 4
101    
102 root 1.16 =item $json_text = to_json $perl_scalar
103 root 1.2
104     Converts the given Perl data structure (a simple scalar or a reference to
105     a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
106     octets only). Croaks on error.
107    
108 root 1.16 This function call is functionally identical to:
109 root 1.2
110 root 1.16 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
111    
112     except being faster.
113    
114     =item $perl_scalar = from_json $json_text
115 root 1.2
116     The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
117 root 1.16 parse that as an UTF-8 encoded JSON text, returning the resulting simple
118 root 1.2 scalar or reference. Croaks on error.
119    
120 root 1.16 This function call is functionally identical to:
121    
122     $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
123    
124     except being faster.
125 root 1.2
126     =back
127    
128     =head1 OBJECT-ORIENTED INTERFACE
129    
130     The object oriented interface lets you configure your own encoding or
131     decoding style, within the limits of supported formats.
132    
133     =over 4
134    
135     =item $json = new JSON::XS
136    
137     Creates a new JSON::XS object that can be used to de/encode JSON
138     strings. All boolean flags described below are by default I<disabled>.
139 root 1.1
140 root 1.2 The mutators for flags all return the JSON object again and thus calls can
141     be chained:
142    
143 root 1.16 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
144 root 1.3 => {"a": [1, 2]}
145 root 1.2
146 root 1.7 =item $json = $json->ascii ([$enable])
147 root 1.2
148 root 1.16 If C<$enable> is true (or missing), then the C<encode> method will not
149     generate characters outside the code range C<0..127> (which is ASCII). Any
150     unicode characters outside that range will be escaped using either a
151     single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
152     as per RFC4627.
153 root 1.2
154     If C<$enable> is false, then the C<encode> method will not escape Unicode
155 root 1.16 characters unless required by the JSON syntax. This results in a faster
156     and more compact format.
157 root 1.2
158 root 1.16 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
159     => ["\ud801\udc01"]
160 root 1.3
161 root 1.7 =item $json = $json->utf8 ([$enable])
162 root 1.2
163 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will encode
164 root 1.16 the JSON result into UTF-8, as required by many protocols, while the
165 root 1.7 C<decode> method expects to be handled an UTF-8-encoded string. Please
166     note that UTF-8-encoded strings do not contain any characters outside the
167 root 1.16 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
168     versions, enabling this option might enable autodetection of the UTF-16
169     and UTF-32 encoding families, as described in RFC4627.
170 root 1.2
171     If C<$enable> is false, then the C<encode> method will return the JSON
172     string as a (non-encoded) unicode string, while C<decode> expects thus a
173     unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
174     to be done yourself, e.g. using the Encode module.
175    
176 root 1.16 Example, output UTF-16BE-encoded JSON:
177    
178     use Encode;
179     $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
180    
181     Example, decode UTF-32LE-encoded JSON:
182    
183     use Encode;
184     $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
185 root 1.12
186 root 1.7 =item $json = $json->pretty ([$enable])
187 root 1.2
188     This enables (or disables) all of the C<indent>, C<space_before> and
189 root 1.3 C<space_after> (and in the future possibly more) flags in one call to
190 root 1.2 generate the most readable (or most compact) form possible.
191    
192 root 1.12 Example, pretty-print some simple structure:
193    
194 root 1.3 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
195     =>
196     {
197     "a" : [
198     1,
199     2
200     ]
201     }
202    
203 root 1.7 =item $json = $json->indent ([$enable])
204 root 1.2
205 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
206 root 1.2 format as output, putting every array member or object/hash key-value pair
207     into its own line, identing them properly.
208    
209     If C<$enable> is false, no newlines or indenting will be produced, and the
210 root 1.16 resulting JSON text is guarenteed not to contain any C<newlines>.
211 root 1.2
212 root 1.16 This setting has no effect when decoding JSON texts.
213 root 1.2
214 root 1.7 =item $json = $json->space_before ([$enable])
215 root 1.2
216 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
217 root 1.2 optional space before the C<:> separating keys from values in JSON objects.
218    
219     If C<$enable> is false, then the C<encode> method will not add any extra
220     space at those places.
221    
222 root 1.16 This setting has no effect when decoding JSON texts. You will also
223     most likely combine this setting with C<space_after>.
224 root 1.2
225 root 1.12 Example, space_before enabled, space_after and indent disabled:
226    
227     {"key" :"value"}
228    
229 root 1.7 =item $json = $json->space_after ([$enable])
230 root 1.2
231 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
232 root 1.2 optional space after the C<:> separating keys from values in JSON objects
233     and extra whitespace after the C<,> separating key-value pairs and array
234     members.
235    
236     If C<$enable> is false, then the C<encode> method will not add any extra
237     space at those places.
238    
239 root 1.16 This setting has no effect when decoding JSON texts.
240 root 1.2
241 root 1.12 Example, space_before and indent disabled, space_after enabled:
242    
243     {"key": "value"}
244    
245 root 1.7 =item $json = $json->canonical ([$enable])
246 root 1.2
247 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
248 root 1.2 by sorting their keys. This is adding a comparatively high overhead.
249    
250     If C<$enable> is false, then the C<encode> method will output key-value
251     pairs in the order Perl stores them (which will likely change between runs
252     of the same script).
253    
254     This option is useful if you want the same data structure to be encoded as
255 root 1.16 the same JSON text (given the same overall settings). If it is disabled,
256 root 1.2 the same hash migh be encoded differently even if contains the same data,
257     as key-value pairs have no inherent ordering in Perl.
258    
259 root 1.16 This setting has no effect when decoding JSON texts.
260 root 1.2
261 root 1.7 =item $json = $json->allow_nonref ([$enable])
262 root 1.3
263 root 1.7 If C<$enable> is true (or missing), then the C<encode> method can convert a
264 root 1.3 non-reference into its corresponding string, number or null JSON value,
265     which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
266     values instead of croaking.
267    
268     If C<$enable> is false, then the C<encode> method will croak if it isn't
269 root 1.16 passed an arrayref or hashref, as JSON texts must either be an object
270 root 1.3 or array. Likewise, C<decode> will croak if given something that is not a
271     JSON object or array.
272    
273 root 1.12 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
274     resulting in an invalid JSON text:
275    
276     JSON::XS->new->allow_nonref->encode ("Hello, World!")
277     => "Hello, World!"
278    
279 root 1.7 =item $json = $json->shrink ([$enable])
280    
281     Perl usually over-allocates memory a bit when allocating space for
282     strings. This flag optionally resizes strings generated by either
283     C<encode> or C<decode> to their minimum size possible. This can save
284 root 1.16 memory when your JSON texts are either very very long or you have many
285 root 1.8 short strings. It will also try to downgrade any strings to octet-form
286     if possible: perl stores strings internally either in an encoding called
287     UTF-X or in octet-form. The latter cannot store everything but uses less
288     space in general.
289 root 1.7
290     If C<$enable> is true (or missing), the string returned by C<encode> will be shrunk-to-fit,
291     while all strings generated by C<decode> will also be shrunk-to-fit.
292    
293     If C<$enable> is false, then the normal perl allocation algorithms are used.
294     If you work with your data, then this is likely to be faster.
295    
296     In the future, this setting might control other things, such as converting
297     strings that look like integers or floats into integers or floats
298     internally (there is no difference on the Perl level), saving space.
299    
300 root 1.16 =item $json_text = $json->encode ($perl_scalar)
301 root 1.2
302     Converts the given Perl data structure (a simple scalar or a reference
303     to a hash or array) to its JSON representation. Simple scalars will be
304     converted into JSON string or number sequences, while references to arrays
305     become JSON arrays and references to hashes become JSON objects. Undefined
306     Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
307     nor C<false> values will be generated.
308 root 1.1
309 root 1.16 =item $perl_scalar = $json->decode ($json_text)
310 root 1.1
311 root 1.16 The opposite of C<encode>: expects a JSON text and tries to parse it,
312 root 1.2 returning the resulting simple scalar or reference. Croaks on error.
313 root 1.1
314 root 1.2 JSON numbers and strings become simple Perl scalars. JSON arrays become
315     Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
316     C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
317 root 1.1
318     =back
319    
320 root 1.10 =head1 MAPPING
321    
322     This section describes how JSON::XS maps Perl values to JSON values and
323     vice versa. These mappings are designed to "do the right thing" in most
324     circumstances automatically, preserving round-tripping characteristics
325     (what you put in comes out as something equivalent).
326    
327     For the more enlightened: note that in the following descriptions,
328     lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
329     refers to the abstract Perl language itself.
330    
331     =head2 JSON -> PERL
332    
333     =over 4
334    
335     =item object
336    
337     A JSON object becomes a reference to a hash in Perl. No ordering of object
338 root 1.14 keys is preserved (JSON does not preserver object key ordering itself).
339 root 1.10
340     =item array
341    
342     A JSON array becomes a reference to an array in Perl.
343    
344     =item string
345    
346     A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
347     are represented by the same codepoints in the Perl string, so no manual
348     decoding is necessary.
349    
350     =item number
351    
352     A JSON number becomes either an integer or numeric (floating point)
353     scalar in perl, depending on its range and any fractional parts. On the
354     Perl level, there is no difference between those as Perl handles all the
355     conversion details, but an integer may take slightly less memory and might
356     represent more values exactly than (floating point) numbers.
357    
358     =item true, false
359    
360     These JSON atoms become C<0>, C<1>, respectively. Information is lost in
361     this process. Future versions might represent those values differently,
362     but they will be guarenteed to act like these integers would normally in
363     Perl.
364    
365     =item null
366    
367     A JSON null atom becomes C<undef> in Perl.
368    
369     =back
370    
371     =head2 PERL -> JSON
372    
373     The mapping from Perl to JSON is slightly more difficult, as Perl is a
374     truly typeless language, so we can only guess which JSON type is meant by
375     a Perl value.
376    
377     =over 4
378    
379     =item hash references
380    
381     Perl hash references become JSON objects. As there is no inherent ordering
382     in hash keys, they will usually be encoded in a pseudo-random order that
383     can change between runs of the same program but stays generally the same
384 root 1.14 within a single run of a program. JSON::XS can optionally sort the hash
385 root 1.10 keys (determined by the I<canonical> flag), so the same datastructure
386     will serialise to the same JSON text (given same settings and version of
387     JSON::XS), but this incurs a runtime overhead.
388    
389     =item array references
390    
391     Perl array references become JSON arrays.
392    
393     =item blessed objects
394    
395     Blessed objects are not allowed. JSON::XS currently tries to encode their
396     underlying representation (hash- or arrayref), but this behaviour might
397     change in future versions.
398    
399     =item simple scalars
400    
401     Simple Perl scalars (any scalar that is not a reference) are the most
402     difficult objects to encode: JSON::XS will encode undefined scalars as
403     JSON null value, scalars that have last been used in a string context
404     before encoding as JSON strings and anything else as number value:
405    
406     # dump as number
407     to_json [2] # yields [2]
408     to_json [-3.0e17] # yields [-3e+17]
409     my $value = 5; to_json [$value] # yields [5]
410    
411     # used as string, so dump as string
412     print $value;
413     to_json [$value] # yields ["5"]
414    
415     # undef becomes null
416     to_json [undef] # yields [null]
417    
418     You can force the type to be a string by stringifying it:
419    
420     my $x = 3.1; # some variable containing a number
421     "$x"; # stringified
422     $x .= ""; # another, more awkward way to stringify
423     print $x; # perl does it for you, too, quite often
424    
425     You can force the type to be a number by numifying it:
426    
427     my $x = "3"; # some variable containing a string
428     $x += 0; # numify it, ensuring it will be dumped as a number
429     $x *= 1; # same thing, the choise is yours.
430    
431     You can not currently output JSON booleans or force the type in other,
432     less obscure, ways. Tell me if you need this capability.
433    
434 root 1.11 =item circular data structures
435    
436     Those will be encoded until memory or stackspace runs out.
437    
438 root 1.10 =back
439    
440 root 1.3 =head1 COMPARISON
441    
442     As already mentioned, this module was created because none of the existing
443     JSON modules could be made to work correctly. First I will describe the
444     problems (or pleasures) I encountered with various existing JSON modules,
445 root 1.4 followed by some benchmark values. JSON::XS was designed not to suffer
446     from any of these problems or limitations.
447 root 1.3
448     =over 4
449    
450 root 1.5 =item JSON 1.07
451 root 1.3
452     Slow (but very portable, as it is written in pure Perl).
453    
454     Undocumented/buggy Unicode handling (how JSON handles unicode values is
455     undocumented. One can get far by feeding it unicode strings and doing
456     en-/decoding oneself, but unicode escapes are not working properly).
457    
458     No roundtripping (strings get clobbered if they look like numbers, e.g.
459     the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
460     decode into the number 2.
461    
462 root 1.5 =item JSON::PC 0.01
463 root 1.3
464     Very fast.
465    
466     Undocumented/buggy Unicode handling.
467    
468     No roundtripping.
469    
470 root 1.4 Has problems handling many Perl values (e.g. regex results and other magic
471     values will make it croak).
472 root 1.3
473     Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
474 root 1.16 which is not a valid JSON text.
475 root 1.3
476     Unmaintained (maintainer unresponsive for many months, bugs are not
477     getting fixed).
478    
479 root 1.5 =item JSON::Syck 0.21
480 root 1.3
481     Very buggy (often crashes).
482    
483 root 1.4 Very inflexible (no human-readable format supported, format pretty much
484     undocumented. I need at least a format for easy reading by humans and a
485     single-line compact format for use in a protocol, and preferably a way to
486 root 1.16 generate ASCII-only JSON texts).
487 root 1.3
488     Completely broken (and confusingly documented) Unicode handling (unicode
489     escapes are not working properly, you need to set ImplicitUnicode to
490     I<different> values on en- and decoding to get symmetric behaviour).
491    
492     No roundtripping (simple cases work, but this depends on wether the scalar
493     value was used in a numeric context or not).
494    
495     Dumping hashes may skip hash values depending on iterator state.
496    
497     Unmaintained (maintainer unresponsive for many months, bugs are not
498     getting fixed).
499    
500     Does not check input for validity (i.e. will accept non-JSON input and
501     return "something" instead of raising an exception. This is a security
502     issue: imagine two banks transfering money between each other using
503     JSON. One bank might parse a given non-JSON request and deduct money,
504     while the other might reject the transaction with a syntax error. While a
505     good protocol will at least recover, that is extra unnecessary work and
506     the transaction will still not succeed).
507    
508 root 1.5 =item JSON::DWIW 0.04
509 root 1.3
510     Very fast. Very natural. Very nice.
511    
512     Undocumented unicode handling (but the best of the pack. Unicode escapes
513     still don't get parsed properly).
514    
515     Very inflexible.
516    
517     No roundtripping.
518    
519 root 1.16 Does not generate valid JSON texts (key strings are often unquoted, empty keys
520 root 1.4 result in nothing being output)
521    
522 root 1.3 Does not check input for validity.
523    
524     =back
525    
526     =head2 SPEED
527    
528 root 1.4 It seems that JSON::XS is surprisingly fast, as shown in the following
529     tables. They have been generated with the help of the C<eg/bench> program
530     in the JSON::XS distribution, to make it easy to compare on your own
531     system.
532    
533 root 1.13 First comes a comparison between various modules using a very short JSON
534 root 1.18 string:
535    
536     {"method": "handleMessage", "params": ["user1", "we were just talking"], "id": null}
537    
538     It shows the number of encodes/decodes per second (JSON::XS uses the
539     functional interface, while JSON::XS/2 uses the OO interface with
540     pretty-printing and hashkey sorting enabled). Higher is better:
541 root 1.4
542     module | encode | decode |
543     -----------|------------|------------|
544 root 1.18 JSON | 11488.516 | 7823.035 |
545     JSON::DWIW | 94708.054 | 129094.260 |
546     JSON::PC | 63884.157 | 128528.212 |
547     JSON::Syck | 34898.677 | 42096.911 |
548     JSON::XS | 654027.064 | 396423.669 |
549     JSON::XS/2 | 371564.190 | 371725.613 |
550 root 1.4 -----------+------------+------------+
551    
552 root 1.18 That is, JSON::XS is more than six times faster than JSON::DWIW on
553     encoding, more than three times faster on decoding, and about thirty times
554     faster than JSON, even with pretty-printing and key sorting.
555 root 1.4
556 root 1.13 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
557 root 1.4 search API (http://nanoref.com/yahooapis/mgPdGg):
558    
559     module | encode | decode |
560     -----------|------------|------------|
561 root 1.18 JSON | 273.023 | 44.674 |
562     JSON::DWIW | 1089.383 | 1145.704 |
563     JSON::PC | 3097.419 | 2393.921 |
564     JSON::Syck | 514.060 | 843.053 |
565     JSON::XS | 6479.668 | 3636.364 |
566     JSON::XS/2 | 3774.221 | 3599.124 |
567 root 1.4 -----------+------------+------------+
568    
569 root 1.18 Again, JSON::XS leads by far.
570 root 1.4
571 root 1.18 On large strings containing lots of high unicode characters, some modules
572     (such as JSON::PC) seem to decode faster than JSON::XS, but the result
573     will be broken due to missing (or wrong) unicode handling. Others refuse
574     to decode or encode properly, so it was impossible to prepare a fair
575     comparison table for that case.
576 root 1.13
577 root 1.11 =head1 RESOURCE LIMITS
578    
579     JSON::XS does not impose any limits on the size of JSON texts or Perl
580 root 1.12 values they represent - if your machine can handle it, JSON::XS will
581 root 1.11 encode or decode it. Future versions might optionally impose structure
582     depth and memory use resource limits.
583    
584 root 1.4 =head1 BUGS
585    
586     While the goal of this module is to be correct, that unfortunately does
587     not mean its bug-free, only that I think its design is bug-free. It is
588     still very young and not well-tested. If you keep reporting bugs they will
589     be fixed swiftly, though.
590    
591 root 1.2 =cut
592    
593     1;
594    
595 root 1.1 =head1 AUTHOR
596    
597     Marc Lehmann <schmorp@schmorp.de>
598     http://home.schmorp.de/
599    
600     =cut
601