ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.17
Committed: Sat Mar 24 19:42:14 2007 UTC (17 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-0_5
Changes since 1.16: +1 -1 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     JSON::XS - JSON serialising/deserialising, done correctly and fast
4    
5     =head1 SYNOPSIS
6    
7     use JSON::XS;
8    
9 root 1.12 # exported functions, croak on error
10    
11     $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
12     $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
13    
14     # oo-interface
15    
16     $coder = JSON::XS->new->ascii->pretty->allow_nonref;
17     $pretty_printed_unencoded = $coder->encode ($perl_scalar);
18     $perl_scalar = $coder->decode ($unicode_json_text);
19    
20 root 1.1 =head1 DESCRIPTION
21    
22 root 1.2 This module converts Perl data structures to JSON and vice versa. Its
23     primary goal is to be I<correct> and its secondary goal is to be
24     I<fast>. To reach the latter goal it was written in C.
25    
26     As this is the n-th-something JSON module on CPAN, what was the reason
27     to write yet another JSON module? While it seems there are many JSON
28     modules, none of them correctly handle all corner cases, and in most cases
29     their maintainers are unresponsive, gone missing, or not listening to bug
30     reports for other reasons.
31    
32     See COMPARISON, below, for a comparison to some other JSON modules.
33    
34 root 1.10 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
35     vice versa.
36    
37 root 1.2 =head2 FEATURES
38    
39 root 1.1 =over 4
40    
41 root 1.2 =item * correct handling of unicode issues
42    
43 root 1.10 This module knows how to handle Unicode, and even documents how and when
44     it does so.
45 root 1.2
46     =item * round-trip integrity
47    
48     When you serialise a perl data structure using only datatypes supported
49     by JSON, the deserialised data structure is identical on the Perl level.
50     (e.g. the string "2.0" doesn't suddenly become "2").
51    
52     =item * strict checking of JSON correctness
53    
54 root 1.16 There is no guessing, no generating of illegal JSON texts by default,
55 root 1.10 and only JSON is accepted as input by default (the latter is a security
56     feature).
57 root 1.2
58     =item * fast
59    
60 root 1.10 Compared to other JSON modules, this module compares favourably in terms
61     of speed, too.
62 root 1.2
63     =item * simple to use
64    
65     This module has both a simple functional interface as well as an OO
66     interface.
67    
68     =item * reasonably versatile output formats
69    
70 root 1.10 You can choose between the most compact guarenteed single-line format
71     possible (nice for simple line-based protocols), a pure-ascii format (for
72     when your transport is not 8-bit clean), or a pretty-printed format (for
73     when you want to read that stuff). Or you can combine those features in
74 root 1.2 whatever way you like.
75    
76     =back
77    
78 root 1.1 =cut
79    
80     package JSON::XS;
81    
82     BEGIN {
83 root 1.17 $VERSION = '0.5';
84 root 1.1 @ISA = qw(Exporter);
85    
86 root 1.2 @EXPORT = qw(to_json from_json);
87 root 1.1 require Exporter;
88    
89     require XSLoader;
90     XSLoader::load JSON::XS::, $VERSION;
91     }
92    
93 root 1.2 =head1 FUNCTIONAL INTERFACE
94    
95     The following convinience methods are provided by this module. They are
96     exported by default:
97    
98     =over 4
99    
100 root 1.16 =item $json_text = to_json $perl_scalar
101 root 1.2
102     Converts the given Perl data structure (a simple scalar or a reference to
103     a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
104     octets only). Croaks on error.
105    
106 root 1.16 This function call is functionally identical to:
107 root 1.2
108 root 1.16 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
109    
110     except being faster.
111    
112     =item $perl_scalar = from_json $json_text
113 root 1.2
114     The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
115 root 1.16 parse that as an UTF-8 encoded JSON text, returning the resulting simple
116 root 1.2 scalar or reference. Croaks on error.
117    
118 root 1.16 This function call is functionally identical to:
119    
120     $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
121    
122     except being faster.
123 root 1.2
124     =back
125    
126     =head1 OBJECT-ORIENTED INTERFACE
127    
128     The object oriented interface lets you configure your own encoding or
129     decoding style, within the limits of supported formats.
130    
131     =over 4
132    
133     =item $json = new JSON::XS
134    
135     Creates a new JSON::XS object that can be used to de/encode JSON
136     strings. All boolean flags described below are by default I<disabled>.
137 root 1.1
138 root 1.2 The mutators for flags all return the JSON object again and thus calls can
139     be chained:
140    
141 root 1.16 my $json = JSON::XS->new->utf8->space_after->encode ({a => [1,2]})
142 root 1.3 => {"a": [1, 2]}
143 root 1.2
144 root 1.7 =item $json = $json->ascii ([$enable])
145 root 1.2
146 root 1.16 If C<$enable> is true (or missing), then the C<encode> method will not
147     generate characters outside the code range C<0..127> (which is ASCII). Any
148     unicode characters outside that range will be escaped using either a
149     single \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence,
150     as per RFC4627.
151 root 1.2
152     If C<$enable> is false, then the C<encode> method will not escape Unicode
153 root 1.16 characters unless required by the JSON syntax. This results in a faster
154     and more compact format.
155 root 1.2
156 root 1.16 JSON::XS->new->ascii (1)->encode ([chr 0x10401])
157     => ["\ud801\udc01"]
158 root 1.3
159 root 1.7 =item $json = $json->utf8 ([$enable])
160 root 1.2
161 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will encode
162 root 1.16 the JSON result into UTF-8, as required by many protocols, while the
163 root 1.7 C<decode> method expects to be handled an UTF-8-encoded string. Please
164     note that UTF-8-encoded strings do not contain any characters outside the
165 root 1.16 range C<0..255>, they are thus useful for bytewise/binary I/O. In future
166     versions, enabling this option might enable autodetection of the UTF-16
167     and UTF-32 encoding families, as described in RFC4627.
168 root 1.2
169     If C<$enable> is false, then the C<encode> method will return the JSON
170     string as a (non-encoded) unicode string, while C<decode> expects thus a
171     unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
172     to be done yourself, e.g. using the Encode module.
173    
174 root 1.16 Example, output UTF-16BE-encoded JSON:
175    
176     use Encode;
177     $jsontext = encode "UTF-16BE", JSON::XS->new->encode ($object);
178    
179     Example, decode UTF-32LE-encoded JSON:
180    
181     use Encode;
182     $object = JSON::XS->new->decode (decode "UTF-32LE", $jsontext);
183 root 1.12
184 root 1.7 =item $json = $json->pretty ([$enable])
185 root 1.2
186     This enables (or disables) all of the C<indent>, C<space_before> and
187 root 1.3 C<space_after> (and in the future possibly more) flags in one call to
188 root 1.2 generate the most readable (or most compact) form possible.
189    
190 root 1.12 Example, pretty-print some simple structure:
191    
192 root 1.3 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
193     =>
194     {
195     "a" : [
196     1,
197     2
198     ]
199     }
200    
201 root 1.7 =item $json = $json->indent ([$enable])
202 root 1.2
203 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
204 root 1.2 format as output, putting every array member or object/hash key-value pair
205     into its own line, identing them properly.
206    
207     If C<$enable> is false, no newlines or indenting will be produced, and the
208 root 1.16 resulting JSON text is guarenteed not to contain any C<newlines>.
209 root 1.2
210 root 1.16 This setting has no effect when decoding JSON texts.
211 root 1.2
212 root 1.7 =item $json = $json->space_before ([$enable])
213 root 1.2
214 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
215 root 1.2 optional space before the C<:> separating keys from values in JSON objects.
216    
217     If C<$enable> is false, then the C<encode> method will not add any extra
218     space at those places.
219    
220 root 1.16 This setting has no effect when decoding JSON texts. You will also
221     most likely combine this setting with C<space_after>.
222 root 1.2
223 root 1.12 Example, space_before enabled, space_after and indent disabled:
224    
225     {"key" :"value"}
226    
227 root 1.7 =item $json = $json->space_after ([$enable])
228 root 1.2
229 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
230 root 1.2 optional space after the C<:> separating keys from values in JSON objects
231     and extra whitespace after the C<,> separating key-value pairs and array
232     members.
233    
234     If C<$enable> is false, then the C<encode> method will not add any extra
235     space at those places.
236    
237 root 1.16 This setting has no effect when decoding JSON texts.
238 root 1.2
239 root 1.12 Example, space_before and indent disabled, space_after enabled:
240    
241     {"key": "value"}
242    
243 root 1.7 =item $json = $json->canonical ([$enable])
244 root 1.2
245 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
246 root 1.2 by sorting their keys. This is adding a comparatively high overhead.
247    
248     If C<$enable> is false, then the C<encode> method will output key-value
249     pairs in the order Perl stores them (which will likely change between runs
250     of the same script).
251    
252     This option is useful if you want the same data structure to be encoded as
253 root 1.16 the same JSON text (given the same overall settings). If it is disabled,
254 root 1.2 the same hash migh be encoded differently even if contains the same data,
255     as key-value pairs have no inherent ordering in Perl.
256    
257 root 1.16 This setting has no effect when decoding JSON texts.
258 root 1.2
259 root 1.7 =item $json = $json->allow_nonref ([$enable])
260 root 1.3
261 root 1.7 If C<$enable> is true (or missing), then the C<encode> method can convert a
262 root 1.3 non-reference into its corresponding string, number or null JSON value,
263     which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
264     values instead of croaking.
265    
266     If C<$enable> is false, then the C<encode> method will croak if it isn't
267 root 1.16 passed an arrayref or hashref, as JSON texts must either be an object
268 root 1.3 or array. Likewise, C<decode> will croak if given something that is not a
269     JSON object or array.
270    
271 root 1.12 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
272     resulting in an invalid JSON text:
273    
274     JSON::XS->new->allow_nonref->encode ("Hello, World!")
275     => "Hello, World!"
276    
277 root 1.7 =item $json = $json->shrink ([$enable])
278    
279     Perl usually over-allocates memory a bit when allocating space for
280     strings. This flag optionally resizes strings generated by either
281     C<encode> or C<decode> to their minimum size possible. This can save
282 root 1.16 memory when your JSON texts are either very very long or you have many
283 root 1.8 short strings. It will also try to downgrade any strings to octet-form
284     if possible: perl stores strings internally either in an encoding called
285     UTF-X or in octet-form. The latter cannot store everything but uses less
286     space in general.
287 root 1.7
288     If C<$enable> is true (or missing), the string returned by C<encode> will be shrunk-to-fit,
289     while all strings generated by C<decode> will also be shrunk-to-fit.
290    
291     If C<$enable> is false, then the normal perl allocation algorithms are used.
292     If you work with your data, then this is likely to be faster.
293    
294     In the future, this setting might control other things, such as converting
295     strings that look like integers or floats into integers or floats
296     internally (there is no difference on the Perl level), saving space.
297    
298 root 1.16 =item $json_text = $json->encode ($perl_scalar)
299 root 1.2
300     Converts the given Perl data structure (a simple scalar or a reference
301     to a hash or array) to its JSON representation. Simple scalars will be
302     converted into JSON string or number sequences, while references to arrays
303     become JSON arrays and references to hashes become JSON objects. Undefined
304     Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
305     nor C<false> values will be generated.
306 root 1.1
307 root 1.16 =item $perl_scalar = $json->decode ($json_text)
308 root 1.1
309 root 1.16 The opposite of C<encode>: expects a JSON text and tries to parse it,
310 root 1.2 returning the resulting simple scalar or reference. Croaks on error.
311 root 1.1
312 root 1.2 JSON numbers and strings become simple Perl scalars. JSON arrays become
313     Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
314     C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
315 root 1.1
316     =back
317    
318 root 1.10 =head1 MAPPING
319    
320     This section describes how JSON::XS maps Perl values to JSON values and
321     vice versa. These mappings are designed to "do the right thing" in most
322     circumstances automatically, preserving round-tripping characteristics
323     (what you put in comes out as something equivalent).
324    
325     For the more enlightened: note that in the following descriptions,
326     lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
327     refers to the abstract Perl language itself.
328    
329     =head2 JSON -> PERL
330    
331     =over 4
332    
333     =item object
334    
335     A JSON object becomes a reference to a hash in Perl. No ordering of object
336 root 1.14 keys is preserved (JSON does not preserver object key ordering itself).
337 root 1.10
338     =item array
339    
340     A JSON array becomes a reference to an array in Perl.
341    
342     =item string
343    
344     A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
345     are represented by the same codepoints in the Perl string, so no manual
346     decoding is necessary.
347    
348     =item number
349    
350     A JSON number becomes either an integer or numeric (floating point)
351     scalar in perl, depending on its range and any fractional parts. On the
352     Perl level, there is no difference between those as Perl handles all the
353     conversion details, but an integer may take slightly less memory and might
354     represent more values exactly than (floating point) numbers.
355    
356     =item true, false
357    
358     These JSON atoms become C<0>, C<1>, respectively. Information is lost in
359     this process. Future versions might represent those values differently,
360     but they will be guarenteed to act like these integers would normally in
361     Perl.
362    
363     =item null
364    
365     A JSON null atom becomes C<undef> in Perl.
366    
367     =back
368    
369     =head2 PERL -> JSON
370    
371     The mapping from Perl to JSON is slightly more difficult, as Perl is a
372     truly typeless language, so we can only guess which JSON type is meant by
373     a Perl value.
374    
375     =over 4
376    
377     =item hash references
378    
379     Perl hash references become JSON objects. As there is no inherent ordering
380     in hash keys, they will usually be encoded in a pseudo-random order that
381     can change between runs of the same program but stays generally the same
382 root 1.14 within a single run of a program. JSON::XS can optionally sort the hash
383 root 1.10 keys (determined by the I<canonical> flag), so the same datastructure
384     will serialise to the same JSON text (given same settings and version of
385     JSON::XS), but this incurs a runtime overhead.
386    
387     =item array references
388    
389     Perl array references become JSON arrays.
390    
391     =item blessed objects
392    
393     Blessed objects are not allowed. JSON::XS currently tries to encode their
394     underlying representation (hash- or arrayref), but this behaviour might
395     change in future versions.
396    
397     =item simple scalars
398    
399     Simple Perl scalars (any scalar that is not a reference) are the most
400     difficult objects to encode: JSON::XS will encode undefined scalars as
401     JSON null value, scalars that have last been used in a string context
402     before encoding as JSON strings and anything else as number value:
403    
404     # dump as number
405     to_json [2] # yields [2]
406     to_json [-3.0e17] # yields [-3e+17]
407     my $value = 5; to_json [$value] # yields [5]
408    
409     # used as string, so dump as string
410     print $value;
411     to_json [$value] # yields ["5"]
412    
413     # undef becomes null
414     to_json [undef] # yields [null]
415    
416     You can force the type to be a string by stringifying it:
417    
418     my $x = 3.1; # some variable containing a number
419     "$x"; # stringified
420     $x .= ""; # another, more awkward way to stringify
421     print $x; # perl does it for you, too, quite often
422    
423     You can force the type to be a number by numifying it:
424    
425     my $x = "3"; # some variable containing a string
426     $x += 0; # numify it, ensuring it will be dumped as a number
427     $x *= 1; # same thing, the choise is yours.
428    
429     You can not currently output JSON booleans or force the type in other,
430     less obscure, ways. Tell me if you need this capability.
431    
432 root 1.11 =item circular data structures
433    
434     Those will be encoded until memory or stackspace runs out.
435    
436 root 1.10 =back
437    
438 root 1.3 =head1 COMPARISON
439    
440     As already mentioned, this module was created because none of the existing
441     JSON modules could be made to work correctly. First I will describe the
442     problems (or pleasures) I encountered with various existing JSON modules,
443 root 1.4 followed by some benchmark values. JSON::XS was designed not to suffer
444     from any of these problems or limitations.
445 root 1.3
446     =over 4
447    
448 root 1.5 =item JSON 1.07
449 root 1.3
450     Slow (but very portable, as it is written in pure Perl).
451    
452     Undocumented/buggy Unicode handling (how JSON handles unicode values is
453     undocumented. One can get far by feeding it unicode strings and doing
454     en-/decoding oneself, but unicode escapes are not working properly).
455    
456     No roundtripping (strings get clobbered if they look like numbers, e.g.
457     the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
458     decode into the number 2.
459    
460 root 1.5 =item JSON::PC 0.01
461 root 1.3
462     Very fast.
463    
464     Undocumented/buggy Unicode handling.
465    
466     No roundtripping.
467    
468 root 1.4 Has problems handling many Perl values (e.g. regex results and other magic
469     values will make it croak).
470 root 1.3
471     Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
472 root 1.16 which is not a valid JSON text.
473 root 1.3
474     Unmaintained (maintainer unresponsive for many months, bugs are not
475     getting fixed).
476    
477 root 1.5 =item JSON::Syck 0.21
478 root 1.3
479     Very buggy (often crashes).
480    
481 root 1.4 Very inflexible (no human-readable format supported, format pretty much
482     undocumented. I need at least a format for easy reading by humans and a
483     single-line compact format for use in a protocol, and preferably a way to
484 root 1.16 generate ASCII-only JSON texts).
485 root 1.3
486     Completely broken (and confusingly documented) Unicode handling (unicode
487     escapes are not working properly, you need to set ImplicitUnicode to
488     I<different> values on en- and decoding to get symmetric behaviour).
489    
490     No roundtripping (simple cases work, but this depends on wether the scalar
491     value was used in a numeric context or not).
492    
493     Dumping hashes may skip hash values depending on iterator state.
494    
495     Unmaintained (maintainer unresponsive for many months, bugs are not
496     getting fixed).
497    
498     Does not check input for validity (i.e. will accept non-JSON input and
499     return "something" instead of raising an exception. This is a security
500     issue: imagine two banks transfering money between each other using
501     JSON. One bank might parse a given non-JSON request and deduct money,
502     while the other might reject the transaction with a syntax error. While a
503     good protocol will at least recover, that is extra unnecessary work and
504     the transaction will still not succeed).
505    
506 root 1.5 =item JSON::DWIW 0.04
507 root 1.3
508     Very fast. Very natural. Very nice.
509    
510     Undocumented unicode handling (but the best of the pack. Unicode escapes
511     still don't get parsed properly).
512    
513     Very inflexible.
514    
515     No roundtripping.
516    
517 root 1.16 Does not generate valid JSON texts (key strings are often unquoted, empty keys
518 root 1.4 result in nothing being output)
519    
520 root 1.3 Does not check input for validity.
521    
522     =back
523    
524     =head2 SPEED
525    
526 root 1.4 It seems that JSON::XS is surprisingly fast, as shown in the following
527     tables. They have been generated with the help of the C<eg/bench> program
528     in the JSON::XS distribution, to make it easy to compare on your own
529     system.
530    
531 root 1.13 First comes a comparison between various modules using a very short JSON
532     string (83 bytes), showing the number of encodes/decodes per second
533     (JSON::XS is the functional interface, while JSON::XS/2 is the OO
534     interface with pretty-printing and hashkey sorting enabled). Higher is
535     better:
536 root 1.4
537     module | encode | decode |
538     -----------|------------|------------|
539     JSON | 14006 | 6820 |
540     JSON::DWIW | 200937 | 120386 |
541     JSON::PC | 85065 | 129366 |
542     JSON::Syck | 59898 | 44232 |
543     JSON::XS | 1171478 | 342435 |
544     JSON::XS/2 | 730760 | 328714 |
545     -----------+------------+------------+
546    
547     That is, JSON::XS is 6 times faster than than JSON::DWIW and about 80
548     times faster than JSON, even with pretty-printing and key sorting.
549    
550 root 1.13 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
551 root 1.4 search API (http://nanoref.com/yahooapis/mgPdGg):
552    
553     module | encode | decode |
554     -----------|------------|------------|
555     JSON | 673 | 38 |
556     JSON::DWIW | 5271 | 770 |
557     JSON::PC | 9901 | 2491 |
558     JSON::Syck | 2360 | 786 |
559     JSON::XS | 37398 | 3202 |
560     JSON::XS/2 | 13765 | 3153 |
561     -----------+------------+------------+
562    
563     Again, JSON::XS leads by far in the encoding case, while still beating
564     every other module in the decoding case.
565    
566 root 1.13 On large strings containing lots of unicode characters, some modules
567     (such as JSON::PC) decode faster than JSON::XS, but the result will be
568     broken due to missing unicode handling. Others refuse to decode or encode
569     properly, so it was impossible to prepare a fair comparison table for that
570     case.
571    
572 root 1.11 =head1 RESOURCE LIMITS
573    
574     JSON::XS does not impose any limits on the size of JSON texts or Perl
575 root 1.12 values they represent - if your machine can handle it, JSON::XS will
576 root 1.11 encode or decode it. Future versions might optionally impose structure
577     depth and memory use resource limits.
578    
579 root 1.4 =head1 BUGS
580    
581     While the goal of this module is to be correct, that unfortunately does
582     not mean its bug-free, only that I think its design is bug-free. It is
583     still very young and not well-tested. If you keep reporting bugs they will
584     be fixed swiftly, though.
585    
586 root 1.2 =cut
587    
588     1;
589    
590 root 1.1 =head1 AUTHOR
591    
592     Marc Lehmann <schmorp@schmorp.de>
593     http://home.schmorp.de/
594    
595     =cut
596