ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.14
Committed: Fri Mar 23 19:02:02 2007 UTC (17 years, 2 months ago) by root
Branch: MAIN
Changes since 1.13: +2 -2 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     JSON::XS - JSON serialising/deserialising, done correctly and fast
4    
5     =head1 SYNOPSIS
6    
7     use JSON::XS;
8    
9 root 1.12 # exported functions, croak on error
10    
11     $utf8_encoded_json_text = to_json $perl_hash_or_arrayref;
12     $perl_hash_or_arrayref = from_json $utf8_encoded_json_text;
13    
14     # oo-interface
15    
16     $coder = JSON::XS->new->ascii->pretty->allow_nonref;
17     $pretty_printed_unencoded = $coder->encode ($perl_scalar);
18     $perl_scalar = $coder->decode ($unicode_json_text);
19    
20 root 1.1 =head1 DESCRIPTION
21    
22 root 1.2 This module converts Perl data structures to JSON and vice versa. Its
23     primary goal is to be I<correct> and its secondary goal is to be
24     I<fast>. To reach the latter goal it was written in C.
25    
26     As this is the n-th-something JSON module on CPAN, what was the reason
27     to write yet another JSON module? While it seems there are many JSON
28     modules, none of them correctly handle all corner cases, and in most cases
29     their maintainers are unresponsive, gone missing, or not listening to bug
30     reports for other reasons.
31    
32     See COMPARISON, below, for a comparison to some other JSON modules.
33    
34 root 1.10 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
35     vice versa.
36    
37 root 1.2 =head2 FEATURES
38    
39 root 1.1 =over 4
40    
41 root 1.2 =item * correct handling of unicode issues
42    
43 root 1.10 This module knows how to handle Unicode, and even documents how and when
44     it does so.
45 root 1.2
46     =item * round-trip integrity
47    
48     When you serialise a perl data structure using only datatypes supported
49     by JSON, the deserialised data structure is identical on the Perl level.
50     (e.g. the string "2.0" doesn't suddenly become "2").
51    
52     =item * strict checking of JSON correctness
53    
54     There is no guessing, no generating of illegal JSON strings by default,
55 root 1.10 and only JSON is accepted as input by default (the latter is a security
56     feature).
57 root 1.2
58     =item * fast
59    
60 root 1.10 Compared to other JSON modules, this module compares favourably in terms
61     of speed, too.
62 root 1.2
63     =item * simple to use
64    
65     This module has both a simple functional interface as well as an OO
66     interface.
67    
68     =item * reasonably versatile output formats
69    
70 root 1.10 You can choose between the most compact guarenteed single-line format
71     possible (nice for simple line-based protocols), a pure-ascii format (for
72     when your transport is not 8-bit clean), or a pretty-printed format (for
73     when you want to read that stuff). Or you can combine those features in
74 root 1.2 whatever way you like.
75    
76     =back
77    
78 root 1.1 =cut
79    
80     package JSON::XS;
81    
82     BEGIN {
83 root 1.9 $VERSION = '0.3';
84 root 1.1 @ISA = qw(Exporter);
85    
86 root 1.2 @EXPORT = qw(to_json from_json);
87 root 1.1 require Exporter;
88    
89     require XSLoader;
90     XSLoader::load JSON::XS::, $VERSION;
91     }
92    
93 root 1.2 =head1 FUNCTIONAL INTERFACE
94    
95     The following convinience methods are provided by this module. They are
96     exported by default:
97    
98     =over 4
99    
100     =item $json_string = to_json $perl_scalar
101    
102     Converts the given Perl data structure (a simple scalar or a reference to
103     a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
104     octets only). Croaks on error.
105    
106 root 1.10 This function call is functionally identical to C<< JSON::XS->new->utf8->encode ($perl_scalar) >>.
107 root 1.2
108     =item $perl_scalar = from_json $json_string
109    
110     The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
111     parse that as an UTF-8 encoded JSON string, returning the resulting simple
112     scalar or reference. Croaks on error.
113    
114 root 1.10 This function call is functionally identical to C<< JSON::XS->new->utf8->decode ($json_string) >>.
115 root 1.2
116     =back
117    
118     =head1 OBJECT-ORIENTED INTERFACE
119    
120     The object oriented interface lets you configure your own encoding or
121     decoding style, within the limits of supported formats.
122    
123     =over 4
124    
125     =item $json = new JSON::XS
126    
127     Creates a new JSON::XS object that can be used to de/encode JSON
128     strings. All boolean flags described below are by default I<disabled>.
129 root 1.1
130 root 1.2 The mutators for flags all return the JSON object again and thus calls can
131     be chained:
132    
133 root 1.3 my $json = JSON::XS->new->utf8(1)->space_after(1)->encode ({a => [1,2]})
134     => {"a": [1, 2]}
135 root 1.2
136 root 1.7 =item $json = $json->ascii ([$enable])
137 root 1.2
138 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will
139     not generate characters outside the code range C<0..127>. Any unicode
140     characters outside that range will be escaped using either a single
141     \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence, as per
142     RFC4627.
143 root 1.2
144     If C<$enable> is false, then the C<encode> method will not escape Unicode
145     characters unless necessary.
146    
147 root 1.3 JSON::XS->new->ascii (1)->encode (chr 0x10401)
148     => \ud801\udc01
149    
150 root 1.7 =item $json = $json->utf8 ([$enable])
151 root 1.2
152 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will encode
153     the JSON string into UTF-8, as required by many protocols, while the
154     C<decode> method expects to be handled an UTF-8-encoded string. Please
155     note that UTF-8-encoded strings do not contain any characters outside the
156     range C<0..255>, they are thus useful for bytewise/binary I/O.
157 root 1.2
158     If C<$enable> is false, then the C<encode> method will return the JSON
159     string as a (non-encoded) unicode string, while C<decode> expects thus a
160     unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
161     to be done yourself, e.g. using the Encode module.
162    
163 root 1.12 Example, output UTF-16-encoded JSON:
164    
165 root 1.7 =item $json = $json->pretty ([$enable])
166 root 1.2
167     This enables (or disables) all of the C<indent>, C<space_before> and
168 root 1.3 C<space_after> (and in the future possibly more) flags in one call to
169 root 1.2 generate the most readable (or most compact) form possible.
170    
171 root 1.12 Example, pretty-print some simple structure:
172    
173 root 1.3 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
174     =>
175     {
176     "a" : [
177     1,
178     2
179     ]
180     }
181    
182 root 1.7 =item $json = $json->indent ([$enable])
183 root 1.2
184 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
185 root 1.2 format as output, putting every array member or object/hash key-value pair
186     into its own line, identing them properly.
187    
188     If C<$enable> is false, no newlines or indenting will be produced, and the
189     resulting JSON strings is guarenteed not to contain any C<newlines>.
190    
191     This setting has no effect when decoding JSON strings.
192    
193 root 1.7 =item $json = $json->space_before ([$enable])
194 root 1.2
195 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
196 root 1.2 optional space before the C<:> separating keys from values in JSON objects.
197    
198     If C<$enable> is false, then the C<encode> method will not add any extra
199     space at those places.
200    
201     This setting has no effect when decoding JSON strings. You will also most
202     likely combine this setting with C<space_after>.
203    
204 root 1.12 Example, space_before enabled, space_after and indent disabled:
205    
206     {"key" :"value"}
207    
208 root 1.7 =item $json = $json->space_after ([$enable])
209 root 1.2
210 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
211 root 1.2 optional space after the C<:> separating keys from values in JSON objects
212     and extra whitespace after the C<,> separating key-value pairs and array
213     members.
214    
215     If C<$enable> is false, then the C<encode> method will not add any extra
216     space at those places.
217    
218     This setting has no effect when decoding JSON strings.
219    
220 root 1.12 Example, space_before and indent disabled, space_after enabled:
221    
222     {"key": "value"}
223    
224 root 1.7 =item $json = $json->canonical ([$enable])
225 root 1.2
226 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
227 root 1.2 by sorting their keys. This is adding a comparatively high overhead.
228    
229     If C<$enable> is false, then the C<encode> method will output key-value
230     pairs in the order Perl stores them (which will likely change between runs
231     of the same script).
232    
233     This option is useful if you want the same data structure to be encoded as
234     the same JSON string (given the same overall settings). If it is disabled,
235     the same hash migh be encoded differently even if contains the same data,
236     as key-value pairs have no inherent ordering in Perl.
237    
238     This setting has no effect when decoding JSON strings.
239    
240 root 1.7 =item $json = $json->allow_nonref ([$enable])
241 root 1.3
242 root 1.7 If C<$enable> is true (or missing), then the C<encode> method can convert a
243 root 1.3 non-reference into its corresponding string, number or null JSON value,
244     which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
245     values instead of croaking.
246    
247     If C<$enable> is false, then the C<encode> method will croak if it isn't
248     passed an arrayref or hashref, as JSON strings must either be an object
249     or array. Likewise, C<decode> will croak if given something that is not a
250     JSON object or array.
251    
252 root 1.12 Example, encode a Perl scalar as JSON value with enabled C<allow_nonref>,
253     resulting in an invalid JSON text:
254    
255     JSON::XS->new->allow_nonref->encode ("Hello, World!")
256     => "Hello, World!"
257    
258 root 1.7 =item $json = $json->shrink ([$enable])
259    
260     Perl usually over-allocates memory a bit when allocating space for
261     strings. This flag optionally resizes strings generated by either
262     C<encode> or C<decode> to their minimum size possible. This can save
263     memory when your JSON strings are either very very long or you have many
264 root 1.8 short strings. It will also try to downgrade any strings to octet-form
265     if possible: perl stores strings internally either in an encoding called
266     UTF-X or in octet-form. The latter cannot store everything but uses less
267     space in general.
268 root 1.7
269     If C<$enable> is true (or missing), the string returned by C<encode> will be shrunk-to-fit,
270     while all strings generated by C<decode> will also be shrunk-to-fit.
271    
272     If C<$enable> is false, then the normal perl allocation algorithms are used.
273     If you work with your data, then this is likely to be faster.
274    
275     In the future, this setting might control other things, such as converting
276     strings that look like integers or floats into integers or floats
277     internally (there is no difference on the Perl level), saving space.
278    
279 root 1.2 =item $json_string = $json->encode ($perl_scalar)
280    
281     Converts the given Perl data structure (a simple scalar or a reference
282     to a hash or array) to its JSON representation. Simple scalars will be
283     converted into JSON string or number sequences, while references to arrays
284     become JSON arrays and references to hashes become JSON objects. Undefined
285     Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
286     nor C<false> values will be generated.
287 root 1.1
288 root 1.2 =item $perl_scalar = $json->decode ($json_string)
289 root 1.1
290 root 1.2 The opposite of C<encode>: expects a JSON string and tries to parse it,
291     returning the resulting simple scalar or reference. Croaks on error.
292 root 1.1
293 root 1.2 JSON numbers and strings become simple Perl scalars. JSON arrays become
294     Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
295     C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
296 root 1.1
297     =back
298    
299 root 1.10 =head1 MAPPING
300    
301     This section describes how JSON::XS maps Perl values to JSON values and
302     vice versa. These mappings are designed to "do the right thing" in most
303     circumstances automatically, preserving round-tripping characteristics
304     (what you put in comes out as something equivalent).
305    
306     For the more enlightened: note that in the following descriptions,
307     lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
308     refers to the abstract Perl language itself.
309    
310     =head2 JSON -> PERL
311    
312     =over 4
313    
314     =item object
315    
316     A JSON object becomes a reference to a hash in Perl. No ordering of object
317 root 1.14 keys is preserved (JSON does not preserver object key ordering itself).
318 root 1.10
319     =item array
320    
321     A JSON array becomes a reference to an array in Perl.
322    
323     =item string
324    
325     A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
326     are represented by the same codepoints in the Perl string, so no manual
327     decoding is necessary.
328    
329     =item number
330    
331     A JSON number becomes either an integer or numeric (floating point)
332     scalar in perl, depending on its range and any fractional parts. On the
333     Perl level, there is no difference between those as Perl handles all the
334     conversion details, but an integer may take slightly less memory and might
335     represent more values exactly than (floating point) numbers.
336    
337     =item true, false
338    
339     These JSON atoms become C<0>, C<1>, respectively. Information is lost in
340     this process. Future versions might represent those values differently,
341     but they will be guarenteed to act like these integers would normally in
342     Perl.
343    
344     =item null
345    
346     A JSON null atom becomes C<undef> in Perl.
347    
348     =back
349    
350     =head2 PERL -> JSON
351    
352     The mapping from Perl to JSON is slightly more difficult, as Perl is a
353     truly typeless language, so we can only guess which JSON type is meant by
354     a Perl value.
355    
356     =over 4
357    
358     =item hash references
359    
360     Perl hash references become JSON objects. As there is no inherent ordering
361     in hash keys, they will usually be encoded in a pseudo-random order that
362     can change between runs of the same program but stays generally the same
363 root 1.14 within a single run of a program. JSON::XS can optionally sort the hash
364 root 1.10 keys (determined by the I<canonical> flag), so the same datastructure
365     will serialise to the same JSON text (given same settings and version of
366     JSON::XS), but this incurs a runtime overhead.
367    
368     =item array references
369    
370     Perl array references become JSON arrays.
371    
372     =item blessed objects
373    
374     Blessed objects are not allowed. JSON::XS currently tries to encode their
375     underlying representation (hash- or arrayref), but this behaviour might
376     change in future versions.
377    
378     =item simple scalars
379    
380     Simple Perl scalars (any scalar that is not a reference) are the most
381     difficult objects to encode: JSON::XS will encode undefined scalars as
382     JSON null value, scalars that have last been used in a string context
383     before encoding as JSON strings and anything else as number value:
384    
385     # dump as number
386     to_json [2] # yields [2]
387     to_json [-3.0e17] # yields [-3e+17]
388     my $value = 5; to_json [$value] # yields [5]
389    
390     # used as string, so dump as string
391     print $value;
392     to_json [$value] # yields ["5"]
393    
394     # undef becomes null
395     to_json [undef] # yields [null]
396    
397     You can force the type to be a string by stringifying it:
398    
399     my $x = 3.1; # some variable containing a number
400     "$x"; # stringified
401     $x .= ""; # another, more awkward way to stringify
402     print $x; # perl does it for you, too, quite often
403    
404     You can force the type to be a number by numifying it:
405    
406     my $x = "3"; # some variable containing a string
407     $x += 0; # numify it, ensuring it will be dumped as a number
408     $x *= 1; # same thing, the choise is yours.
409    
410     You can not currently output JSON booleans or force the type in other,
411     less obscure, ways. Tell me if you need this capability.
412    
413 root 1.11 =item circular data structures
414    
415     Those will be encoded until memory or stackspace runs out.
416    
417 root 1.10 =back
418    
419 root 1.3 =head1 COMPARISON
420    
421     As already mentioned, this module was created because none of the existing
422     JSON modules could be made to work correctly. First I will describe the
423     problems (or pleasures) I encountered with various existing JSON modules,
424 root 1.4 followed by some benchmark values. JSON::XS was designed not to suffer
425     from any of these problems or limitations.
426 root 1.3
427     =over 4
428    
429 root 1.5 =item JSON 1.07
430 root 1.3
431     Slow (but very portable, as it is written in pure Perl).
432    
433     Undocumented/buggy Unicode handling (how JSON handles unicode values is
434     undocumented. One can get far by feeding it unicode strings and doing
435     en-/decoding oneself, but unicode escapes are not working properly).
436    
437     No roundtripping (strings get clobbered if they look like numbers, e.g.
438     the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
439     decode into the number 2.
440    
441 root 1.5 =item JSON::PC 0.01
442 root 1.3
443     Very fast.
444    
445     Undocumented/buggy Unicode handling.
446    
447     No roundtripping.
448    
449 root 1.4 Has problems handling many Perl values (e.g. regex results and other magic
450     values will make it croak).
451 root 1.3
452     Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
453     which is not a valid JSON string.
454    
455     Unmaintained (maintainer unresponsive for many months, bugs are not
456     getting fixed).
457    
458 root 1.5 =item JSON::Syck 0.21
459 root 1.3
460     Very buggy (often crashes).
461    
462 root 1.4 Very inflexible (no human-readable format supported, format pretty much
463     undocumented. I need at least a format for easy reading by humans and a
464     single-line compact format for use in a protocol, and preferably a way to
465     generate ASCII-only JSON strings).
466 root 1.3
467     Completely broken (and confusingly documented) Unicode handling (unicode
468     escapes are not working properly, you need to set ImplicitUnicode to
469     I<different> values on en- and decoding to get symmetric behaviour).
470    
471     No roundtripping (simple cases work, but this depends on wether the scalar
472     value was used in a numeric context or not).
473    
474     Dumping hashes may skip hash values depending on iterator state.
475    
476     Unmaintained (maintainer unresponsive for many months, bugs are not
477     getting fixed).
478    
479     Does not check input for validity (i.e. will accept non-JSON input and
480     return "something" instead of raising an exception. This is a security
481     issue: imagine two banks transfering money between each other using
482     JSON. One bank might parse a given non-JSON request and deduct money,
483     while the other might reject the transaction with a syntax error. While a
484     good protocol will at least recover, that is extra unnecessary work and
485     the transaction will still not succeed).
486    
487 root 1.5 =item JSON::DWIW 0.04
488 root 1.3
489     Very fast. Very natural. Very nice.
490    
491     Undocumented unicode handling (but the best of the pack. Unicode escapes
492     still don't get parsed properly).
493    
494     Very inflexible.
495    
496     No roundtripping.
497    
498 root 1.4 Does not generate valid JSON (key strings are often unquoted, empty keys
499     result in nothing being output)
500    
501 root 1.3 Does not check input for validity.
502    
503     =back
504    
505     =head2 SPEED
506    
507 root 1.4 It seems that JSON::XS is surprisingly fast, as shown in the following
508     tables. They have been generated with the help of the C<eg/bench> program
509     in the JSON::XS distribution, to make it easy to compare on your own
510     system.
511    
512 root 1.13 First comes a comparison between various modules using a very short JSON
513     string (83 bytes), showing the number of encodes/decodes per second
514     (JSON::XS is the functional interface, while JSON::XS/2 is the OO
515     interface with pretty-printing and hashkey sorting enabled). Higher is
516     better:
517 root 1.4
518     module | encode | decode |
519     -----------|------------|------------|
520     JSON | 14006 | 6820 |
521     JSON::DWIW | 200937 | 120386 |
522     JSON::PC | 85065 | 129366 |
523     JSON::Syck | 59898 | 44232 |
524     JSON::XS | 1171478 | 342435 |
525     JSON::XS/2 | 730760 | 328714 |
526     -----------+------------+------------+
527    
528     That is, JSON::XS is 6 times faster than than JSON::DWIW and about 80
529     times faster than JSON, even with pretty-printing and key sorting.
530    
531 root 1.13 Using a longer test string (roughly 18KB, generated from Yahoo! Locals
532 root 1.4 search API (http://nanoref.com/yahooapis/mgPdGg):
533    
534     module | encode | decode |
535     -----------|------------|------------|
536     JSON | 673 | 38 |
537     JSON::DWIW | 5271 | 770 |
538     JSON::PC | 9901 | 2491 |
539     JSON::Syck | 2360 | 786 |
540     JSON::XS | 37398 | 3202 |
541     JSON::XS/2 | 13765 | 3153 |
542     -----------+------------+------------+
543    
544     Again, JSON::XS leads by far in the encoding case, while still beating
545     every other module in the decoding case.
546    
547 root 1.13 On large strings containing lots of unicode characters, some modules
548     (such as JSON::PC) decode faster than JSON::XS, but the result will be
549     broken due to missing unicode handling. Others refuse to decode or encode
550     properly, so it was impossible to prepare a fair comparison table for that
551     case.
552    
553 root 1.11 =head1 RESOURCE LIMITS
554    
555     JSON::XS does not impose any limits on the size of JSON texts or Perl
556 root 1.12 values they represent - if your machine can handle it, JSON::XS will
557 root 1.11 encode or decode it. Future versions might optionally impose structure
558     depth and memory use resource limits.
559    
560 root 1.4 =head1 BUGS
561    
562     While the goal of this module is to be correct, that unfortunately does
563     not mean its bug-free, only that I think its design is bug-free. It is
564     still very young and not well-tested. If you keep reporting bugs they will
565     be fixed swiftly, though.
566    
567 root 1.2 =cut
568    
569     1;
570    
571 root 1.1 =head1 AUTHOR
572    
573     Marc Lehmann <schmorp@schmorp.de>
574     http://home.schmorp.de/
575    
576     =cut
577