ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
Revision: 1.11
Committed: Fri Mar 23 17:48:59 2007 UTC (17 years, 2 months ago) by root
Branch: MAIN
Changes since 1.10: +11 -0 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     JSON::XS - JSON serialising/deserialising, done correctly and fast
4    
5     =head1 SYNOPSIS
6    
7     use JSON::XS;
8    
9     =head1 DESCRIPTION
10    
11 root 1.2 This module converts Perl data structures to JSON and vice versa. Its
12     primary goal is to be I<correct> and its secondary goal is to be
13     I<fast>. To reach the latter goal it was written in C.
14    
15     As this is the n-th-something JSON module on CPAN, what was the reason
16     to write yet another JSON module? While it seems there are many JSON
17     modules, none of them correctly handle all corner cases, and in most cases
18     their maintainers are unresponsive, gone missing, or not listening to bug
19     reports for other reasons.
20    
21     See COMPARISON, below, for a comparison to some other JSON modules.
22    
23 root 1.10 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
24     vice versa.
25    
26 root 1.2 =head2 FEATURES
27    
28 root 1.1 =over 4
29    
30 root 1.2 =item * correct handling of unicode issues
31    
32 root 1.10 This module knows how to handle Unicode, and even documents how and when
33     it does so.
34 root 1.2
35     =item * round-trip integrity
36    
37     When you serialise a perl data structure using only datatypes supported
38     by JSON, the deserialised data structure is identical on the Perl level.
39     (e.g. the string "2.0" doesn't suddenly become "2").
40    
41     =item * strict checking of JSON correctness
42    
43     There is no guessing, no generating of illegal JSON strings by default,
44 root 1.10 and only JSON is accepted as input by default (the latter is a security
45     feature).
46 root 1.2
47     =item * fast
48    
49 root 1.10 Compared to other JSON modules, this module compares favourably in terms
50     of speed, too.
51 root 1.2
52     =item * simple to use
53    
54     This module has both a simple functional interface as well as an OO
55     interface.
56    
57     =item * reasonably versatile output formats
58    
59 root 1.10 You can choose between the most compact guarenteed single-line format
60     possible (nice for simple line-based protocols), a pure-ascii format (for
61     when your transport is not 8-bit clean), or a pretty-printed format (for
62     when you want to read that stuff). Or you can combine those features in
63 root 1.2 whatever way you like.
64    
65     =back
66    
67 root 1.1 =cut
68    
69     package JSON::XS;
70    
71     BEGIN {
72 root 1.9 $VERSION = '0.3';
73 root 1.1 @ISA = qw(Exporter);
74    
75 root 1.2 @EXPORT = qw(to_json from_json);
76 root 1.1 require Exporter;
77    
78     require XSLoader;
79     XSLoader::load JSON::XS::, $VERSION;
80     }
81    
82 root 1.2 =head1 FUNCTIONAL INTERFACE
83    
84     The following convinience methods are provided by this module. They are
85     exported by default:
86    
87     =over 4
88    
89     =item $json_string = to_json $perl_scalar
90    
91     Converts the given Perl data structure (a simple scalar or a reference to
92     a hash or array) to a UTF-8 encoded, binary string (that is, the string contains
93     octets only). Croaks on error.
94    
95 root 1.10 This function call is functionally identical to C<< JSON::XS->new->utf8->encode ($perl_scalar) >>.
96 root 1.2
97     =item $perl_scalar = from_json $json_string
98    
99     The opposite of C<to_json>: expects an UTF-8 (binary) string and tries to
100     parse that as an UTF-8 encoded JSON string, returning the resulting simple
101     scalar or reference. Croaks on error.
102    
103 root 1.10 This function call is functionally identical to C<< JSON::XS->new->utf8->decode ($json_string) >>.
104 root 1.2
105     =back
106    
107     =head1 OBJECT-ORIENTED INTERFACE
108    
109     The object oriented interface lets you configure your own encoding or
110     decoding style, within the limits of supported formats.
111    
112     =over 4
113    
114     =item $json = new JSON::XS
115    
116     Creates a new JSON::XS object that can be used to de/encode JSON
117     strings. All boolean flags described below are by default I<disabled>.
118 root 1.1
119 root 1.2 The mutators for flags all return the JSON object again and thus calls can
120     be chained:
121    
122 root 1.3 my $json = JSON::XS->new->utf8(1)->space_after(1)->encode ({a => [1,2]})
123     => {"a": [1, 2]}
124 root 1.2
125 root 1.7 =item $json = $json->ascii ([$enable])
126 root 1.2
127 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will
128     not generate characters outside the code range C<0..127>. Any unicode
129     characters outside that range will be escaped using either a single
130     \uXXXX (BMP characters) or a double \uHHHH\uLLLLL escape sequence, as per
131     RFC4627.
132 root 1.2
133     If C<$enable> is false, then the C<encode> method will not escape Unicode
134     characters unless necessary.
135    
136 root 1.3 JSON::XS->new->ascii (1)->encode (chr 0x10401)
137     => \ud801\udc01
138    
139 root 1.7 =item $json = $json->utf8 ([$enable])
140 root 1.2
141 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will encode
142     the JSON string into UTF-8, as required by many protocols, while the
143     C<decode> method expects to be handled an UTF-8-encoded string. Please
144     note that UTF-8-encoded strings do not contain any characters outside the
145     range C<0..255>, they are thus useful for bytewise/binary I/O.
146 root 1.2
147     If C<$enable> is false, then the C<encode> method will return the JSON
148     string as a (non-encoded) unicode string, while C<decode> expects thus a
149     unicode string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs
150     to be done yourself, e.g. using the Encode module.
151    
152 root 1.7 =item $json = $json->pretty ([$enable])
153 root 1.2
154     This enables (or disables) all of the C<indent>, C<space_before> and
155 root 1.3 C<space_after> (and in the future possibly more) flags in one call to
156 root 1.2 generate the most readable (or most compact) form possible.
157    
158 root 1.3 my $json = JSON::XS->new->pretty(1)->encode ({a => [1,2]})
159     =>
160     {
161     "a" : [
162     1,
163     2
164     ]
165     }
166    
167 root 1.7 =item $json = $json->indent ([$enable])
168 root 1.2
169 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will use a multiline
170 root 1.2 format as output, putting every array member or object/hash key-value pair
171     into its own line, identing them properly.
172    
173     If C<$enable> is false, no newlines or indenting will be produced, and the
174     resulting JSON strings is guarenteed not to contain any C<newlines>.
175    
176     This setting has no effect when decoding JSON strings.
177    
178 root 1.7 =item $json = $json->space_before ([$enable])
179 root 1.2
180 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
181 root 1.2 optional space before the C<:> separating keys from values in JSON objects.
182    
183     If C<$enable> is false, then the C<encode> method will not add any extra
184     space at those places.
185    
186     This setting has no effect when decoding JSON strings. You will also most
187     likely combine this setting with C<space_after>.
188    
189 root 1.7 =item $json = $json->space_after ([$enable])
190 root 1.2
191 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will add an extra
192 root 1.2 optional space after the C<:> separating keys from values in JSON objects
193     and extra whitespace after the C<,> separating key-value pairs and array
194     members.
195    
196     If C<$enable> is false, then the C<encode> method will not add any extra
197     space at those places.
198    
199     This setting has no effect when decoding JSON strings.
200    
201 root 1.7 =item $json = $json->canonical ([$enable])
202 root 1.2
203 root 1.7 If C<$enable> is true (or missing), then the C<encode> method will output JSON objects
204 root 1.2 by sorting their keys. This is adding a comparatively high overhead.
205    
206     If C<$enable> is false, then the C<encode> method will output key-value
207     pairs in the order Perl stores them (which will likely change between runs
208     of the same script).
209    
210     This option is useful if you want the same data structure to be encoded as
211     the same JSON string (given the same overall settings). If it is disabled,
212     the same hash migh be encoded differently even if contains the same data,
213     as key-value pairs have no inherent ordering in Perl.
214    
215     This setting has no effect when decoding JSON strings.
216    
217 root 1.7 =item $json = $json->allow_nonref ([$enable])
218 root 1.3
219 root 1.7 If C<$enable> is true (or missing), then the C<encode> method can convert a
220 root 1.3 non-reference into its corresponding string, number or null JSON value,
221     which is an extension to RFC4627. Likewise, C<decode> will accept those JSON
222     values instead of croaking.
223    
224     If C<$enable> is false, then the C<encode> method will croak if it isn't
225     passed an arrayref or hashref, as JSON strings must either be an object
226     or array. Likewise, C<decode> will croak if given something that is not a
227     JSON object or array.
228    
229 root 1.7 =item $json = $json->shrink ([$enable])
230    
231     Perl usually over-allocates memory a bit when allocating space for
232     strings. This flag optionally resizes strings generated by either
233     C<encode> or C<decode> to their minimum size possible. This can save
234     memory when your JSON strings are either very very long or you have many
235 root 1.8 short strings. It will also try to downgrade any strings to octet-form
236     if possible: perl stores strings internally either in an encoding called
237     UTF-X or in octet-form. The latter cannot store everything but uses less
238     space in general.
239 root 1.7
240     If C<$enable> is true (or missing), the string returned by C<encode> will be shrunk-to-fit,
241     while all strings generated by C<decode> will also be shrunk-to-fit.
242    
243     If C<$enable> is false, then the normal perl allocation algorithms are used.
244     If you work with your data, then this is likely to be faster.
245    
246     In the future, this setting might control other things, such as converting
247     strings that look like integers or floats into integers or floats
248     internally (there is no difference on the Perl level), saving space.
249    
250 root 1.2 =item $json_string = $json->encode ($perl_scalar)
251    
252     Converts the given Perl data structure (a simple scalar or a reference
253     to a hash or array) to its JSON representation. Simple scalars will be
254     converted into JSON string or number sequences, while references to arrays
255     become JSON arrays and references to hashes become JSON objects. Undefined
256     Perl values (e.g. C<undef>) become JSON C<null> values. Neither C<true>
257     nor C<false> values will be generated.
258 root 1.1
259 root 1.2 =item $perl_scalar = $json->decode ($json_string)
260 root 1.1
261 root 1.2 The opposite of C<encode>: expects a JSON string and tries to parse it,
262     returning the resulting simple scalar or reference. Croaks on error.
263 root 1.1
264 root 1.2 JSON numbers and strings become simple Perl scalars. JSON arrays become
265     Perl arrayrefs and JSON objects become Perl hashrefs. C<true> becomes
266     C<1>, C<false> becomes C<0> and C<null> becomes C<undef>.
267 root 1.1
268     =back
269    
270 root 1.10 =head1 MAPPING
271    
272     This section describes how JSON::XS maps Perl values to JSON values and
273     vice versa. These mappings are designed to "do the right thing" in most
274     circumstances automatically, preserving round-tripping characteristics
275     (what you put in comes out as something equivalent).
276    
277     For the more enlightened: note that in the following descriptions,
278     lowercase I<perl> refers to the Perl interpreter, while uppcercase I<Perl>
279     refers to the abstract Perl language itself.
280    
281     =head2 JSON -> PERL
282    
283     =over 4
284    
285     =item object
286    
287     A JSON object becomes a reference to a hash in Perl. No ordering of object
288     keys is preserved.
289    
290     =item array
291    
292     A JSON array becomes a reference to an array in Perl.
293    
294     =item string
295    
296     A JSON string becomes a string scalar in Perl - Unicode codepoints in JSON
297     are represented by the same codepoints in the Perl string, so no manual
298     decoding is necessary.
299    
300     =item number
301    
302     A JSON number becomes either an integer or numeric (floating point)
303     scalar in perl, depending on its range and any fractional parts. On the
304     Perl level, there is no difference between those as Perl handles all the
305     conversion details, but an integer may take slightly less memory and might
306     represent more values exactly than (floating point) numbers.
307    
308     =item true, false
309    
310     These JSON atoms become C<0>, C<1>, respectively. Information is lost in
311     this process. Future versions might represent those values differently,
312     but they will be guarenteed to act like these integers would normally in
313     Perl.
314    
315     =item null
316    
317     A JSON null atom becomes C<undef> in Perl.
318    
319     =back
320    
321     =head2 PERL -> JSON
322    
323     The mapping from Perl to JSON is slightly more difficult, as Perl is a
324     truly typeless language, so we can only guess which JSON type is meant by
325     a Perl value.
326    
327     =over 4
328    
329     =item hash references
330    
331     Perl hash references become JSON objects. As there is no inherent ordering
332     in hash keys, they will usually be encoded in a pseudo-random order that
333     can change between runs of the same program but stays generally the same
334     within the single run of a program. JSON::XS can optionally sort the hash
335     keys (determined by the I<canonical> flag), so the same datastructure
336     will serialise to the same JSON text (given same settings and version of
337     JSON::XS), but this incurs a runtime overhead.
338    
339     =item array references
340    
341     Perl array references become JSON arrays.
342    
343     =item blessed objects
344    
345     Blessed objects are not allowed. JSON::XS currently tries to encode their
346     underlying representation (hash- or arrayref), but this behaviour might
347     change in future versions.
348    
349     =item simple scalars
350    
351     Simple Perl scalars (any scalar that is not a reference) are the most
352     difficult objects to encode: JSON::XS will encode undefined scalars as
353     JSON null value, scalars that have last been used in a string context
354     before encoding as JSON strings and anything else as number value:
355    
356     # dump as number
357     to_json [2] # yields [2]
358     to_json [-3.0e17] # yields [-3e+17]
359     my $value = 5; to_json [$value] # yields [5]
360    
361     # used as string, so dump as string
362     print $value;
363     to_json [$value] # yields ["5"]
364    
365     # undef becomes null
366     to_json [undef] # yields [null]
367    
368     You can force the type to be a string by stringifying it:
369    
370     my $x = 3.1; # some variable containing a number
371     "$x"; # stringified
372     $x .= ""; # another, more awkward way to stringify
373     print $x; # perl does it for you, too, quite often
374    
375     You can force the type to be a number by numifying it:
376    
377     my $x = "3"; # some variable containing a string
378     $x += 0; # numify it, ensuring it will be dumped as a number
379     $x *= 1; # same thing, the choise is yours.
380    
381     You can not currently output JSON booleans or force the type in other,
382     less obscure, ways. Tell me if you need this capability.
383    
384 root 1.11 =item circular data structures
385    
386     Those will be encoded until memory or stackspace runs out.
387    
388 root 1.10 =back
389    
390 root 1.3 =head1 COMPARISON
391    
392     As already mentioned, this module was created because none of the existing
393     JSON modules could be made to work correctly. First I will describe the
394     problems (or pleasures) I encountered with various existing JSON modules,
395 root 1.4 followed by some benchmark values. JSON::XS was designed not to suffer
396     from any of these problems or limitations.
397 root 1.3
398     =over 4
399    
400 root 1.5 =item JSON 1.07
401 root 1.3
402     Slow (but very portable, as it is written in pure Perl).
403    
404     Undocumented/buggy Unicode handling (how JSON handles unicode values is
405     undocumented. One can get far by feeding it unicode strings and doing
406     en-/decoding oneself, but unicode escapes are not working properly).
407    
408     No roundtripping (strings get clobbered if they look like numbers, e.g.
409     the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
410     decode into the number 2.
411    
412 root 1.5 =item JSON::PC 0.01
413 root 1.3
414     Very fast.
415    
416     Undocumented/buggy Unicode handling.
417    
418     No roundtripping.
419    
420 root 1.4 Has problems handling many Perl values (e.g. regex results and other magic
421     values will make it croak).
422 root 1.3
423     Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
424     which is not a valid JSON string.
425    
426     Unmaintained (maintainer unresponsive for many months, bugs are not
427     getting fixed).
428    
429 root 1.5 =item JSON::Syck 0.21
430 root 1.3
431     Very buggy (often crashes).
432    
433 root 1.4 Very inflexible (no human-readable format supported, format pretty much
434     undocumented. I need at least a format for easy reading by humans and a
435     single-line compact format for use in a protocol, and preferably a way to
436     generate ASCII-only JSON strings).
437 root 1.3
438     Completely broken (and confusingly documented) Unicode handling (unicode
439     escapes are not working properly, you need to set ImplicitUnicode to
440     I<different> values on en- and decoding to get symmetric behaviour).
441    
442     No roundtripping (simple cases work, but this depends on wether the scalar
443     value was used in a numeric context or not).
444    
445     Dumping hashes may skip hash values depending on iterator state.
446    
447     Unmaintained (maintainer unresponsive for many months, bugs are not
448     getting fixed).
449    
450     Does not check input for validity (i.e. will accept non-JSON input and
451     return "something" instead of raising an exception. This is a security
452     issue: imagine two banks transfering money between each other using
453     JSON. One bank might parse a given non-JSON request and deduct money,
454     while the other might reject the transaction with a syntax error. While a
455     good protocol will at least recover, that is extra unnecessary work and
456     the transaction will still not succeed).
457    
458 root 1.5 =item JSON::DWIW 0.04
459 root 1.3
460     Very fast. Very natural. Very nice.
461    
462     Undocumented unicode handling (but the best of the pack. Unicode escapes
463     still don't get parsed properly).
464    
465     Very inflexible.
466    
467     No roundtripping.
468    
469 root 1.4 Does not generate valid JSON (key strings are often unquoted, empty keys
470     result in nothing being output)
471    
472 root 1.3 Does not check input for validity.
473    
474     =back
475    
476     =head2 SPEED
477    
478 root 1.4 It seems that JSON::XS is surprisingly fast, as shown in the following
479     tables. They have been generated with the help of the C<eg/bench> program
480     in the JSON::XS distribution, to make it easy to compare on your own
481     system.
482    
483     First is a comparison between various modules using a very simple JSON
484     string, showing the number of encodes/decodes per second (JSON::XS is
485     the functional interface, while JSON::XS/2 is the OO interface with
486     pretty-printing and hashkey sorting enabled).
487    
488     module | encode | decode |
489     -----------|------------|------------|
490     JSON | 14006 | 6820 |
491     JSON::DWIW | 200937 | 120386 |
492     JSON::PC | 85065 | 129366 |
493     JSON::Syck | 59898 | 44232 |
494     JSON::XS | 1171478 | 342435 |
495     JSON::XS/2 | 730760 | 328714 |
496     -----------+------------+------------+
497    
498     That is, JSON::XS is 6 times faster than than JSON::DWIW and about 80
499     times faster than JSON, even with pretty-printing and key sorting.
500    
501     Using a longer test string (roughly 8KB, generated from Yahoo! Locals
502     search API (http://nanoref.com/yahooapis/mgPdGg):
503    
504     module | encode | decode |
505     -----------|------------|------------|
506     JSON | 673 | 38 |
507     JSON::DWIW | 5271 | 770 |
508     JSON::PC | 9901 | 2491 |
509     JSON::Syck | 2360 | 786 |
510     JSON::XS | 37398 | 3202 |
511     JSON::XS/2 | 13765 | 3153 |
512     -----------+------------+------------+
513    
514     Again, JSON::XS leads by far in the encoding case, while still beating
515     every other module in the decoding case.
516    
517     Last example is an almost 8MB large hash with many large binary values
518     (PNG files), resulting in a lot of escaping:
519    
520 root 1.11 =head1 RESOURCE LIMITS
521    
522     JSON::XS does not impose any limits on the size of JSON texts or Perl
523     values they represent - if your machine cna handle it, JSON::XS will
524     encode or decode it. Future versions might optionally impose structure
525     depth and memory use resource limits.
526    
527 root 1.4 =head1 BUGS
528    
529     While the goal of this module is to be correct, that unfortunately does
530     not mean its bug-free, only that I think its design is bug-free. It is
531     still very young and not well-tested. If you keep reporting bugs they will
532     be fixed swiftly, though.
533    
534 root 1.2 =cut
535    
536     1;
537    
538 root 1.1 =head1 AUTHOR
539    
540     Marc Lehmann <schmorp@schmorp.de>
541     http://home.schmorp.de/
542    
543     =cut
544