ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/README
(Generate patch)

Comparing JSON-XS/README (file contents):
Revision 1.24 by root, Thu Mar 27 06:37:35 2008 UTC vs.
Revision 1.29 by root, Thu Feb 19 01:13:46 2009 UTC

32 primary goal is to be *correct* and its secondary goal is to be *fast*. 32 primary goal is to be *correct* and its secondary goal is to be *fast*.
33 To reach the latter goal it was written in C. 33 To reach the latter goal it was written in C.
34 34
35 Beginning with version 2.0 of the JSON module, when both JSON and 35 Beginning with version 2.0 of the JSON module, when both JSON and
36 JSON::XS are installed, then JSON will fall back on JSON::XS (this can 36 JSON::XS are installed, then JSON will fall back on JSON::XS (this can
37 be overriden) with no overhead due to emulation (by inheritign 37 be overridden) with no overhead due to emulation (by inheriting
38 constructor and methods). If JSON::XS is not available, it will fall 38 constructor and methods). If JSON::XS is not available, it will fall
39 back to the compatible JSON::PP module as backend, so using JSON instead 39 back to the compatible JSON::PP module as backend, so using JSON instead
40 of JSON::XS gives you a portable JSON API that can be fast when you need 40 of JSON::XS gives you a portable JSON API that can be fast when you need
41 and doesn't require a C compiler when that is a problem. 41 and doesn't require a C compiler when that is a problem.
42 42
44 to write yet another JSON module? While it seems there are many JSON 44 to write yet another JSON module? While it seems there are many JSON
45 modules, none of them correctly handle all corner cases, and in most 45 modules, none of them correctly handle all corner cases, and in most
46 cases their maintainers are unresponsive, gone missing, or not listening 46 cases their maintainers are unresponsive, gone missing, or not listening
47 to bug reports for other reasons. 47 to bug reports for other reasons.
48 48
49 See COMPARISON, below, for a comparison to some other JSON modules.
50
51 See MAPPING, below, on how JSON::XS maps perl values to JSON values and 49 See MAPPING, below, on how JSON::XS maps perl values to JSON values and
52 vice versa. 50 vice versa.
53 51
54 FEATURES 52 FEATURES
55 * correct Unicode handling 53 * correct Unicode handling
57 This module knows how to handle Unicode, documents how and when it 55 This module knows how to handle Unicode, documents how and when it
58 does so, and even documents what "correct" means. 56 does so, and even documents what "correct" means.
59 57
60 * round-trip integrity 58 * round-trip integrity
61 59
62 When you serialise a perl data structure using only datatypes 60 When you serialise a perl data structure using only data types
63 supported by JSON, the deserialised data structure is identical on 61 supported by JSON, the deserialised data structure is identical on
64 the Perl level. (e.g. the string "2.0" doesn't suddenly become "2" 62 the Perl level. (e.g. the string "2.0" doesn't suddenly become "2"
65 just because it looks like a number). There minor *are* exceptions 63 just because it looks like a number). There minor *are* exceptions
66 to this, read the MAPPING section below to learn about those. 64 to this, read the MAPPING section below to learn about those.
67 65
78 too. 76 too.
79 77
80 * simple to use 78 * simple to use
81 79
82 This module has both a simple functional interface as well as an 80 This module has both a simple functional interface as well as an
83 objetc oriented interface interface. 81 object oriented interface interface.
84 82
85 * reasonably versatile output formats 83 * reasonably versatile output formats
86 84
87 You can choose between the most compact guaranteed-single-line 85 You can choose between the most compact guaranteed-single-line
88 format possible (nice for simple line-based protocols), a pure-ascii 86 format possible (nice for simple line-based protocols), a pure-ASCII
89 format (for when your transport is not 8-bit clean, still supports 87 format (for when your transport is not 8-bit clean, still supports
90 the whole Unicode range), or a pretty-printed format (for when you 88 the whole Unicode range), or a pretty-printed format (for when you
91 want to read that stuff). Or you can combine those features in 89 want to read that stuff). Or you can combine those features in
92 whatever way you like. 90 whatever way you like.
93 91
101 99
102 This function call is functionally identical to: 100 This function call is functionally identical to:
103 101
104 $json_text = JSON::XS->new->utf8->encode ($perl_scalar) 102 $json_text = JSON::XS->new->utf8->encode ($perl_scalar)
105 103
106 except being faster. 104 Except being faster.
107 105
108 $perl_scalar = decode_json $json_text 106 $perl_scalar = decode_json $json_text
109 The opposite of "encode_json": expects an UTF-8 (binary) string and 107 The opposite of "encode_json": expects an UTF-8 (binary) string and
110 tries to parse that as an UTF-8 encoded JSON text, returning the 108 tries to parse that as an UTF-8 encoded JSON text, returning the
111 resulting reference. Croaks on error. 109 resulting reference. Croaks on error.
112 110
113 This function call is functionally identical to: 111 This function call is functionally identical to:
114 112
115 $perl_scalar = JSON::XS->new->utf8->decode ($json_text) 113 $perl_scalar = JSON::XS->new->utf8->decode ($json_text)
116 114
117 except being faster. 115 Except being faster.
118 116
119 $is_boolean = JSON::XS::is_bool $scalar 117 $is_boolean = JSON::XS::is_bool $scalar
120 Returns true if the passed scalar represents either JSON::XS::true 118 Returns true if the passed scalar represents either JSON::XS::true
121 or JSON::XS::false, two constants that act like 1 and 0, 119 or JSON::XS::false, two constants that act like 1 and 0,
122 respectively and are used to represent JSON "true" and "false" 120 respectively and are used to represent JSON "true" and "false"
152 150
153 If you didn't know about that flag, just the better, pretend it 151 If you didn't know about that flag, just the better, pretend it
154 doesn't exist. 152 doesn't exist.
155 153
156 4. A "Unicode String" is simply a string where each character can be 154 4. A "Unicode String" is simply a string where each character can be
157 validly interpreted as a Unicode codepoint. 155 validly interpreted as a Unicode code point.
158 If you have UTF-8 encoded data, it is no longer a Unicode string, 156 If you have UTF-8 encoded data, it is no longer a Unicode string,
159 but a Unicode string encoded in UTF-8, giving you a binary string. 157 but a Unicode string encoded in UTF-8, giving you a binary string.
160 158
161 5. A string containing "high" (> 255) character values is *not* a UTF-8 159 5. A string containing "high" (> 255) character values is *not* a UTF-8
162 string. 160 string.
397 Example, encode a Perl scalar as JSON value with enabled 395 Example, encode a Perl scalar as JSON value with enabled
398 "allow_nonref", resulting in an invalid JSON text: 396 "allow_nonref", resulting in an invalid JSON text:
399 397
400 JSON::XS->new->allow_nonref->encode ("Hello, World!") 398 JSON::XS->new->allow_nonref->encode ("Hello, World!")
401 => "Hello, World!" 399 => "Hello, World!"
400
401 $json = $json->allow_unknown ([$enable])
402 $enabled = $json->get_allow_unknown
403 If $enable is true (or missing), then "encode" will *not* throw an
404 exception when it encounters values it cannot represent in JSON (for
405 example, filehandles) but instead will encode a JSON "null" value.
406 Note that blessed objects are not included here and are handled
407 separately by c<allow_nonref>.
408
409 If $enable is false (the default), then "encode" will throw an
410 exception when it encounters anything it cannot encode as JSON.
411
412 This option does not affect "decode" in any way, and it is
413 recommended to leave it off unless you know your communications
414 partner.
402 415
403 $json = $json->allow_blessed ([$enable]) 416 $json = $json->allow_blessed ([$enable])
404 $enabled = $json->get_allow_blessed 417 $enabled = $json->get_allow_blessed
405 If $enable is true (or missing), then the "encode" method will not 418 If $enable is true (or missing), then the "encode" method will not
406 barf when it encounters a blessed reference. Instead, the value of 419 barf when it encounters a blessed reference. Instead, the value of
541 saving space. 554 saving space.
542 555
543 $json = $json->max_depth ([$maximum_nesting_depth]) 556 $json = $json->max_depth ([$maximum_nesting_depth])
544 $max_depth = $json->get_max_depth 557 $max_depth = $json->get_max_depth
545 Sets the maximum nesting level (default 512) accepted while encoding 558 Sets the maximum nesting level (default 512) accepted while encoding
546 or decoding. If the JSON text or Perl data structure has an equal or 559 or decoding. If a higher nesting level is detected in JSON text or a
547 higher nesting level then this limit, then the encoder and decoder 560 Perl data structure, then the encoder and decoder will stop and
548 will stop and croak at that point. 561 croak at that point.
549 562
550 Nesting level is defined by number of hash- or arrayrefs that the 563 Nesting level is defined by number of hash- or arrayrefs that the
551 encoder needs to traverse to reach a given point or the number of 564 encoder needs to traverse to reach a given point or the number of
552 "{" or "[" characters without their matching closing parenthesis 565 "{" or "[" characters without their matching closing parenthesis
553 crossed to reach a given character in a string. 566 crossed to reach a given character in a string.
554 567
555 Setting the maximum depth to one disallows any nesting, so that 568 Setting the maximum depth to one disallows any nesting, so that
556 ensures that the object is only a single hash/object or array. 569 ensures that the object is only a single hash/object or array.
557 570
558 The argument to "max_depth" will be rounded up to the next highest
559 power of two. If no argument is given, the highest possible setting 571 If no argument is given, the highest possible setting will be used,
560 will be used, which is rarely useful. 572 which is rarely useful.
573
574 Note that nesting is implemented by recursion in C. The default
575 value has been chosen to be as large as typical operating systems
576 allow without crashing.
561 577
562 See SECURITY CONSIDERATIONS, below, for more info on why this is 578 See SECURITY CONSIDERATIONS, below, for more info on why this is
563 useful. 579 useful.
564 580
565 $json = $json->max_size ([$maximum_string_size]) 581 $json = $json->max_size ([$maximum_string_size])
566 $max_size = $json->get_max_size 582 $max_size = $json->get_max_size
567 Set the maximum length a JSON text may have (in bytes) where 583 Set the maximum length a JSON text may have (in bytes) where
568 decoding is being attempted. The default is 0, meaning no limit. 584 decoding is being attempted. The default is 0, meaning no limit.
569 When "decode" is called on a string longer then this number of 585 When "decode" is called on a string that is longer then this many
570 characters it will not attempt to decode the string but throw an 586 bytes, it will not attempt to decode the string but throw an
571 exception. This setting has no effect on "encode" (yet). 587 exception. This setting has no effect on "encode" (yet).
572 588
573 The argument to "max_size" will be rounded up to the next highest
574 power of two (so may be more than requested). If no argument is
575 given, the limit check will be deactivated (same as when 0 is 589 If no argument is given, the limit check will be deactivated (same
576 specified). 590 as when 0 is specified).
577 591
578 See SECURITY CONSIDERATIONS, below, for more info on why this is 592 See SECURITY CONSIDERATIONS, below, for more info on why this is
579 useful. 593 useful.
580 594
581 $json_text = $json->encode ($perl_scalar) 595 $json_text = $json->encode ($perl_scalar)
607 621
608 JSON::XS->new->decode_prefix ("[1] the tail") 622 JSON::XS->new->decode_prefix ("[1] the tail")
609 => ([], 3) 623 => ([], 3)
610 624
611INCREMENTAL PARSING 625INCREMENTAL PARSING
612 [This section and the API it details is still EXPERIMENTAL]
613
614 In some cases, there is the need for incremental parsing of JSON texts. 626 In some cases, there is the need for incremental parsing of JSON texts.
615 While this module always has to keep both JSON text and resulting Perl 627 While this module always has to keep both JSON text and resulting Perl
616 data structure in memory at one time, it does allow you to parse a JSON 628 data structure in memory at one time, it does allow you to parse a JSON
617 stream incrementally. It does so by accumulating text until it has a 629 stream incrementally. It does so by accumulating text until it has a
618 full JSON object, which it then can decode. This process is similar to 630 full JSON object, which it then can decode. This process is similar to
619 using "decode_prefix" to see if a full JSON object is available, but is 631 using "decode_prefix" to see if a full JSON object is available, but is
620 much more efficient (JSON::XS will only attempt to parse the JSON text 632 much more efficient (and can be implemented with a minimum of method
621 once it is sure it has enough text to get a decisive result, using a 633 calls).
622 very simple but truly incremental parser).
623 634
624 The following two methods deal with this. 635 JSON::XS will only attempt to parse the JSON text once it is sure it has
636 enough text to get a decisive result, using a very simple but truly
637 incremental parser. This means that it sometimes won't stop as early as
638 the full parser, for example, it doesn't detect parenthese mismatches.
639 The only thing it guarantees is that it starts decoding as soon as a
640 syntactically valid JSON text has been seen. This means you need to set
641 resource limits (e.g. "max_size") to ensure the parser will stop parsing
642 in the presence if syntax errors.
643
644 The following methods implement this incremental parser.
625 645
626 [void, scalar or list context] = $json->incr_parse ([$string]) 646 [void, scalar or list context] = $json->incr_parse ([$string])
627 This is the central parsing function. It can both append new text 647 This is the central parsing function. It can both append new text
628 and extract objects from the stream accumulated so far (both of 648 and extract objects from the stream accumulated so far (both of
629 these functions are optional). 649 these functions are optional).
664 after a JSON object or b) parsing multiple JSON objects separated by 684 after a JSON object or b) parsing multiple JSON objects separated by
665 non-JSON text (such as commas). 685 non-JSON text (such as commas).
666 686
667 $json->incr_skip 687 $json->incr_skip
668 This will reset the state of the incremental parser and will remove 688 This will reset the state of the incremental parser and will remove
669 the parsed text from the input buffer. This is useful after 689 the parsed text from the input buffer so far. This is useful after
670 "incr_parse" died, in which case the input buffer and incremental 690 "incr_parse" died, in which case the input buffer and incremental
671 parser state is left unchanged, to skip the text parsed so far and 691 parser state is left unchanged, to skip the text parsed so far and
672 to reset the parse state. 692 to reset the parse state.
693
694 The difference to "incr_reset" is that only text until the parse
695 error occured is removed.
696
697 $json->incr_reset
698 This completely resets the incremental parser, that is, after this
699 call, it will be as if the parser had never parsed anything.
700
701 This is useful if you want to repeatedly parse JSON objects and want
702 to ignore any trailing data, which means you have to reset the
703 parser after each successful decode.
673 704
674 LIMITATIONS 705 LIMITATIONS
675 All options that affect decoding are supported, except "allow_nonref". 706 All options that affect decoding are supported, except "allow_nonref".
676 The reason for this is that it cannot be made to work sensibly: JSON 707 The reason for this is that it cannot be made to work sensibly: JSON
677 objects and arrays are self-delimited, i.e. you can concatenate them 708 objects and arrays are self-delimited, i.e. you can concatenate them
897 an exception to be thrown, except for references to the integers 0 928 an exception to be thrown, except for references to the integers 0
898 and 1, which get turned into "false" and "true" atoms in JSON. You 929 and 1, which get turned into "false" and "true" atoms in JSON. You
899 can also use "JSON::XS::false" and "JSON::XS::true" to improve 930 can also use "JSON::XS::false" and "JSON::XS::true" to improve
900 readability. 931 readability.
901 932
902 encode_json [\0,JSON::XS::true] # yields [false,true] 933 encode_json [\0, JSON::XS::true] # yields [false,true]
903 934
904 JSON::XS::true, JSON::XS::false 935 JSON::XS::true, JSON::XS::false
905 These special values become JSON true and JSON false values, 936 These special values become JSON true and JSON false values,
906 respectively. You can also use "\1" and "\0" directly if you want. 937 respectively. You can also use "\1" and "\0" directly if you want.
907 938
1040 any character set and 8-bit-encoding, and still get the same data 1071 any character set and 8-bit-encoding, and still get the same data
1041 structure back. This is useful when your channel for JSON transfer 1072 structure back. This is useful when your channel for JSON transfer
1042 is not 8-bit clean or the encoding might be mangled in between (e.g. 1073 is not 8-bit clean or the encoding might be mangled in between (e.g.
1043 in mail), and works because ASCII is a proper subset of most 8-bit 1074 in mail), and works because ASCII is a proper subset of most 8-bit
1044 and multibyte encodings in use in the world. 1075 and multibyte encodings in use in the world.
1076
1077 JSON and ECMAscript
1078 JSON syntax is based on how literals are represented in javascript (the
1079 not-standardised predecessor of ECMAscript) which is presumably why it
1080 is called "JavaScript Object Notation".
1081
1082 However, JSON is not a subset (and also not a superset of course) of
1083 ECMAscript (the standard) or javascript (whatever browsers actually
1084 implement).
1085
1086 If you want to use javascript's "eval" function to "parse" JSON, you
1087 might run into parse errors for valid JSON texts, or the resulting data
1088 structure might not be queryable:
1089
1090 One of the problems is that U+2028 and U+2029 are valid characters
1091 inside JSON strings, but are not allowed in ECMAscript string literals,
1092 so the following Perl fragment will not output something that can be
1093 guaranteed to be parsable by javascript's "eval":
1094
1095 use JSON::XS;
1096
1097 print encode_json [chr 0x2028];
1098
1099 The right fix for this is to use a proper JSON parser in your javascript
1100 programs, and not rely on "eval" (see for example Douglas Crockford's
1101 json2.js parser).
1102
1103 If this is not an option, you can, as a stop-gap measure, simply encode
1104 to ASCII-only JSON:
1105
1106 use JSON::XS;
1107
1108 print JSON::XS->new->ascii->encode ([chr 0x2028]);
1109
1110 Note that this will enlarge the resulting JSON text quite a bit if you
1111 have many non-ASCII characters. You might be tempted to run some regexes
1112 to only escape U+2028 and U+2029, e.g.:
1113
1114 # DO NOT USE THIS!
1115 my $json = JSON::XS->new->utf8->encode ([chr 0x2028]);
1116 $json =~ s/\xe2\x80\xa8/\\u2028/g; # escape U+2028
1117 $json =~ s/\xe2\x80\xa9/\\u2029/g; # escape U+2029
1118 print $json;
1119
1120 Note that *this is a bad idea*: the above only works for U+2028 and
1121 U+2029 and thus only for fully ECMAscript-compliant parsers. Many
1122 existing javascript implementations, however, have issues with other
1123 characters as well - using "eval" naively simply *will* cause problems.
1124
1125 Another problem is that some javascript implementations reserve some
1126 property names for their own purposes (which probably makes them
1127 non-ECMAscript-compliant). For example, Iceweasel reserves the
1128 "__proto__" property name for it's own purposes.
1129
1130 If that is a problem, you could parse try to filter the resulting JSON
1131 output for these property strings, e.g.:
1132
1133 $json =~ s/"__proto__"\s*:/"__proto__renamed":/g;
1134
1135 This works because "__proto__" is not valid outside of strings, so every
1136 occurence of ""__proto__"\s*:" must be a string used as property name.
1137
1138 If you know of other incompatibilities, please let me know.
1045 1139
1046 JSON and YAML 1140 JSON and YAML
1047 You often hear that JSON is a subset of YAML. This is, however, a mass 1141 You often hear that JSON is a subset of YAML. This is, however, a mass
1048 hysteria(*) and very far from the truth (as of the time of this 1142 hysteria(*) and very far from the truth (as of the time of this
1049 writing), so let me state it clearly: *in general, there is no way to 1143 writing), so let me state it clearly: *in general, there is no way to
1097 1191
1098 First comes a comparison between various modules using a very short 1192 First comes a comparison between various modules using a very short
1099 single-line JSON string (also available at 1193 single-line JSON string (also available at
1100 <http://dist.schmorp.de/misc/json/short.json>). 1194 <http://dist.schmorp.de/misc/json/short.json>).
1101 1195
1102 {"method": "handleMessage", "params": ["user1", "we were just talking"], \ 1196 {"method": "handleMessage", "params": ["user1",
1103 "id": null, "array":[1,11,234,-5,1e5,1e7, true, false]} 1197 "we were just talking"], "id": null, "array":[1,11,234,-5,1e5,1e7,
1198 true, false]}
1104 1199
1105 It shows the number of encodes/decodes per second (JSON::XS uses the 1200 It shows the number of encodes/decodes per second (JSON::XS uses the
1106 functional interface, while JSON::XS/2 uses the OO interface with 1201 functional interface, while JSON::XS/2 uses the OO interface with
1107 pretty-printing and hashkey sorting enabled, JSON::XS/3 enables shrink). 1202 pretty-printing and hashkey sorting enabled, JSON::XS/3 enables shrink).
1108 Higher is better: 1203 Higher is better:
1201 1296
1202 (It might actually work, but you have been warned). 1297 (It might actually work, but you have been warned).
1203 1298
1204BUGS 1299BUGS
1205 While the goal of this module is to be correct, that unfortunately does 1300 While the goal of this module is to be correct, that unfortunately does
1206 not mean it's bug-free, only that I think its design is bug-free. It is 1301 not mean it's bug-free, only that I think its design is bug-free. If you
1207 still relatively early in its development. If you keep reporting bugs
1208 they will be fixed swiftly, though. 1302 keep reporting bugs they will be fixed swiftly, though.
1209 1303
1210 Please refrain from using rt.cpan.org or any other bug reporting 1304 Please refrain from using rt.cpan.org or any other bug reporting
1211 service. I put the contact address into my modules for a reason. 1305 service. I put the contact address into my modules for a reason.
1212 1306
1213SEE ALSO 1307SEE ALSO

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines