ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
(Generate patch)

Comparing JSON-XS/XS.pm (file contents):
Revision 1.90 by root, Wed Mar 19 22:28:43 2008 UTC vs.
Revision 1.95 by root, Tue Mar 25 16:56:09 2008 UTC

1=head1 NAME
2
1=encoding utf-8 3=encoding utf-8
2
3=head1 NAME
4 4
5JSON::XS - JSON serialising/deserialising, done correctly and fast 5JSON::XS - JSON serialising/deserialising, done correctly and fast
6 6
7JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ 7JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
8 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html) 8 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
681 => ([], 3) 681 => ([], 3)
682 682
683=back 683=back
684 684
685 685
686=head1 INCREMENTAL PARSING
687
688[This section is still EXPERIMENTAL]
689
690In some cases, there is the need for incremental parsing of JSON
691texts. While this module always has to keep both JSON text and resulting
692Perl data structure in memory at one time, it does allow you to parse a
693JSON stream incrementally. It does so by accumulating text until it has
694a full JSON object, which it then can decode. This process is similar to
695using C<decode_prefix> to see if a full JSON object is available, but is
696much more efficient (JSON::XS will only attempt to parse the JSON text
697once it is sure it has enough text to get a decisive result, using a very
698simple but truly incremental parser).
699
700The following two methods deal with this.
701
702=over 4
703
704=item [void, scalar or list context] = $json->incr_parse ([$string])
705
706This is the central parsing function. It can both append new text and
707extract objects from the stream accumulated so far (both of these
708functions are optional).
709
710If C<$string> is given, then this string is appended to the already
711existing JSON fragment stored in the C<$json> object.
712
713After that, if the function is called in void context, it will simply
714return without doing anything further. This can be used to add more text
715in as many chunks as you want.
716
717If the method is called in scalar context, then it will try to extract
718exactly I<one> JSON object. If that is successful, it will return this
719object, otherwise it will return C<undef>. This is the most common way of
720using the method.
721
722And finally, in list context, it will try to extract as many objects
723from the stream as it can find and return them, or the empty list
724otherwise. For this to work, there must be no separators between the JSON
725objects or arrays, instead they must be concatenated back-to-back.
726
727=item $lvalue_string = $json->incr_text
728
729This method returns the currently stored JSON fragment as an lvalue, that
730is, you can manipulate it. This I<only> works when a preceding call to
731C<incr_parse> in I<scalar context> successfully returned an object. Under
732all other circumstances you must not call this function (I mean it.
733although in simple tests it might actually work, it I<will> fail under
734real world conditions). As a special exception, you can also call this
735method before having parsed anything.
736
737This function is useful in two cases: a) finding the trailing text after a
738JSON object or b) parsing multiple JSON objects separated by non-JSON text
739(such as commas).
740
741=back
742
743=head2 LIMITATIONS
744
745All options that affect decoding are supported, except
746C<allow_nonref>. The reason for this is that it cannot be made to
747work sensibly: JSON objects and arrays are self-delimited, i.e. you can concatenate
748them back to back and still decode them perfectly. This does not hold true
749for JSON numbers, however.
750
751For example, is the string C<1> a single JSON number, or is it simply the
752start of C<12>? Or is C<12> a single JSON number, or the concatenation
753of C<1> and C<2>? In neither case you can tell, and this is why JSON::XS
754takes the conservative route and disallows this case.
755
756=head2 EXAMPLES
757
758Some examples will make all this clearer. First, a simple example that
759works similarly to C<decode_prefix>: We want to decode the JSON object at
760the start of a string and identify the portion after the JSON object:
761
762 my $text = "[1,2,3] hello";
763
764 my $json = new JSON::XS;
765
766 my $obj = $json->incr_parse ($text)
767 or die "expected JSON object or array at beginning of string";
768
769 my $tail = $json->incr_text;
770 # $tail now contains " hello"
771
772Easy, isn't it?
773
774Now for a more complicated example: Imagine a hypothetical protocol where
775you read some requests from a TCP stream, and each request is a JSON
776array, without any separation between them (in fact, it is often useful to
777use newlines as "separators", as these get interpreted as whitespace at
778the start of the JSON text, which makes it possible to test said protocol
779with C<telnet>...).
780
781Here is how you'd do it (it is trivial to write this in an event-based
782manner):
783
784 my $json = new JSON::XS;
785
786 # read some data from the socket
787 while (sysread $socket, my $buf, 4096) {
788
789 # split and decode as many requests as possible
790 for my $request ($json->incr_parse ($buf)) {
791 # act on the $request
792 }
793 }
794
795Another complicated example: Assume you have a string with JSON objects
796or arrays, all separated by (optional) comma characters (e.g. C<[1],[2],
797[3]>). To parse them, we have to skip the commas between the JSON texts,
798and here is where the lvalue-ness of C<incr_text> comes in useful:
799
800 my $text = "[1],[2], [3]";
801 my $json = new JSON::XS;
802
803 # void context, so no parsing done
804 $json->incr_parse ($text);
805
806 # now extract as many objects as possible. note the
807 # use of scalar context so incr_text can be called.
808 while (my $obj = $json->incr_parse) {
809 # do something with $obj
810
811 # now skip the optional comma
812 $json->incr_text =~ s/^ \s* , //x;
813 }
814
815Now lets go for a very complex example: Assume that you have a gigantic
816JSON array-of-objects, many gigabytes in size, and you want to parse it,
817but you cannot load it into memory fully (this has actually happened in
818the real world :).
819
820Well, you lost, you have to implement your own JSON parser. But JSON::XS
821can still help you: You implement a (very simple) array parser and let
822JSON decode the array elements, which are all full JSON objects on their
823own (this wouldn't work if the array elements could be JSON numbers, for
824example):
825
826 my $json = new JSON::XS;
827
828 # open the monster
829 open my $fh, "<bigfile.json"
830 or die "bigfile: $!";
831
832 # first parse the initial "["
833 for (;;) {
834 sysread $fh, my $buf, 65536
835 or die "read error: $!";
836 $json->incr_parse ($buf); # void context, so no parsing
837
838 # Exit the loop once we found and removed(!) the initial "[".
839 # In essence, we are (ab-)using the $json object as a simple scalar
840 # we append data to.
841 last if $json->incr_text =~ s/^ \s* \[ //x;
842 }
843
844 # now we have the skipped the initial "[", so continue
845 # parsing all the elements.
846 for (;;) {
847 # in this loop we read data until we got a single JSON object
848 for (;;) {
849 if (my $obj = $json->incr_parse) {
850 # do something with $obj
851 last;
852 }
853
854 # add more data
855 sysread $fh, my $buf, 65536
856 or die "read error: $!";
857 $json->incr_parse ($buf); # void context, so no parsing
858 }
859
860 # in this loop we read data until we either found and parsed the
861 # separating "," between elements, or the final "]"
862 for (;;) {
863 # first skip whitespace
864 $json->incr_text =~ s/^\s*//;
865
866 # if we find "]", we are done
867 if ($json->incr_text =~ s/^\]//) {
868 print "finished.\n";
869 exit;
870 }
871
872 # if we find ",", we can continue with the next element
873 if ($json->incr_text =~ s/^,//) {
874 last;
875 }
876
877 # if we find anything else, we have a parse error!
878 if (length $json->incr_text) {
879 die "parse error near ", $json->incr_text;
880 }
881
882 # else add more data
883 sysread $fh, my $buf, 65536
884 or die "read error: $!";
885 $json->incr_parse ($buf); # void context, so no parsing
886 }
887
888This is a complex example, but most of the complexity comes from the fact
889that we are trying to be correct (bear with me if I am wrong, I never ran
890the above example :).
891
892
893
686=head1 MAPPING 894=head1 MAPPING
687 895
688This section describes how JSON::XS maps Perl values to JSON values and 896This section describes how JSON::XS maps Perl values to JSON values and
689vice versa. These mappings are designed to "do the right thing" in most 897vice versa. These mappings are designed to "do the right thing" in most
690circumstances automatically, preserving round-tripping characteristics 898circumstances automatically, preserving round-tripping characteristics
825 my $x = "3"; # some variable containing a string 1033 my $x = "3"; # some variable containing a string
826 $x += 0; # numify it, ensuring it will be dumped as a number 1034 $x += 0; # numify it, ensuring it will be dumped as a number
827 $x *= 1; # same thing, the choice is yours. 1035 $x *= 1; # same thing, the choice is yours.
828 1036
829You can not currently force the type in other, less obscure, ways. Tell me 1037You can not currently force the type in other, less obscure, ways. Tell me
830if you need this capability (but don't forget to explain why its needed 1038if you need this capability (but don't forget to explain why it's needed
831:). 1039:).
832 1040
833=back 1041=back
834 1042
835 1043
837 1045
838The interested reader might have seen a number of flags that signify 1046The interested reader might have seen a number of flags that signify
839encodings or codesets - C<utf8>, C<latin1> and C<ascii>. There seems to be 1047encodings or codesets - C<utf8>, C<latin1> and C<ascii>. There seems to be
840some confusion on what these do, so here is a short comparison: 1048some confusion on what these do, so here is a short comparison:
841 1049
842C<utf8> controls wether the JSON text created by C<encode> (and expected 1050C<utf8> controls whether the JSON text created by C<encode> (and expected
843by C<decode>) is UTF-8 encoded or not, while C<latin1> and C<ascii> only 1051by C<decode>) is UTF-8 encoded or not, while C<latin1> and C<ascii> only
844control wether C<encode> escapes character values outside their respective 1052control whether C<encode> escapes character values outside their respective
845codeset range. Neither of these flags conflict with each other, although 1053codeset range. Neither of these flags conflict with each other, although
846some combinations make less sense than others. 1054some combinations make less sense than others.
847 1055
848Care has been taken to make all flags symmetrical with respect to 1056Care has been taken to make all flags symmetrical with respect to
849C<encode> and C<decode>, that is, texts encoded with any combination of 1057C<encode> and C<decode>, that is, texts encoded with any combination of
925as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about any character set and 1133as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about any character set and
9268-bit-encoding, and still get the same data structure back. This is useful 11348-bit-encoding, and still get the same data structure back. This is useful
927when your channel for JSON transfer is not 8-bit clean or the encoding 1135when your channel for JSON transfer is not 8-bit clean or the encoding
928might be mangled in between (e.g. in mail), and works because ASCII is a 1136might be mangled in between (e.g. in mail), and works because ASCII is a
929proper subset of most 8-bit and multibyte encodings in use in the world. 1137proper subset of most 8-bit and multibyte encodings in use in the world.
930
931=back
932
933
934=head1 COMPARISON
935
936As already mentioned, this module was created because none of the existing
937JSON modules could be made to work correctly. First I will describe the
938problems (or pleasures) I encountered with various existing JSON modules,
939followed by some benchmark values. JSON::XS was designed not to suffer
940from any of these problems or limitations.
941
942=over 4
943
944=item JSON 2.xx
945
946A marvellous piece of engineering, this module either uses JSON::XS
947directly when available (so will be 100% compatible with it, including
948speed), or it uses JSON::PP, which is basically JSON::XS translated to
949Pure Perl, which should be 100% compatible with JSON::XS, just a bit
950slower.
951
952You cannot really lose by using this module, especially as it tries very
953hard to work even with ancient Perl versions, while JSON::XS does not.
954
955=item JSON 1.07
956
957Slow (but very portable, as it is written in pure Perl).
958
959Undocumented/buggy Unicode handling (how JSON handles Unicode values is
960undocumented. One can get far by feeding it Unicode strings and doing
961en-/decoding oneself, but Unicode escapes are not working properly).
962
963No round-tripping (strings get clobbered if they look like numbers, e.g.
964the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
965decode into the number 2.
966
967=item JSON::PC 0.01
968
969Very fast.
970
971Undocumented/buggy Unicode handling.
972
973No round-tripping.
974
975Has problems handling many Perl values (e.g. regex results and other magic
976values will make it croak).
977
978Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
979which is not a valid JSON text.
980
981Unmaintained (maintainer unresponsive for many months, bugs are not
982getting fixed).
983
984=item JSON::Syck 0.21
985
986Very buggy (often crashes).
987
988Very inflexible (no human-readable format supported, format pretty much
989undocumented. I need at least a format for easy reading by humans and a
990single-line compact format for use in a protocol, and preferably a way to
991generate ASCII-only JSON texts).
992
993Completely broken (and confusingly documented) Unicode handling (Unicode
994escapes are not working properly, you need to set ImplicitUnicode to
995I<different> values on en- and decoding to get symmetric behaviour).
996
997No round-tripping (simple cases work, but this depends on whether the scalar
998value was used in a numeric context or not).
999
1000Dumping hashes may skip hash values depending on iterator state.
1001
1002Unmaintained (maintainer unresponsive for many months, bugs are not
1003getting fixed).
1004
1005Does not check input for validity (i.e. will accept non-JSON input and
1006return "something" instead of raising an exception. This is a security
1007issue: imagine two banks transferring money between each other using
1008JSON. One bank might parse a given non-JSON request and deduct money,
1009while the other might reject the transaction with a syntax error. While a
1010good protocol will at least recover, that is extra unnecessary work and
1011the transaction will still not succeed).
1012
1013=item JSON::DWIW 0.04
1014
1015Very fast. Very natural. Very nice.
1016
1017Undocumented Unicode handling (but the best of the pack. Unicode escapes
1018still don't get parsed properly).
1019
1020Very inflexible.
1021
1022No round-tripping.
1023
1024Does not generate valid JSON texts (key strings are often unquoted, empty keys
1025result in nothing being output)
1026
1027Does not check input for validity.
1028 1138
1029=back 1139=back
1030 1140
1031 1141
1032=head2 JSON and YAML 1142=head2 JSON and YAML
1193=head1 THREADS 1303=head1 THREADS
1194 1304
1195This module is I<not> guaranteed to be thread safe and there are no 1305This module is I<not> guaranteed to be thread safe and there are no
1196plans to change this until Perl gets thread support (as opposed to the 1306plans to change this until Perl gets thread support (as opposed to the
1197horribly slow so-called "threads" which are simply slow and bloated 1307horribly slow so-called "threads" which are simply slow and bloated
1198process simulations - use fork, its I<much> faster, cheaper, better). 1308process simulations - use fork, it's I<much> faster, cheaper, better).
1199 1309
1200(It might actually work, but you have been warned). 1310(It might actually work, but you have been warned).
1201 1311
1202 1312
1203=head1 BUGS 1313=head1 BUGS
1204 1314
1205While the goal of this module is to be correct, that unfortunately does 1315While the goal of this module is to be correct, that unfortunately does
1206not mean its bug-free, only that I think its design is bug-free. It is 1316not mean it's bug-free, only that I think its design is bug-free. It is
1207still relatively early in its development. If you keep reporting bugs they 1317still relatively early in its development. If you keep reporting bugs they
1208will be fixed swiftly, though. 1318will be fixed swiftly, though.
1209 1319
1210Please refrain from using rt.cpan.org or any other bug reporting 1320Please refrain from using rt.cpan.org or any other bug reporting
1211service. I put the contact address into my modules for a reason. 1321service. I put the contact address into my modules for a reason.
1233 "--" => sub { $_[0] = ${$_[0]} - 1 }, 1343 "--" => sub { $_[0] = ${$_[0]} - 1 },
1234 fallback => 1; 1344 fallback => 1;
1235 1345
12361; 13461;
1237 1347
1348=head1 SEE ALSO
1349
1350The F<json_xs> command line utility for quick experiments.
1351
1238=head1 AUTHOR 1352=head1 AUTHOR
1239 1353
1240 Marc Lehmann <schmorp@schmorp.de> 1354 Marc Lehmann <schmorp@schmorp.de>
1241 http://home.schmorp.de/ 1355 http://home.schmorp.de/
1242 1356

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines