ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/XS.pm
(Generate patch)

Comparing JSON-XS/XS.pm (file contents):
Revision 1.90 by root, Wed Mar 19 22:28:43 2008 UTC vs.
Revision 1.94 by root, Tue Mar 25 07:46:15 2008 UTC

1=head1 NAME
2
1=encoding utf-8 3=encoding utf-8
2
3=head1 NAME
4 4
5JSON::XS - JSON serialising/deserialising, done correctly and fast 5JSON::XS - JSON serialising/deserialising, done correctly and fast
6 6
7JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ 7JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
8 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html) 8 (http://fleur.hio.jp/perldoc/mix/lib/JSON/XS.html)
681 => ([], 3) 681 => ([], 3)
682 682
683=back 683=back
684 684
685 685
686=head1 INCREMENTAL PARSING
687
688[This section is still EXPERIMENTAL]
689
690In some cases, there is the need for incremental parsing of JSON
691texts. While this module always has to keep both JSON text and resulting
692Perl data structure in memory at one time, it does allow you to parse a
693JSON stream incrementally. It does so by accumulating text until it has
694a full JSON object, which it then can decode. This process is similar to
695using C<decode_prefix> to see if a full JSON object is available, but is
696much more efficient (JSON::XS will only attempt to parse the JSON text
697once it is sure it has enough text to get a decisive result, using a very
698simple but truly incremental parser).
699
700The following two methods deal with this.
701
702=over 4
703
704=item [void, scalar or list context] = $json->incr_parse ([$string])
705
706This is the central parsing function. It can both append new text and
707extract objects from the stream accumulated so far (both of these
708functions are optional).
709
710If C<$string> is given, then this string is appended to the already
711existing JSON fragment stored in the C<$json> object.
712
713After that, if the function is called in void context, it will simply
714return without doing anything further. This can be used to add more text
715in as many chunks as you want.
716
717If the method is called in scalar context, then it will try to extract
718exactly I<one> JSON object. If that is successful, it will return this
719object, otherwise it will return C<undef>. This is the most common way of
720using the method.
721
722And finally, in list context, it will try to extract as many objects
723from the stream as it can find and return them, or the empty list
724otherwise. For this to work, there must be no separators between the JSON
725objects or arrays, instead they must be concatenated back-to-back.
726
727=item $lvalue_string = $json->incr_text
728
729This method returns the currently stored JSON fragment as an lvalue, that
730is, you can manipulate it. This I<only> works when a preceding call to
731C<incr_parse> in I<scalar context> successfully returned an object. Under
732all other circumstances you must not call this function (I mean it.
733although in simple tests it might actually work, it I<will> fail under
734real world conditions). As a special exception, you can also call this
735method before having parsed anything.
736
737This function is useful in two cases: a) finding the trailing text after a
738JSON object or b) parsing multiple JSON objects separated by non-JSON text
739(such as commas).
740
741=back
742
743=head2 LIMITATIONS
744
745All options that affect decoding are supported, except
746C<allow_nonref>. The reason for this is that it cannot be made to
747work sensibly: JSON objects and arrays are self-delimited, i.e. you can concatenate
748them back to back and still decode them perfectly. This does not hold true
749for JSON numbers, however.
750
751For example, is the string C<1> a single JSON number, or is it simply the
752start of C<12>? Or is C<12> a single JSON number, or the concatenation
753of C<1> and C<2>? In neither case you can tell, and this is why JSON::XS
754takes the conservative route and disallows this case.
755
756=head2 EXAMPLES
757
758Some examples will make all this clearer. First, a simple example that
759works similarly to C<decode_prefix>: We want to decode the JSON object at
760the start of a string and identify the portion after the JSON object:
761
762 my $text = "[1,2,3] hello";
763
764 my $json = new JSON::XS;
765
766 my $obj = $json->incr_parse ($text)
767 or die "expected JSON object or array at beginning of string";
768
769 my $tail = $json->incr_text;
770 # $tail now contains " hello"
771
772Easy, isn't it?
773
774Now for a more complicated example: Imagine a hypothetical protocol where
775you read some requests from a TCP stream, and each request is a JSON
776array, without any separation between them (in fact, it is often useful to
777use newlines as "separators", as these get interpreted as whitespace at
778the start of the JSON text, which makes it possible to test said protocol
779with C<telnet>...).
780
781Here is how you'd do it (it is trivial to write this in an event-based
782manner):
783
784 my $json = new JSON::XS;
785
786 # read some data from the socket
787 while (sysread $socket, my $buf, 4096) {
788
789 # split and decode as many requests as possible
790 for my $request ($json->incr_parse ($buf)) {
791 # act on the $request
792 }
793 }
794
795Another complicated example: Assume you have a string with JSON objects
796or arrays, all separated by (optional) comma characters (e.g. C<[1],[2],
797[3]>). To parse them, we have to skip the commas between the JSON texts,
798and here is where the lvalue-ness of C<incr_text> comes in useful:
799
800 my $text = "[1],[2], [3]";
801 my $json = new JSON::XS;
802
803 # void context, so no parsing done
804 $json->incr_parse ($text);
805
806 # now extract as many objects as possible. note the
807 # use of scalar context so incr_text can be called.
808 while (my $obj = $json->incr_parse) {
809 # do something with $obj
810
811 # now skip the optional comma
812 $json->incr_text =~ s/^ \s* , //x;
813 }
814
815Now lets go for a very complex example: Assume that you have a gigantic
816JSON array-of-objects, many gigabytes in size, and you want to parse it,
817but you cannot load it into memory fully (this has actually happened in
818the real world :).
819
820Well, you lost, you have to implement your own JSON parser. But JSON::XS
821can still help you: You implement a (very simple) array parser and let
822JSON decode the array elements, which are all full JSON objects on their
823own (this wouldn't work if the array elements could be JSON numbers, for
824example):
825
826 my $json = new JSON::XS;
827
828 # open the monster
829 open my $fh, "<bigfile.json"
830 or die "bigfile: $!";
831
832 # first parse the initial "["
833 for (;;) {
834 sysread $fh, my $buf, 65536
835 or die "read error: $!";
836 $json->incr_parse ($buf); # void context, so no parsing
837
838 # Exit the loop once we found and removed(!) the initial "[".
839 # In essence, we are (ab-)using the $json object as a simple scalar
840 # we append data to.
841 last if $json->incr_text =~ s/^ \s* \[ //x;
842 }
843
844 # now we have the skipped the initial "[", so continue
845 # parsing all the elements.
846 for (;;) {
847 # in this loop we read data until we got a single JSON object
848 for (;;) {
849 if (my $obj = $json->incr_parse) {
850 # do something with $obj
851 last;
852 }
853
854 # add more data
855 sysread $fh, my $buf, 65536
856 or die "read error: $!";
857 $json->incr_parse ($buf); # void context, so no parsing
858 }
859
860 # in this loop we read data until we either found and parsed the
861 # separating "," between elements, or the final "]"
862 for (;;) {
863 # first skip whitespace
864 $json->incr_text =~ s/^\s*//;
865
866 # if we find "]", we are done
867 if ($json->incr_text =~ s/^\]//) {
868 print "finished.\n";
869 exit;
870 }
871
872 # if we find ",", we can continue with the next element
873 if ($json->incr_text =~ s/^,//) {
874 last;
875 }
876
877 # if we find anything else, we have a parse error!
878 if (length $json->incr_text) {
879 die "parse error near ", $json->incr_text;
880 }
881
882 # else add more data
883 sysread $fh, my $buf, 65536
884 or die "read error: $!";
885 $json->incr_parse ($buf); # void context, so no parsing
886 }
887
888This is a complex example, but most of the complexity comes from the fact
889that we are trying to be correct (bear with me if I am wrong, I never ran
890the above example :).
891
892
893
686=head1 MAPPING 894=head1 MAPPING
687 895
688This section describes how JSON::XS maps Perl values to JSON values and 896This section describes how JSON::XS maps Perl values to JSON values and
689vice versa. These mappings are designed to "do the right thing" in most 897vice versa. These mappings are designed to "do the right thing" in most
690circumstances automatically, preserving round-tripping characteristics 898circumstances automatically, preserving round-tripping characteristics
825 my $x = "3"; # some variable containing a string 1033 my $x = "3"; # some variable containing a string
826 $x += 0; # numify it, ensuring it will be dumped as a number 1034 $x += 0; # numify it, ensuring it will be dumped as a number
827 $x *= 1; # same thing, the choice is yours. 1035 $x *= 1; # same thing, the choice is yours.
828 1036
829You can not currently force the type in other, less obscure, ways. Tell me 1037You can not currently force the type in other, less obscure, ways. Tell me
830if you need this capability (but don't forget to explain why its needed 1038if you need this capability (but don't forget to explain why it's needed
831:). 1039:).
832 1040
833=back 1041=back
834 1042
835 1043
837 1045
838The interested reader might have seen a number of flags that signify 1046The interested reader might have seen a number of flags that signify
839encodings or codesets - C<utf8>, C<latin1> and C<ascii>. There seems to be 1047encodings or codesets - C<utf8>, C<latin1> and C<ascii>. There seems to be
840some confusion on what these do, so here is a short comparison: 1048some confusion on what these do, so here is a short comparison:
841 1049
842C<utf8> controls wether the JSON text created by C<encode> (and expected 1050C<utf8> controls whether the JSON text created by C<encode> (and expected
843by C<decode>) is UTF-8 encoded or not, while C<latin1> and C<ascii> only 1051by C<decode>) is UTF-8 encoded or not, while C<latin1> and C<ascii> only
844control wether C<encode> escapes character values outside their respective 1052control whether C<encode> escapes character values outside their respective
845codeset range. Neither of these flags conflict with each other, although 1053codeset range. Neither of these flags conflict with each other, although
846some combinations make less sense than others. 1054some combinations make less sense than others.
847 1055
848Care has been taken to make all flags symmetrical with respect to 1056Care has been taken to make all flags symmetrical with respect to
849C<encode> and C<decode>, that is, texts encoded with any combination of 1057C<encode> and C<decode>, that is, texts encoded with any combination of
1193=head1 THREADS 1401=head1 THREADS
1194 1402
1195This module is I<not> guaranteed to be thread safe and there are no 1403This module is I<not> guaranteed to be thread safe and there are no
1196plans to change this until Perl gets thread support (as opposed to the 1404plans to change this until Perl gets thread support (as opposed to the
1197horribly slow so-called "threads" which are simply slow and bloated 1405horribly slow so-called "threads" which are simply slow and bloated
1198process simulations - use fork, its I<much> faster, cheaper, better). 1406process simulations - use fork, it's I<much> faster, cheaper, better).
1199 1407
1200(It might actually work, but you have been warned). 1408(It might actually work, but you have been warned).
1201 1409
1202 1410
1203=head1 BUGS 1411=head1 BUGS
1204 1412
1205While the goal of this module is to be correct, that unfortunately does 1413While the goal of this module is to be correct, that unfortunately does
1206not mean its bug-free, only that I think its design is bug-free. It is 1414not mean it's bug-free, only that I think its design is bug-free. It is
1207still relatively early in its development. If you keep reporting bugs they 1415still relatively early in its development. If you keep reporting bugs they
1208will be fixed swiftly, though. 1416will be fixed swiftly, though.
1209 1417
1210Please refrain from using rt.cpan.org or any other bug reporting 1418Please refrain from using rt.cpan.org or any other bug reporting
1211service. I put the contact address into my modules for a reason. 1419service. I put the contact address into my modules for a reason.
1233 "--" => sub { $_[0] = ${$_[0]} - 1 }, 1441 "--" => sub { $_[0] = ${$_[0]} - 1 },
1234 fallback => 1; 1442 fallback => 1;
1235 1443
12361; 14441;
1237 1445
1446=head1 SEE ALSO
1447
1448The F<json_xs> command line utility for quick experiments.
1449
1238=head1 AUTHOR 1450=head1 AUTHOR
1239 1451
1240 Marc Lehmann <schmorp@schmorp.de> 1452 Marc Lehmann <schmorp@schmorp.de>
1241 http://home.schmorp.de/ 1453 http://home.schmorp.de/
1242 1454

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines