ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/JSON-XS/README
(Generate patch)

Comparing JSON-XS/README (file contents):
Revision 1.23 by root, Wed Mar 19 22:31:00 2008 UTC vs.
Revision 1.24 by root, Thu Mar 27 06:37:35 2008 UTC

606 and you need to know where the JSON text ends. 606 and you need to know where the JSON text ends.
607 607
608 JSON::XS->new->decode_prefix ("[1] the tail") 608 JSON::XS->new->decode_prefix ("[1] the tail")
609 => ([], 3) 609 => ([], 3)
610 610
611INCREMENTAL PARSING
612 [This section and the API it details is still EXPERIMENTAL]
613
614 In some cases, there is the need for incremental parsing of JSON texts.
615 While this module always has to keep both JSON text and resulting Perl
616 data structure in memory at one time, it does allow you to parse a JSON
617 stream incrementally. It does so by accumulating text until it has a
618 full JSON object, which it then can decode. This process is similar to
619 using "decode_prefix" to see if a full JSON object is available, but is
620 much more efficient (JSON::XS will only attempt to parse the JSON text
621 once it is sure it has enough text to get a decisive result, using a
622 very simple but truly incremental parser).
623
624 The following two methods deal with this.
625
626 [void, scalar or list context] = $json->incr_parse ([$string])
627 This is the central parsing function. It can both append new text
628 and extract objects from the stream accumulated so far (both of
629 these functions are optional).
630
631 If $string is given, then this string is appended to the already
632 existing JSON fragment stored in the $json object.
633
634 After that, if the function is called in void context, it will
635 simply return without doing anything further. This can be used to
636 add more text in as many chunks as you want.
637
638 If the method is called in scalar context, then it will try to
639 extract exactly *one* JSON object. If that is successful, it will
640 return this object, otherwise it will return "undef". If there is a
641 parse error, this method will croak just as "decode" would do (one
642 can then use "incr_skip" to skip the errornous part). This is the
643 most common way of using the method.
644
645 And finally, in list context, it will try to extract as many objects
646 from the stream as it can find and return them, or the empty list
647 otherwise. For this to work, there must be no separators between the
648 JSON objects or arrays, instead they must be concatenated
649 back-to-back. If an error occurs, an exception will be raised as in
650 the scalar context case. Note that in this case, any
651 previously-parsed JSON texts will be lost.
652
653 $lvalue_string = $json->incr_text
654 This method returns the currently stored JSON fragment as an lvalue,
655 that is, you can manipulate it. This *only* works when a preceding
656 call to "incr_parse" in *scalar context* successfully returned an
657 object. Under all other circumstances you must not call this
658 function (I mean it. although in simple tests it might actually
659 work, it *will* fail under real world conditions). As a special
660 exception, you can also call this method before having parsed
661 anything.
662
663 This function is useful in two cases: a) finding the trailing text
664 after a JSON object or b) parsing multiple JSON objects separated by
665 non-JSON text (such as commas).
666
667 $json->incr_skip
668 This will reset the state of the incremental parser and will remove
669 the parsed text from the input buffer. This is useful after
670 "incr_parse" died, in which case the input buffer and incremental
671 parser state is left unchanged, to skip the text parsed so far and
672 to reset the parse state.
673
674 LIMITATIONS
675 All options that affect decoding are supported, except "allow_nonref".
676 The reason for this is that it cannot be made to work sensibly: JSON
677 objects and arrays are self-delimited, i.e. you can concatenate them
678 back to back and still decode them perfectly. This does not hold true
679 for JSON numbers, however.
680
681 For example, is the string 1 a single JSON number, or is it simply the
682 start of 12? Or is 12 a single JSON number, or the concatenation of 1
683 and 2? In neither case you can tell, and this is why JSON::XS takes the
684 conservative route and disallows this case.
685
686 EXAMPLES
687 Some examples will make all this clearer. First, a simple example that
688 works similarly to "decode_prefix": We want to decode the JSON object at
689 the start of a string and identify the portion after the JSON object:
690
691 my $text = "[1,2,3] hello";
692
693 my $json = new JSON::XS;
694
695 my $obj = $json->incr_parse ($text)
696 or die "expected JSON object or array at beginning of string";
697
698 my $tail = $json->incr_text;
699 # $tail now contains " hello"
700
701 Easy, isn't it?
702
703 Now for a more complicated example: Imagine a hypothetical protocol
704 where you read some requests from a TCP stream, and each request is a
705 JSON array, without any separation between them (in fact, it is often
706 useful to use newlines as "separators", as these get interpreted as
707 whitespace at the start of the JSON text, which makes it possible to
708 test said protocol with "telnet"...).
709
710 Here is how you'd do it (it is trivial to write this in an event-based
711 manner):
712
713 my $json = new JSON::XS;
714
715 # read some data from the socket
716 while (sysread $socket, my $buf, 4096) {
717
718 # split and decode as many requests as possible
719 for my $request ($json->incr_parse ($buf)) {
720 # act on the $request
721 }
722 }
723
724 Another complicated example: Assume you have a string with JSON objects
725 or arrays, all separated by (optional) comma characters (e.g. "[1],[2],
726 [3]"). To parse them, we have to skip the commas between the JSON texts,
727 and here is where the lvalue-ness of "incr_text" comes in useful:
728
729 my $text = "[1],[2], [3]";
730 my $json = new JSON::XS;
731
732 # void context, so no parsing done
733 $json->incr_parse ($text);
734
735 # now extract as many objects as possible. note the
736 # use of scalar context so incr_text can be called.
737 while (my $obj = $json->incr_parse) {
738 # do something with $obj
739
740 # now skip the optional comma
741 $json->incr_text =~ s/^ \s* , //x;
742 }
743
744 Now lets go for a very complex example: Assume that you have a gigantic
745 JSON array-of-objects, many gigabytes in size, and you want to parse it,
746 but you cannot load it into memory fully (this has actually happened in
747 the real world :).
748
749 Well, you lost, you have to implement your own JSON parser. But JSON::XS
750 can still help you: You implement a (very simple) array parser and let
751 JSON decode the array elements, which are all full JSON objects on their
752 own (this wouldn't work if the array elements could be JSON numbers, for
753 example):
754
755 my $json = new JSON::XS;
756
757 # open the monster
758 open my $fh, "<bigfile.json"
759 or die "bigfile: $!";
760
761 # first parse the initial "["
762 for (;;) {
763 sysread $fh, my $buf, 65536
764 or die "read error: $!";
765 $json->incr_parse ($buf); # void context, so no parsing
766
767 # Exit the loop once we found and removed(!) the initial "[".
768 # In essence, we are (ab-)using the $json object as a simple scalar
769 # we append data to.
770 last if $json->incr_text =~ s/^ \s* \[ //x;
771 }
772
773 # now we have the skipped the initial "[", so continue
774 # parsing all the elements.
775 for (;;) {
776 # in this loop we read data until we got a single JSON object
777 for (;;) {
778 if (my $obj = $json->incr_parse) {
779 # do something with $obj
780 last;
781 }
782
783 # add more data
784 sysread $fh, my $buf, 65536
785 or die "read error: $!";
786 $json->incr_parse ($buf); # void context, so no parsing
787 }
788
789 # in this loop we read data until we either found and parsed the
790 # separating "," between elements, or the final "]"
791 for (;;) {
792 # first skip whitespace
793 $json->incr_text =~ s/^\s*//;
794
795 # if we find "]", we are done
796 if ($json->incr_text =~ s/^\]//) {
797 print "finished.\n";
798 exit;
799 }
800
801 # if we find ",", we can continue with the next element
802 if ($json->incr_text =~ s/^,//) {
803 last;
804 }
805
806 # if we find anything else, we have a parse error!
807 if (length $json->incr_text) {
808 die "parse error near ", $json->incr_text;
809 }
810
811 # else add more data
812 sysread $fh, my $buf, 65536
813 or die "read error: $!";
814 $json->incr_parse ($buf); # void context, so no parsing
815 }
816
817 This is a complex example, but most of the complexity comes from the
818 fact that we are trying to be correct (bear with me if I am wrong, I
819 never ran the above example :).
820
611MAPPING 821MAPPING
612 This section describes how JSON::XS maps Perl values to JSON values and 822 This section describes how JSON::XS maps Perl values to JSON values and
613 vice versa. These mappings are designed to "do the right thing" in most 823 vice versa. These mappings are designed to "do the right thing" in most
614 circumstances automatically, preserving round-tripping characteristics 824 circumstances automatically, preserving round-tripping characteristics
615 (what you put in comes out as something equivalent). 825 (what you put in comes out as something equivalent).
734 $x += 0; # numify it, ensuring it will be dumped as a number 944 $x += 0; # numify it, ensuring it will be dumped as a number
735 $x *= 1; # same thing, the choice is yours. 945 $x *= 1; # same thing, the choice is yours.
736 946
737 You can not currently force the type in other, less obscure, ways. 947 You can not currently force the type in other, less obscure, ways.
738 Tell me if you need this capability (but don't forget to explain why 948 Tell me if you need this capability (but don't forget to explain why
739 its needed :). 949 it's needed :).
740 950
741ENCODING/CODESET FLAG NOTES 951ENCODING/CODESET FLAG NOTES
742 The interested reader might have seen a number of flags that signify 952 The interested reader might have seen a number of flags that signify
743 encodings or codesets - "utf8", "latin1" and "ascii". There seems to be 953 encodings or codesets - "utf8", "latin1" and "ascii". There seems to be
744 some confusion on what these do, so here is a short comparison: 954 some confusion on what these do, so here is a short comparison:
745 955
746 "utf8" controls wether the JSON text created by "encode" (and expected 956 "utf8" controls whether the JSON text created by "encode" (and expected
747 by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only 957 by "decode") is UTF-8 encoded or not, while "latin1" and "ascii" only
748 control wether "encode" escapes character values outside their 958 control whether "encode" escapes character values outside their
749 respective codeset range. Neither of these flags conflict with each 959 respective codeset range. Neither of these flags conflict with each
750 other, although some combinations make less sense than others. 960 other, although some combinations make less sense than others.
751 961
752 Care has been taken to make all flags symmetrical with respect to 962 Care has been taken to make all flags symmetrical with respect to
753 "encode" and "decode", that is, texts encoded with any combination of 963 "encode" and "decode", that is, texts encoded with any combination of
830 any character set and 8-bit-encoding, and still get the same data 1040 any character set and 8-bit-encoding, and still get the same data
831 structure back. This is useful when your channel for JSON transfer 1041 structure back. This is useful when your channel for JSON transfer
832 is not 8-bit clean or the encoding might be mangled in between (e.g. 1042 is not 8-bit clean or the encoding might be mangled in between (e.g.
833 in mail), and works because ASCII is a proper subset of most 8-bit 1043 in mail), and works because ASCII is a proper subset of most 8-bit
834 and multibyte encodings in use in the world. 1044 and multibyte encodings in use in the world.
835
836COMPARISON
837 As already mentioned, this module was created because none of the
838 existing JSON modules could be made to work correctly. First I will
839 describe the problems (or pleasures) I encountered with various existing
840 JSON modules, followed by some benchmark values. JSON::XS was designed
841 not to suffer from any of these problems or limitations.
842
843 JSON 2.xx
844 A marvellous piece of engineering, this module either uses JSON::XS
845 directly when available (so will be 100% compatible with it,
846 including speed), or it uses JSON::PP, which is basically JSON::XS
847 translated to Pure Perl, which should be 100% compatible with
848 JSON::XS, just a bit slower.
849
850 You cannot really lose by using this module, especially as it tries
851 very hard to work even with ancient Perl versions, while JSON::XS
852 does not.
853
854 JSON 1.07
855 Slow (but very portable, as it is written in pure Perl).
856
857 Undocumented/buggy Unicode handling (how JSON handles Unicode values
858 is undocumented. One can get far by feeding it Unicode strings and
859 doing en-/decoding oneself, but Unicode escapes are not working
860 properly).
861
862 No round-tripping (strings get clobbered if they look like numbers,
863 e.g. the string 2.0 will encode to 2.0 instead of "2.0", and that
864 will decode into the number 2.
865
866 JSON::PC 0.01
867 Very fast.
868
869 Undocumented/buggy Unicode handling.
870
871 No round-tripping.
872
873 Has problems handling many Perl values (e.g. regex results and other
874 magic values will make it croak).
875
876 Does not even generate valid JSON ("{1,2}" gets converted to "{1:2}"
877 which is not a valid JSON text.
878
879 Unmaintained (maintainer unresponsive for many months, bugs are not
880 getting fixed).
881
882 JSON::Syck 0.21
883 Very buggy (often crashes).
884
885 Very inflexible (no human-readable format supported, format pretty
886 much undocumented. I need at least a format for easy reading by
887 humans and a single-line compact format for use in a protocol, and
888 preferably a way to generate ASCII-only JSON texts).
889
890 Completely broken (and confusingly documented) Unicode handling
891 (Unicode escapes are not working properly, you need to set
892 ImplicitUnicode to *different* values on en- and decoding to get
893 symmetric behaviour).
894
895 No round-tripping (simple cases work, but this depends on whether
896 the scalar value was used in a numeric context or not).
897
898 Dumping hashes may skip hash values depending on iterator state.
899
900 Unmaintained (maintainer unresponsive for many months, bugs are not
901 getting fixed).
902
903 Does not check input for validity (i.e. will accept non-JSON input
904 and return "something" instead of raising an exception. This is a
905 security issue: imagine two banks transferring money between each
906 other using JSON. One bank might parse a given non-JSON request and
907 deduct money, while the other might reject the transaction with a
908 syntax error. While a good protocol will at least recover, that is
909 extra unnecessary work and the transaction will still not succeed).
910
911 JSON::DWIW 0.04
912 Very fast. Very natural. Very nice.
913
914 Undocumented Unicode handling (but the best of the pack. Unicode
915 escapes still don't get parsed properly).
916
917 Very inflexible.
918
919 No round-tripping.
920
921 Does not generate valid JSON texts (key strings are often unquoted,
922 empty keys result in nothing being output)
923
924 Does not check input for validity.
925 1045
926 JSON and YAML 1046 JSON and YAML
927 You often hear that JSON is a subset of YAML. This is, however, a mass 1047 You often hear that JSON is a subset of YAML. This is, however, a mass
928 hysteria(*) and very far from the truth (as of the time of this 1048 hysteria(*) and very far from the truth (as of the time of this
929 writing), so let me state it clearly: *in general, there is no way to 1049 writing), so let me state it clearly: *in general, there is no way to
1075 1195
1076THREADS 1196THREADS
1077 This module is *not* guaranteed to be thread safe and there are no plans 1197 This module is *not* guaranteed to be thread safe and there are no plans
1078 to change this until Perl gets thread support (as opposed to the 1198 to change this until Perl gets thread support (as opposed to the
1079 horribly slow so-called "threads" which are simply slow and bloated 1199 horribly slow so-called "threads" which are simply slow and bloated
1080 process simulations - use fork, its *much* faster, cheaper, better). 1200 process simulations - use fork, it's *much* faster, cheaper, better).
1081 1201
1082 (It might actually work, but you have been warned). 1202 (It might actually work, but you have been warned).
1083 1203
1084BUGS 1204BUGS
1085 While the goal of this module is to be correct, that unfortunately does 1205 While the goal of this module is to be correct, that unfortunately does
1086 not mean its bug-free, only that I think its design is bug-free. It is 1206 not mean it's bug-free, only that I think its design is bug-free. It is
1087 still relatively early in its development. If you keep reporting bugs 1207 still relatively early in its development. If you keep reporting bugs
1088 they will be fixed swiftly, though. 1208 they will be fixed swiftly, though.
1089 1209
1090 Please refrain from using rt.cpan.org or any other bug reporting 1210 Please refrain from using rt.cpan.org or any other bug reporting
1091 service. I put the contact address into my modules for a reason. 1211 service. I put the contact address into my modules for a reason.
1212
1213SEE ALSO
1214 The json_xs command line utility for quick experiments.
1092 1215
1093AUTHOR 1216AUTHOR
1094 Marc Lehmann <schmorp@schmorp.de> 1217 Marc Lehmann <schmorp@schmorp.de>
1095 http://home.schmorp.de/ 1218 http://home.schmorp.de/
1096 1219

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines