[ViewVC] Diff of: cvs/JSON-XS/XS.pm

Comparing JSON-XS/XS.pm (file contents):
Revision 1.92 by root, Fri Mar 21 21:47:43 2008 UTC vs.
Revision 1.98 by root, Wed Mar 26 02:36:18 2008 UTC

…		…
681	=> ([], 3)	681	=> ([], 3)
682		682
683	=back	683	=back
684		684
685		685
		686	=head1 INCREMENTAL PARSING
		687
		688	[This section and the API it details is still EXPERIMENTAL]
		689
		690	In some cases, there is the need for incremental parsing of JSON
		691	texts. While this module always has to keep both JSON text and resulting
		692	Perl data structure in memory at one time, it does allow you to parse a
		693	JSON stream incrementally. It does so by accumulating text until it has
		694	a full JSON object, which it then can decode. This process is similar to
		695	using C<decode_prefix> to see if a full JSON object is available, but is
		696	much more efficient (JSON::XS will only attempt to parse the JSON text
		697	once it is sure it has enough text to get a decisive result, using a very
		698	simple but truly incremental parser).
		699
		700	The following two methods deal with this.
		701
		702	=over 4
		703
		704	=item [void, scalar or list context] = $json->incr_parse ([$string])
		705
		706	This is the central parsing function. It can both append new text and
		707	extract objects from the stream accumulated so far (both of these
		708	functions are optional).
		709
		710	If C<$string> is given, then this string is appended to the already
		711	existing JSON fragment stored in the C<$json> object.
		712
		713	After that, if the function is called in void context, it will simply
		714	return without doing anything further. This can be used to add more text
		715	in as many chunks as you want.
		716
		717	If the method is called in scalar context, then it will try to extract
		718	exactly I<one> JSON object. If that is successful, it will return this
		719	object, otherwise it will return C<undef>. If there is a parse error,
		720	this method will croak just as C<decode> would do (one can then use
		721	C<incr_skip> to skip the errornous part). This is the most common way of
		722	using the method.
		723
		724	And finally, in list context, it will try to extract as many objects
		725	from the stream as it can find and return them, or the empty list
		726	otherwise. For this to work, there must be no separators between the JSON
		727	objects or arrays, instead they must be concatenated back-to-back. If
		728	an error occurs, an exception will be raised as in the scalar context
		729	case. Note that in this case, any previously-parsed JSON texts will be
		730	lost.
		731
		732	=item $lvalue_string = $json->incr_text
		733
		734	This method returns the currently stored JSON fragment as an lvalue, that
		735	is, you can manipulate it. This I<only> works when a preceding call to
		736	C<incr_parse> in I<scalar context> successfully returned an object. Under
		737	all other circumstances you must not call this function (I mean it.
		738	although in simple tests it might actually work, it I<will> fail under
		739	real world conditions). As a special exception, you can also call this
		740	method before having parsed anything.
		741
		742	This function is useful in two cases: a) finding the trailing text after a
		743	JSON object or b) parsing multiple JSON objects separated by non-JSON text
		744	(such as commas).
		745
		746	=item $json->incr_skip
		747
		748	This will reset the state of the incremental parser and will remove the
		749	parsed text from the input buffer. This is useful after C<incr_parse>
		750	died, in which case the input buffer and incremental parser state is left
		751	unchanged, to skip the text parsed so far and to reset the parse state.
		752
		753	=back
		754
		755	=head2 LIMITATIONS
		756
		757	All options that affect decoding are supported, except
		758	C<allow_nonref>. The reason for this is that it cannot be made to
		759	work sensibly: JSON objects and arrays are self-delimited, i.e. you can concatenate
		760	them back to back and still decode them perfectly. This does not hold true
		761	for JSON numbers, however.
		762
		763	For example, is the string C<1> a single JSON number, or is it simply the
		764	start of C<12>? Or is C<12> a single JSON number, or the concatenation
		765	of C<1> and C<2>? In neither case you can tell, and this is why JSON::XS
		766	takes the conservative route and disallows this case.
		767
		768	=head2 EXAMPLES
		769
		770	Some examples will make all this clearer. First, a simple example that
		771	works similarly to C<decode_prefix>: We want to decode the JSON object at
		772	the start of a string and identify the portion after the JSON object:
		773
		774	my $text = "[1,2,3] hello";
		775
		776	my $json = new JSON::XS;
		777
		778	my $obj = $json->incr_parse ($text)
		779	or die "expected JSON object or array at beginning of string";
		780
		781	my $tail = $json->incr_text;
		782	# $tail now contains " hello"
		783
		784	Easy, isn't it?
		785
		786	Now for a more complicated example: Imagine a hypothetical protocol where
		787	you read some requests from a TCP stream, and each request is a JSON
		788	array, without any separation between them (in fact, it is often useful to
		789	use newlines as "separators", as these get interpreted as whitespace at
		790	the start of the JSON text, which makes it possible to test said protocol
		791	with C<telnet>...).
		792
		793	Here is how you'd do it (it is trivial to write this in an event-based
		794	manner):
		795
		796	my $json = new JSON::XS;
		797
		798	# read some data from the socket
		799	while (sysread $socket, my $buf, 4096) {
		800
		801	# split and decode as many requests as possible
		802	for my $request ($json->incr_parse ($buf)) {
		803	# act on the $request
		804	}
		805	}
		806
		807	Another complicated example: Assume you have a string with JSON objects
		808	or arrays, all separated by (optional) comma characters (e.g. C<[1],[2],
		809	[3]>). To parse them, we have to skip the commas between the JSON texts,
		810	and here is where the lvalue-ness of C<incr_text> comes in useful:
		811
		812	my $text = "[1],[2], [3]";
		813	my $json = new JSON::XS;
		814
		815	# void context, so no parsing done
		816	$json->incr_parse ($text);
		817
		818	# now extract as many objects as possible. note the
		819	# use of scalar context so incr_text can be called.
		820	while (my $obj = $json->incr_parse) {
		821	# do something with $obj
		822
		823	# now skip the optional comma
		824	$json->incr_text =~ s/^ \s* , //x;
		825	}
		826
		827	Now lets go for a very complex example: Assume that you have a gigantic
		828	JSON array-of-objects, many gigabytes in size, and you want to parse it,
		829	but you cannot load it into memory fully (this has actually happened in
		830	the real world :).
		831
		832	Well, you lost, you have to implement your own JSON parser. But JSON::XS
		833	can still help you: You implement a (very simple) array parser and let
		834	JSON decode the array elements, which are all full JSON objects on their
		835	own (this wouldn't work if the array elements could be JSON numbers, for
		836	example):
		837
		838	my $json = new JSON::XS;
		839
		840	# open the monster
		841	open my $fh, "<bigfile.json"
		842	or die "bigfile: $!";
		843
		844	# first parse the initial "["
		845	for (;;) {
		846	sysread $fh, my $buf, 65536
		847	or die "read error: $!";
		848	$json->incr_parse ($buf); # void context, so no parsing
		849
		850	# Exit the loop once we found and removed(!) the initial "[".
		851	# In essence, we are (ab-)using the $json object as a simple scalar
		852	# we append data to.
		853	last if $json->incr_text =~ s/^ \s* \[ //x;
		854	}
		855
		856	# now we have the skipped the initial "[", so continue
		857	# parsing all the elements.
		858	for (;;) {
		859	# in this loop we read data until we got a single JSON object
		860	for (;;) {
		861	if (my $obj = $json->incr_parse) {
		862	# do something with $obj
		863	last;
		864	}
		865
		866	# add more data
		867	sysread $fh, my $buf, 65536
		868	or die "read error: $!";
		869	$json->incr_parse ($buf); # void context, so no parsing
		870	}
		871
		872	# in this loop we read data until we either found and parsed the
		873	# separating "," between elements, or the final "]"
		874	for (;;) {
		875	# first skip whitespace
		876	$json->incr_text =~ s/^\s*//;
		877
		878	# if we find "]", we are done
		879	if ($json->incr_text =~ s/^\]//) {
		880	print "finished.\n";
		881	exit;
		882	}
		883
		884	# if we find ",", we can continue with the next element
		885	if ($json->incr_text =~ s/^,//) {
		886	last;
		887	}
		888
		889	# if we find anything else, we have a parse error!
		890	if (length $json->incr_text) {
		891	die "parse error near ", $json->incr_text;
		892	}
		893
		894	# else add more data
		895	sysread $fh, my $buf, 65536
		896	or die "read error: $!";
		897	$json->incr_parse ($buf); # void context, so no parsing
		898	}
		899
		900	This is a complex example, but most of the complexity comes from the fact
		901	that we are trying to be correct (bear with me if I am wrong, I never ran
		902	the above example :).
		903
		904
		905
686	=head1 MAPPING	906	=head1 MAPPING
687		907
688	This section describes how JSON::XS maps Perl values to JSON values and	908	This section describes how JSON::XS maps Perl values to JSON values and
689	vice versa. These mappings are designed to "do the right thing" in most	909	vice versa. These mappings are designed to "do the right thing" in most
690	circumstances automatically, preserving round-tripping characteristics	910	circumstances automatically, preserving round-tripping characteristics
…		…
925	as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about any character set and	1145	as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about any character set and
926	8-bit-encoding, and still get the same data structure back. This is useful	1146	8-bit-encoding, and still get the same data structure back. This is useful
927	when your channel for JSON transfer is not 8-bit clean or the encoding	1147	when your channel for JSON transfer is not 8-bit clean or the encoding
928	might be mangled in between (e.g. in mail), and works because ASCII is a	1148	might be mangled in between (e.g. in mail), and works because ASCII is a
929	proper subset of most 8-bit and multibyte encodings in use in the world.	1149	proper subset of most 8-bit and multibyte encodings in use in the world.
930
931	=back
932
933
934	=head1 COMPARISON
935
936	As already mentioned, this module was created because none of the existing
937	JSON modules could be made to work correctly. First I will describe the
938	problems (or pleasures) I encountered with various existing JSON modules,
939	followed by some benchmark values. JSON::XS was designed not to suffer
940	from any of these problems or limitations.
941
942	=over 4
943
944	=item JSON 2.xx
945
946	A marvellous piece of engineering, this module either uses JSON::XS
947	directly when available (so will be 100% compatible with it, including
948	speed), or it uses JSON::PP, which is basically JSON::XS translated to
949	Pure Perl, which should be 100% compatible with JSON::XS, just a bit
950	slower.
951
952	You cannot really lose by using this module, especially as it tries very
953	hard to work even with ancient Perl versions, while JSON::XS does not.
954
955	=item JSON 1.07
956
957	Slow (but very portable, as it is written in pure Perl).
958
959	Undocumented/buggy Unicode handling (how JSON handles Unicode values is
960	undocumented. One can get far by feeding it Unicode strings and doing
961	en-/decoding oneself, but Unicode escapes are not working properly).
962
963	No round-tripping (strings get clobbered if they look like numbers, e.g.
964	the string C<2.0> will encode to C<2.0> instead of C<"2.0">, and that will
965	decode into the number 2.
966
967	=item JSON::PC 0.01
968
969	Very fast.
970
971	Undocumented/buggy Unicode handling.
972
973	No round-tripping.
974
975	Has problems handling many Perl values (e.g. regex results and other magic
976	values will make it croak).
977
978	Does not even generate valid JSON (C<{1,2}> gets converted to C<{1:2}>
979	which is not a valid JSON text.
980
981	Unmaintained (maintainer unresponsive for many months, bugs are not
982	getting fixed).
983
984	=item JSON::Syck 0.21
985
986	Very buggy (often crashes).
987
988	Very inflexible (no human-readable format supported, format pretty much
989	undocumented. I need at least a format for easy reading by humans and a
990	single-line compact format for use in a protocol, and preferably a way to
991	generate ASCII-only JSON texts).
992
993	Completely broken (and confusingly documented) Unicode handling (Unicode
994	escapes are not working properly, you need to set ImplicitUnicode to
995	I<different> values on en- and decoding to get symmetric behaviour).
996
997	No round-tripping (simple cases work, but this depends on whether the scalar
998	value was used in a numeric context or not).
999
1000	Dumping hashes may skip hash values depending on iterator state.
1001
1002	Unmaintained (maintainer unresponsive for many months, bugs are not
1003	getting fixed).
1004
1005	Does not check input for validity (i.e. will accept non-JSON input and
1006	return "something" instead of raising an exception. This is a security
1007	issue: imagine two banks transferring money between each other using
1008	JSON. One bank might parse a given non-JSON request and deduct money,
1009	while the other might reject the transaction with a syntax error. While a
1010	good protocol will at least recover, that is extra unnecessary work and
1011	the transaction will still not succeed).
1012
1013	=item JSON::DWIW 0.04
1014
1015	Very fast. Very natural. Very nice.
1016
1017	Undocumented Unicode handling (but the best of the pack. Unicode escapes
1018	still don't get parsed properly).
1019
1020	Very inflexible.
1021
1022	No round-tripping.
1023
1024	Does not generate valid JSON texts (key strings are often unquoted, empty keys
1025	result in nothing being output)
1026
1027	Does not check input for validity.
1028		1150
1029	=back	1151	=back
1030		1152
1031		1153
1032	=head2 JSON and YAML	1154	=head2 JSON and YAML
…		…
1233	"--" => sub { $_[0] = ${$_[0]} - 1 },	1355	"--" => sub { $_[0] = ${$_[0]} - 1 },
1234	fallback => 1;	1356	fallback => 1;
1235		1357
1236	1;	1358	1;
1237		1359
		1360	=head1 SEE ALSO
		1361
		1362	The F<json_xs> command line utility for quick experiments.
		1363
1238	=head1 AUTHOR	1364	=head1 AUTHOR
1239		1365
1240	Marc Lehmann <schmorp@schmorp.de>	1366	Marc Lehmann <schmorp@schmorp.de>
1241	http://home.schmorp.de/	1367	http://home.schmorp.de/
1242		1368

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing JSON-XS/XS.pm (file contents): Revision 1.92 by root, Fri Mar 21 21:47:43 2008 UTC vs. Revision 1.98 by root, Wed Mar 26 02:36:18 2008 UTC

Diff Legend

Comparing JSON-XS/XS.pm (file contents):
Revision 1.92 by root, Fri Mar 21 21:47:43 2008 UTC vs.
Revision 1.98 by root, Wed Mar 26 02:36:18 2008 UTC