[ViewVC] Diff of: cvs/JSON-XS/XS.pm

Comparing JSON-XS/XS.pm (file contents):
Revision 1.83 by root, Sun Jan 20 19:19:07 2008 UTC vs.
Revision 1.85 by root, Wed Mar 19 02:55:23 2008 UTC

…		…
		1	=encoding utf-8
		2
1	=head1 NAME	3	=head1 NAME
2		4
3	JSON::XS - JSON serialising/deserialising, done correctly and fast	5	JSON::XS - JSON serialising/deserialising, done correctly and fast
4		6
5	JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ	7	JSON::XS - 正しくて高速な JSON シリアライザ/デシリアライザ
58		60
59	=over 4	61	=over 4
60		62
61	=item * correct Unicode handling	63	=item * correct Unicode handling
62		64
63	This module knows how to handle Unicode, and even documents how and when	65	This module knows how to handle Unicode, documents how and when it does
64	it does so.	66	so, and even documents what "correct" means.
65		67
66	=item * round-trip integrity	68	=item * round-trip integrity
67		69
68	When you serialise a perl data structure using only datatypes supported	70	When you serialise a perl data structure using only datatypes supported
69	by JSON, the deserialised data structure is identical on the Perl level.	71	by JSON, the deserialised data structure is identical on the Perl level.
70	(e.g. the string "2.0" doesn't suddenly become "2" just because it looks	72	(e.g. the string "2.0" doesn't suddenly become "2" just because it looks
71	like a number).	73	like a number). There minor I<are> exceptions to this, read the MAPPING
		74	section below to learn about those.
72		75
73	=item * strict checking of JSON correctness	76	=item * strict checking of JSON correctness
74		77
75	There is no guessing, no generating of illegal JSON texts by default,	78	There is no guessing, no generating of illegal JSON texts by default,
76	and only JSON is accepted as input by default (the latter is a security	79	and only JSON is accepted as input by default (the latter is a security
77	feature).	80	feature).
78		81
79	=item * fast	82	=item * fast
80		83
81	Compared to other JSON modules, this module compares favourably in terms	84	Compared to other JSON modules and other serialisers such as Storable,
82	of speed, too.	85	this module usually compares favourably in terms of speed, too.
83		86
84	=item * simple to use	87	=item * simple to use
85		88
86	This module has both a simple functional interface as well as an OO	89	This module has both a simple functional interface as well as an objetc
87	interface.	90	oriented interface interface.
88		91
89	=item * reasonably versatile output formats	92	=item * reasonably versatile output formats
90		93
91	You can choose between the most compact guaranteed single-line format	94	You can choose between the most compact guaranteed-single-line format
92	possible (nice for simple line-based protocols), a pure-ascii format	95	possible (nice for simple line-based protocols), a pure-ascii format
93	(for when your transport is not 8-bit clean, still supports the whole	96	(for when your transport is not 8-bit clean, still supports the whole
94	Unicode range), or a pretty-printed format (for when you want to read that	97	Unicode range), or a pretty-printed format (for when you want to read that
95	stuff). Or you can combine those features in whatever way you like.	98	stuff). Or you can combine those features in whatever way you like.
96		99
…		…
174	This enables you to store Unicode characters as single characters in a	177	This enables you to store Unicode characters as single characters in a
175	Perl string - very natural.	178	Perl string - very natural.
176		179
177	=item 2. Perl does I<not> associate an encoding with your strings.	180	=item 2. Perl does I<not> associate an encoding with your strings.
178		181
179	Unless you force it to, e.g. when matching it against a regex, or printing	182	... until you force it to, e.g. when matching it against a regex, or
180	the scalar to a file, in which case Perl either interprets your string as	183	printing the scalar to a file, in which case Perl either interprets your
181	locale-encoded text, octets/binary, or as Unicode, depending on various	184	string as locale-encoded text, octets/binary, or as Unicode, depending
182	settings. In no case is an encoding stored together with your data, it is	185	on various settings. In no case is an encoding stored together with your
183	I<use> that decides encoding, not any magical metadata.	186	data, it is I<use> that decides encoding, not any magical meta data.
184		187
185	=item 3. The internal utf-8 flag has no meaning with regards to the	188	=item 3. The internal utf-8 flag has no meaning with regards to the
186	encoding of your string.	189	encoding of your string.
187		190
188	Just ignore that flag unless you debug a Perl bug, a module written in	191	Just ignore that flag unless you debug a Perl bug, a module written in
…		…
706		709
707	A JSON number becomes either an integer, numeric (floating point) or	710	A JSON number becomes either an integer, numeric (floating point) or
708	string scalar in perl, depending on its range and any fractional parts. On	711	string scalar in perl, depending on its range and any fractional parts. On
709	the Perl level, there is no difference between those as Perl handles all	712	the Perl level, there is no difference between those as Perl handles all
710	the conversion details, but an integer may take slightly less memory and	713	the conversion details, but an integer may take slightly less memory and
711	might represent more values exactly than (floating point) numbers.	714	might represent more values exactly than floating point numbers.
712		715
713	If the number consists of digits only, JSON::XS will try to represent	716	If the number consists of digits only, JSON::XS will try to represent
714	it as an integer value. If that fails, it will try to represent it as	717	it as an integer value. If that fails, it will try to represent it as
715	a numeric (floating point) value if that is possible without loss of	718	a numeric (floating point) value if that is possible without loss of
716	precision. Otherwise it will preserve the number as a string value.	719	precision. Otherwise it will preserve the number as a string value (in
		720	which case you lose roundtripping ability, as the JSON number will be
		721	re-encoded toa JSON string).
717		722
718	Numbers containing a fractional or exponential part will always be	723	Numbers containing a fractional or exponential part will always be
719	represented as numeric (floating point) values, possibly at a loss of	724	represented as numeric (floating point) values, possibly at a loss of
720	precision.	725	precision (in which case you might lose perfect roundtripping ability, but
721		726	the JSON number will still be re-encoded as a JSON number).
722	This might create round-tripping problems as numbers might become strings,
723	but as Perl is typeless there is no other way to do it.
724		727
725	=item true, false	728	=item true, false
726		729
727	These JSON atoms become C<JSON::XS::true> and C<JSON::XS::false>,	730	These JSON atoms become C<JSON::XS::true> and C<JSON::XS::false>,
728	respectively. They are overloaded to act almost exactly like the numbers	731	respectively. They are overloaded to act almost exactly like the numbers
…		…
819	:).	822	:).
820		823
821	=back	824	=back
822		825
823		826
		827	=head1 ENCODING/CODESET FLAG NOTES
		828
		829	The interested reader might have seen a number of flags that signify
		830	encodings or codesets - C<utf8>, C<latin1> and C<ascii>. There seems to be
		831	some confusion on what these do, so here is a short comparison:
		832
		833	C<utf8> controls wether the JSON text created by C<encode> (and expected
		834	by C<decode>) is UTF-8 encoded or not, while C<latin1> and C<ascii> only
		835	control wether C<encode> escapes character values outside their respective
		836	codeset range. Neither of these flags conflict with each other, although
		837	some combinations make less sense than others.
		838
		839	Care has been taken to make all flags symmetrical with respect to
		840	C<encode> and C<decode>, that is, texts encoded with any combination of
		841	these flag values will be correctly decoded when the same flags are used
		842	- in general, if you use different flag settings while encoding vs. when
		843	decoding you likely have a bug somewhere.
		844
		845	Below comes a verbose discussion of these flags. Note that a "codeset" is
		846	simply an abstract set of character-codepoint pairs, while an encoding
		847	takes those codepoint numbers and I<encodes> them, in our case into
		848	octets. Unicode is (among other things) a codeset, UTF-8 is an encoding,
		849	and ISO-8859-1 (= latin 1) and ASCII are both codesets I<and> encodings at
		850	the same time, which can be confusing.
		851
		852	=over 4
		853
		854	=item C<utf8> flag disabled
		855
		856	When C<utf8> is disabled (the default), then C<encode>/C<decode> generate
		857	and expect Unicode strings, that is, characters with high ordinal Unicode
		858	values (> 255) will be encoded as such characters, and likewise such
		859	characters are decoded as-is, no canges to them will be done, except
		860	"(re-)interpreting" them as Unicode codepoints or Unicode characters,
		861	respectively (to Perl, these are the same thing in strings unless you do
		862	funny/weird/dumb stuff).
		863
		864	This is useful when you want to do the encoding yourself (e.g. when you
		865	want to have UTF-16 encoded JSON texts) or when some other layer does
		866	the encoding for you (for example, when printing to a terminal using a
		867	filehandle that transparently encodes to UTF-8 you certainly do NOT want
		868	to UTF-8 encode your data first and have Perl encode it another time).
		869
		870	=item C<utf8> flag enabled
		871
		872	If the C<utf8>-flag is enabled, C<encode>/C<decode> will encode all
		873	characters using the corresponding UTF-8 multi-byte sequence, and will
		874	expect your input strings to be encoded as UTF-8, that is, no "character"
		875	of the input string must have any value > 255, as UTF-8 does not allow
		876	that.
		877
		878	The C<utf8> flag therefore switches between two modes: disabled means you
		879	will get a Unicode string in Perl, enabled means you get an UTF-8 encoded
		880	octet/binary string in Perl.
		881
		882	=item C<latin1> or C<ascii> flags enabled
		883
		884	With C<latin1> (or C<ascii>) enabled, C<encode> will escape characters
		885	with ordinal values > 255 (> 127 with C<ascii>) and encode the remaining
		886	characters as specified by the C<utf8> flag.
		887
		888	If C<utf8> is disabled, then the result is also correctly encoded in those
		889	character sets (as both are proper subsets of Unicode, meaning that a
		890	Unicode string with all character values < 256 is the same thing as a
		891	ISO-8859-1 string, and a Unicode string with all character values < 128 is
		892	the same thing as an ASCII string in Perl).
		893
		894	If C<utf8> is enabled, you still get a correct UTF-8-encoded string,
		895	regardless of these flags, just some more characters will be escaped using
		896	C<\uXXXX> then before.
		897
		898	Note that ISO-8859-1-I<encoded> strings are not compatible with UTF-8
		899	encoding, while ASCII-encoded strings are. That is because the ISO-8859-1
		900	encoding is NOT a subset of UTF-8 (despite the ISO-8859-1 I<codeset> being
		901	a subset of Unicode), while ASCII is.
		902
		903	Surprisingly, C<decode> will ignore these flags and so treat all input
		904	values as governed by the C<utf8> flag. If it is disabled, this allows you
		905	to decode ISO-8859-1- and ASCII-encoded strings, as both strict subsets of
		906	Unicode. If it is enabled, you can correctly decode UTF-8 encoded strings.
		907
		908	So neither C<latin1> nor C<ascii> are incompatible with the C<utf8> flag -
		909	they only govern when the JSON output engine escapes a character or not.
		910
		911	The main use for C<latin1> is to relatively efficiently store binary data
		912	as JSON, at the expense of breaking compatibility with most JSON decoders.
		913
		914	The main use for C<ascii> is to force the output to not contain characters
		915	with values > 127, which means you can interpret the resulting string
		916	as UTF-8, ISO-8859-1, ASCII, KOI8-R or most about any character set and
		917	8-bit-encoding, and still get the same data structure back. This is useful
		918	when your channel for JSON transfer is not 8-bit clean or the encoding
		919	might be mangled in between (e.g. in mail), and works because ASCII is a
		920	proper subset of most 8-bit and multibyte encodings in use in the world.
		921
		922	=back
		923
		924
824	=head1 COMPARISON	925	=head1 COMPARISON
825		926
826	As already mentioned, this module was created because none of the existing	927	As already mentioned, this module was created because none of the existing
827	JSON modules could be made to work correctly. First I will describe the	928	JSON modules could be made to work correctly. First I will describe the
828	problems (or pleasures) I encountered with various existing JSON modules,	929	problems (or pleasures) I encountered with various existing JSON modules,
829	followed by some benchmark values. JSON::XS was designed not to suffer	930	followed by some benchmark values. JSON::XS was designed not to suffer
830	from any of these problems or limitations.	931	from any of these problems or limitations.
831		932
832	=over 4	933	=over 4
		934
		935	=item JSON 2.xx
		936
		937	A marvellous piece of engineering, this module either uses JSON::XS
		938	directly when available (so will be 100% compatible with it, including
		939	speed), or it uses JSON::PP, which is basically JSON::XS translated to
		940	Pure Perl, which should be 100% compatible with JSON::XS, just a bit
		941	slower.
		942
		943	You cannot really lose by using this module, especially as it tries very
		944	hard to work even with ancient Perl versions, while JSON::XS does not.
833		945
834	=item JSON 1.07	946	=item JSON 1.07
835		947
836	Slow (but very portable, as it is written in pure Perl).	948	Slow (but very portable, as it is written in pure Perl).
837		949

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing JSON-XS/XS.pm (file contents): Revision 1.83 by root, Sun Jan 20 19:19:07 2008 UTC vs. Revision 1.85 by root, Wed Mar 19 02:55:23 2008 UTC

Diff Legend

Comparing JSON-XS/XS.pm (file contents):
Revision 1.83 by root, Sun Jan 20 19:19:07 2008 UTC vs.
Revision 1.85 by root, Wed Mar 19 02:55:23 2008 UTC