[ViewVC] Diff of: cvs/CBOR-XS/XS.pm

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.15 by root, Tue Oct 29 21:13:28 2013 UTC vs.
Revision 1.40 by root, Sun Jan 5 14:24:54 2014 UTC

…		…
26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string	26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27	}	27	}
28		28
29	=head1 DESCRIPTION	29	=head1 DESCRIPTION
30		30
31	WARNING! This module is very new, and not very well tested (that's up to
32	you to do). Furthermore, details of the implementation might change freely
33	before version 1.0. And lastly, the object serialisation protocol depends
34	on a pending IANA assignment, and until that assignment is official, this
35	implementation is not interoperable with other implementations (even
36	future versions of this module) until the assignment is done.
37
38	You are still invited to try out CBOR, and this module.
39
40	This module converts Perl data structures to the Concise Binary Object	31	This module converts Perl data structures to the Concise Binary Object
41	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation	32	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42	format that aims to use a superset of the JSON data model, i.e. when you	33	format that aims to use an (almost) superset of the JSON data model, i.e.
43	can represent something in JSON, you should be able to represent it in	34	when you can represent something useful in JSON, you should be able to
44	CBOR.	35	represent it in CBOR.
45		36
46	In short, CBOR is a faster and very compact binary alternative to JSON,	37	In short, CBOR is a faster and quite compact binary alternative to JSON,
47	with the added ability of supporting serialisation of Perl objects. (JSON	38	with the added ability of supporting serialisation of Perl objects. (JSON
48	often compresses better than CBOR though, so if you plan to compress the	39	often compresses better than CBOR though, so if you plan to compress the
49	data later you might want to compare both formats first).	40	data later and speed is less important you might want to compare both
		41	formats first).
50		42
51	To give you a general idea about speed, with texts in the megabyte range,	43	To give you a general idea about speed, with texts in the megabyte range,
52	C<CBOR::XS> usually encodes roughly twice as fast as L<Storable> or	44	C<CBOR::XS> usually encodes roughly twice as fast as L<Storable> or
53	L<JSON::XS> and decodes about 15%-30% faster than those. The shorter the	45	L<JSON::XS> and decodes about 15%-30% faster than those. The shorter the
54	data, the worse L<Storable> performs in comparison.	46	data, the worse L<Storable> performs in comparison.
55		47
56	As for compactness, C<CBOR::XS> encoded data structures are usually about	48	Regarding compactness, C<CBOR::XS>-encoded data structures are usually
57	20% smaller than the same data encoded as (compact) JSON or L<Storable>.	49	about 20% smaller than the same data encoded as (compact) JSON or
		50	L<Storable>.
		51
		52	In addition to the core CBOR data format, this module implements a
		53	number of extensions, to support cyclic and shared data structures
		54	(see C<allow_sharing> and C<allow_cycles>), string deduplication (see
		55	C<pack_strings>) and scalar references (always enabled).
58		56
59	The primary goal of this module is to be I<correct> and the secondary goal	57	The primary goal of this module is to be I<correct> and the secondary goal
60	is to be I<fast>. To reach the latter goal it was written in C.	58	is to be I<fast>. To reach the latter goal it was written in C.
61		59
62	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and	60	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
…		…
66		64
67	package CBOR::XS;	65	package CBOR::XS;
68		66
69	use common::sense;	67	use common::sense;
70		68
71	our $VERSION = 0.06;	69	our $VERSION = 1.25;
72	our @ISA = qw(Exporter);	70	our @ISA = qw(Exporter);
73		71
74	our @EXPORT = qw(encode_cbor decode_cbor);	72	our @EXPORT = qw(encode_cbor decode_cbor);
75		73
76	use Exporter;	74	use Exporter;
…		…
113	strings. All boolean flags described below are by default I<disabled>.	111	strings. All boolean flags described below are by default I<disabled>.
114		112
115	The mutators for flags all return the CBOR object again and thus calls can	113	The mutators for flags all return the CBOR object again and thus calls can
116	be chained:	114	be chained:
117		115
118	#TODO
119	my $cbor = CBOR::XS->new->encode ({a => [1,2]});	116	my $cbor = CBOR::XS->new->encode ({a => [1,2]});
120		117
121	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])	118	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
122		119
123	=item $max_depth = $cbor->get_max_depth	120	=item $max_depth = $cbor->get_max_depth
…		…
157	If no argument is given, the limit check will be deactivated (same as when	154	If no argument is given, the limit check will be deactivated (same as when
158	C<0> is specified).	155	C<0> is specified).
159		156
160	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.	157	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
161		158
		159	=item $cbor = $cbor->allow_unknown ([$enable])
		160
		161	=item $enabled = $cbor->get_allow_unknown
		162
		163	If C<$enable> is true (or missing), then C<encode> will I<not> throw an
		164	exception when it encounters values it cannot represent in CBOR (for
		165	example, filehandles) but instead will encode a CBOR C<error> value.
		166
		167	If C<$enable> is false (the default), then C<encode> will throw an
		168	exception when it encounters anything it cannot encode as CBOR.
		169
		170	This option does not affect C<decode> in any way, and it is recommended to
		171	leave it off unless you know your communications partner.
		172
		173	=item $cbor = $cbor->allow_sharing ([$enable])
		174
		175	=item $enabled = $cbor->get_allow_sharing
		176
		177	If C<$enable> is true (or missing), then C<encode> will not double-encode
		178	values that have been referenced before (e.g. when the same object, such
		179	as an array, is referenced multiple times), but instead will emit a
		180	reference to the earlier value.
		181
		182	This means that such values will only be encoded once, and will not result
		183	in a deep cloning of the value on decode, in decoders supporting the value
		184	sharing extension. This also makes it possible to encode cyclic data
		185	structures (which need C<allow_cycles> to ne enabled to be decoded by this
		186	module).
		187
		188	It is recommended to leave it off unless you know your
		189	communication partner supports the value sharing extensions to CBOR
		190	(L<http://cbor.schmorp.de/value-sharing>), as without decoder support, the
		191	resulting data structure might be unusable.
		192
		193	Detecting shared values incurs a runtime overhead when values are encoded
		194	that have a reference counter large than one, and might unnecessarily
		195	increase the encoded size, as potentially shared values are encode as
		196	shareable whether or not they are actually shared.
		197
		198	At the moment, only targets of references can be shared (e.g. scalars,
		199	arrays or hashes pointed to by a reference). Weirder constructs, such as
		200	an array with multiple "copies" of the I<same> string, which are hard but
		201	not impossible to create in Perl, are not supported (this is the same as
		202	with L<Storable>).
		203
		204	If C<$enable> is false (the default), then C<encode> will encode shared
		205	data structures repeatedly, unsharing them in the process. Cyclic data
		206	structures cannot be encoded in this mode.
		207
		208	This option does not affect C<decode> in any way - shared values and
		209	references will always be decoded properly if present.
		210
		211	=item $cbor = $cbor->allow_cycles ([$enable])
		212
		213	=item $enabled = $cbor->get_allow_cycles
		214
		215	If C<$enable> is true (or missing), then C<decode> will happily decode
		216	self-referential (cyclic) data structures. By default these will not be
		217	decoded, as they need manual cleanup to avoid memory leaks, so code that
		218	isn't prepared for this will not leak memory.
		219
		220	If C<$enable> is false (the default), then C<decode> will throw an error
		221	when it encounters a self-referential/cyclic data structure.
		222
		223	This option does not affect C<encode> in any way - shared values and
		224	references will always be decoded properly if present.
		225
		226	=item $cbor = $cbor->pack_strings ([$enable])
		227
		228	=item $enabled = $cbor->get_pack_strings
		229
		230	If C<$enable> is true (or missing), then C<encode> will try not to encode
		231	the same string twice, but will instead encode a reference to the string
		232	instead. Depending on your data format, this can save a lot of space, but
		233	also results in a very large runtime overhead (expect encoding times to be
		234	2-4 times as high as without).
		235
		236	It is recommended to leave it off unless you know your
		237	communications partner supports the stringref extension to CBOR
		238	(L<http://cbor.schmorp.de/stringref>), as without decoder support, the
		239	resulting data structure might not be usable.
		240
		241	If C<$enable> is false (the default), then C<encode> will encode strings
		242	the standard CBOR way.
		243
		244	This option does not affect C<decode> in any way - string references will
		245	always be decoded properly if present.
		246
		247	=item $cbor = $cbor->validate_utf8 ([$enable])
		248
		249	=item $enabled = $cbor->get_validate_utf8
		250
		251	If C<$enable> is true (or missing), then C<decode> will validate that
		252	elements (text strings) containing UTF-8 data in fact contain valid UTF-8
		253	data (instead of blindly accepting it). This validation obviously takes
		254	extra time during decoding.
		255
		256	The concept of "valid UTF-8" used is perl's concept, which is a superset
		257	of the official UTF-8.
		258
		259	If C<$enable> is false (the default), then C<decode> will blindly accept
		260	UTF-8 data, marking them as valid UTF-8 in the resulting data structure
		261	regardless of whether thats true or not.
		262
		263	Perl isn't too happy about corrupted UTF-8 in strings, but should
		264	generally not crash or do similarly evil things. Extensions might be not
		265	so forgiving, so it's recommended to turn on this setting if you receive
		266	untrusted CBOR.
		267
		268	This option does not affect C<encode> in any way - strings that are
		269	supposedly valid UTF-8 will simply be dumped into the resulting CBOR
		270	string without checking whether that is, in fact, true or not.
		271
		272	=item $cbor = $cbor->filter ([$cb->($tag, $value)])
		273
		274	=item $cb_or_undef = $cbor->get_filter
		275
		276	Sets or replaces the tagged value decoding filter (when C<$cb> is
		277	specified) or clears the filter (if no argument or C<undef> is provided).
		278
		279	The filter callback is called only during decoding, when a non-enforced
		280	tagged value has been decoded (see L<TAG HANDLING AND EXTENSIONS> for a
		281	list of enforced tags). For specific tags, it's often better to provide a
		282	default converter using the C<%CBOR::XS::FILTER> hash (see below).
		283
		284	The first argument is the numerical tag, the second is the (decoded) value
		285	that has been tagged.
		286
		287	The filter function should return either exactly one value, which will
		288	replace the tagged value in the decoded data structure, or no values,
		289	which will result in default handling, which currently means the decoder
		290	creates a C<CBOR::XS::Tagged> object to hold the tag and the value.
		291
		292	When the filter is cleared (the default state), the default filter
		293	function, C<CBOR::XS::default_filter>, is used. This function simply looks
		294	up the tag in the C<%CBOR::XS::FILTER> hash. If an entry exists it must be
		295	a code reference that is called with tag and value, and is responsible for
		296	decoding the value. If no entry exists, it returns no values.
		297
		298	Example: decode all tags not handled internally into C<CBOR::XS::Tagged>
		299	objects, with no other special handling (useful when working with
		300	potentially "unsafe" CBOR data).
		301
		302	CBOR::XS->new->filter (sub { })->decode ($cbor_data);
		303
		304	Example: provide a global filter for tag 1347375694, converting the value
		305	into some string form.
		306
		307	$CBOR::XS::FILTER{1347375694} = sub {
		308	my ($tag, $value);
		309
		310	"tag 1347375694 value $value"
		311	};
		312
162	=item $cbor_data = $cbor->encode ($perl_scalar)	313	=item $cbor_data = $cbor->encode ($perl_scalar)
163		314
164	Converts the given Perl data structure (a scalar value) to its CBOR	315	Converts the given Perl data structure (a scalar value) to its CBOR
165	representation.	316	representation.
166		317
…		…
179	and you need to know where the first CBOR string ends amd the next one	330	and you need to know where the first CBOR string ends amd the next one
180	starts.	331	starts.
181		332
182	CBOR::XS->new->decode_prefix ("......")	333	CBOR::XS->new->decode_prefix ("......")
183	=> ("...", 3)	334	=> ("...", 3)
		335
		336	=back
		337
		338	=head2 INCREMENTAL PARSING
		339
		340	In some cases, there is the need for incremental parsing of JSON
		341	texts. While this module always has to keep both CBOR text and resulting
		342	Perl data structure in memory at one time, it does allow you to parse a
		343	CBOR stream incrementally, using a similar to using "decode_prefix" to see
		344	if a full CBOR object is available, but is much more efficient.
		345
		346	It basically works by parsing as much of a CBOR string as possible - if
		347	the CBOR data is not complete yet, the pasrer will remember where it was,
		348	to be able to restart when more data has been accumulated. Once enough
		349	data is available to either decode a complete CBOR value or raise an
		350	error, a real decode will be attempted.
		351
		352	A typical use case would be a network protocol that consists of sending
		353	and receiving CBOR-encoded messages. The solution that works with CBOR and
		354	about anything else is by prepending a length to every CBOR value, so the
		355	receiver knows how many octets to read. More compact (and slightly slower)
		356	would be to just send CBOR values back-to-back, as C<CBOR::XS> knows where
		357	a CBOR value ends, and doesn't need an explicit length.
		358
		359	The following methods help with this:
		360
		361	=over 4
		362
		363	=item @decoded = $cbor->incr_parse ($buffer)
		364
		365	This method attempts to decode exactly one CBOR value from the beginning
		366	of the given C<$buffer>. The value is removed from the C<$buffer> on
		367	success. When C<$buffer> doesn't contain a complete value yet, it returns
		368	nothing. Finally, when the C<$buffer> doesn't start with something
		369	that could ever be a valid CBOR value, it raises an exception, just as
		370	C<decode> would. In the latter case the decoder state is undefined and
		371	must be reset before being able to parse further.
		372
		373	This method modifies the C<$buffer> in place. When no CBOR value can be
		374	decoded, the decoder stores the current string offset. On the next call,
		375	continues decoding at the place where it stopped before. For this to make
		376	sense, the C<$buffer> must begin with the same octets as on previous
		377	unsuccessful calls.
		378
		379	You can call this method in scalar context, in which case it either
		380	returns a decoded value or C<undef>. This makes it impossible to
		381	distinguish between CBOR null values (which decode to C<undef>) and an
		382	unsuccessful decode, which is often acceptable.
		383
		384	=item @decoded = $cbor->incr_parse_multiple ($buffer)
		385
		386	Same as C<incr_parse>, but attempts to decode as many CBOR values as
		387	possible in one go, instead of at most one. Calls to C<incr_parse> and
		388	C<incr_parse_multiple> can be interleaved.
		389
		390	=item $cbor->incr_reset
		391
		392	Resets the incremental decoder. This throws away any saved state, so that
		393	subsequent calls to C<incr_parse> or C<incr_parse_multiple> start to parse
		394	a new CBOR value from the beginning of the C<$buffer> again.
		395
		396	This method can be caled at any time, but it I<must> be called if you want
		397	to change your C<$buffer> or there was a decoding error and you want to
		398	reuse the C<$cbor> object for future incremental parsings.
184		399
185	=back	400	=back
186		401
187		402
188	=head1 MAPPING	403	=head1 MAPPING
…		…
206	CBOR integers become (numeric) perl scalars. On perls without 64 bit	421	CBOR integers become (numeric) perl scalars. On perls without 64 bit
207	support, 64 bit integers will be truncated or otherwise corrupted.	422	support, 64 bit integers will be truncated or otherwise corrupted.
208		423
209	=item byte strings	424	=item byte strings
210		425
211	Byte strings will become octet strings in Perl (the byte values 0..255	426	Byte strings will become octet strings in Perl (the Byte values 0..255
212	will simply become characters of the same value in Perl).	427	will simply become characters of the same value in Perl).
213		428
214	=item UTF-8 strings	429	=item UTF-8 strings
215		430
216	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be	431	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
…		…
234	C<Types:Serialiser::false> and C<Types::Serialiser::error>,	449	C<Types:Serialiser::false> and C<Types::Serialiser::error>,
235	respectively. They are overloaded to act almost exactly like the numbers	450	respectively. They are overloaded to act almost exactly like the numbers
236	C<1> and C<0> (for true and false) or to throw an exception on access (for	451	C<1> and C<0> (for true and false) or to throw an exception on access (for
237	error). See the L<Types::Serialiser> manpage for details.	452	error). See the L<Types::Serialiser> manpage for details.
238		453
239	=item CBOR tag 256 (perl object)	454	=item tagged values
240		455
241	The tag value C<256> (TODO: pending iana registration) will be used
242	to deserialise a Perl object serialised with C<FREEZE>. See L<OBJECT
243	SERIALISATION>, below, for details.
244
245	=item CBOR tag 55799 (magic header)
246
247	The tag 55799 is ignored (this tag implements the magic header).
248
249	=item other CBOR tags
250
251	Tagged items consists of a numeric tag and another CBOR value. Tags not	456	Tagged items consists of a numeric tag and another CBOR value.
252	handled internally are currently converted into a L<CBOR::XS::Tagged>
253	object, which is simply a blessed array reference consisting of the
254	numeric tag value followed by the (decoded) CBOR value.
255		457
256	In the future, support for user-supplied conversions might get added.	458	See L<TAG HANDLING AND EXTENSIONS> and the description of C<< ->filter >>
		459	for details on which tags are handled how.
257		460
258	=item anything else	461	=item anything else
259		462
260	Anything else (e.g. unsupported simple values) will raise a decoding	463	Anything else (e.g. unsupported simple values) will raise a decoding
261	error.	464	error.
…		…
264		467
265		468
266	=head2 PERL -> CBOR	469	=head2 PERL -> CBOR
267		470
268	The mapping from Perl to CBOR is slightly more difficult, as Perl is a	471	The mapping from Perl to CBOR is slightly more difficult, as Perl is a
269	truly typeless language, so we can only guess which CBOR type is meant by	472	typeless language. That means this module can only guess which CBOR type
270	a Perl value.	473	is meant by a perl value.
271		474
272	=over 4	475	=over 4
273		476
274	=item hash references	477	=item hash references
275		478
276	Perl hash references become CBOR maps. As there is no inherent ordering in	479	Perl hash references become CBOR maps. As there is no inherent ordering in
277	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random	480	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
278	order.	481	order. This order can be different each time a hahs is encoded.
279		482
280	Currently, tied hashes will use the indefinite-length format, while normal	483	Currently, tied hashes will use the indefinite-length format, while normal
281	hashes will use the fixed-length format.	484	hashes will use the fixed-length format.
282		485
283	=item array references	486	=item array references
284		487
285	Perl array references become fixed-length CBOR arrays.	488	Perl array references become fixed-length CBOR arrays.
286		489
287	=item other references	490	=item other references
288		491
289	Other unblessed references are generally not allowed and will cause an	492	Other unblessed references will be represented using
290	exception to be thrown, except for references to the integers C<0> and	493	the indirection tag extension (tag value C<22098>,
291	C<1>, which get turned into false and true in CBOR.	494	L<http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
		495	to be able to decode these values somehow, by either "doing the right
		496	thing", decoding into a generic tagged object, simply ignoring the tag, or
		497	something else.
292		498
293	=item CBOR::XS::Tagged objects	499	=item CBOR::XS::Tagged objects
294		500
295	Objects of this type must be arrays consisting of a single C<[tag, value]>	501	Objects of this type must be arrays consisting of a single C<[tag, value]>
296	pair. The (numerical) tag will be encoded as a CBOR tag, the value will	502	pair. The (numerical) tag will be encoded as a CBOR tag, the value will
297	be encoded as appropriate for the value. You cna use C<CBOR::XS::tag> to	503	be encoded as appropriate for the value. You must use C<CBOR::XS::tag> to
298	create such objects.	504	create such objects.
299		505
300	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error	506	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
301		507
302	These special values become CBOR true, CBOR false and CBOR undefined	508	These special values become CBOR true, CBOR false and CBOR undefined
…		…
304	if you want.	510	if you want.
305		511
306	=item other blessed objects	512	=item other blessed objects
307		513
308	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See	514	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
309	L<OBJECT SERIALISATION>, below, for details.	515	L<TAG HANDLING AND EXTENSIONS> for specific classes handled by this
		516	module, and L<OBJECT SERIALISATION> for generic object serialisation.
310		517
311	=item simple scalars	518	=item simple scalars
312		519
313	TODO
314	Simple Perl scalars (any scalar that is not a reference) are the most	520	Simple Perl scalars (any scalar that is not a reference) are the most
315	difficult objects to encode: CBOR::XS will encode undefined scalars as	521	difficult objects to encode: CBOR::XS will encode undefined scalars as
316	CBOR null values, scalars that have last been used in a string context	522	CBOR null values, scalars that have last been used in a string context
317	before encoding as CBOR strings, and anything else as number value:	523	before encoding as CBOR strings, and anything else as number value:
318		524
319	# dump as number	525	# dump as number
320	encode_cbor [2] # yields [2]	526	encode_cbor [2] # yields [2]
321	encode_cbor [-3.0e17] # yields [-3e+17]	527	encode_cbor [-3.0e17] # yields [-3e+17]
322	my $value = 5; encode_cbor [$value] # yields [5]	528	my $value = 5; encode_cbor [$value] # yields [5]
323		529
324	# used as string, so dump as string	530	# used as string, so dump as string (either byte or text)
325	print $value;	531	print $value;
326	encode_cbor [$value] # yields ["5"]	532	encode_cbor [$value] # yields ["5"]
327		533
328	# undef becomes null	534	# undef becomes null
329	encode_cbor [undef] # yields [null]	535	encode_cbor [undef] # yields [null]
…		…
332		538
333	my $x = 3.1; # some variable containing a number	539	my $x = 3.1; # some variable containing a number
334	"$x"; # stringified	540	"$x"; # stringified
335	$x .= ""; # another, more awkward way to stringify	541	$x .= ""; # another, more awkward way to stringify
336	print $x; # perl does it for you, too, quite often	542	print $x; # perl does it for you, too, quite often
		543
		544	You can force whether a string ie encoded as byte or text string by using
		545	C<utf8::upgrade> and C<utf8::downgrade>):
		546
		547	utf8::upgrade $x; # encode $x as text string
		548	utf8::downgrade $x; # encode $x as byte string
		549
		550	Perl doesn't define what operations up- and downgrade strings, so if the
		551	difference between byte and text is important, you should up- or downgrade
		552	your string as late as possible before encoding.
337		553
338	You can force the type to be a CBOR number by numifying it:	554	You can force the type to be a CBOR number by numifying it:
339		555
340	my $x = "3"; # some variable containing a string	556	my $x = "3"; # some variable containing a string
341	$x += 0; # numify it, ensuring it will be dumped as a number	557	$x += 0; # numify it, ensuring it will be dumped as a number
…		…
354		570
355	=back	571	=back
356		572
357	=head2 OBJECT SERIALISATION	573	=head2 OBJECT SERIALISATION
358		574
		575	This module implements both a CBOR-specific and the generic
		576	L<Types::Serialier> object serialisation protocol. The following
		577	subsections explain both methods.
		578
		579	=head3 ENCODING
		580
359	This module knows two way to serialise a Perl object: The CBOR-specific	581	This module knows two way to serialise a Perl object: The CBOR-specific
360	way, and the generic way.	582	way, and the generic way.
361		583
362	Whenever the encoder encounters a Perl object that it cnanot serialise	584	Whenever the encoder encounters a Perl object that it cannot serialise
363	directly (most of them), it will first look up the C<TO_CBOR> method on	585	directly (most of them), it will first look up the C<TO_CBOR> method on
364	it.	586	it.
365		587
366	If it has a C<TO_CBOR> method, it will call it with the object as only	588	If it has a C<TO_CBOR> method, it will call it with the object as only
367	argument, and expects exactly one return value, which it will then	589	argument, and expects exactly one return value, which it will then
…		…
373		595
374	The C<FREEZE> method can return any number of values (i.e. zero or	596	The C<FREEZE> method can return any number of values (i.e. zero or
375	more). These will be encoded as CBOR perl object, together with the	597	more). These will be encoded as CBOR perl object, together with the
376	classname.	598	classname.
377		599
		600	These methods I<MUST NOT> change the data structure that is being
		601	serialised. Failure to comply to this can result in memory corruption -
		602	and worse.
		603
378	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail	604	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
379	with an error.	605	with an error.
380		606
		607	=head3 DECODING
		608
381	Objects encoded via C<TO_CBOR> cannot be automatically decoded, but	609	Objects encoded via C<TO_CBOR> cannot (normally) be automatically decoded,
382	objects encoded via C<FREEZE> can be decoded using the following protocol:	610	but objects encoded via C<FREEZE> can be decoded using the following
		611	protocol:
383		612
384	When an encoded CBOR perl object is encountered by the decoder, it will	613	When an encoded CBOR perl object is encountered by the decoder, it will
385	look up the C<THAW> method, by using the stored classname, and will fail	614	look up the C<THAW> method, by using the stored classname, and will fail
386	if the method cannot be found.	615	if the method cannot be found.
387		616
388	After the lookup it will call the C<THAW> method with the stored classname	617	After the lookup it will call the C<THAW> method with the stored classname
389	as first argument, the constant string C<CBOR> as second argument, and all	618	as first argument, the constant string C<CBOR> as second argument, and all
390	values returned by C<FREEZE> as remaining arguments.	619	values returned by C<FREEZE> as remaining arguments.
391		620
392	=head4 EXAMPLES	621	=head3 EXAMPLES
393		622
394	Here is an example C<TO_CBOR> method:	623	Here is an example C<TO_CBOR> method:
395		624
396	sub My::Object::TO_CBOR {	625	sub My::Object::TO_CBOR {
397	my ($obj) = @_;	626	my ($obj) = @_;
…		…
408		637
409	sub URI::TO_CBOR {	638	sub URI::TO_CBOR {
410	my ($self) = @_;	639	my ($self) = @_;
411	my $uri = "$self"; # stringify uri	640	my $uri = "$self"; # stringify uri
412	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string	641	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
413	CBOR::XS::tagged 32, "$_[0]"	642	CBOR::XS::tag 32, "$_[0]"
414	}	643	}
415		644
416	This will encode URIs as a UTF-8 string with tag 32, which indicates an	645	This will encode URIs as a UTF-8 string with tag 32, which indicates an
417	URI.	646	URI.
418		647
…		…
455	=head1 MAGIC HEADER	684	=head1 MAGIC HEADER
456		685
457	There is no way to distinguish CBOR from other formats	686	There is no way to distinguish CBOR from other formats
458	programmatically. To make it easier to distinguish CBOR from other	687	programmatically. To make it easier to distinguish CBOR from other
459	formats, the CBOR specification has a special "magic string" that can be	688	formats, the CBOR specification has a special "magic string" that can be
460	prepended to any CBOR string without changing it's meaning.	689	prepended to any CBOR string without changing its meaning.
461		690
462	This string is available as C<$CBOR::XS::MAGIC>. This module does not	691	This string is available as C<$CBOR::XS::MAGIC>. This module does not
463	prepend this string tot he CBOR data it generates, but it will ignroe it	692	prepend this string to the CBOR data it generates, but it will ignore it
464	if present, so users can prepend this string as a "file type" indicator as	693	if present, so users can prepend this string as a "file type" indicator as
465	required.	694	required.
466		695
467		696
468	=head1 THE CBOR::XS::Tagged CLASS	697	=head1 THE CBOR::XS::Tagged CLASS
…		…
551	Wrap CBOR data in CBOR:	780	Wrap CBOR data in CBOR:
552		781
553	my $cbor_cbor = encode_cbor	782	my $cbor_cbor = encode_cbor
554	CBOR::XS::tag 24,	783	CBOR::XS::tag 24,
555	encode_cbor [1, 2, 3];	784	encode_cbor [1, 2, 3];
		785
		786	=head1 TAG HANDLING AND EXTENSIONS
		787
		788	This section describes how this module handles specific tagged values
		789	and extensions. If a tag is not mentioned here and no additional filters
		790	are provided for it, then the default handling applies (creating a
		791	CBOR::XS::Tagged object on decoding, and only encoding the tag when
		792	explicitly requested).
		793
		794	Tags not handled specifically are currently converted into a
		795	L<CBOR::XS::Tagged> object, which is simply a blessed array reference
		796	consisting of the numeric tag value followed by the (decoded) CBOR value.
		797
		798	Future versions of this module reserve the right to special case
		799	additional tags (such as base64url).
		800
		801	=head2 ENFORCED TAGS
		802
		803	These tags are always handled when decoding, and their handling cannot be
		804	overriden by the user.
		805
		806	=over 4
		807
		808	=item 26 (perl-object, L<http://cbor.schmorp.de/perl-object>)
		809
		810	These tags are automatically created (and decoded) for serialisable
		811	objects using the C<FREEZE/THAW> methods (the L<Types::Serialier> object
		812	serialisation protocol). See L<OBJECT SERIALISATION> for details.
		813
		814	=item 28, 29 (shareable, sharedref, L <http://cbor.schmorp.de/value-sharing>)
		815
		816	These tags are automatically decoded when encountered (and they do not
		817	result in a cyclic data structure, see C<allow_cycles>), resulting in
		818	shared values in the decoded object. They are only encoded, however, when
		819	C<allow_sharing> is enabled.
		820
		821	Not all shared values can be successfully decoded: values that reference
		822	themselves will I<currently> decode as C<undef> (this is not the same
		823	as a reference pointing to itself, which will be represented as a value
		824	that contains an indirect reference to itself - these will be decoded
		825	properly).
		826
		827	Note that considerably more shared value data structures can be decoded
		828	than will be encoded - currently, only values pointed to by references
		829	will be shared, others will not. While non-reference shared values can be
		830	generated in Perl with some effort, they were considered too unimportant
		831	to be supported in the encoder. The decoder, however, will decode these
		832	values as shared values.
		833
		834	=item 256, 25 (stringref-namespace, stringref, L <http://cbor.schmorp.de/stringref>)
		835
		836	These tags are automatically decoded when encountered. They are only
		837	encoded, however, when C<pack_strings> is enabled.
		838
		839	=item 22098 (indirection, L<http://cbor.schmorp.de/indirection>)
		840
		841	This tag is automatically generated when a reference are encountered (with
		842	the exception of hash and array refernces). It is converted to a reference
		843	when decoding.
		844
		845	=item 55799 (self-describe CBOR, RFC 7049)
		846
		847	This value is not generated on encoding (unless explicitly requested by
		848	the user), and is simply ignored when decoding.
		849
		850	=back
		851
		852	=head2 NON-ENFORCED TAGS
		853
		854	These tags have default filters provided when decoding. Their handling can
		855	be overriden by changing the C<%CBOR::XS::FILTER> entry for the tag, or by
		856	providing a custom C<filter> callback when decoding.
		857
		858	When they result in decoding into a specific Perl class, the module
		859	usually provides a corresponding C<TO_CBOR> method as well.
		860
		861	When any of these need to load additional modules that are not part of the
		862	perl core distribution (e.g. L<URI>), it is (currently) up to the user to
		863	provide these modules. The decoding usually fails with an exception if the
		864	required module cannot be loaded.
		865
		866	=over 4
		867
		868	=item 0, 1 (date/time string, seconds since the epoch)
		869
		870	These tags are decoded into L<Time::Piece> objects. The corresponding
		871	C<Time::Piece::TO_CBOR> method always encodes into tag 1 values currently.
		872
		873	The L<Time::Piece> API is generally surprisingly bad, and fractional
		874	seconds are only accidentally kept intact, so watch out. On the plus side,
		875	the module comes with perl since 5.10, which has to count for something.
		876
		877	=item 2, 3 (positive/negative bignum)
		878
		879	These tags are decoded into L<Math::BigInt> objects. The corresponding
		880	C<Math::BigInt::TO_CBOR> method encodes "small" bigints into normal CBOR
		881	integers, and others into positive/negative CBOR bignums.
		882
		883	=item 4, 5 (decimal fraction/bigfloat)
		884
		885	Both decimal fractions and bigfloats are decoded into L<Math::BigFloat>
		886	objects. The corresponding C<Math::BigFloat::TO_CBOR> method I<always>
		887	encodes into a decimal fraction.
		888
		889	CBOR cannot represent bigfloats with I<very> large exponents - conversion
		890	of such big float objects is undefined.
		891
		892	Also, NaN and infinities are not encoded properly.
		893
		894	=item 21, 22, 23 (expected later JSON conversion)
		895
		896	CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore these
		897	tags.
		898
		899	=item 32 (URI)
		900
		901	These objects decode into L<URI> objects. The corresponding
		902	C<URI::TO_CBOR> method again results in a CBOR URI value.
		903
		904	=back
		905
		906	=cut
		907
		908	our %FILTER = (
		909	# 0 # rfc4287 datetime, utf-8
		910	# 1 # unix timestamp, any
		911
		912	2 => sub { # pos bigint
		913	require Math::BigInt;
		914	Math::BigInt->new ("0x" . unpack "H*", pop)
		915	},
		916
		917	3 => sub { # neg bigint
		918	require Math::BigInt;
		919	-Math::BigInt->new ("0x" . unpack "H*", pop)
		920	},
		921
		922	4 => sub { # decimal fraction, array
		923	require Math::BigFloat;
		924	Math::BigFloat->new ($_[1][1] . "E" . $_[1][0])
		925	},
		926
		927	5 => sub { # bigfloat, array
		928	require Math::BigFloat;
		929	scalar Math::BigFloat->new ($_[1][1])->blsft ($_[1][0], 2)
		930	},
		931
		932	21 => sub { pop }, # expected conversion to base64url encoding
		933	22 => sub { pop }, # expected conversion to base64 encoding
		934	23 => sub { pop }, # expected conversion to base16 encoding
		935
		936	# 24 # embedded cbor, byte string
		937
		938	32 => sub {
		939	require URI;
		940	URI->new (pop)
		941	},
		942
		943	# 33 # base64url rfc4648, utf-8
		944	# 34 # base64 rfc46484, utf-8
		945	# 35 # regex pcre/ecma262, utf-8
		946	# 36 # mime message rfc2045, utf-8
		947	);
		948
556		949
557	=head1 CBOR and JSON	950	=head1 CBOR and JSON
558		951
559	CBOR is supposed to implement a superset of the JSON data model, and is,	952	CBOR is supposed to implement a superset of the JSON data model, and is,
560	with some coercion, able to represent all JSON texts (something that other	953	with some coercion, able to represent all JSON texts (something that other
…		…
621	properly. Half precision types are accepted, but not encoded.	1014	properly. Half precision types are accepted, but not encoded.
622		1015
623	Strict mode and canonical mode are not implemented.	1016	Strict mode and canonical mode are not implemented.
624		1017
625		1018
		1019	=head1 LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
		1020
		1021	On perls that were built without 64 bit integer support (these are rare
		1022	nowadays, even on 32 bit architectures), support for any kind of 64 bit
		1023	integer in CBOR is very limited - most likely, these 64 bit values will
		1024	be truncated, corrupted, or otherwise not decoded correctly. This also
		1025	includes string, array and map sizes that are stored as 64 bit integers.
		1026
		1027
626	=head1 THREADS	1028	=head1 THREADS
627		1029
628	This module is I<not> guaranteed to be thread safe and there are no	1030	This module is I<not> guaranteed to be thread safe and there are no
629	plans to change this until Perl gets thread support (as opposed to the	1031	plans to change this until Perl gets thread support (as opposed to the
630	horribly slow so-called "threads" which are simply slow and bloated	1032	horribly slow so-called "threads" which are simply slow and bloated
…		…
642	Please refrain from using rt.cpan.org or any other bug reporting	1044	Please refrain from using rt.cpan.org or any other bug reporting
643	service. I put the contact address into my modules for a reason.	1045	service. I put the contact address into my modules for a reason.
644		1046
645	=cut	1047	=cut
646		1048
		1049	our %FILTER = (
		1050	0 => sub { # rfc4287 datetime, utf-8
		1051	require Time::Piece;
		1052	# Time::Piece::Strptime uses the "incredibly flexible date parsing routine"
		1053	# from FreeBSD, which can't parse ISO 8601, RFC3339, RFC4287 or much of anything
		1054	# else either. Whats incredibe over standard strptime totally escapes me.
		1055	# doesn't do fractional times, either. sigh.
		1056	# In fact, it's all a lie, it uses whatever strptime it wants, and of course,
		1057	# they are all incomptible. The openbsd one simply ignores %z (but according to the
		1058	# docs, it would be much more incredibly flexible indeed. If it worked, that is.).
		1059	scalar eval {
		1060	my $s = $_[1];
		1061
		1062	$s =~ s/Z$/+00:00/;
		1063	$s =~ s/(\.[0-9]+)?([+-][0-9][0-9]):([0-9][0-9])$//
		1064	or die;
		1065
		1066	my $b = $1 - ($2 * 60 + $3) * 60; # fractional part + offset. hopefully
		1067	my $d = Time::Piece->strptime ($s, "%Y-%m-%dT%H:%M:%S");
		1068
		1069	Time::Piece::gmtime ($d->epoch + $b)
		1070	} \|\| die "corrupted CBOR date/time string ($_[0])";
		1071	},
		1072
		1073	1 => sub { # seconds since the epoch, possibly fractional
		1074	require Time::Piece;
		1075	scalar Time::Piece::gmtime (pop)
		1076	},
		1077
		1078	2 => sub { # pos bigint
		1079	require Math::BigInt;
		1080	Math::BigInt->new ("0x" . unpack "H*", pop)
		1081	},
		1082
		1083	3 => sub { # neg bigint
		1084	require Math::BigInt;
		1085	-Math::BigInt->new ("0x" . unpack "H*", pop)
		1086	},
		1087
		1088	4 => sub { # decimal fraction, array
		1089	require Math::BigFloat;
		1090	Math::BigFloat->new ($_[1][1] . "E" . $_[1][0])
		1091	},
		1092
		1093	5 => sub { # bigfloat, array
		1094	require Math::BigFloat;
		1095	scalar Math::BigFloat->new ($_[1][1])->blsft ($_[1][0], 2)
		1096	},
		1097
		1098	21 => sub { pop }, # expected conversion to base64url encoding
		1099	22 => sub { pop }, # expected conversion to base64 encoding
		1100	23 => sub { pop }, # expected conversion to base16 encoding
		1101
		1102	# 24 # embedded cbor, byte string
		1103
		1104	32 => sub {
		1105	require URI;
		1106	URI->new (pop)
		1107	},
		1108
		1109	# 33 # base64url rfc4648, utf-8
		1110	# 34 # base64 rfc46484, utf-8
		1111	# 35 # regex pcre/ecma262, utf-8
		1112	# 36 # mime message rfc2045, utf-8
		1113	);
		1114
		1115	sub CBOR::XS::default_filter {
		1116	&{ $FILTER{$_[0]} or return }
		1117	}
		1118
		1119	sub URI::TO_CBOR {
		1120	my $uri = $_[0]->as_string;
		1121	utf8::upgrade $uri;
		1122	tag 32, $uri
		1123	}
		1124
		1125	sub Math::BigInt::TO_CBOR {
		1126	if ($_[0] >= -2147483648 && $_[0] <= 2147483647) {
		1127	$_[0]->numify
		1128	} else {
		1129	my $hex = substr $_[0]->as_hex, 2;
		1130	$hex = "0$hex" if 1 & length $hex; # sigh
		1131	tag $_[0] >= 0 ? 2 : 3, pack "H*", $hex
		1132	}
		1133	}
		1134
		1135	sub Math::BigFloat::TO_CBOR {
		1136	my ($m, $e) = $_[0]->parts;
		1137	tag 4, [$e->numify, $m]
		1138	}
		1139
		1140	sub Time::Piece::TO_CBOR {
		1141	tag 1, 0 + $_[0]->epoch
		1142	}
		1143
647	XSLoader::load "CBOR::XS", $VERSION;	1144	XSLoader::load "CBOR::XS", $VERSION;
648		1145
649	=head1 SEE ALSO	1146	=head1 SEE ALSO
650		1147
651	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,	1148	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing CBOR-XS/XS.pm (file contents): Revision 1.15 by root, Tue Oct 29 21:13:28 2013 UTC vs. Revision 1.40 by root, Sun Jan 5 14:24:54 2014 UTC

Diff Legend

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.15 by root, Tue Oct 29 21:13:28 2013 UTC vs.
Revision 1.40 by root, Sun Jan 5 14:24:54 2014 UTC