[ViewVC] Diff of: cvs/CBOR-XS/XS.pm

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.7 by root, Sun Oct 27 22:35:15 2013 UTC vs.
Revision 1.48 by root, Thu Feb 25 14:22:49 2016 UTC

…		…
26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string	26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27	}	27	}
28		28
29	=head1 DESCRIPTION	29	=head1 DESCRIPTION
30		30
31	WARNING! THIS IS A PRE-ALPHA RELEASE! IT WILL CRASH, CORRUPT YOUR DATA
32	AND EAT YOUR CHILDREN! (Actually, apart from being untested and a bit
33	feature-limited, it might already be useful).
34
35	This module converts Perl data structures to the Concise Binary Object	31	This module converts Perl data structures to the Concise Binary Object
36	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation	32	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
37	format that aims to use a superset of the JSON data model, i.e. when you	33	format that aims to use an (almost) superset of the JSON data model, i.e.
38	can represent something in JSON, you should be able to represent it in	34	when you can represent something useful in JSON, you should be able to
39	CBOR.	35	represent it in CBOR.
40		36
41	This makes it a faster and more compact binary alternative to JSON, with	37	In short, CBOR is a faster and quite compact binary alternative to JSON,
42	the added ability of supporting serialising of perl objects.	38	with the added ability of supporting serialisation of Perl objects. (JSON
		39	often compresses better than CBOR though, so if you plan to compress the
		40	data later and speed is less important you might want to compare both
		41	formats first).
		42
		43	To give you a general idea about speed, with texts in the megabyte range,
		44	C<CBOR::XS> usually encodes roughly twice as fast as L<Storable> or
		45	L<JSON::XS> and decodes about 15%-30% faster than those. The shorter the
		46	data, the worse L<Storable> performs in comparison.
		47
		48	Regarding compactness, C<CBOR::XS>-encoded data structures are usually
		49	about 20% smaller than the same data encoded as (compact) JSON or
		50	L<Storable>.
		51
		52	In addition to the core CBOR data format, this module implements a
		53	number of extensions, to support cyclic and shared data structures
		54	(see C<allow_sharing> and C<allow_cycles>), string deduplication (see
		55	C<pack_strings>) and scalar references (always enabled).
43		56
44	The primary goal of this module is to be I<correct> and the secondary goal	57	The primary goal of this module is to be I<correct> and the secondary goal
45	is to be I<fast>. To reach the latter goal it was written in C.	58	is to be I<fast>. To reach the latter goal it was written in C.
46		59
47	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and	60	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
…		…
51		64
52	package CBOR::XS;	65	package CBOR::XS;
53		66
54	use common::sense;	67	use common::sense;
55		68
56	our $VERSION = 0.03;	69	our $VERSION = 1.41;
57	our @ISA = qw(Exporter);	70	our @ISA = qw(Exporter);
58		71
59	our @EXPORT = qw(encode_cbor decode_cbor);	72	our @EXPORT = qw(encode_cbor decode_cbor);
60		73
61	use Exporter;	74	use Exporter;
…		…
98	strings. All boolean flags described below are by default I<disabled>.	111	strings. All boolean flags described below are by default I<disabled>.
99		112
100	The mutators for flags all return the CBOR object again and thus calls can	113	The mutators for flags all return the CBOR object again and thus calls can
101	be chained:	114	be chained:
102		115
103	#TODO
104	my $cbor = CBOR::XS->new->encode ({a => [1,2]});	116	my $cbor = CBOR::XS->new->encode ({a => [1,2]});
105		117
106	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])	118	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
107		119
108	=item $max_depth = $cbor->get_max_depth	120	=item $max_depth = $cbor->get_max_depth
…		…
142	If no argument is given, the limit check will be deactivated (same as when	154	If no argument is given, the limit check will be deactivated (same as when
143	C<0> is specified).	155	C<0> is specified).
144		156
145	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.	157	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
146		158
		159	=item $cbor = $cbor->allow_unknown ([$enable])
		160
		161	=item $enabled = $cbor->get_allow_unknown
		162
		163	If C<$enable> is true (or missing), then C<encode> will I<not> throw an
		164	exception when it encounters values it cannot represent in CBOR (for
		165	example, filehandles) but instead will encode a CBOR C<error> value.
		166
		167	If C<$enable> is false (the default), then C<encode> will throw an
		168	exception when it encounters anything it cannot encode as CBOR.
		169
		170	This option does not affect C<decode> in any way, and it is recommended to
		171	leave it off unless you know your communications partner.
		172
		173	=item $cbor = $cbor->allow_sharing ([$enable])
		174
		175	=item $enabled = $cbor->get_allow_sharing
		176
		177	If C<$enable> is true (or missing), then C<encode> will not double-encode
		178	values that have been referenced before (e.g. when the same object, such
		179	as an array, is referenced multiple times), but instead will emit a
		180	reference to the earlier value.
		181
		182	This means that such values will only be encoded once, and will not result
		183	in a deep cloning of the value on decode, in decoders supporting the value
		184	sharing extension. This also makes it possible to encode cyclic data
		185	structures (which need C<allow_cycles> to ne enabled to be decoded by this
		186	module).
		187
		188	It is recommended to leave it off unless you know your
		189	communication partner supports the value sharing extensions to CBOR
		190	(L<http://cbor.schmorp.de/value-sharing>), as without decoder support, the
		191	resulting data structure might be unusable.
		192
		193	Detecting shared values incurs a runtime overhead when values are encoded
		194	that have a reference counter large than one, and might unnecessarily
		195	increase the encoded size, as potentially shared values are encode as
		196	shareable whether or not they are actually shared.
		197
		198	At the moment, only targets of references can be shared (e.g. scalars,
		199	arrays or hashes pointed to by a reference). Weirder constructs, such as
		200	an array with multiple "copies" of the I<same> string, which are hard but
		201	not impossible to create in Perl, are not supported (this is the same as
		202	with L<Storable>).
		203
		204	If C<$enable> is false (the default), then C<encode> will encode shared
		205	data structures repeatedly, unsharing them in the process. Cyclic data
		206	structures cannot be encoded in this mode.
		207
		208	This option does not affect C<decode> in any way - shared values and
		209	references will always be decoded properly if present.
		210
		211	=item $cbor = $cbor->allow_cycles ([$enable])
		212
		213	=item $enabled = $cbor->get_allow_cycles
		214
		215	If C<$enable> is true (or missing), then C<decode> will happily decode
		216	self-referential (cyclic) data structures. By default these will not be
		217	decoded, as they need manual cleanup to avoid memory leaks, so code that
		218	isn't prepared for this will not leak memory.
		219
		220	If C<$enable> is false (the default), then C<decode> will throw an error
		221	when it encounters a self-referential/cyclic data structure.
		222
		223	FUTURE DIRECTION: the motivation behind this option is to avoid I<real>
		224	cycles - future versions of this module might chose to decode cyclic data
		225	structures using weak references when this option is off, instead of
		226	throwing an error.
		227
		228	This option does not affect C<encode> in any way - shared values and
		229	references will always be encoded properly if present.
		230
		231	=item $cbor = $cbor->pack_strings ([$enable])
		232
		233	=item $enabled = $cbor->get_pack_strings
		234
		235	If C<$enable> is true (or missing), then C<encode> will try not to encode
		236	the same string twice, but will instead encode a reference to the string
		237	instead. Depending on your data format, this can save a lot of space, but
		238	also results in a very large runtime overhead (expect encoding times to be
		239	2-4 times as high as without).
		240
		241	It is recommended to leave it off unless you know your
		242	communications partner supports the stringref extension to CBOR
		243	(L<http://cbor.schmorp.de/stringref>), as without decoder support, the
		244	resulting data structure might not be usable.
		245
		246	If C<$enable> is false (the default), then C<encode> will encode strings
		247	the standard CBOR way.
		248
		249	This option does not affect C<decode> in any way - string references will
		250	always be decoded properly if present.
		251
		252	=item $cbor = $cbor->validate_utf8 ([$enable])
		253
		254	=item $enabled = $cbor->get_validate_utf8
		255
		256	If C<$enable> is true (or missing), then C<decode> will validate that
		257	elements (text strings) containing UTF-8 data in fact contain valid UTF-8
		258	data (instead of blindly accepting it). This validation obviously takes
		259	extra time during decoding.
		260
		261	The concept of "valid UTF-8" used is perl's concept, which is a superset
		262	of the official UTF-8.
		263
		264	If C<$enable> is false (the default), then C<decode> will blindly accept
		265	UTF-8 data, marking them as valid UTF-8 in the resulting data structure
		266	regardless of whether thats true or not.
		267
		268	Perl isn't too happy about corrupted UTF-8 in strings, but should
		269	generally not crash or do similarly evil things. Extensions might be not
		270	so forgiving, so it's recommended to turn on this setting if you receive
		271	untrusted CBOR.
		272
		273	This option does not affect C<encode> in any way - strings that are
		274	supposedly valid UTF-8 will simply be dumped into the resulting CBOR
		275	string without checking whether that is, in fact, true or not.
		276
		277	=item $cbor = $cbor->filter ([$cb->($tag, $value)])
		278
		279	=item $cb_or_undef = $cbor->get_filter
		280
		281	Sets or replaces the tagged value decoding filter (when C<$cb> is
		282	specified) or clears the filter (if no argument or C<undef> is provided).
		283
		284	The filter callback is called only during decoding, when a non-enforced
		285	tagged value has been decoded (see L<TAG HANDLING AND EXTENSIONS> for a
		286	list of enforced tags). For specific tags, it's often better to provide a
		287	default converter using the C<%CBOR::XS::FILTER> hash (see below).
		288
		289	The first argument is the numerical tag, the second is the (decoded) value
		290	that has been tagged.
		291
		292	The filter function should return either exactly one value, which will
		293	replace the tagged value in the decoded data structure, or no values,
		294	which will result in default handling, which currently means the decoder
		295	creates a C<CBOR::XS::Tagged> object to hold the tag and the value.
		296
		297	When the filter is cleared (the default state), the default filter
		298	function, C<CBOR::XS::default_filter>, is used. This function simply looks
		299	up the tag in the C<%CBOR::XS::FILTER> hash. If an entry exists it must be
		300	a code reference that is called with tag and value, and is responsible for
		301	decoding the value. If no entry exists, it returns no values.
		302
		303	Example: decode all tags not handled internally into C<CBOR::XS::Tagged>
		304	objects, with no other special handling (useful when working with
		305	potentially "unsafe" CBOR data).
		306
		307	CBOR::XS->new->filter (sub { })->decode ($cbor_data);
		308
		309	Example: provide a global filter for tag 1347375694, converting the value
		310	into some string form.
		311
		312	$CBOR::XS::FILTER{1347375694} = sub {
		313	my ($tag, $value);
		314
		315	"tag 1347375694 value $value"
		316	};
		317
147	=item $cbor_data = $cbor->encode ($perl_scalar)	318	=item $cbor_data = $cbor->encode ($perl_scalar)
148		319
149	Converts the given Perl data structure (a scalar value) to its CBOR	320	Converts the given Perl data structure (a scalar value) to its CBOR
150	representation.	321	representation.
151		322
…		…
164	and you need to know where the first CBOR string ends amd the next one	335	and you need to know where the first CBOR string ends amd the next one
165	starts.	336	starts.
166		337
167	CBOR::XS->new->decode_prefix ("......")	338	CBOR::XS->new->decode_prefix ("......")
168	=> ("...", 3)	339	=> ("...", 3)
		340
		341	=back
		342
		343	=head2 INCREMENTAL PARSING
		344
		345	In some cases, there is the need for incremental parsing of JSON
		346	texts. While this module always has to keep both CBOR text and resulting
		347	Perl data structure in memory at one time, it does allow you to parse a
		348	CBOR stream incrementally, using a similar to using "decode_prefix" to see
		349	if a full CBOR object is available, but is much more efficient.
		350
		351	It basically works by parsing as much of a CBOR string as possible - if
		352	the CBOR data is not complete yet, the pasrer will remember where it was,
		353	to be able to restart when more data has been accumulated. Once enough
		354	data is available to either decode a complete CBOR value or raise an
		355	error, a real decode will be attempted.
		356
		357	A typical use case would be a network protocol that consists of sending
		358	and receiving CBOR-encoded messages. The solution that works with CBOR and
		359	about anything else is by prepending a length to every CBOR value, so the
		360	receiver knows how many octets to read. More compact (and slightly slower)
		361	would be to just send CBOR values back-to-back, as C<CBOR::XS> knows where
		362	a CBOR value ends, and doesn't need an explicit length.
		363
		364	The following methods help with this:
		365
		366	=over 4
		367
		368	=item @decoded = $cbor->incr_parse ($buffer)
		369
		370	This method attempts to decode exactly one CBOR value from the beginning
		371	of the given C<$buffer>. The value is removed from the C<$buffer> on
		372	success. When C<$buffer> doesn't contain a complete value yet, it returns
		373	nothing. Finally, when the C<$buffer> doesn't start with something
		374	that could ever be a valid CBOR value, it raises an exception, just as
		375	C<decode> would. In the latter case the decoder state is undefined and
		376	must be reset before being able to parse further.
		377
		378	This method modifies the C<$buffer> in place. When no CBOR value can be
		379	decoded, the decoder stores the current string offset. On the next call,
		380	continues decoding at the place where it stopped before. For this to make
		381	sense, the C<$buffer> must begin with the same octets as on previous
		382	unsuccessful calls.
		383
		384	You can call this method in scalar context, in which case it either
		385	returns a decoded value or C<undef>. This makes it impossible to
		386	distinguish between CBOR null values (which decode to C<undef>) and an
		387	unsuccessful decode, which is often acceptable.
		388
		389	=item @decoded = $cbor->incr_parse_multiple ($buffer)
		390
		391	Same as C<incr_parse>, but attempts to decode as many CBOR values as
		392	possible in one go, instead of at most one. Calls to C<incr_parse> and
		393	C<incr_parse_multiple> can be interleaved.
		394
		395	=item $cbor->incr_reset
		396
		397	Resets the incremental decoder. This throws away any saved state, so that
		398	subsequent calls to C<incr_parse> or C<incr_parse_multiple> start to parse
		399	a new CBOR value from the beginning of the C<$buffer> again.
		400
		401	This method can be caled at any time, but it I<must> be called if you want
		402	to change your C<$buffer> or there was a decoding error and you want to
		403	reuse the C<$cbor> object for future incremental parsings.
169		404
170	=back	405	=back
171		406
172		407
173	=head1 MAPPING	408	=head1 MAPPING
…		…
191	CBOR integers become (numeric) perl scalars. On perls without 64 bit	426	CBOR integers become (numeric) perl scalars. On perls without 64 bit
192	support, 64 bit integers will be truncated or otherwise corrupted.	427	support, 64 bit integers will be truncated or otherwise corrupted.
193		428
194	=item byte strings	429	=item byte strings
195		430
196	Byte strings will become octet strings in Perl (the byte values 0..255	431	Byte strings will become octet strings in Perl (the Byte values 0..255
197	will simply become characters of the same value in Perl).	432	will simply become characters of the same value in Perl).
198		433
199	=item UTF-8 strings	434	=item UTF-8 strings
200		435
201	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be	436	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
…		…
219	C<Types:Serialiser::false> and C<Types::Serialiser::error>,	454	C<Types:Serialiser::false> and C<Types::Serialiser::error>,
220	respectively. They are overloaded to act almost exactly like the numbers	455	respectively. They are overloaded to act almost exactly like the numbers
221	C<1> and C<0> (for true and false) or to throw an exception on access (for	456	C<1> and C<0> (for true and false) or to throw an exception on access (for
222	error). See the L<Types::Serialiser> manpage for details.	457	error). See the L<Types::Serialiser> manpage for details.
223		458
224	=item CBOR tag 256 (perl object)	459	=item tagged values
225		460
226	The tag value C<256> (TODO: pending iana registration) will be used
227	to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
228	SERIALISATION", below, for details.
229
230	=item CBOR tag 55799 (magic header)
231
232	The tag 55799 is ignored (this tag implements the magic header).
233
234	=item other CBOR tags
235
236	Tagged items consists of a numeric tag and another CBOR value. Tags not	461	Tagged items consists of a numeric tag and another CBOR value.
237	handled internally are currently converted into a L<CBOR::XS::Tagged>
238	object, which is simply a blessed array reference consisting of the
239	numeric tag value followed by the (decoded) CBOR value.
240		462
241	In the future, support for user-supplied conversions might get added.	463	See L<TAG HANDLING AND EXTENSIONS> and the description of C<< ->filter >>
		464	for details on which tags are handled how.
242		465
243	=item anything else	466	=item anything else
244		467
245	Anything else (e.g. unsupported simple values) will raise a decoding	468	Anything else (e.g. unsupported simple values) will raise a decoding
246	error.	469	error.
…		…
249		472
250		473
251	=head2 PERL -> CBOR	474	=head2 PERL -> CBOR
252		475
253	The mapping from Perl to CBOR is slightly more difficult, as Perl is a	476	The mapping from Perl to CBOR is slightly more difficult, as Perl is a
254	truly typeless language, so we can only guess which CBOR type is meant by	477	typeless language. That means this module can only guess which CBOR type
255	a Perl value.	478	is meant by a perl value.
256		479
257	=over 4	480	=over 4
258		481
259	=item hash references	482	=item hash references
260		483
261	Perl hash references become CBOR maps. As there is no inherent ordering in	484	Perl hash references become CBOR maps. As there is no inherent ordering in
262	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random	485	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
263	order.	486	order. This order can be different each time a hahs is encoded.
264		487
265	Currently, tied hashes will use the indefinite-length format, while normal	488	Currently, tied hashes will use the indefinite-length format, while normal
266	hashes will use the fixed-length format.	489	hashes will use the fixed-length format.
267		490
268	=item array references	491	=item array references
269		492
270	Perl array references become fixed-length CBOR arrays.	493	Perl array references become fixed-length CBOR arrays.
271		494
272	=item other references	495	=item other references
273		496
274	Other unblessed references are generally not allowed and will cause an	497	Other unblessed references will be represented using
275	exception to be thrown, except for references to the integers C<0> and	498	the indirection tag extension (tag value C<22098>,
276	C<1>, which get turned into false and true in CBOR.	499	L<http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
		500	to be able to decode these values somehow, by either "doing the right
		501	thing", decoding into a generic tagged object, simply ignoring the tag, or
		502	something else.
277		503
278	=item CBOR::XS::Tagged objects	504	=item CBOR::XS::Tagged objects
279		505
280	Objects of this type must be arrays consisting of a single C<[tag, value]>	506	Objects of this type must be arrays consisting of a single C<[tag, value]>
281	pair. The (numerical) tag will be encoded as a CBOR tag, the value will be	507	pair. The (numerical) tag will be encoded as a CBOR tag, the value will
282	encoded as appropriate for the value.	508	be encoded as appropriate for the value. You must use C<CBOR::XS::tag> to
		509	create such objects.
283		510
284	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error	511	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
285		512
286	These special values become CBOR true, CBOR false and CBOR undefined	513	These special values become CBOR true, CBOR false and CBOR undefined
287	values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly	514	values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
288	if you want.	515	if you want.
289		516
290	=item other blessed objects	517	=item other blessed objects
291		518
292	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See	519	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
293	"OBJECT SERIALISATION", below, for details.	520	L<TAG HANDLING AND EXTENSIONS> for specific classes handled by this
		521	module, and L<OBJECT SERIALISATION> for generic object serialisation.
294		522
295	=item simple scalars	523	=item simple scalars
296		524
297	TODO
298	Simple Perl scalars (any scalar that is not a reference) are the most	525	Simple Perl scalars (any scalar that is not a reference) are the most
299	difficult objects to encode: CBOR::XS will encode undefined scalars as	526	difficult objects to encode: CBOR::XS will encode undefined scalars as
300	CBOR null values, scalars that have last been used in a string context	527	CBOR null values, scalars that have last been used in a string context
301	before encoding as CBOR strings, and anything else as number value:	528	before encoding as CBOR strings, and anything else as number value:
302		529
303	# dump as number	530	# dump as number
304	encode_cbor [2] # yields [2]	531	encode_cbor [2] # yields [2]
305	encode_cbor [-3.0e17] # yields [-3e+17]	532	encode_cbor [-3.0e17] # yields [-3e+17]
306	my $value = 5; encode_cbor [$value] # yields [5]	533	my $value = 5; encode_cbor [$value] # yields [5]
307		534
308	# used as string, so dump as string	535	# used as string, so dump as string (either byte or text)
309	print $value;	536	print $value;
310	encode_cbor [$value] # yields ["5"]	537	encode_cbor [$value] # yields ["5"]
311		538
312	# undef becomes null	539	# undef becomes null
313	encode_cbor [undef] # yields [null]	540	encode_cbor [undef] # yields [null]
…		…
316		543
317	my $x = 3.1; # some variable containing a number	544	my $x = 3.1; # some variable containing a number
318	"$x"; # stringified	545	"$x"; # stringified
319	$x .= ""; # another, more awkward way to stringify	546	$x .= ""; # another, more awkward way to stringify
320	print $x; # perl does it for you, too, quite often	547	print $x; # perl does it for you, too, quite often
		548
		549	You can force whether a string ie encoded as byte or text string by using
		550	C<utf8::upgrade> and C<utf8::downgrade>):
		551
		552	utf8::upgrade $x; # encode $x as text string
		553	utf8::downgrade $x; # encode $x as byte string
		554
		555	Perl doesn't define what operations up- and downgrade strings, so if the
		556	difference between byte and text is important, you should up- or downgrade
		557	your string as late as possible before encoding.
321		558
322	You can force the type to be a CBOR number by numifying it:	559	You can force the type to be a CBOR number by numifying it:
323		560
324	my $x = "3"; # some variable containing a string	561	my $x = "3"; # some variable containing a string
325	$x += 0; # numify it, ensuring it will be dumped as a number	562	$x += 0; # numify it, ensuring it will be dumped as a number
…		…
338		575
339	=back	576	=back
340		577
341	=head2 OBJECT SERIALISATION	578	=head2 OBJECT SERIALISATION
342		579
		580	This module implements both a CBOR-specific and the generic
		581	L<Types::Serialier> object serialisation protocol. The following
		582	subsections explain both methods.
		583
		584	=head3 ENCODING
		585
343	This module knows two way to serialise a Perl object: The CBOR-specific	586	This module knows two way to serialise a Perl object: The CBOR-specific
344	way, and the generic way.	587	way, and the generic way.
345		588
346	Whenever the encoder encounters a Perl object that it cnanot serialise	589	Whenever the encoder encounters a Perl object that it cannot serialise
347	directly (most of them), it will first look up the C<TO_CBOR> method on	590	directly (most of them), it will first look up the C<TO_CBOR> method on
348	it.	591	it.
349		592
350	If it has a C<TO_CBOR> method, it will call it with the object as only	593	If it has a C<TO_CBOR> method, it will call it with the object as only
351	argument, and expects exactly one return value, which it will then	594	argument, and expects exactly one return value, which it will then
…		…
357		600
358	The C<FREEZE> method can return any number of values (i.e. zero or	601	The C<FREEZE> method can return any number of values (i.e. zero or
359	more). These will be encoded as CBOR perl object, together with the	602	more). These will be encoded as CBOR perl object, together with the
360	classname.	603	classname.
361		604
		605	These methods I<MUST NOT> change the data structure that is being
		606	serialised. Failure to comply to this can result in memory corruption -
		607	and worse.
		608
362	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail	609	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
363	with an error.	610	with an error.
364		611
		612	=head3 DECODING
		613
365	Objects encoded via C<TO_CBOR> cannot be automatically decoded, but	614	Objects encoded via C<TO_CBOR> cannot (normally) be automatically decoded,
366	objects encoded via C<FREEZE> can be decoded using the following protocol:	615	but objects encoded via C<FREEZE> can be decoded using the following
		616	protocol:
367		617
368	When an encoded CBOR perl object is encountered by the decoder, it will	618	When an encoded CBOR perl object is encountered by the decoder, it will
369	look up the C<THAW> method, by using the stored classname, and will fail	619	look up the C<THAW> method, by using the stored classname, and will fail
370	if the method cannot be found.	620	if the method cannot be found.
371		621
372	After the lookup it will call the C<THAW> method with the stored classname	622	After the lookup it will call the C<THAW> method with the stored classname
373	as first argument, the constant string C<CBOR> as second argument, and all	623	as first argument, the constant string C<CBOR> as second argument, and all
374	values returned by C<FREEZE> as remaining arguments.	624	values returned by C<FREEZE> as remaining arguments.
375		625
376	=head4 EXAMPLES	626	=head3 EXAMPLES
377		627
378	Here is an example C<TO_CBOR> method:	628	Here is an example C<TO_CBOR> method:
379		629
380	sub My::Object::TO_CBOR {	630	sub My::Object::TO_CBOR {
381	my ($obj) = @_;	631	my ($obj) = @_;
…		…
392		642
393	sub URI::TO_CBOR {	643	sub URI::TO_CBOR {
394	my ($self) = @_;	644	my ($self) = @_;
395	my $uri = "$self"; # stringify uri	645	my $uri = "$self"; # stringify uri
396	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string	646	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
397	CBOR::XS::tagged 32, "$_[0]"	647	CBOR::XS::tag 32, "$_[0]"
398	}	648	}
399		649
400	This will encode URIs as a UTF-8 string with tag 32, which indicates an	650	This will encode URIs as a UTF-8 string with tag 32, which indicates an
401	URI.	651	URI.
402		652
…		…
439	=head1 MAGIC HEADER	689	=head1 MAGIC HEADER
440		690
441	There is no way to distinguish CBOR from other formats	691	There is no way to distinguish CBOR from other formats
442	programmatically. To make it easier to distinguish CBOR from other	692	programmatically. To make it easier to distinguish CBOR from other
443	formats, the CBOR specification has a special "magic string" that can be	693	formats, the CBOR specification has a special "magic string" that can be
444	prepended to any CBOR string without changing it's meaning.	694	prepended to any CBOR string without changing its meaning.
445		695
446	This string is available as C<$CBOR::XS::MAGIC>. This module does not	696	This string is available as C<$CBOR::XS::MAGIC>. This module does not
447	prepend this string tot he CBOR data it generates, but it will ignroe it	697	prepend this string to the CBOR data it generates, but it will ignore it
448	if present, so users can prepend this string as a "file type" indicator as	698	if present, so users can prepend this string as a "file type" indicator as
449	required.	699	required.
		700
		701
		702	=head1 THE CBOR::XS::Tagged CLASS
		703
		704	CBOR has the concept of tagged values - any CBOR value can be tagged with
		705	a numeric 64 bit number, which are centrally administered.
		706
		707	C<CBOR::XS> handles a few tags internally when en- or decoding. You can
		708	also create tags yourself by encoding C<CBOR::XS::Tagged> objects, and the
		709	decoder will create C<CBOR::XS::Tagged> objects itself when it hits an
		710	unknown tag.
		711
		712	These objects are simply blessed array references - the first member of
		713	the array being the numerical tag, the second being the value.
		714
		715	You can interact with C<CBOR::XS::Tagged> objects in the following ways:
		716
		717	=over 4
		718
		719	=item $tagged = CBOR::XS::tag $tag, $value
		720
		721	This function(!) creates a new C<CBOR::XS::Tagged> object using the given
		722	C<$tag> (0..2**64-1) to tag the given C<$value> (which can be any Perl
		723	value that can be encoded in CBOR, including serialisable Perl objects and
		724	C<CBOR::XS::Tagged> objects).
		725
		726	=item $tagged->[0]
		727
		728	=item $tagged->[0] = $new_tag
		729
		730	=item $tag = $tagged->tag
		731
		732	=item $new_tag = $tagged->tag ($new_tag)
		733
		734	Access/mutate the tag.
		735
		736	=item $tagged->[1]
		737
		738	=item $tagged->[1] = $new_value
		739
		740	=item $value = $tagged->value
		741
		742	=item $new_value = $tagged->value ($new_value)
		743
		744	Access/mutate the tagged value.
		745
		746	=back
		747
		748	=cut
		749
		750	sub tag($$) {
		751	bless [@_], CBOR::XS::Tagged::;
		752	}
		753
		754	sub CBOR::XS::Tagged::tag {
		755	$_[0][0] = $_[1] if $#_;
		756	$_[0][0]
		757	}
		758
		759	sub CBOR::XS::Tagged::value {
		760	$_[0][1] = $_[1] if $#_;
		761	$_[0][1]
		762	}
		763
		764	=head2 EXAMPLES
		765
		766	Here are some examples of C<CBOR::XS::Tagged> uses to tag objects.
		767
		768	You can look up CBOR tag value and emanings in the IANA registry at
		769	L<http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
		770
		771	Prepend a magic header (C<$CBOR::XS::MAGIC>):
		772
		773	my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
		774	# same as:
		775	my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
		776
		777	Serialise some URIs and a regex in an array:
		778
		779	my $cbor = encode_cbor [
		780	(CBOR::XS::tag 32, "http://www.nethype.de/"),
		781	(CBOR::XS::tag 32, "http://software.schmorp.de/"),
		782	(CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
		783	];
		784
		785	Wrap CBOR data in CBOR:
		786
		787	my $cbor_cbor = encode_cbor
		788	CBOR::XS::tag 24,
		789	encode_cbor [1, 2, 3];
		790
		791	=head1 TAG HANDLING AND EXTENSIONS
		792
		793	This section describes how this module handles specific tagged values
		794	and extensions. If a tag is not mentioned here and no additional filters
		795	are provided for it, then the default handling applies (creating a
		796	CBOR::XS::Tagged object on decoding, and only encoding the tag when
		797	explicitly requested).
		798
		799	Tags not handled specifically are currently converted into a
		800	L<CBOR::XS::Tagged> object, which is simply a blessed array reference
		801	consisting of the numeric tag value followed by the (decoded) CBOR value.
		802
		803	Future versions of this module reserve the right to special case
		804	additional tags (such as base64url).
		805
		806	=head2 ENFORCED TAGS
		807
		808	These tags are always handled when decoding, and their handling cannot be
		809	overriden by the user.
		810
		811	=over 4
		812
		813	=item 26 (perl-object, L<http://cbor.schmorp.de/perl-object>)
		814
		815	These tags are automatically created (and decoded) for serialisable
		816	objects using the C<FREEZE/THAW> methods (the L<Types::Serialier> object
		817	serialisation protocol). See L<OBJECT SERIALISATION> for details.
		818
		819	=item 28, 29 (shareable, sharedref, L<http://cbor.schmorp.de/value-sharing>)
		820
		821	These tags are automatically decoded when encountered (and they do not
		822	result in a cyclic data structure, see C<allow_cycles>), resulting in
		823	shared values in the decoded object. They are only encoded, however, when
		824	C<allow_sharing> is enabled.
		825
		826	Not all shared values can be successfully decoded: values that reference
		827	themselves will I<currently> decode as C<undef> (this is not the same
		828	as a reference pointing to itself, which will be represented as a value
		829	that contains an indirect reference to itself - these will be decoded
		830	properly).
		831
		832	Note that considerably more shared value data structures can be decoded
		833	than will be encoded - currently, only values pointed to by references
		834	will be shared, others will not. While non-reference shared values can be
		835	generated in Perl with some effort, they were considered too unimportant
		836	to be supported in the encoder. The decoder, however, will decode these
		837	values as shared values.
		838
		839	=item 256, 25 (stringref-namespace, stringref, L<http://cbor.schmorp.de/stringref>)
		840
		841	These tags are automatically decoded when encountered. They are only
		842	encoded, however, when C<pack_strings> is enabled.
		843
		844	=item 22098 (indirection, L<http://cbor.schmorp.de/indirection>)
		845
		846	This tag is automatically generated when a reference are encountered (with
		847	the exception of hash and array refernces). It is converted to a reference
		848	when decoding.
		849
		850	=item 55799 (self-describe CBOR, RFC 7049)
		851
		852	This value is not generated on encoding (unless explicitly requested by
		853	the user), and is simply ignored when decoding.
		854
		855	=back
		856
		857	=head2 NON-ENFORCED TAGS
		858
		859	These tags have default filters provided when decoding. Their handling can
		860	be overriden by changing the C<%CBOR::XS::FILTER> entry for the tag, or by
		861	providing a custom C<filter> callback when decoding.
		862
		863	When they result in decoding into a specific Perl class, the module
		864	usually provides a corresponding C<TO_CBOR> method as well.
		865
		866	When any of these need to load additional modules that are not part of the
		867	perl core distribution (e.g. L<URI>), it is (currently) up to the user to
		868	provide these modules. The decoding usually fails with an exception if the
		869	required module cannot be loaded.
		870
		871	=over 4
		872
		873	=item 0, 1 (date/time string, seconds since the epoch)
		874
		875	These tags are decoded into L<Time::Piece> objects. The corresponding
		876	C<Time::Piece::TO_CBOR> method always encodes into tag 1 values currently.
		877
		878	The L<Time::Piece> API is generally surprisingly bad, and fractional
		879	seconds are only accidentally kept intact, so watch out. On the plus side,
		880	the module comes with perl since 5.10, which has to count for something.
		881
		882	=item 2, 3 (positive/negative bignum)
		883
		884	These tags are decoded into L<Math::BigInt> objects. The corresponding
		885	C<Math::BigInt::TO_CBOR> method encodes "small" bigints into normal CBOR
		886	integers, and others into positive/negative CBOR bignums.
		887
		888	=item 4, 5 (decimal fraction/bigfloat)
		889
		890	Both decimal fractions and bigfloats are decoded into L<Math::BigFloat>
		891	objects. The corresponding C<Math::BigFloat::TO_CBOR> method I<always>
		892	encodes into a decimal fraction.
		893
		894	CBOR cannot represent bigfloats with I<very> large exponents - conversion
		895	of such big float objects is undefined.
		896
		897	Also, NaN and infinities are not encoded properly.
		898
		899	=item 21, 22, 23 (expected later JSON conversion)
		900
		901	CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore these
		902	tags.
		903
		904	=item 32 (URI)
		905
		906	These objects decode into L<URI> objects. The corresponding
		907	C<URI::TO_CBOR> method again results in a CBOR URI value.
		908
		909	=back
		910
		911	=cut
		912
		913	our %FILTER = (
		914	# 0 # rfc4287 datetime, utf-8
		915	# 1 # unix timestamp, any
		916
		917	2 => sub { # pos bigint
		918	require Math::BigInt;
		919	Math::BigInt->new ("0x" . unpack "H*", pop)
		920	},
		921
		922	3 => sub { # neg bigint
		923	require Math::BigInt;
		924	-Math::BigInt->new ("0x" . unpack "H*", pop)
		925	},
		926
		927	4 => sub { # decimal fraction, array
		928	require Math::BigFloat;
		929	Math::BigFloat->new ($_[1][1] . "E" . $_[1][0])
		930	},
		931
		932	5 => sub { # bigfloat, array
		933	require Math::BigFloat;
		934	scalar Math::BigFloat->new ($_[1][1])->blsft ($_[1][0], 2)
		935	},
		936
		937	21 => sub { pop }, # expected conversion to base64url encoding
		938	22 => sub { pop }, # expected conversion to base64 encoding
		939	23 => sub { pop }, # expected conversion to base16 encoding
		940
		941	# 24 # embedded cbor, byte string
		942
		943	32 => sub {
		944	require URI;
		945	URI->new (pop)
		946	},
		947
		948	# 33 # base64url rfc4648, utf-8
		949	# 34 # base64 rfc46484, utf-8
		950	# 35 # regex pcre/ecma262, utf-8
		951	# 36 # mime message rfc2045, utf-8
		952	);
450		953
451		954
452	=head1 CBOR and JSON	955	=head1 CBOR and JSON
453		956
454	CBOR is supposed to implement a superset of the JSON data model, and is,	957	CBOR is supposed to implement a superset of the JSON data model, and is,
…		…
516	properly. Half precision types are accepted, but not encoded.	1019	properly. Half precision types are accepted, but not encoded.
517		1020
518	Strict mode and canonical mode are not implemented.	1021	Strict mode and canonical mode are not implemented.
519		1022
520		1023
		1024	=head1 LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
		1025
		1026	On perls that were built without 64 bit integer support (these are rare
		1027	nowadays, even on 32 bit architectures, as all major Perl distributions
		1028	are built with 64 bit integer support), support for any kind of 64 bit
		1029	integer in CBOR is very limited - most likely, these 64 bit values will
		1030	be truncated, corrupted, or otherwise not decoded correctly. This also
		1031	includes string, array and map sizes that are stored as 64 bit integers.
		1032
		1033
521	=head1 THREADS	1034	=head1 THREADS
522		1035
523	This module is I<not> guaranteed to be thread safe and there are no	1036	This module is I<not> guaranteed to be thread safe and there are no
524	plans to change this until Perl gets thread support (as opposed to the	1037	plans to change this until Perl gets thread support (as opposed to the
525	horribly slow so-called "threads" which are simply slow and bloated	1038	horribly slow so-called "threads" which are simply slow and bloated
…		…
537	Please refrain from using rt.cpan.org or any other bug reporting	1050	Please refrain from using rt.cpan.org or any other bug reporting
538	service. I put the contact address into my modules for a reason.	1051	service. I put the contact address into my modules for a reason.
539		1052
540	=cut	1053	=cut
541		1054
		1055	our %FILTER = (
		1056	0 => sub { # rfc4287 datetime, utf-8
		1057	require Time::Piece;
		1058	# Time::Piece::Strptime uses the "incredibly flexible date parsing routine"
		1059	# from FreeBSD, which can't parse ISO 8601, RFC3339, RFC4287 or much of anything
		1060	# else either. Whats incredibe over standard strptime totally escapes me.
		1061	# doesn't do fractional times, either. sigh.
		1062	# In fact, it's all a lie, it uses whatever strptime it wants, and of course,
		1063	# they are all incompatible. The openbsd one simply ignores %z (but according to the
		1064	# docs, it would be much more incredibly flexible indeed. If it worked, that is.).
		1065	scalar eval {
		1066	my $s = $_[1];
		1067
		1068	$s =~ s/Z$/+00:00/;
		1069	$s =~ s/(\.[0-9]+)?([+-][0-9][0-9]):([0-9][0-9])$//
		1070	or die;
		1071
		1072	my $b = $1 - ($2 * 60 + $3) * 60; # fractional part + offset. hopefully
		1073	my $d = Time::Piece->strptime ($s, "%Y-%m-%dT%H:%M:%S");
		1074
		1075	Time::Piece::gmtime ($d->epoch + $b)
		1076	} \|\| die "corrupted CBOR date/time string ($_[0])";
		1077	},
		1078
		1079	1 => sub { # seconds since the epoch, possibly fractional
		1080	require Time::Piece;
		1081	scalar Time::Piece::gmtime (pop)
		1082	},
		1083
		1084	2 => sub { # pos bigint
		1085	require Math::BigInt;
		1086	Math::BigInt->new ("0x" . unpack "H*", pop)
		1087	},
		1088
		1089	3 => sub { # neg bigint
		1090	require Math::BigInt;
		1091	-Math::BigInt->new ("0x" . unpack "H*", pop)
		1092	},
		1093
		1094	4 => sub { # decimal fraction, array
		1095	require Math::BigFloat;
		1096	Math::BigFloat->new ($_[1][1] . "E" . $_[1][0])
		1097	},
		1098
		1099	5 => sub { # bigfloat, array
		1100	require Math::BigFloat;
		1101	scalar Math::BigFloat->new ($_[1][1])->blsft ($_[1][0], 2)
		1102	},
		1103
		1104	21 => sub { pop }, # expected conversion to base64url encoding
		1105	22 => sub { pop }, # expected conversion to base64 encoding
		1106	23 => sub { pop }, # expected conversion to base16 encoding
		1107
		1108	# 24 # embedded cbor, byte string
		1109
		1110	32 => sub {
		1111	require URI;
		1112	URI->new (pop)
		1113	},
		1114
		1115	# 33 # base64url rfc4648, utf-8
		1116	# 34 # base64 rfc46484, utf-8
		1117	# 35 # regex pcre/ecma262, utf-8
		1118	# 36 # mime message rfc2045, utf-8
		1119	);
		1120
		1121	sub CBOR::XS::default_filter {
		1122	&{ $FILTER{$_[0]} or return }
		1123	}
		1124
		1125	sub URI::TO_CBOR {
		1126	my $uri = $_[0]->as_string;
		1127	utf8::upgrade $uri;
		1128	tag 32, $uri
		1129	}
		1130
		1131	sub Math::BigInt::TO_CBOR {
		1132	if ($_[0] >= -2147483648 && $_[0] <= 2147483647) {
		1133	$_[0]->numify
		1134	} else {
		1135	my $hex = substr $_[0]->as_hex, 2;
		1136	$hex = "0$hex" if 1 & length $hex; # sigh
		1137	tag $_[0] >= 0 ? 2 : 3, pack "H*", $hex
		1138	}
		1139	}
		1140
		1141	sub Math::BigFloat::TO_CBOR {
		1142	my ($m, $e) = $_[0]->parts;
		1143	tag 4, [$e->numify, $m]
		1144	}
		1145
		1146	sub Time::Piece::TO_CBOR {
		1147	tag 1, 0 + $_[0]->epoch
		1148	}
		1149
542	XSLoader::load "CBOR::XS", $VERSION;	1150	XSLoader::load "CBOR::XS", $VERSION;
543		1151
544	=head1 SEE ALSO	1152	=head1 SEE ALSO
545		1153
546	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,	1154	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing CBOR-XS/XS.pm (file contents): Revision 1.7 by root, Sun Oct 27 22:35:15 2013 UTC vs. Revision 1.48 by root, Thu Feb 25 14:22:49 2016 UTC

Diff Legend

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.7 by root, Sun Oct 27 22:35:15 2013 UTC vs.
Revision 1.48 by root, Thu Feb 25 14:22:49 2016 UTC