[ViewVC] Diff of: cvs/CBOR-XS/XS.pm

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.17 by root, Wed Oct 30 10:11:04 2013 UTC vs.
Revision 1.82 by root, Sat May 8 07:08:12 2021 UTC

…		…
26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string	26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27	}	27	}
28		28
29	=head1 DESCRIPTION	29	=head1 DESCRIPTION
30		30
31	WARNING! This module is very new, and not very well tested (that's up to
32	you to do). Furthermore, details of the implementation might change freely
33	before version 1.0. And lastly, the object serialisation protocol depends
34	on a pending IANA assignment, and until that assignment is official, this
35	implementation is not interoperable with other implementations (even
36	future versions of this module) until the assignment is done.
37
38	You are still invited to try out CBOR, and this module.
39
40	This module converts Perl data structures to the Concise Binary Object	31	This module converts Perl data structures to the Concise Binary Object
41	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation	32	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42	format that aims to use a superset of the JSON data model, i.e. when you	33	format that aims to use an (almost) superset of the JSON data model, i.e.
43	can represent something in JSON, you should be able to represent it in	34	when you can represent something useful in JSON, you should be able to
44	CBOR.	35	represent it in CBOR.
45		36
46	In short, CBOR is a faster and very compact binary alternative to JSON,	37	In short, CBOR is a faster and quite compact binary alternative to JSON,
47	with the added ability of supporting serialisation of Perl objects. (JSON	38	with the added ability of supporting serialisation of Perl objects. (JSON
48	often compresses better than CBOR though, so if you plan to compress the	39	often compresses better than CBOR though, so if you plan to compress the
49	data later you might want to compare both formats first).	40	data later and speed is less important you might want to compare both
		41	formats first).
		42
		43	The primary goal of this module is to be I<correct> and the secondary goal
		44	is to be I<fast>. To reach the latter goal it was written in C.
50		45
51	To give you a general idea about speed, with texts in the megabyte range,	46	To give you a general idea about speed, with texts in the megabyte range,
52	C<CBOR::XS> usually encodes roughly twice as fast as L<Storable> or	47	C<CBOR::XS> usually encodes roughly twice as fast as L<Storable> or
53	L<JSON::XS> and decodes about 15%-30% faster than those. The shorter the	48	L<JSON::XS> and decodes about 15%-30% faster than those. The shorter the
54	data, the worse L<Storable> performs in comparison.	49	data, the worse L<Storable> performs in comparison.
55		50
56	As for compactness, C<CBOR::XS> encoded data structures are usually about	51	Regarding compactness, C<CBOR::XS>-encoded data structures are usually
57	20% smaller than the same data encoded as (compact) JSON or L<Storable>.	52	about 20% smaller than the same data encoded as (compact) JSON or
		53	L<Storable>.
58		54
59	The primary goal of this module is to be I<correct> and the secondary goal	55	In addition to the core CBOR data format, this module implements a
60	is to be I<fast>. To reach the latter goal it was written in C.	56	number of extensions, to support cyclic and shared data structures
		57	(see C<allow_sharing> and C<allow_cycles>), string deduplication (see
		58	C<pack_strings>) and scalar references (always enabled).
61		59
62	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and	60	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
63	vice versa.	61	vice versa.
64		62
65	=cut	63	=cut
66		64
67	package CBOR::XS;	65	package CBOR::XS;
68		66
69	use common::sense;	67	use common::sense;
70		68
71	our $VERSION = 0.08;	69	our $VERSION = 1.83;
72	our @ISA = qw(Exporter);	70	our @ISA = qw(Exporter);
73		71
74	our @EXPORT = qw(encode_cbor decode_cbor);	72	our @EXPORT = qw(encode_cbor decode_cbor);
75		73
76	use Exporter;	74	use Exporter;
…		…
113	strings. All boolean flags described below are by default I<disabled>.	111	strings. All boolean flags described below are by default I<disabled>.
114		112
115	The mutators for flags all return the CBOR object again and thus calls can	113	The mutators for flags all return the CBOR object again and thus calls can
116	be chained:	114	be chained:
117		115
118	#TODO
119	my $cbor = CBOR::XS->new->encode ({a => [1,2]});	116	my $cbor = CBOR::XS->new->encode ({a => [1,2]});
		117
		118	=item $cbor = new_safe CBOR::XS
		119
		120	Create a new, safe/secure CBOR::XS object. This is similar to C<new>,
		121	but configures the coder object to be safe to use with untrusted
		122	data. Currently, this is equivalent to:
		123
		124	my $cbor = CBOR::XS
		125	->new
		126	->validate_utf8
		127	->forbid_objects
		128	->filter (\&CBOR::XS::safe_filter)
		129	->max_size (1e8);
		130
		131	But is more future proof (it is better to crash because of a change than
		132	to be exploited in other ways).
		133
		134	=cut
		135
		136	sub new_safe {
		137	CBOR::XS
		138	->new
		139	->validate_utf8
		140	->forbid_objects
		141	->filter (\&CBOR::XS::safe_filter)
		142	->max_size (1e8)
		143	}
120		144
121	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])	145	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
122		146
123	=item $max_depth = $cbor->get_max_depth	147	=item $max_depth = $cbor->get_max_depth
124		148
…		…
140		164
141	Note that nesting is implemented by recursion in C. The default value has	165	Note that nesting is implemented by recursion in C. The default value has
142	been chosen to be as large as typical operating systems allow without	166	been chosen to be as large as typical operating systems allow without
143	crashing.	167	crashing.
144		168
145	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.	169	See L<SECURITY CONSIDERATIONS>, below, for more info on why this is useful.
146		170
147	=item $cbor = $cbor->max_size ([$maximum_string_size])	171	=item $cbor = $cbor->max_size ([$maximum_string_size])
148		172
149	=item $max_size = $cbor->get_max_size	173	=item $max_size = $cbor->get_max_size
150		174
…		…
155	effect on C<encode> (yet).	179	effect on C<encode> (yet).
156		180
157	If no argument is given, the limit check will be deactivated (same as when	181	If no argument is given, the limit check will be deactivated (same as when
158	C<0> is specified).	182	C<0> is specified).
159		183
160	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.	184	See L<SECURITY CONSIDERATIONS>, below, for more info on why this is useful.
		185
		186	=item $cbor = $cbor->allow_unknown ([$enable])
		187
		188	=item $enabled = $cbor->get_allow_unknown
		189
		190	If C<$enable> is true (or missing), then C<encode> will I<not> throw an
		191	exception when it encounters values it cannot represent in CBOR (for
		192	example, filehandles) but instead will encode a CBOR C<error> value.
		193
		194	If C<$enable> is false (the default), then C<encode> will throw an
		195	exception when it encounters anything it cannot encode as CBOR.
		196
		197	This option does not affect C<decode> in any way, and it is recommended to
		198	leave it off unless you know your communications partner.
		199
		200	=item $cbor = $cbor->allow_sharing ([$enable])
		201
		202	=item $enabled = $cbor->get_allow_sharing
		203
		204	If C<$enable> is true (or missing), then C<encode> will not double-encode
		205	values that have been referenced before (e.g. when the same object, such
		206	as an array, is referenced multiple times), but instead will emit a
		207	reference to the earlier value.
		208
		209	This means that such values will only be encoded once, and will not result
		210	in a deep cloning of the value on decode, in decoders supporting the value
		211	sharing extension. This also makes it possible to encode cyclic data
		212	structures (which need C<allow_cycles> to be enabled to be decoded by this
		213	module).
		214
		215	It is recommended to leave it off unless you know your
		216	communication partner supports the value sharing extensions to CBOR
		217	(L<http://cbor.schmorp.de/value-sharing>), as without decoder support, the
		218	resulting data structure might be unusable.
		219
		220	Detecting shared values incurs a runtime overhead when values are encoded
		221	that have a reference counter large than one, and might unnecessarily
		222	increase the encoded size, as potentially shared values are encoded as
		223	shareable whether or not they are actually shared.
		224
		225	At the moment, only targets of references can be shared (e.g. scalars,
		226	arrays or hashes pointed to by a reference). Weirder constructs, such as
		227	an array with multiple "copies" of the I<same> string, which are hard but
		228	not impossible to create in Perl, are not supported (this is the same as
		229	with L<Storable>).
		230
		231	If C<$enable> is false (the default), then C<encode> will encode shared
		232	data structures repeatedly, unsharing them in the process. Cyclic data
		233	structures cannot be encoded in this mode.
		234
		235	This option does not affect C<decode> in any way - shared values and
		236	references will always be decoded properly if present.
		237
		238	=item $cbor = $cbor->allow_cycles ([$enable])
		239
		240	=item $enabled = $cbor->get_allow_cycles
		241
		242	If C<$enable> is true (or missing), then C<decode> will happily decode
		243	self-referential (cyclic) data structures. By default these will not be
		244	decoded, as they need manual cleanup to avoid memory leaks, so code that
		245	isn't prepared for this will not leak memory.
		246
		247	If C<$enable> is false (the default), then C<decode> will throw an error
		248	when it encounters a self-referential/cyclic data structure.
		249
		250	FUTURE DIRECTION: the motivation behind this option is to avoid I<real>
		251	cycles - future versions of this module might chose to decode cyclic data
		252	structures using weak references when this option is off, instead of
		253	throwing an error.
		254
		255	This option does not affect C<encode> in any way - shared values and
		256	references will always be encoded properly if present.
		257
		258	=item $cbor = $cbor->forbid_objects ([$enable])
		259
		260	=item $enabled = $cbor->get_forbid_objects
		261
		262	Disables the use of the object serialiser protocol.
		263
		264	If C<$enable> is true (or missing), then C<encode> will will throw an
		265	exception when it encounters perl objects that would be encoded using the
		266	perl-object tag (26). When C<decode> encounters such tags, it will fall
		267	back to the general filter/tagged logic as if this were an unknown tag (by
		268	default resulting in a C<CBOR::XC::Tagged> object).
		269
		270	If C<$enable> is false (the default), then C<encode> will use the
		271	L<Types::Serialiser> object serialisation protocol to serialise objects
		272	into perl-object tags, and C<decode> will do the same to decode such tags.
		273
		274	See L<SECURITY CONSIDERATIONS>, below, for more info on why forbidding this
		275	protocol can be useful.
		276
		277	=item $cbor = $cbor->pack_strings ([$enable])
		278
		279	=item $enabled = $cbor->get_pack_strings
		280
		281	If C<$enable> is true (or missing), then C<encode> will try not to encode
		282	the same string twice, but will instead encode a reference to the string
		283	instead. Depending on your data format, this can save a lot of space, but
		284	also results in a very large runtime overhead (expect encoding times to be
		285	2-4 times as high as without).
		286
		287	It is recommended to leave it off unless you know your
		288	communications partner supports the stringref extension to CBOR
		289	(L<http://cbor.schmorp.de/stringref>), as without decoder support, the
		290	resulting data structure might not be usable.
		291
		292	If C<$enable> is false (the default), then C<encode> will encode strings
		293	the standard CBOR way.
		294
		295	This option does not affect C<decode> in any way - string references will
		296	always be decoded properly if present.
		297
		298	=item $cbor = $cbor->text_keys ([$enable])
		299
		300	=item $enabled = $cbor->get_text_keys
		301
		302	If C<$enabled> is true (or missing), then C<encode> will encode all
		303	perl hash keys as CBOR text strings/UTF-8 string, upgrading them as needed.
		304
		305	If C<$enable> is false (the default), then C<encode> will encode hash keys
		306	normally - upgraded perl strings (strings internally encoded as UTF-8) as
		307	CBOR text strings, and downgraded perl strings as CBOR byte strings.
		308
		309	This option does not affect C<decode> in any way.
		310
		311	This option is useful for interoperability with CBOR decoders that don't
		312	treat byte strings as a form of text. It is especially useful as Perl
		313	gives very little control over hash keys.
		314
		315	Enabling this option can be slow, as all downgraded hash keys that are
		316	encoded need to be scanned and converted to UTF-8.
		317
		318	=item $cbor = $cbor->text_strings ([$enable])
		319
		320	=item $enabled = $cbor->get_text_strings
		321
		322	This option works similar to C<text_keys>, above, but works on all strings
		323	(including hash keys), so C<text_keys> has no further effect after
		324	enabling C<text_strings>.
		325
		326	If C<$enabled> is true (or missing), then C<encode> will encode all perl
		327	strings as CBOR text strings/UTF-8 strings, upgrading them as needed.
		328
		329	If C<$enable> is false (the default), then C<encode> will encode strings
		330	normally (but see C<text_keys>) - upgraded perl strings (strings
		331	internally encoded as UTF-8) as CBOR text strings, and downgraded perl
		332	strings as CBOR byte strings.
		333
		334	This option does not affect C<decode> in any way.
		335
		336	This option has similar advantages and disadvantages as C<text_keys>. In
		337	addition, this option effectively removes the ability to automatically
		338	encode byte strings, which might break some C<FREEZE> and C<TO_CBOR>
		339	methods that rely on this.
		340
		341	A workaround is to use explicit type casts, which are unaffected by this option.
		342
		343	=item $cbor = $cbor->validate_utf8 ([$enable])
		344
		345	=item $enabled = $cbor->get_validate_utf8
		346
		347	If C<$enable> is true (or missing), then C<decode> will validate that
		348	elements (text strings) containing UTF-8 data in fact contain valid UTF-8
		349	data (instead of blindly accepting it). This validation obviously takes
		350	extra time during decoding.
		351
		352	The concept of "valid UTF-8" used is perl's concept, which is a superset
		353	of the official UTF-8.
		354
		355	If C<$enable> is false (the default), then C<decode> will blindly accept
		356	UTF-8 data, marking them as valid UTF-8 in the resulting data structure
		357	regardless of whether that's true or not.
		358
		359	Perl isn't too happy about corrupted UTF-8 in strings, but should
		360	generally not crash or do similarly evil things. Extensions might be not
		361	so forgiving, so it's recommended to turn on this setting if you receive
		362	untrusted CBOR.
		363
		364	This option does not affect C<encode> in any way - strings that are
		365	supposedly valid UTF-8 will simply be dumped into the resulting CBOR
		366	string without checking whether that is, in fact, true or not.
		367
		368	=item $cbor = $cbor->filter ([$cb->($tag, $value)])
		369
		370	=item $cb_or_undef = $cbor->get_filter
		371
		372	Sets or replaces the tagged value decoding filter (when C<$cb> is
		373	specified) or clears the filter (if no argument or C<undef> is provided).
		374
		375	The filter callback is called only during decoding, when a non-enforced
		376	tagged value has been decoded (see L<TAG HANDLING AND EXTENSIONS> for a
		377	list of enforced tags). For specific tags, it's often better to provide a
		378	default converter using the C<%CBOR::XS::FILTER> hash (see below).
		379
		380	The first argument is the numerical tag, the second is the (decoded) value
		381	that has been tagged.
		382
		383	The filter function should return either exactly one value, which will
		384	replace the tagged value in the decoded data structure, or no values,
		385	which will result in default handling, which currently means the decoder
		386	creates a C<CBOR::XS::Tagged> object to hold the tag and the value.
		387
		388	When the filter is cleared (the default state), the default filter
		389	function, C<CBOR::XS::default_filter>, is used. This function simply
		390	looks up the tag in the C<%CBOR::XS::FILTER> hash. If an entry exists
		391	it must be a code reference that is called with tag and value, and is
		392	responsible for decoding the value. If no entry exists, it returns no
		393	values. C<CBOR::XS> provides a number of default filter functions already,
		394	the the C<%CBOR::XS::FILTER> hash can be freely extended with more.
		395
		396	C<CBOR::XS> additionally provides an alternative filter function that is
		397	supposed to be safe to use with untrusted data (which the default filter
		398	might not), called C<CBOR::XS::safe_filter>, which works the same as
		399	the C<default_filter> but uses the C<%CBOR::XS::SAFE_FILTER> variable
		400	instead. It is prepopulated with the tag decoding functions that are
		401	deemed safe (basically the same as C<%CBOR::XS::FILTER> without all
		402	the bignum tags), and can be extended by user code as wlel, although,
		403	obviously, one should be very careful about adding decoding functions
		404	here, since the expectation is that they are safe to use on untrusted
		405	data, after all.
		406
		407	Example: decode all tags not handled internally into C<CBOR::XS::Tagged>
		408	objects, with no other special handling (useful when working with
		409	potentially "unsafe" CBOR data).
		410
		411	CBOR::XS->new->filter (sub { })->decode ($cbor_data);
		412
		413	Example: provide a global filter for tag 1347375694, converting the value
		414	into some string form.
		415
		416	$CBOR::XS::FILTER{1347375694} = sub {
		417	my ($tag, $value);
		418
		419	"tag 1347375694 value $value"
		420	};
		421
		422	Example: provide your own filter function that looks up tags in your own
		423	hash:
		424
		425	my %my_filter = (
		426	998347484 => sub {
		427	my ($tag, $value);
		428
		429	"tag 998347484 value $value"
		430	};
		431	);
		432
		433	my $coder = CBOR::XS->new->filter (sub {
		434	&{ $my_filter{$_[0]} or return }
		435	});
		436
		437
		438	Example: use the safe filter function (see L<SECURITY CONSIDERATIONS> for
		439	more considerations on security).
		440
		441	CBOR::XS->new->filter (\&CBOR::XS::safe_filter)->decode ($cbor_data);
161		442
162	=item $cbor_data = $cbor->encode ($perl_scalar)	443	=item $cbor_data = $cbor->encode ($perl_scalar)
163		444
164	Converts the given Perl data structure (a scalar value) to its CBOR	445	Converts the given Perl data structure (a scalar value) to its CBOR
165	representation.	446	representation.
…		…
175	when there is trailing garbage after the CBOR string, it will silently	456	when there is trailing garbage after the CBOR string, it will silently
176	stop parsing there and return the number of characters consumed so far.	457	stop parsing there and return the number of characters consumed so far.
177		458
178	This is useful if your CBOR texts are not delimited by an outer protocol	459	This is useful if your CBOR texts are not delimited by an outer protocol
179	and you need to know where the first CBOR string ends amd the next one	460	and you need to know where the first CBOR string ends amd the next one
180	starts.	461	starts - CBOR strings are self-delimited, so it is possible to concatenate
		462	CBOR strings without any delimiters or size fields and recover their data.
181		463
182	CBOR::XS->new->decode_prefix ("......")	464	CBOR::XS->new->decode_prefix ("......")
183	=> ("...", 3)	465	=> ("...", 3)
		466
		467	=back
		468
		469	=head2 INCREMENTAL PARSING
		470
		471	In some cases, there is the need for incremental parsing of JSON
		472	texts. While this module always has to keep both CBOR text and resulting
		473	Perl data structure in memory at one time, it does allow you to parse a
		474	CBOR stream incrementally, using a similar to using "decode_prefix" to see
		475	if a full CBOR object is available, but is much more efficient.
		476
		477	It basically works by parsing as much of a CBOR string as possible - if
		478	the CBOR data is not complete yet, the pasrer will remember where it was,
		479	to be able to restart when more data has been accumulated. Once enough
		480	data is available to either decode a complete CBOR value or raise an
		481	error, a real decode will be attempted.
		482
		483	A typical use case would be a network protocol that consists of sending
		484	and receiving CBOR-encoded messages. The solution that works with CBOR and
		485	about anything else is by prepending a length to every CBOR value, so the
		486	receiver knows how many octets to read. More compact (and slightly slower)
		487	would be to just send CBOR values back-to-back, as C<CBOR::XS> knows where
		488	a CBOR value ends, and doesn't need an explicit length.
		489
		490	The following methods help with this:
		491
		492	=over 4
		493
		494	=item @decoded = $cbor->incr_parse ($buffer)
		495
		496	This method attempts to decode exactly one CBOR value from the beginning
		497	of the given C<$buffer>. The value is removed from the C<$buffer> on
		498	success. When C<$buffer> doesn't contain a complete value yet, it returns
		499	nothing. Finally, when the C<$buffer> doesn't start with something
		500	that could ever be a valid CBOR value, it raises an exception, just as
		501	C<decode> would. In the latter case the decoder state is undefined and
		502	must be reset before being able to parse further.
		503
		504	This method modifies the C<$buffer> in place. When no CBOR value can be
		505	decoded, the decoder stores the current string offset. On the next call,
		506	continues decoding at the place where it stopped before. For this to make
		507	sense, the C<$buffer> must begin with the same octets as on previous
		508	unsuccessful calls.
		509
		510	You can call this method in scalar context, in which case it either
		511	returns a decoded value or C<undef>. This makes it impossible to
		512	distinguish between CBOR null values (which decode to C<undef>) and an
		513	unsuccessful decode, which is often acceptable.
		514
		515	=item @decoded = $cbor->incr_parse_multiple ($buffer)
		516
		517	Same as C<incr_parse>, but attempts to decode as many CBOR values as
		518	possible in one go, instead of at most one. Calls to C<incr_parse> and
		519	C<incr_parse_multiple> can be interleaved.
		520
		521	=item $cbor->incr_reset
		522
		523	Resets the incremental decoder. This throws away any saved state, so that
		524	subsequent calls to C<incr_parse> or C<incr_parse_multiple> start to parse
		525	a new CBOR value from the beginning of the C<$buffer> again.
		526
		527	This method can be called at any time, but it I<must> be called if you want
		528	to change your C<$buffer> or there was a decoding error and you want to
		529	reuse the C<$cbor> object for future incremental parsings.
184		530
185	=back	531	=back
186		532
187		533
188	=head1 MAPPING	534	=head1 MAPPING
…		…
206	CBOR integers become (numeric) perl scalars. On perls without 64 bit	552	CBOR integers become (numeric) perl scalars. On perls without 64 bit
207	support, 64 bit integers will be truncated or otherwise corrupted.	553	support, 64 bit integers will be truncated or otherwise corrupted.
208		554
209	=item byte strings	555	=item byte strings
210		556
211	Byte strings will become octet strings in Perl (the byte values 0..255	557	Byte strings will become octet strings in Perl (the Byte values 0..255
212	will simply become characters of the same value in Perl).	558	will simply become characters of the same value in Perl).
213		559
214	=item UTF-8 strings	560	=item UTF-8 strings
215		561
216	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be	562	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
…		…
234	C<Types:Serialiser::false> and C<Types::Serialiser::error>,	580	C<Types:Serialiser::false> and C<Types::Serialiser::error>,
235	respectively. They are overloaded to act almost exactly like the numbers	581	respectively. They are overloaded to act almost exactly like the numbers
236	C<1> and C<0> (for true and false) or to throw an exception on access (for	582	C<1> and C<0> (for true and false) or to throw an exception on access (for
237	error). See the L<Types::Serialiser> manpage for details.	583	error). See the L<Types::Serialiser> manpage for details.
238		584
239	=item CBOR tag 256 (perl object)	585	=item tagged values
240		586
241	The tag value C<256> (TODO: pending iana registration) will be used
242	to deserialise a Perl object serialised with C<FREEZE>. See L<OBJECT
243	SERIALISATION>, below, for details.
244
245	=item CBOR tag 55799 (magic header)
246
247	The tag 55799 is ignored (this tag implements the magic header).
248
249	=item other CBOR tags
250
251	Tagged items consists of a numeric tag and another CBOR value. Tags not	587	Tagged items consists of a numeric tag and another CBOR value.
252	handled internally are currently converted into a L<CBOR::XS::Tagged>
253	object, which is simply a blessed array reference consisting of the
254	numeric tag value followed by the (decoded) CBOR value.
255		588
256	In the future, support for user-supplied conversions might get added.	589	See L<TAG HANDLING AND EXTENSIONS> and the description of C<< ->filter >>
		590	for details on which tags are handled how.
257		591
258	=item anything else	592	=item anything else
259		593
260	Anything else (e.g. unsupported simple values) will raise a decoding	594	Anything else (e.g. unsupported simple values) will raise a decoding
261	error.	595	error.
…		…
264		598
265		599
266	=head2 PERL -> CBOR	600	=head2 PERL -> CBOR
267		601
268	The mapping from Perl to CBOR is slightly more difficult, as Perl is a	602	The mapping from Perl to CBOR is slightly more difficult, as Perl is a
269	truly typeless language, so we can only guess which CBOR type is meant by	603	typeless language. That means this module can only guess which CBOR type
270	a Perl value.	604	is meant by a perl value.
271		605
272	=over 4	606	=over 4
273		607
274	=item hash references	608	=item hash references
275		609
276	Perl hash references become CBOR maps. As there is no inherent ordering in	610	Perl hash references become CBOR maps. As there is no inherent ordering in
277	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random	611	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
278	order.	612	order. This order can be different each time a hash is encoded.
279		613
280	Currently, tied hashes will use the indefinite-length format, while normal	614	Currently, tied hashes will use the indefinite-length format, while normal
281	hashes will use the fixed-length format.	615	hashes will use the fixed-length format.
282		616
283	=item array references	617	=item array references
284		618
285	Perl array references become fixed-length CBOR arrays.	619	Perl array references become fixed-length CBOR arrays.
286		620
287	=item other references	621	=item other references
288		622
289	Other unblessed references are generally not allowed and will cause an	623	Other unblessed references will be represented using
290	exception to be thrown, except for references to the integers C<0> and	624	the indirection tag extension (tag value C<22098>,
291	C<1>, which get turned into false and true in CBOR.	625	L<http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
		626	to be able to decode these values somehow, by either "doing the right
		627	thing", decoding into a generic tagged object, simply ignoring the tag, or
		628	something else.
292		629
293	=item CBOR::XS::Tagged objects	630	=item CBOR::XS::Tagged objects
294		631
295	Objects of this type must be arrays consisting of a single C<[tag, value]>	632	Objects of this type must be arrays consisting of a single C<[tag, value]>
296	pair. The (numerical) tag will be encoded as a CBOR tag, the value will	633	pair. The (numerical) tag will be encoded as a CBOR tag, the value will
297	be encoded as appropriate for the value. You cna use C<CBOR::XS::tag> to	634	be encoded as appropriate for the value. You must use C<CBOR::XS::tag> to
298	create such objects.	635	create such objects.
299		636
300	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error	637	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
301		638
302	These special values become CBOR true, CBOR false and CBOR undefined	639	These special values become CBOR true, CBOR false and CBOR undefined
…		…
304	if you want.	641	if you want.
305		642
306	=item other blessed objects	643	=item other blessed objects
307		644
308	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See	645	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
309	L<OBJECT SERIALISATION>, below, for details.	646	L<TAG HANDLING AND EXTENSIONS> for specific classes handled by this
		647	module, and L<OBJECT SERIALISATION> for generic object serialisation.
310		648
311	=item simple scalars	649	=item simple scalars
312		650
313	TODO
314	Simple Perl scalars (any scalar that is not a reference) are the most	651	Simple Perl scalars (any scalar that is not a reference) are the most
315	difficult objects to encode: CBOR::XS will encode undefined scalars as	652	difficult objects to encode: CBOR::XS will encode undefined scalars as
316	CBOR null values, scalars that have last been used in a string context	653	CBOR null values, scalars that have last been used in a string context
317	before encoding as CBOR strings, and anything else as number value:	654	before encoding as CBOR strings, and anything else as number value:
318		655
319	# dump as number	656	# dump as number
320	encode_cbor [2] # yields [2]	657	encode_cbor [2] # yields [2]
321	encode_cbor [-3.0e17] # yields [-3e+17]	658	encode_cbor [-3.0e17] # yields [-3e+17]
322	my $value = 5; encode_cbor [$value] # yields [5]	659	my $value = 5; encode_cbor [$value] # yields [5]
323		660
324	# used as string, so dump as string	661	# used as string, so dump as string (either byte or text)
325	print $value;	662	print $value;
326	encode_cbor [$value] # yields ["5"]	663	encode_cbor [$value] # yields ["5"]
327		664
328	# undef becomes null	665	# undef becomes null
329	encode_cbor [undef] # yields [null]	666	encode_cbor [undef] # yields [null]
…		…
332		669
333	my $x = 3.1; # some variable containing a number	670	my $x = 3.1; # some variable containing a number
334	"$x"; # stringified	671	"$x"; # stringified
335	$x .= ""; # another, more awkward way to stringify	672	$x .= ""; # another, more awkward way to stringify
336	print $x; # perl does it for you, too, quite often	673	print $x; # perl does it for you, too, quite often
		674
		675	You can force whether a string is encoded as byte or text string by using
		676	C<utf8::upgrade> and C<utf8::downgrade> (if C<text_strings> is disabled).
		677
		678	utf8::upgrade $x; # encode $x as text string
		679	utf8::downgrade $x; # encode $x as byte string
		680
		681	More options are available, see L<TYPE CASTS>, below, and the C<text_keys>
		682	and C<text_strings> options.
		683
		684	Perl doesn't define what operations up- and downgrade strings, so if the
		685	difference between byte and text is important, you should up- or downgrade
		686	your string as late as possible before encoding. You can also force the
		687	use of CBOR text strings by using C<text_keys> or C<text_strings>.
337		688
338	You can force the type to be a CBOR number by numifying it:	689	You can force the type to be a CBOR number by numifying it:
339		690
340	my $x = "3"; # some variable containing a string	691	my $x = "3"; # some variable containing a string
341	$x += 0; # numify it, ensuring it will be dumped as a number	692	$x += 0; # numify it, ensuring it will be dumped as a number
…		…
352	represent numerical values are supported, but might suffer loss of	703	represent numerical values are supported, but might suffer loss of
353	precision.	704	precision.
354		705
355	=back	706	=back
356		707
		708	=head2 TYPE CASTS
		709
		710	B<EXPERIMENTAL>: As an experimental extension, C<CBOR::XS> allows you to
		711	force specific CBOR types to be used when encoding. That allows you to
		712	encode types not normally accessible (e.g. half floats) as well as force
		713	string types even when C<text_strings> is in effect.
		714
		715	Type forcing is done by calling a special "cast" function which keeps a
		716	copy of the value and returns a new value that can be handed over to any
		717	CBOR encoder function.
		718
		719	The following casts are currently available (all of which are unary
		720	operators, that is, have a prototype of C<$>):
		721
		722	=over
		723
		724	=item CBOR::XS::as_int $value
		725
		726	Forces the value to be encoded as some form of (basic, not bignum) integer
		727	type.
		728
		729	=item CBOR::XS::as_text $value
		730
		731	Forces the value to be encoded as (UTF-8) text values.
		732
		733	=item CBOR::XS::as_bytes $value
		734
		735	Forces the value to be encoded as a (binary) string value.
		736
		737	Example: encode a perl string as binary even though C<text_strings> is in
		738	effect.
		739
		740	CBOR::XS->new->text_strings->encode ([4, "text", CBOR::XS::bytes "bytevalue"]);
		741
		742	=item CBOR::XS::as_bool $value
		743
		744	Converts a Perl boolean (which can be any kind of scalar) into a CBOR
		745	boolean. Strictly the same, but shorter to write, than:
		746
		747	$value ? Types::Serialiser::true : Types::Serialiser::false
		748
		749	=item CBOR::XS::as_float16 $value
		750
		751	Forces half-float (IEEE 754 binary16) encoding of the given value.
		752
		753	=item CBOR::XS::as_float32 $value
		754
		755	Forces single-float (IEEE 754 binary32) encoding of the given value.
		756
		757	=item CBOR::XS::as_float64 $value
		758
		759	Forces double-float (IEEE 754 binary64) encoding of the given value.
		760
		761	=item CBOR::XS::as_cbor $cbor_text
		762
		763	Not a type cast per-se, this type cast forces the argument to be encoded
		764	as-is. This can be used to embed pre-encoded CBOR data.
		765
		766	Note that no checking on the validity of the C<$cbor_text> is done - it's
		767	the callers responsibility to correctly encode values.
		768
		769	=item CBOR::XS::as_map [key => value...]
		770
		771	Treat the array reference as key value pairs and output a CBOR map. This
		772	allows you to generate CBOR maps with arbitrary key types (or, if you
		773	don't care about semantics, duplicate keys or pairs in a custom order),
		774	which is otherwise hard to do with Perl.
		775
		776	The single argument must be an array reference with an even number of
		777	elements.
		778
		779	Note that only the reference to the array is copied, the array itself is
		780	not. Modifications done to the array before calling an encoding function
		781	will be reflected in the encoded output.
		782
		783	Example: encode a CBOR map with a string and an integer as keys.
		784
		785	encode_cbor CBOR::XS::as_map [string => "value", 5 => "value"]
		786
		787	=back
		788
		789	=cut
		790
		791	sub CBOR::XS::as_cbor ($) { bless [$_[0], 0, undef], CBOR::XS::Tagged:: }
		792	sub CBOR::XS::as_int ($) { bless [$_[0], 1, undef], CBOR::XS::Tagged:: }
		793	sub CBOR::XS::as_bytes ($) { bless [$_[0], 2, undef], CBOR::XS::Tagged:: }
		794	sub CBOR::XS::as_text ($) { bless [$_[0], 3, undef], CBOR::XS::Tagged:: }
		795	sub CBOR::XS::as_float16 ($) { bless [$_[0], 4, undef], CBOR::XS::Tagged:: }
		796	sub CBOR::XS::as_float32 ($) { bless [$_[0], 5, undef], CBOR::XS::Tagged:: }
		797	sub CBOR::XS::as_float64 ($) { bless [$_[0], 6, undef], CBOR::XS::Tagged:: }
		798
		799	sub CBOR::XS::as_bool ($) { $_[0] ? $Types::Serialiser::true : $Types::Serialiser::false }
		800
		801	sub CBOR::XS::as_map ($) {
		802	ARRAY:: eq ref $_[0]
		803	and $#{ $_[0] } & 1
		804	or do { require Carp; Carp::croak ("CBOR::XS::as_map only acepts array references with an even number of elements, caught") };
		805
		806	bless [$_[0], 7, undef], CBOR::XS::Tagged::
		807	}
		808
357	=head2 OBJECT SERIALISATION	809	=head2 OBJECT SERIALISATION
		810
		811	This module implements both a CBOR-specific and the generic
		812	L<Types::Serialier> object serialisation protocol. The following
		813	subsections explain both methods.
		814
		815	=head3 ENCODING
358		816
359	This module knows two way to serialise a Perl object: The CBOR-specific	817	This module knows two way to serialise a Perl object: The CBOR-specific
360	way, and the generic way.	818	way, and the generic way.
361		819
362	Whenever the encoder encounters a Perl object that it cnanot serialise	820	Whenever the encoder encounters a Perl object that it cannot serialise
363	directly (most of them), it will first look up the C<TO_CBOR> method on	821	directly (most of them), it will first look up the C<TO_CBOR> method on
364	it.	822	it.
365		823
366	If it has a C<TO_CBOR> method, it will call it with the object as only	824	If it has a C<TO_CBOR> method, it will call it with the object as only
367	argument, and expects exactly one return value, which it will then	825	argument, and expects exactly one return value, which it will then
…		…
373		831
374	The C<FREEZE> method can return any number of values (i.e. zero or	832	The C<FREEZE> method can return any number of values (i.e. zero or
375	more). These will be encoded as CBOR perl object, together with the	833	more). These will be encoded as CBOR perl object, together with the
376	classname.	834	classname.
377		835
		836	These methods I<MUST NOT> change the data structure that is being
		837	serialised. Failure to comply to this can result in memory corruption -
		838	and worse.
		839
378	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail	840	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
379	with an error.	841	with an error.
380		842
		843	=head3 DECODING
		844
381	Objects encoded via C<TO_CBOR> cannot be automatically decoded, but	845	Objects encoded via C<TO_CBOR> cannot (normally) be automatically decoded,
382	objects encoded via C<FREEZE> can be decoded using the following protocol:	846	but objects encoded via C<FREEZE> can be decoded using the following
		847	protocol:
383		848
384	When an encoded CBOR perl object is encountered by the decoder, it will	849	When an encoded CBOR perl object is encountered by the decoder, it will
385	look up the C<THAW> method, by using the stored classname, and will fail	850	look up the C<THAW> method, by using the stored classname, and will fail
386	if the method cannot be found.	851	if the method cannot be found.
387		852
388	After the lookup it will call the C<THAW> method with the stored classname	853	After the lookup it will call the C<THAW> method with the stored classname
389	as first argument, the constant string C<CBOR> as second argument, and all	854	as first argument, the constant string C<CBOR> as second argument, and all
390	values returned by C<FREEZE> as remaining arguments.	855	values returned by C<FREEZE> as remaining arguments.
391		856
392	=head4 EXAMPLES	857	=head3 EXAMPLES
393		858
394	Here is an example C<TO_CBOR> method:	859	Here is an example C<TO_CBOR> method:
395		860
396	sub My::Object::TO_CBOR {	861	sub My::Object::TO_CBOR {
397	my ($obj) = @_;	862	my ($obj) = @_;
…		…
408		873
409	sub URI::TO_CBOR {	874	sub URI::TO_CBOR {
410	my ($self) = @_;	875	my ($self) = @_;
411	my $uri = "$self"; # stringify uri	876	my $uri = "$self"; # stringify uri
412	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string	877	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
413	CBOR::XS::tagged 32, "$_[0]"	878	CBOR::XS::tag 32, "$_[0]"
414	}	879	}
415		880
416	This will encode URIs as a UTF-8 string with tag 32, which indicates an	881	This will encode URIs as a UTF-8 string with tag 32, which indicates an
417	URI.	882	URI.
418		883
…		…
429	"$self" # encode url string	894	"$self" # encode url string
430	}	895	}
431		896
432	sub URI::THAW {	897	sub URI::THAW {
433	my ($class, $serialiser, $uri) = @_;	898	my ($class, $serialiser, $uri) = @_;
434
435	$class->new ($uri)	899	$class->new ($uri)
436	}	900	}
437		901
438	Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For	902	Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
439	example, a C<FREEZE> method that returns "type", "id" and "variant" values	903	example, a C<FREEZE> method that returns "type", "id" and "variant" values
…		…
455	=head1 MAGIC HEADER	919	=head1 MAGIC HEADER
456		920
457	There is no way to distinguish CBOR from other formats	921	There is no way to distinguish CBOR from other formats
458	programmatically. To make it easier to distinguish CBOR from other	922	programmatically. To make it easier to distinguish CBOR from other
459	formats, the CBOR specification has a special "magic string" that can be	923	formats, the CBOR specification has a special "magic string" that can be
460	prepended to any CBOR string without changing it's meaning.	924	prepended to any CBOR string without changing its meaning.
461		925
462	This string is available as C<$CBOR::XS::MAGIC>. This module does not	926	This string is available as C<$CBOR::XS::MAGIC>. This module does not
463	prepend this string tot he CBOR data it generates, but it will ignroe it	927	prepend this string to the CBOR data it generates, but it will ignore it
464	if present, so users can prepend this string as a "file type" indicator as	928	if present, so users can prepend this string as a "file type" indicator as
465	required.	929	required.
466		930
467		931
468	=head1 THE CBOR::XS::Tagged CLASS	932	=head1 THE CBOR::XS::Tagged CLASS
…		…
551	Wrap CBOR data in CBOR:	1015	Wrap CBOR data in CBOR:
552		1016
553	my $cbor_cbor = encode_cbor	1017	my $cbor_cbor = encode_cbor
554	CBOR::XS::tag 24,	1018	CBOR::XS::tag 24,
555	encode_cbor [1, 2, 3];	1019	encode_cbor [1, 2, 3];
		1020
		1021	=head1 TAG HANDLING AND EXTENSIONS
		1022
		1023	This section describes how this module handles specific tagged values
		1024	and extensions. If a tag is not mentioned here and no additional filters
		1025	are provided for it, then the default handling applies (creating a
		1026	CBOR::XS::Tagged object on decoding, and only encoding the tag when
		1027	explicitly requested).
		1028
		1029	Tags not handled specifically are currently converted into a
		1030	L<CBOR::XS::Tagged> object, which is simply a blessed array reference
		1031	consisting of the numeric tag value followed by the (decoded) CBOR value.
		1032
		1033	Future versions of this module reserve the right to special case
		1034	additional tags (such as base64url).
		1035
		1036	=head2 ENFORCED TAGS
		1037
		1038	These tags are always handled when decoding, and their handling cannot be
		1039	overridden by the user.
		1040
		1041	=over 4
		1042
		1043	=item 26 (perl-object, L<http://cbor.schmorp.de/perl-object>)
		1044
		1045	These tags are automatically created (and decoded) for serialisable
		1046	objects using the C<FREEZE/THAW> methods (the L<Types::Serialier> object
		1047	serialisation protocol). See L<OBJECT SERIALISATION> for details.
		1048
		1049	=item 28, 29 (shareable, sharedref, L<http://cbor.schmorp.de/value-sharing>)
		1050
		1051	These tags are automatically decoded when encountered (and they do not
		1052	result in a cyclic data structure, see C<allow_cycles>), resulting in
		1053	shared values in the decoded object. They are only encoded, however, when
		1054	C<allow_sharing> is enabled.
		1055
		1056	Not all shared values can be successfully decoded: values that reference
		1057	themselves will I<currently> decode as C<undef> (this is not the same
		1058	as a reference pointing to itself, which will be represented as a value
		1059	that contains an indirect reference to itself - these will be decoded
		1060	properly).
		1061
		1062	Note that considerably more shared value data structures can be decoded
		1063	than will be encoded - currently, only values pointed to by references
		1064	will be shared, others will not. While non-reference shared values can be
		1065	generated in Perl with some effort, they were considered too unimportant
		1066	to be supported in the encoder. The decoder, however, will decode these
		1067	values as shared values.
		1068
		1069	=item 256, 25 (stringref-namespace, stringref, L<http://cbor.schmorp.de/stringref>)
		1070
		1071	These tags are automatically decoded when encountered. They are only
		1072	encoded, however, when C<pack_strings> is enabled.
		1073
		1074	=item 22098 (indirection, L<http://cbor.schmorp.de/indirection>)
		1075
		1076	This tag is automatically generated when a reference are encountered (with
		1077	the exception of hash and array references). It is converted to a reference
		1078	when decoding.
		1079
		1080	=item 55799 (self-describe CBOR, RFC 7049)
		1081
		1082	This value is not generated on encoding (unless explicitly requested by
		1083	the user), and is simply ignored when decoding.
		1084
		1085	=back
		1086
		1087	=head2 NON-ENFORCED TAGS
		1088
		1089	These tags have default filters provided when decoding. Their handling can
		1090	be overridden by changing the C<%CBOR::XS::FILTER> entry for the tag, or by
		1091	providing a custom C<filter> callback when decoding.
		1092
		1093	When they result in decoding into a specific Perl class, the module
		1094	usually provides a corresponding C<TO_CBOR> method as well.
		1095
		1096	When any of these need to load additional modules that are not part of the
		1097	perl core distribution (e.g. L<URI>), it is (currently) up to the user to
		1098	provide these modules. The decoding usually fails with an exception if the
		1099	required module cannot be loaded.
		1100
		1101	=over 4
		1102
		1103	=item 0, 1 (date/time string, seconds since the epoch)
		1104
		1105	These tags are decoded into L<Time::Piece> objects. The corresponding
		1106	C<Time::Piece::TO_CBOR> method always encodes into tag 1 values currently.
		1107
		1108	The L<Time::Piece> API is generally surprisingly bad, and fractional
		1109	seconds are only accidentally kept intact, so watch out. On the plus side,
		1110	the module comes with perl since 5.10, which has to count for something.
		1111
		1112	=item 2, 3 (positive/negative bignum)
		1113
		1114	These tags are decoded into L<Math::BigInt> objects. The corresponding
		1115	C<Math::BigInt::TO_CBOR> method encodes "small" bigints into normal CBOR
		1116	integers, and others into positive/negative CBOR bignums.
		1117
		1118	=item 4, 5, 264, 265 (decimal fraction/bigfloat)
		1119
		1120	Both decimal fractions and bigfloats are decoded into L<Math::BigFloat>
		1121	objects. The corresponding C<Math::BigFloat::TO_CBOR> method I<always>
		1122	encodes into a decimal fraction (either tag 4 or 264).
		1123
		1124	NaN and infinities are not encoded properly, as they cannot be represented
		1125	in CBOR.
		1126
		1127	See L<BIGNUM SECURITY CONSIDERATIONS> for more info.
		1128
		1129	=item 30 (rational numbers)
		1130
		1131	These tags are decoded into L<Math::BigRat> objects. The corresponding
		1132	C<Math::BigRat::TO_CBOR> method encodes rational numbers with denominator
		1133	C<1> via their numerator only, i.e., they become normal integers or
		1134	C<bignums>.
		1135
		1136	See L<BIGNUM SECURITY CONSIDERATIONS> for more info.
		1137
		1138	=item 21, 22, 23 (expected later JSON conversion)
		1139
		1140	CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore these
		1141	tags.
		1142
		1143	=item 32 (URI)
		1144
		1145	These objects decode into L<URI> objects. The corresponding
		1146	C<URI::TO_CBOR> method again results in a CBOR URI value.
		1147
		1148	=back
		1149
		1150	=cut
556		1151
557	=head1 CBOR and JSON	1152	=head1 CBOR and JSON
558		1153
559	CBOR is supposed to implement a superset of the JSON data model, and is,	1154	CBOR is supposed to implement a superset of the JSON data model, and is,
560	with some coercion, able to represent all JSON texts (something that other	1155	with some coercion, able to represent all JSON texts (something that other
…		…
569	CBOR intact.	1164	CBOR intact.
570		1165
571		1166
572	=head1 SECURITY CONSIDERATIONS	1167	=head1 SECURITY CONSIDERATIONS
573		1168
574	When you are using CBOR in a protocol, talking to untrusted potentially	1169	Tl;dr... if you want to decode or encode CBOR from untrusted sources, you
575	hostile creatures requires relatively few measures.	1170	should start with a coder object created via C<new_safe> (which implements
		1171	the mitigations explained below):
576		1172
		1173	my $coder = CBOR::XS->new_safe;
		1174
		1175	my $data = $coder->decode ($cbor_text);
		1176	my $cbor = $coder->encode ($data);
		1177
		1178	Longer version: When you are using CBOR in a protocol, talking to
		1179	untrusted potentially hostile creatures requires some thought:
		1180
		1181	=over 4
		1182
		1183	=item Security of the CBOR decoder itself
		1184
577	First of all, your CBOR decoder should be secure, that is, should not have	1185	First and foremost, your CBOR decoder should be secure, that is, should
		1186	not have any buffer overflows or similar bugs that could potentially be
578	any buffer overflows. Obviously, this module should ensure that and I am	1187	exploited. Obviously, this module should ensure that and I am trying hard
579	trying hard on making that true, but you never know.	1188	on making that true, but you never know.
580		1189
		1190	=item CBOR::XS can invoke almost arbitrary callbacks during decoding
		1191
		1192	CBOR::XS supports object serialisation - decoding CBOR can cause calls
		1193	to I<any> C<THAW> method in I<any> package that exists in your process
		1194	(that is, CBOR::XS will not try to load modules, but any existing C<THAW>
		1195	method or function can be called, so they all have to be secure).
		1196
		1197	Less obviously, it will also invoke C<TO_CBOR> and C<FREEZE> methods -
		1198	even if all your C<THAW> methods are secure, encoding data structures from
		1199	untrusted sources can invoke those and trigger bugs in those.
		1200
		1201	So, if you are not sure about the security of all the modules you
		1202	have loaded (you shouldn't), you should disable this part using
		1203	C<forbid_objects> or using C<new_safe>.
		1204
		1205	=item CBOR can be extended with tags that call library code
		1206
		1207	CBOR can be extended with tags, and C<CBOR::XS> has a registry of
		1208	conversion functions for many existing tags that can be extended via
		1209	third-party modules (see the C<filter> method).
		1210
		1211	If you don't trust these, you should configure the "safe" filter function,
		1212	C<CBOR::XS::safe_filter> (C<new_safe> does this), which by default only
		1213	includes conversion functions that are considered "safe" by the author
		1214	(but again, they can be extended by third party modules).
		1215
		1216	Depending on your level of paranoia, you can use the "safe" filter:
		1217
		1218	$cbor->filter (\&CBOR::XS::safe_filter);
		1219
		1220	... your own filter...
		1221
		1222	$cbor->filter (sub { ... do your stuffs here ... });
		1223
		1224	... or even no filter at all, disabling all tag decoding:
		1225
		1226	$cbor->filter (sub { });
		1227
		1228	This is never a problem for encoding, as the tag mechanism only exists in
		1229	CBOR texts.
		1230
		1231	=item Resource-starving attacks: object memory usage
		1232
581	Second, you need to avoid resource-starving attacks. That means you should	1233	You need to avoid resource-starving attacks. That means you should limit
582	limit the size of CBOR data you accept, or make sure then when your	1234	the size of CBOR data you accept, or make sure then when your resources
583	resources run out, that's just fine (e.g. by using a separate process that	1235	run out, that's just fine (e.g. by using a separate process that can
584	can crash safely). The size of a CBOR string in octets is usually a good	1236	crash safely). The size of a CBOR string in octets is usually a good
585	indication of the size of the resources required to decode it into a Perl	1237	indication of the size of the resources required to decode it into a Perl
586	structure. While CBOR::XS can check the size of the CBOR text, it might be	1238	structure. While CBOR::XS can check the size of the CBOR text (using
587	too late when you already have it in memory, so you might want to check	1239	C<max_size> - done by C<new_safe>), it might be too late when you already
588	the size before you accept the string.	1240	have it in memory, so you might want to check the size before you accept
		1241	the string.
589		1242
		1243	As for encoding, it is possible to construct data structures that are
		1244	relatively small but result in large CBOR texts (for example by having an
		1245	array full of references to the same big data structure, which will all be
		1246	deep-cloned during encoding by default). This is rarely an actual issue
		1247	(and the worst case is still just running out of memory), but you can
		1248	reduce this risk by using C<allow_sharing>.
		1249
		1250	=item Resource-starving attacks: stack overflows
		1251
590	Third, CBOR::XS recurses using the C stack when decoding objects and	1252	CBOR::XS recurses using the C stack when decoding objects and arrays. The
591	arrays. The C stack is a limited resource: for instance, on my amd64	1253	C stack is a limited resource: for instance, on my amd64 machine with 8MB
592	machine with 8MB of stack size I can decode around 180k nested arrays but	1254	of stack size I can decode around 180k nested arrays but only 14k nested
593	only 14k nested CBOR objects (due to perl itself recursing deeply on croak	1255	CBOR objects (due to perl itself recursing deeply on croak to free the
594	to free the temporary). If that is exceeded, the program crashes. To be	1256	temporary). If that is exceeded, the program crashes. To be conservative,
595	conservative, the default nesting limit is set to 512. If your process	1257	the default nesting limit is set to 512. If your process has a smaller
596	has a smaller stack, you should adjust this setting accordingly with the	1258	stack, you should adjust this setting accordingly with the C<max_depth>
597	C<max_depth> method.	1259	method.
		1260
		1261	=item Resource-starving attacks: CPU en-/decoding complexity
		1262
		1263	CBOR::XS will use the L<Math::BigInt>, L<Math::BigFloat> and
		1264	L<Math::BigRat> libraries to represent encode/decode bignums. These can be
		1265	very slow (as in, centuries of CPU time) and can even crash your program
		1266	(and are generally not very trustworthy). See the next section on bignum
		1267	security for details.
		1268
		1269	=item Data breaches: leaking information in error messages
		1270
		1271	CBOR::XS might leak contents of your Perl data structures in its error
		1272	messages, so when you serialise sensitive information you might want to
		1273	make sure that exceptions thrown by CBOR::XS will not end up in front of
		1274	untrusted eyes.
		1275
		1276	=item Something else...
598		1277
599	Something else could bomb you, too, that I forgot to think of. In that	1278	Something else could bomb you, too, that I forgot to think of. In that
600	case, you get to keep the pieces. I am always open for hints, though...	1279	case, you get to keep the pieces. I am always open for hints, though...
601		1280
602	Also keep in mind that CBOR::XS might leak contents of your Perl data	1281	=back
603	structures in its error messages, so when you serialise sensitive	1282
604	information you might want to make sure that exceptions thrown by CBOR::XS	1283
605	will not end up in front of untrusted eyes.	1284	=head1 BIGNUM SECURITY CONSIDERATIONS
		1285
		1286	CBOR::XS provides a C<TO_CBOR> method for both L<Math::BigInt> and
		1287	L<Math::BigFloat> that tries to encode the number in the simplest possible
		1288	way, that is, either a CBOR integer, a CBOR bigint/decimal fraction (tag
		1289	4) or an arbitrary-exponent decimal fraction (tag 264). Rational numbers
		1290	(L<Math::BigRat>, tag 30) can also contain bignums as members.
		1291
		1292	CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
		1293	bigfloats (tags 5 and 265), but it will never generate these on its own.
		1294
		1295	Using the built-in L<Math::BigInt::Calc> support, encoding and decoding
		1296	decimal fractions is generally fast. Decoding bigints can be slow for very
		1297	big numbers (tens of thousands of digits, something that could potentially
		1298	be caught by limiting the size of CBOR texts), and decoding bigfloats or
		1299	arbitrary-exponent bigfloats can be I<extremely> slow (minutes, decades)
		1300	for large exponents (roughly 40 bit and longer).
		1301
		1302	Additionally, L<Math::BigInt> can take advantage of other bignum
		1303	libraries, such as L<Math::GMP>, which cannot handle big floats with large
		1304	exponents, and might simply abort or crash your program, due to their code
		1305	quality.
		1306
		1307	This can be a concern if you want to parse untrusted CBOR. If it is, you
		1308	might want to disable decoding of tag 2 (bigint) and 3 (negative bigint)
		1309	types. You should also disable types 5 and 265, as these can be slow even
		1310	without bigints.
		1311
		1312	Disabling bigints will also partially or fully disable types that rely on
		1313	them, e.g. rational numbers that use bignums.
		1314
606		1315
607	=head1 CBOR IMPLEMENTATION NOTES	1316	=head1 CBOR IMPLEMENTATION NOTES
608		1317
609	This section contains some random implementation notes. They do not	1318	This section contains some random implementation notes. They do not
610	describe guaranteed behaviour, but merely behaviour as-is implemented	1319	describe guaranteed behaviour, but merely behaviour as-is implemented
…		…
619	Only the double data type is supported for NV data types - when Perl uses	1328	Only the double data type is supported for NV data types - when Perl uses
620	long double to represent floating point values, they might not be encoded	1329	long double to represent floating point values, they might not be encoded
621	properly. Half precision types are accepted, but not encoded.	1330	properly. Half precision types are accepted, but not encoded.
622		1331
623	Strict mode and canonical mode are not implemented.	1332	Strict mode and canonical mode are not implemented.
		1333
		1334
		1335	=head1 LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
		1336
		1337	On perls that were built without 64 bit integer support (these are rare
		1338	nowadays, even on 32 bit architectures, as all major Perl distributions
		1339	are built with 64 bit integer support), support for any kind of 64 bit
		1340	value in CBOR is very limited - most likely, these 64 bit values will
		1341	be truncated, corrupted, or otherwise not decoded correctly. This also
		1342	includes string, float, array and map sizes that are stored as 64 bit
		1343	integers.
624		1344
625		1345
626	=head1 THREADS	1346	=head1 THREADS
627		1347
628	This module is I<not> guaranteed to be thread safe and there are no	1348	This module is I<not> guaranteed to be thread safe and there are no
…		…
642	Please refrain from using rt.cpan.org or any other bug reporting	1362	Please refrain from using rt.cpan.org or any other bug reporting
643	service. I put the contact address into my modules for a reason.	1363	service. I put the contact address into my modules for a reason.
644		1364
645	=cut	1365	=cut
646		1366
		1367	# clumsy and slow hv_store-in-hash helper function
		1368	sub _hv_store {
		1369	$_[0]{$_[1]} = $_[2];
		1370	}
		1371
		1372	our %FILTER = (
		1373	0 => sub { # rfc4287 datetime, utf-8
		1374	require Time::Piece;
		1375	# Time::Piece::Strptime uses the "incredibly flexible date parsing routine"
		1376	# from FreeBSD, which can't parse ISO 8601, RFC3339, RFC4287 or much of anything
		1377	# else either. Whats incredibe over standard strptime totally escapes me.
		1378	# doesn't do fractional times, either. sigh.
		1379	# In fact, it's all a lie, it uses whatever strptime it wants, and of course,
		1380	# they are all incompatible. The openbsd one simply ignores %z (but according to the
		1381	# docs, it would be much more incredibly flexible indeed. If it worked, that is.).
		1382	scalar eval {
		1383	my $s = $_[1];
		1384
		1385	$s =~ s/Z$/+00:00/;
		1386	$s =~ s/(\.[0-9]+)?([+-][0-9][0-9]):([0-9][0-9])$//
		1387	or die;
		1388
		1389	my $b = $1 - ($2 * 60 + $3) * 60; # fractional part + offset. hopefully
		1390	my $d = Time::Piece->strptime ($s, "%Y-%m-%dT%H:%M:%S");
		1391
		1392	Time::Piece::gmtime ($d->epoch + $b)
		1393	} \|\| die "corrupted CBOR date/time string ($_[0])";
		1394	},
		1395
		1396	1 => sub { # seconds since the epoch, possibly fractional
		1397	require Time::Piece;
		1398	scalar Time::Piece::gmtime (pop)
		1399	},
		1400
		1401	2 => sub { # pos bigint
		1402	require Math::BigInt;
		1403	Math::BigInt->new ("0x" . unpack "H*", pop)
		1404	},
		1405
		1406	3 => sub { # neg bigint
		1407	require Math::BigInt;
		1408	-Math::BigInt->new ("0x" . unpack "H*", pop)
		1409	},
		1410
		1411	4 => sub { # decimal fraction, array
		1412	require Math::BigFloat;
		1413	Math::BigFloat->new ($_[1][1] . "E" . $_[1][0])
		1414	},
		1415
		1416	264 => sub { # decimal fraction with arbitrary exponent
		1417	require Math::BigFloat;
		1418	Math::BigFloat->new ($_[1][1] . "E" . $_[1][0])
		1419	},
		1420
		1421	5 => sub { # bigfloat, array
		1422	require Math::BigFloat;
		1423	scalar Math::BigFloat->new ($_[1][1]) * Math::BigFloat->new (2)->bpow ($_[1][0])
		1424	},
		1425
		1426	265 => sub { # bigfloat with arbitrary exponent
		1427	require Math::BigFloat;
		1428	scalar Math::BigFloat->new ($_[1][1]) * Math::BigFloat->new (2)->bpow ($_[1][0])
		1429	},
		1430
		1431	30 => sub { # rational number
		1432	require Math::BigRat;
		1433	Math::BigRat->new ("$_[1][0]/$_[1][1]") # separate parameters only work in recent versons
		1434	},
		1435
		1436	21 => sub { pop }, # expected conversion to base64url encoding
		1437	22 => sub { pop }, # expected conversion to base64 encoding
		1438	23 => sub { pop }, # expected conversion to base16 encoding
		1439
		1440	# 24 # embedded cbor, byte string
		1441
		1442	32 => sub {
		1443	require URI;
		1444	URI->new (pop)
		1445	},
		1446
		1447	# 33 # base64url rfc4648, utf-8
		1448	# 34 # base64 rfc46484, utf-8
		1449	# 35 # regex pcre/ecma262, utf-8
		1450	# 36 # mime message rfc2045, utf-8
		1451	);
		1452
		1453	sub default_filter {
		1454	&{ $FILTER{$_[0]} or return }
		1455	}
		1456
		1457	our %SAFE_FILTER = map { $_ => $FILTER{$_} } 0, 1, 21, 22, 23, 32;
		1458
		1459	sub safe_filter {
		1460	&{ $SAFE_FILTER{$_[0]} or return }
		1461	}
		1462
		1463	sub URI::TO_CBOR {
		1464	my $uri = $_[0]->as_string;
		1465	utf8::upgrade $uri;
		1466	tag 32, $uri
		1467	}
		1468
		1469	sub Math::BigInt::TO_CBOR {
		1470	if (-2147483648 <= $_[0] && $_[0] <= 2147483647) {
		1471	$_[0]->numify
		1472	} else {
		1473	my $hex = substr $_[0]->as_hex, 2;
		1474	$hex = "0$hex" if 1 & length $hex; # sigh
		1475	tag $_[0] >= 0 ? 2 : 3, pack "H*", $hex
		1476	}
		1477	}
		1478
		1479	sub Math::BigFloat::TO_CBOR {
		1480	my ($m, $e) = $_[0]->parts;
		1481
		1482	-9223372036854775808 <= $e && $e <= 18446744073709551615
		1483	? tag 4, [$e->numify, $m]
		1484	: tag 264, [$e, $m]
		1485	}
		1486
		1487	sub Math::BigRat::TO_CBOR {
		1488	my ($n, $d) = $_[0]->parts;
		1489
		1490	# older versions of BigRat need *1, as they not always return numbers
		1491
		1492	$d*1 == 1
		1493	? $n*1
		1494	: tag 30, [$n1, $d1]
		1495	}
		1496
		1497	sub Time::Piece::TO_CBOR {
		1498	tag 1, 0 + $_[0]->epoch
		1499	}
		1500
647	XSLoader::load "CBOR::XS", $VERSION;	1501	XSLoader::load "CBOR::XS", $VERSION;
648		1502
649	=head1 SEE ALSO	1503	=head1 SEE ALSO
650		1504
651	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,	1505	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing CBOR-XS/XS.pm (file contents): Revision 1.17 by root, Wed Oct 30 10:11:04 2013 UTC vs. Revision 1.82 by root, Sat May 8 07:08:12 2021 UTC

Diff Legend

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.17 by root, Wed Oct 30 10:11:04 2013 UTC vs.
Revision 1.82 by root, Sat May 8 07:08:12 2021 UTC