[ViewVC] Diff of: cvs/CBOR-XS/README

Comparing CBOR-XS/README (file contents):
Revision 1.1 by root, Fri Oct 25 23:09:45 2013 UTC vs.
Revision 1.18 by root, Wed Dec 7 14:14:30 2016 UTC

		1	NAME
		2	CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
		3
		4	SYNOPSIS
		5	use CBOR::XS;
		6
		7	$binary_cbor_data = encode_cbor $perl_value;
		8	$perl_value = decode_cbor $binary_cbor_data;
		9
		10	# OO-interface
		11
		12	$coder = CBOR::XS->new;
		13	$binary_cbor_data = $coder->encode ($perl_value);
		14	$perl_value = $coder->decode ($binary_cbor_data);
		15
		16	# prefix decoding
		17
		18	my $many_cbor_strings = ...;
		19	while (length $many_cbor_strings) {
		20	my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
		21	# data was decoded
		22	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
		23	}
		24
		25	DESCRIPTION
		26	This module converts Perl data structures to the Concise Binary Object
		27	Representation (CBOR) and vice versa. CBOR is a fast binary
		28	serialisation format that aims to use an (almost) superset of the JSON
		29	data model, i.e. when you can represent something useful in JSON, you
		30	should be able to represent it in CBOR.
		31
		32	In short, CBOR is a faster and quite compact binary alternative to JSON,
		33	with the added ability of supporting serialisation of Perl objects.
		34	(JSON often compresses better than CBOR though, so if you plan to
		35	compress the data later and speed is less important you might want to
		36	compare both formats first).
		37
		38	To give you a general idea about speed, with texts in the megabyte
		39	range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
		40	JSON::XS and decodes about 15%-30% faster than those. The shorter the
		41	data, the worse Storable performs in comparison.
		42
		43	Regarding compactness, "CBOR::XS"-encoded data structures are usually
		44	about 20% smaller than the same data encoded as (compact) JSON or
		45	Storable.
		46
		47	In addition to the core CBOR data format, this module implements a
		48	number of extensions, to support cyclic and shared data structures (see
		49	"allow_sharing" and "allow_cycles"), string deduplication (see
		50	"pack_strings") and scalar references (always enabled).
		51
		52	The primary goal of this module is to be correct and the secondary
		53	goal is to be fast. To reach the latter goal it was written in C.
		54
		55	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
		56	vice versa.
		57
		58	FUNCTIONAL INTERFACE
		59	The following convenience methods are provided by this module. They are
		60	exported by default:
		61
		62	$cbor_data = encode_cbor $perl_scalar
		63	Converts the given Perl data structure to CBOR representation.
		64	Croaks on error.
		65
		66	$perl_scalar = decode_cbor $cbor_data
		67	The opposite of "encode_cbor": expects a valid CBOR string to parse,
		68	returning the resulting perl scalar. Croaks on error.
		69
		70	OBJECT-ORIENTED INTERFACE
		71	The object oriented interface lets you configure your own encoding or
		72	decoding style, within the limits of supported formats.
		73
		74	$cbor = new CBOR::XS
		75	Creates a new CBOR::XS object that can be used to de/encode CBOR
		76	strings. All boolean flags described below are by default
		77	disabled.
		78
		79	The mutators for flags all return the CBOR object again and thus
		80	calls can be chained:
		81
		82	my $cbor = CBOR::XS->new->encode ({a => [1,2]});
		83
		84	$cbor = new_safe CBOR::XS
		85	Create a new, safe/secure CBOR::XS object. This is similar to "new",
		86	but configures the coder object to be safe to use with untrusted
		87	data. Currently, this is equivalent to:
		88
		89	my $cbor = CBOR::XS
		90	->new
		91	->forbid_objects
		92	->filter (\&CBOR::XS::safe_filter)
		93	->max_size (1e8);
		94
		95	But is more future proof (it is better to crash because of a change
		96	than to be exploited in other ways).
		97
		98	$cbor = $cbor->max_depth ([$maximum_nesting_depth])
		99	$max_depth = $cbor->get_max_depth
		100	Sets the maximum nesting level (default 512) accepted while encoding
		101	or decoding. If a higher nesting level is detected in CBOR data or a
		102	Perl data structure, then the encoder and decoder will stop and
		103	croak at that point.
		104
		105	Nesting level is defined by number of hash- or arrayrefs that the
		106	encoder needs to traverse to reach a given point or the number of
		107	"{" or "[" characters without their matching closing parenthesis
		108	crossed to reach a given character in a string.
		109
		110	Setting the maximum depth to one disallows any nesting, so that
		111	ensures that the object is only a single hash/object or array.
		112
		113	If no argument is given, the highest possible setting will be used,
		114	which is rarely useful.
		115
		116	Note that nesting is implemented by recursion in C. The default
		117	value has been chosen to be as large as typical operating systems
		118	allow without crashing.
		119
		120	See "SECURITY CONSIDERATIONS", below, for more info on why this is
		121	useful.
		122
		123	$cbor = $cbor->max_size ([$maximum_string_size])
		124	$max_size = $cbor->get_max_size
		125	Set the maximum length a CBOR string may have (in bytes) where
		126	decoding is being attempted. The default is 0, meaning no limit.
		127	When "decode" is called on a string that is longer then this many
		128	bytes, it will not attempt to decode the string but throw an
		129	exception. This setting has no effect on "encode" (yet).
		130
		131	If no argument is given, the limit check will be deactivated (same
		132	as when 0 is specified).
		133
		134	See "SECURITY CONSIDERATIONS", below, for more info on why this is
		135	useful.
		136
		137	$cbor = $cbor->allow_unknown ([$enable])
		138	$enabled = $cbor->get_allow_unknown
		139	If $enable is true (or missing), then "encode" will not throw an
		140	exception when it encounters values it cannot represent in CBOR (for
		141	example, filehandles) but instead will encode a CBOR "error" value.
		142
		143	If $enable is false (the default), then "encode" will throw an
		144	exception when it encounters anything it cannot encode as CBOR.
		145
		146	This option does not affect "decode" in any way, and it is
		147	recommended to leave it off unless you know your communications
		148	partner.
		149
		150	$cbor = $cbor->allow_sharing ([$enable])
		151	$enabled = $cbor->get_allow_sharing
		152	If $enable is true (or missing), then "encode" will not
		153	double-encode values that have been referenced before (e.g. when the
		154	same object, such as an array, is referenced multiple times), but
		155	instead will emit a reference to the earlier value.
		156
		157	This means that such values will only be encoded once, and will not
		158	result in a deep cloning of the value on decode, in decoders
		159	supporting the value sharing extension. This also makes it possible
		160	to encode cyclic data structures (which need "allow_cycles" to be
		161	enabled to be decoded by this module).
		162
		163	It is recommended to leave it off unless you know your communication
		164	partner supports the value sharing extensions to CBOR
		165	(<http://cbor.schmorp.de/value-sharing>), as without decoder
		166	support, the resulting data structure might be unusable.
		167
		168	Detecting shared values incurs a runtime overhead when values are
		169	encoded that have a reference counter large than one, and might
		170	unnecessarily increase the encoded size, as potentially shared
		171	values are encode as shareable whether or not they are actually
		172	shared.
		173
		174	At the moment, only targets of references can be shared (e.g.
		175	scalars, arrays or hashes pointed to by a reference). Weirder
		176	constructs, such as an array with multiple "copies" of the same
		177	string, which are hard but not impossible to create in Perl, are not
		178	supported (this is the same as with Storable).
		179
		180	If $enable is false (the default), then "encode" will encode shared
		181	data structures repeatedly, unsharing them in the process. Cyclic
		182	data structures cannot be encoded in this mode.
		183
		184	This option does not affect "decode" in any way - shared values and
		185	references will always be decoded properly if present.
		186
		187	$cbor = $cbor->allow_cycles ([$enable])
		188	$enabled = $cbor->get_allow_cycles
		189	If $enable is true (or missing), then "decode" will happily decode
		190	self-referential (cyclic) data structures. By default these will not
		191	be decoded, as they need manual cleanup to avoid memory leaks, so
		192	code that isn't prepared for this will not leak memory.
		193
		194	If $enable is false (the default), then "decode" will throw an error
		195	when it encounters a self-referential/cyclic data structure.
		196
		197	FUTURE DIRECTION: the motivation behind this option is to avoid
		198	real cycles - future versions of this module might chose to decode
		199	cyclic data structures using weak references when this option is
		200	off, instead of throwing an error.
		201
		202	This option does not affect "encode" in any way - shared values and
		203	references will always be encoded properly if present.
		204
		205	$cbor = $cbor->forbid_objects ([$enable])
		206	$enabled = $cbor->get_forbid_objects
		207	Disables the use of the object serialiser protocol.
		208
		209	If $enable is true (or missing), then "encode" will will throw an
		210	exception when it encounters perl objects that would be encoded
		211	using the perl-object tag (26). When "decode" encounters such tags,
		212	it will fall back to the general filter/tagged logic as if this were
		213	an unknown tag (by default resulting in a "CBOR::XC::Tagged"
		214	object).
		215
		216	If $enable is false (the default), then "encode" will use the
		217	Types::Serialiser object serialisation protocol to serialise objects
		218	into perl-object tags, and "decode" will do the same to decode such
		219	tags.
		220
		221	See "SECURITY CONSIDERATIONS", below, for more info on why
		222	forbidding this protocol can be useful.
		223
		224	$cbor = $cbor->pack_strings ([$enable])
		225	$enabled = $cbor->get_pack_strings
		226	If $enable is true (or missing), then "encode" will try not to
		227	encode the same string twice, but will instead encode a reference to
		228	the string instead. Depending on your data format, this can save a
		229	lot of space, but also results in a very large runtime overhead
		230	(expect encoding times to be 2-4 times as high as without).
		231
		232	It is recommended to leave it off unless you know your
		233	communications partner supports the stringref extension to CBOR
		234	(<http://cbor.schmorp.de/stringref>), as without decoder support,
		235	the resulting data structure might not be usable.
		236
		237	If $enable is false (the default), then "encode" will encode strings
		238	the standard CBOR way.
		239
		240	This option does not affect "decode" in any way - string references
		241	will always be decoded properly if present.
		242
		243	$cbor = $cbor->text_keys ([$enable])
		244	$enabled = $cbor->get_text_keys
		245	If $enabled is true (or missing), then "encode" will encode all perl
		246	hash keys as CBOR text strings/UTF-8 string, upgrading them as
		247	needed.
		248
		249	If $enable is false (the default), then "encode" will encode hash
		250	keys normally - upgraded perl strings (strings internally encoded as
		251	UTF-8) as CBOR text strings, and downgraded perl strings as CBOR
		252	byte strings.
		253
		254	This option does not affect "decode" in any way.
		255
		256	This option is useful for interoperability with CBOR decoders that
		257	don't treat byte strings as a form of text. It is especially useful
		258	as Perl gives very little control over hash keys.
		259
		260	Enabling this option can be slow, as all downgraded hash keys that
		261	are encoded need to be scanned and converted to UTF-8.
		262
		263	$cbor = $cbor->text_strings ([$enable])
		264	$enabled = $cbor->get_text_strings
		265	This option works similar to "text_keys", above, but works on all
		266	strings (including hash keys), so "text_keys" has no further effect
		267	after enabling "text_strings".
		268
		269	If $enabled is true (or missing), then "encode" will encode all perl
		270	strings as CBOR text strings/UTF-8 strings, upgrading them as
		271	needed.
		272
		273	If $enable is false (the default), then "encode" will encode strings
		274	normally (but see "text_keys") - upgraded perl strings (strings
		275	internally encoded as UTF-8) as CBOR text strings, and downgraded
		276	perl strings as CBOR byte strings.
		277
		278	This option does not affect "decode" in any way.
		279
		280	This option has similar advantages and disadvantages as "text_keys".
		281	In addition, this option effectively removes the ability to encode
		282	byte strings, which might break some "FREEZE" and "TO_CBOR" methods
		283	that rely on this, such as bignum encoding, so this option is mainly
		284	useful for very simple data.
		285
		286	$cbor = $cbor->validate_utf8 ([$enable])
		287	$enabled = $cbor->get_validate_utf8
		288	If $enable is true (or missing), then "decode" will validate that
		289	elements (text strings) containing UTF-8 data in fact contain valid
		290	UTF-8 data (instead of blindly accepting it). This validation
		291	obviously takes extra time during decoding.
		292
		293	The concept of "valid UTF-8" used is perl's concept, which is a
		294	superset of the official UTF-8.
		295
		296	If $enable is false (the default), then "decode" will blindly accept
		297	UTF-8 data, marking them as valid UTF-8 in the resulting data
		298	structure regardless of whether that's true or not.
		299
		300	Perl isn't too happy about corrupted UTF-8 in strings, but should
		301	generally not crash or do similarly evil things. Extensions might be
		302	not so forgiving, so it's recommended to turn on this setting if you
		303	receive untrusted CBOR.
		304
		305	This option does not affect "encode" in any way - strings that are
		306	supposedly valid UTF-8 will simply be dumped into the resulting CBOR
		307	string without checking whether that is, in fact, true or not.
		308
		309	$cbor = $cbor->filter ([$cb->($tag, $value)])
		310	$cb_or_undef = $cbor->get_filter
		311	Sets or replaces the tagged value decoding filter (when $cb is
		312	specified) or clears the filter (if no argument or "undef" is
		313	provided).
		314
		315	The filter callback is called only during decoding, when a
		316	non-enforced tagged value has been decoded (see "TAG HANDLING AND
		317	EXTENSIONS" for a list of enforced tags). For specific tags, it's
		318	often better to provide a default converter using the
		319	%CBOR::XS::FILTER hash (see below).
		320
		321	The first argument is the numerical tag, the second is the (decoded)
		322	value that has been tagged.
		323
		324	The filter function should return either exactly one value, which
		325	will replace the tagged value in the decoded data structure, or no
		326	values, which will result in default handling, which currently means
		327	the decoder creates a "CBOR::XS::Tagged" object to hold the tag and
		328	the value.
		329
		330	When the filter is cleared (the default state), the default filter
		331	function, "CBOR::XS::default_filter", is used. This function simply
		332	looks up the tag in the %CBOR::XS::FILTER hash. If an entry exists
		333	it must be a code reference that is called with tag and value, and
		334	is responsible for decoding the value. If no entry exists, it
		335	returns no values. "CBOR::XS" provides a number of default filter
		336	functions already, the the %CBOR::XS::FILTER hash can be freely
		337	extended with more.
		338
		339	"CBOR::XS" additionally provides an alternative filter function that
		340	is supposed to be safe to use with untrusted data (which the default
		341	filter might not), called "CBOR::XS::safe_filter", which works the
		342	same as the "default_filter" but uses the %CBOR::XS::SAFE_FILTER
		343	variable instead. It is prepopulated with the tag decoding functions
		344	that are deemed safe (basically the same as %CBOR::XS::FILTER
		345	without all the bignum tags), and can be extended by user code as
		346	wlel, although, obviously, one should be very careful about adding
		347	decoding functions here, since the expectation is that they are safe
		348	to use on untrusted data, after all.
		349
		350	Example: decode all tags not handled internally into
		351	"CBOR::XS::Tagged" objects, with no other special handling (useful
		352	when working with potentially "unsafe" CBOR data).
		353
		354	CBOR::XS->new->filter (sub { })->decode ($cbor_data);
		355
		356	Example: provide a global filter for tag 1347375694, converting the
		357	value into some string form.
		358
		359	$CBOR::XS::FILTER{1347375694} = sub {
		360	my ($tag, $value);
		361
		362	"tag 1347375694 value $value"
		363	};
		364
		365	Example: provide your own filter function that looks up tags in your
		366	own hash:
		367
		368	my %my_filter = (
		369	998347484 => sub {
		370	my ($tag, $value);
		371
		372	"tag 998347484 value $value"
		373	};
		374	);
		375
		376	my $coder = CBOR::XS->new->filter (sub {
		377	&{ $my_filter{$_[0]} or return }
		378	});
		379
		380	Example: use the safe filter function (see "SECURITY CONSIDERATIONS"
		381	for more considerations on security).
		382
		383	CBOR::XS->new->filter (\&CBOR::XS::safe_filter)->decode ($cbor_data);
		384
		385	$cbor_data = $cbor->encode ($perl_scalar)
		386	Converts the given Perl data structure (a scalar value) to its CBOR
		387	representation.
		388
		389	$perl_scalar = $cbor->decode ($cbor_data)
		390	The opposite of "encode": expects CBOR data and tries to parse it,
		391	returning the resulting simple scalar or reference. Croaks on error.
		392
		393	($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
		394	This works like the "decode" method, but instead of raising an
		395	exception when there is trailing garbage after the CBOR string, it
		396	will silently stop parsing there and return the number of characters
		397	consumed so far.
		398
		399	This is useful if your CBOR texts are not delimited by an outer
		400	protocol and you need to know where the first CBOR string ends amd
		401	the next one starts.
		402
		403	CBOR::XS->new->decode_prefix ("......")
		404	=> ("...", 3)
		405
		406	INCREMENTAL PARSING
		407	In some cases, there is the need for incremental parsing of JSON texts.
		408	While this module always has to keep both CBOR text and resulting Perl
		409	data structure in memory at one time, it does allow you to parse a CBOR
		410	stream incrementally, using a similar to using "decode_prefix" to see if
		411	a full CBOR object is available, but is much more efficient.
		412
		413	It basically works by parsing as much of a CBOR string as possible - if
		414	the CBOR data is not complete yet, the pasrer will remember where it
		415	was, to be able to restart when more data has been accumulated. Once
		416	enough data is available to either decode a complete CBOR value or raise
		417	an error, a real decode will be attempted.
		418
		419	A typical use case would be a network protocol that consists of sending
		420	and receiving CBOR-encoded messages. The solution that works with CBOR
		421	and about anything else is by prepending a length to every CBOR value,
		422	so the receiver knows how many octets to read. More compact (and
		423	slightly slower) would be to just send CBOR values back-to-back, as
		424	"CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
		425	length.
		426
		427	The following methods help with this:
		428
		429	@decoded = $cbor->incr_parse ($buffer)
		430	This method attempts to decode exactly one CBOR value from the
		431	beginning of the given $buffer. The value is removed from the
		432	$buffer on success. When $buffer doesn't contain a complete value
		433	yet, it returns nothing. Finally, when the $buffer doesn't start
		434	with something that could ever be a valid CBOR value, it raises an
		435	exception, just as "decode" would. In the latter case the decoder
		436	state is undefined and must be reset before being able to parse
		437	further.
		438
		439	This method modifies the $buffer in place. When no CBOR value can be
		440	decoded, the decoder stores the current string offset. On the next
		441	call, continues decoding at the place where it stopped before. For
		442	this to make sense, the $buffer must begin with the same octets as
		443	on previous unsuccessful calls.
		444
		445	You can call this method in scalar context, in which case it either
		446	returns a decoded value or "undef". This makes it impossible to
		447	distinguish between CBOR null values (which decode to "undef") and
		448	an unsuccessful decode, which is often acceptable.
		449
		450	@decoded = $cbor->incr_parse_multiple ($buffer)
		451	Same as "incr_parse", but attempts to decode as many CBOR values as
		452	possible in one go, instead of at most one. Calls to "incr_parse"
		453	and "incr_parse_multiple" can be interleaved.
		454
		455	$cbor->incr_reset
		456	Resets the incremental decoder. This throws away any saved state, so
		457	that subsequent calls to "incr_parse" or "incr_parse_multiple" start
		458	to parse a new CBOR value from the beginning of the $buffer again.
		459
		460	This method can be called at any time, but it must be called if
		461	you want to change your $buffer or there was a decoding error and
		462	you want to reuse the $cbor object for future incremental parsings.
		463
		464	MAPPING
		465	This section describes how CBOR::XS maps Perl values to CBOR values and
		466	vice versa. These mappings are designed to "do the right thing" in most
		467	circumstances automatically, preserving round-tripping characteristics
		468	(what you put in comes out as something equivalent).
		469
		470	For the more enlightened: note that in the following descriptions,
		471	lowercase perl refers to the Perl interpreter, while uppercase Perl
		472	refers to the abstract Perl language itself.
		473
		474	CBOR -> PERL
		475	integers
		476	CBOR integers become (numeric) perl scalars. On perls without 64 bit
		477	support, 64 bit integers will be truncated or otherwise corrupted.
		478
		479	byte strings
		480	Byte strings will become octet strings in Perl (the Byte values
		481	0..255 will simply become characters of the same value in Perl).
		482
		483	UTF-8 strings
		484	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
		485	decoded into proper Unicode code points. At the moment, the validity
		486	of the UTF-8 octets will not be validated - corrupt input will
		487	result in corrupted Perl strings.
		488
		489	arrays, maps
		490	CBOR arrays and CBOR maps will be converted into references to a
		491	Perl array or hash, respectively. The keys of the map will be
		492	stringified during this process.
		493
		494	null
		495	CBOR null becomes "undef" in Perl.
		496
		497	true, false, undefined
		498	These CBOR values become "Types:Serialiser::true",
		499	"Types:Serialiser::false" and "Types::Serialiser::error",
		500	respectively. They are overloaded to act almost exactly like the
		501	numbers 1 and 0 (for true and false) or to throw an exception on
		502	access (for error). See the Types::Serialiser manpage for details.
		503
		504	tagged values
		505	Tagged items consists of a numeric tag and another CBOR value.
		506
		507	See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
		508	for details on which tags are handled how.
		509
		510	anything else
		511	Anything else (e.g. unsupported simple values) will raise a decoding
		512	error.
		513
		514	PERL -> CBOR
		515	The mapping from Perl to CBOR is slightly more difficult, as Perl is a
		516	typeless language. That means this module can only guess which CBOR type
		517	is meant by a perl value.
		518
		519	hash references
		520	Perl hash references become CBOR maps. As there is no inherent
		521	ordering in hash keys (or CBOR maps), they will usually be encoded
		522	in a pseudo-random order. This order can be different each time a
		523	hash is encoded.
		524
		525	Currently, tied hashes will use the indefinite-length format, while
		526	normal hashes will use the fixed-length format.
		527
		528	array references
		529	Perl array references become fixed-length CBOR arrays.
		530
		531	other references
		532	Other unblessed references will be represented using the indirection
		533	tag extension (tag value 22098,
		534	<http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
		535	to be able to decode these values somehow, by either "doing the
		536	right thing", decoding into a generic tagged object, simply ignoring
		537	the tag, or something else.
		538
		539	CBOR::XS::Tagged objects
		540	Objects of this type must be arrays consisting of a single "[tag,
		541	value]" pair. The (numerical) tag will be encoded as a CBOR tag, the
		542	value will be encoded as appropriate for the value. You must use
		543	"CBOR::XS::tag" to create such objects.
		544
		545	Types::Serialiser::true, Types::Serialiser::false,
		546	Types::Serialiser::error
		547	These special values become CBOR true, CBOR false and CBOR undefined
		548	values, respectively. You can also use "\1", "\0" and "\undef"
		549	directly if you want.
		550
		551	other blessed objects
		552	Other blessed objects are serialised via "TO_CBOR" or "FREEZE". See
		553	"TAG HANDLING AND EXTENSIONS" for specific classes handled by this
		554	module, and "OBJECT SERIALISATION" for generic object serialisation.
		555
		556	simple scalars
		557	Simple Perl scalars (any scalar that is not a reference) are the
		558	most difficult objects to encode: CBOR::XS will encode undefined
		559	scalars as CBOR null values, scalars that have last been used in a
		560	string context before encoding as CBOR strings, and anything else as
		561	number value:
		562
		563	# dump as number
		564	encode_cbor [2] # yields [2]
		565	encode_cbor [-3.0e17] # yields [-3e+17]
		566	my $value = 5; encode_cbor [$value] # yields [5]
		567
		568	# used as string, so dump as string (either byte or text)
		569	print $value;
		570	encode_cbor [$value] # yields ["5"]
		571
		572	# undef becomes null
		573	encode_cbor [undef] # yields [null]
		574
		575	You can force the type to be a CBOR string by stringifying it:
		576
		577	my $x = 3.1; # some variable containing a number
		578	"$x"; # stringified
		579	$x .= ""; # another, more awkward way to stringify
		580	print $x; # perl does it for you, too, quite often
		581
		582	You can force whether a string is encoded as byte or text string by
		583	using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
		584	disabled):
		585
		586	utf8::upgrade $x; # encode $x as text string
		587	utf8::downgrade $x; # encode $x as byte string
		588
		589	Perl doesn't define what operations up- and downgrade strings, so if
		590	the difference between byte and text is important, you should up- or
		591	downgrade your string as late as possible before encoding. You can
		592	also force the use of CBOR text strings by using "text_keys" or
		593	"text_strings".
		594
		595	You can force the type to be a CBOR number by numifying it:
		596
		597	my $x = "3"; # some variable containing a string
		598	$x += 0; # numify it, ensuring it will be dumped as a number
		599	$x *= 1; # same thing, the choice is yours.
		600
		601	You can not currently force the type in other, less obscure, ways.
		602	Tell me if you need this capability (but don't forget to explain why
		603	it's needed :).
		604
		605	Perl values that seem to be integers generally use the shortest
		606	possible representation. Floating-point values will use either the
		607	IEEE single format if possible without loss of precision, otherwise
		608	the IEEE double format will be used. Perls that use formats other
		609	than IEEE double to represent numerical values are supported, but
		610	might suffer loss of precision.
		611
		612	OBJECT SERIALISATION
		613	This module implements both a CBOR-specific and the generic
		614	Types::Serialier object serialisation protocol. The following
		615	subsections explain both methods.
		616
		617	ENCODING
		618	This module knows two way to serialise a Perl object: The CBOR-specific
		619	way, and the generic way.
		620
		621	Whenever the encoder encounters a Perl object that it cannot serialise
		622	directly (most of them), it will first look up the "TO_CBOR" method on
		623	it.
		624
		625	If it has a "TO_CBOR" method, it will call it with the object as only
		626	argument, and expects exactly one return value, which it will then
		627	substitute and encode it in the place of the object.
		628
		629	Otherwise, it will look up the "FREEZE" method. If it exists, it will
		630	call it with the object as first argument, and the constant string
		631	"CBOR" as the second argument, to distinguish it from other serialisers.
		632
		633	The "FREEZE" method can return any number of values (i.e. zero or more).
		634	These will be encoded as CBOR perl object, together with the classname.
		635
		636	These methods MUST NOT change the data structure that is being
		637	serialised. Failure to comply to this can result in memory corruption -
		638	and worse.
		639
		640	If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail
		641	with an error.
		642
		643	DECODING
		644	Objects encoded via "TO_CBOR" cannot (normally) be automatically
		645	decoded, but objects encoded via "FREEZE" can be decoded using the
		646	following protocol:
		647
		648	When an encoded CBOR perl object is encountered by the decoder, it will
		649	look up the "THAW" method, by using the stored classname, and will fail
		650	if the method cannot be found.
		651
		652	After the lookup it will call the "THAW" method with the stored
		653	classname as first argument, the constant string "CBOR" as second
		654	argument, and all values returned by "FREEZE" as remaining arguments.
		655
		656	EXAMPLES
		657	Here is an example "TO_CBOR" method:
		658
		659	sub My::Object::TO_CBOR {
		660	my ($obj) = @_;
		661
		662	["this is a serialised My::Object object", $obj->{id}]
		663	}
		664
		665	When a "My::Object" is encoded to CBOR, it will instead encode a simple
		666	array with two members: a string, and the "object id". Decoding this
		667	CBOR string will yield a normal perl array reference in place of the
		668	object.
		669
		670	A more useful and practical example would be a serialisation method for
		671	the URI module. CBOR has a custom tag value for URIs, namely 32:
		672
		673	sub URI::TO_CBOR {
		674	my ($self) = @_;
		675	my $uri = "$self"; # stringify uri
		676	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
		677	CBOR::XS::tag 32, "$_[0]"
		678	}
		679
		680	This will encode URIs as a UTF-8 string with tag 32, which indicates an
		681	URI.
		682
		683	Decoding such an URI will not (currently) give you an URI object, but
		684	instead a CBOR::XS::Tagged object with tag number 32 and the string -
		685	exactly what was returned by "TO_CBOR".
		686
		687	To serialise an object so it can automatically be deserialised, you need
		688	to use "FREEZE" and "THAW". To take the URI module as example, this
		689	would be a possible implementation:
		690
		691	sub URI::FREEZE {
		692	my ($self, $serialiser) = @_;
		693	"$self" # encode url string
		694	}
		695
		696	sub URI::THAW {
		697	my ($class, $serialiser, $uri) = @_;
		698	$class->new ($uri)
		699	}
		700
		701	Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For
		702	example, a "FREEZE" method that returns "type", "id" and "variant"
		703	values would cause an invocation of "THAW" with 5 arguments:
		704
		705	sub My::Object::FREEZE {
		706	my ($self, $serialiser) = @_;
		707
		708	($self->{type}, $self->{id}, $self->{variant})
		709	}
		710
		711	sub My::Object::THAW {
		712	my ($class, $serialiser, $type, $id, $variant) = @_;
		713
		714	$class-<new (type => $type, id => $id, variant => $variant)
		715	}
		716
		717	MAGIC HEADER
		718	There is no way to distinguish CBOR from other formats programmatically.
		719	To make it easier to distinguish CBOR from other formats, the CBOR
		720	specification has a special "magic string" that can be prepended to any
		721	CBOR string without changing its meaning.
		722
		723	This string is available as $CBOR::XS::MAGIC. This module does not
		724	prepend this string to the CBOR data it generates, but it will ignore it
		725	if present, so users can prepend this string as a "file type" indicator
		726	as required.
		727
		728	THE CBOR::XS::Tagged CLASS
		729	CBOR has the concept of tagged values - any CBOR value can be tagged
		730	with a numeric 64 bit number, which are centrally administered.
		731
		732	"CBOR::XS" handles a few tags internally when en- or decoding. You can
		733	also create tags yourself by encoding "CBOR::XS::Tagged" objects, and
		734	the decoder will create "CBOR::XS::Tagged" objects itself when it hits
		735	an unknown tag.
		736
		737	These objects are simply blessed array references - the first member of
		738	the array being the numerical tag, the second being the value.
		739
		740	You can interact with "CBOR::XS::Tagged" objects in the following ways:
		741
		742	$tagged = CBOR::XS::tag $tag, $value
		743	This function(!) creates a new "CBOR::XS::Tagged" object using the
		744	given $tag (0..2**64-1) to tag the given $value (which can be any
		745	Perl value that can be encoded in CBOR, including serialisable Perl
		746	objects and "CBOR::XS::Tagged" objects).
		747
		748	$tagged->[0]
		749	$tagged->[0] = $new_tag
		750	$tag = $tagged->tag
		751	$new_tag = $tagged->tag ($new_tag)
		752	Access/mutate the tag.
		753
		754	$tagged->[1]
		755	$tagged->[1] = $new_value
		756	$value = $tagged->value
		757	$new_value = $tagged->value ($new_value)
		758	Access/mutate the tagged value.
		759
		760	EXAMPLES
		761	Here are some examples of "CBOR::XS::Tagged" uses to tag objects.
		762
		763	You can look up CBOR tag value and emanings in the IANA registry at
		764	<http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
		765
		766	Prepend a magic header ($CBOR::XS::MAGIC):
		767
		768	my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
		769	# same as:
		770	my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
		771
		772	Serialise some URIs and a regex in an array:
		773
		774	my $cbor = encode_cbor [
		775	(CBOR::XS::tag 32, "http://www.nethype.de/"),
		776	(CBOR::XS::tag 32, "http://software.schmorp.de/"),
		777	(CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
		778	];
		779
		780	Wrap CBOR data in CBOR:
		781
		782	my $cbor_cbor = encode_cbor
		783	CBOR::XS::tag 24,
		784	encode_cbor [1, 2, 3];
		785
		786	TAG HANDLING AND EXTENSIONS
		787	This section describes how this module handles specific tagged values
		788	and extensions. If a tag is not mentioned here and no additional filters
		789	are provided for it, then the default handling applies (creating a
		790	CBOR::XS::Tagged object on decoding, and only encoding the tag when
		791	explicitly requested).
		792
		793	Tags not handled specifically are currently converted into a
		794	CBOR::XS::Tagged object, which is simply a blessed array reference
		795	consisting of the numeric tag value followed by the (decoded) CBOR
		796	value.
		797
		798	Future versions of this module reserve the right to special case
		799	additional tags (such as base64url).
		800
		801	ENFORCED TAGS
		802	These tags are always handled when decoding, and their handling cannot
		803	be overridden by the user.
		804
		805	26 (perl-object, <http://cbor.schmorp.de/perl-object>)
		806	These tags are automatically created (and decoded) for serialisable
		807	objects using the "FREEZE/THAW" methods (the Types::Serialier object
		808	serialisation protocol). See "OBJECT SERIALISATION" for details.
		809
		810	28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
		811	These tags are automatically decoded when encountered (and they do
		812	not result in a cyclic data structure, see "allow_cycles"),
		813	resulting in shared values in the decoded object. They are only
		814	encoded, however, when "allow_sharing" is enabled.
		815
		816	Not all shared values can be successfully decoded: values that
		817	reference themselves will currently decode as "undef" (this is not
		818	the same as a reference pointing to itself, which will be
		819	represented as a value that contains an indirect reference to itself
		820	- these will be decoded properly).
		821
		822	Note that considerably more shared value data structures can be
		823	decoded than will be encoded - currently, only values pointed to by
		824	references will be shared, others will not. While non-reference
		825	shared values can be generated in Perl with some effort, they were
		826	considered too unimportant to be supported in the encoder. The
		827	decoder, however, will decode these values as shared values.
		828
		829	256, 25 (stringref-namespace, stringref,
		830	<http://cbor.schmorp.de/stringref>)
		831	These tags are automatically decoded when encountered. They are only
		832	encoded, however, when "pack_strings" is enabled.
		833
		834	22098 (indirection, <http://cbor.schmorp.de/indirection>)
		835	This tag is automatically generated when a reference are encountered
		836	(with the exception of hash and array references). It is converted
		837	to a reference when decoding.
		838
		839	55799 (self-describe CBOR, RFC 7049)
		840	This value is not generated on encoding (unless explicitly requested
		841	by the user), and is simply ignored when decoding.
		842
		843	NON-ENFORCED TAGS
		844	These tags have default filters provided when decoding. Their handling
		845	can be overridden by changing the %CBOR::XS::FILTER entry for the tag,
		846	or by providing a custom "filter" callback when decoding.
		847
		848	When they result in decoding into a specific Perl class, the module
		849	usually provides a corresponding "TO_CBOR" method as well.
		850
		851	When any of these need to load additional modules that are not part of
		852	the perl core distribution (e.g. URI), it is (currently) up to the user
		853	to provide these modules. The decoding usually fails with an exception
		854	if the required module cannot be loaded.
		855
		856	0, 1 (date/time string, seconds since the epoch)
		857	These tags are decoded into Time::Piece objects. The corresponding
		858	"Time::Piece::TO_CBOR" method always encodes into tag 1 values
		859	currently.
		860
		861	The Time::Piece API is generally surprisingly bad, and fractional
		862	seconds are only accidentally kept intact, so watch out. On the plus
		863	side, the module comes with perl since 5.10, which has to count for
		864	something.
		865
		866	2, 3 (positive/negative bignum)
		867	These tags are decoded into Math::BigInt objects. The corresponding
		868	"Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
		869	CBOR integers, and others into positive/negative CBOR bignums.
		870
		871	4, 5, 264, 265 (decimal fraction/bigfloat)
		872	Both decimal fractions and bigfloats are decoded into Math::BigFloat
		873	objects. The corresponding "Math::BigFloat::TO_CBOR" method always
		874	encodes into a decimal fraction (either tag 4 or 264).
		875
		876	NaN and infinities are not encoded properly, as they cannot be
		877	represented in CBOR.
		878
		879	See "BIGNUM SECURITY CONSIDERATIONS" for more info.
		880
		881	30 (rational numbers)
		882	These tags are decoded into Math::BigRat objects. The corresponding
		883	"Math::BigRat::TO_CBOR" method encodes rational numbers with
		884	denominator 1 via their numerator only, i.e., they become normal
		885	integers or "bignums".
		886
		887	See "BIGNUM SECURITY CONSIDERATIONS" for more info.
		888
		889	21, 22, 23 (expected later JSON conversion)
		890	CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore
		891	these tags.
		892
		893	32 (URI)
		894	These objects decode into URI objects. The corresponding
		895	"URI::TO_CBOR" method again results in a CBOR URI value.
		896
		897	CBOR and JSON
		898	CBOR is supposed to implement a superset of the JSON data model, and is,
		899	with some coercion, able to represent all JSON texts (something that
		900	other "binary JSON" formats such as BSON generally do not support).
		901
		902	CBOR implements some extra hints and support for JSON interoperability,
		903	and the spec offers further guidance for conversion between CBOR and
		904	JSON. None of this is currently implemented in CBOR, and the guidelines
		905	in the spec do not result in correct round-tripping of data. If JSON
		906	interoperability is improved in the future, then the goal will be to
		907	ensure that decoded JSON data will round-trip encoding and decoding to
		908	CBOR intact.
		909
		910	SECURITY CONSIDERATIONS
		911	Tl;dr... if you want to decode or encode CBOR from untrusted sources,
		912	you should start with a coder object created via "new_safe":
		913
		914	my $coder = CBOR::XS->new_safe;
		915
		916	my $data = $coder->decode ($cbor_text);
		917	my $cbor = $coder->encode ($data);
		918
		919	Longer version: When you are using CBOR in a protocol, talking to
		920	untrusted potentially hostile creatures requires some thought:
		921
		922	Security of the CBOR decoder itself
		923	First and foremost, your CBOR decoder should be secure, that is,
		924	should not have any buffer overflows or similar bugs that could
		925	potentially be exploited. Obviously, this module should ensure that
		926	and I am trying hard on making that true, but you never know.
		927
		928	CBOR::XS can invoke almost arbitrary callbacks during decoding
		929	CBOR::XS supports object serialisation - decoding CBOR can cause
		930	calls to any "THAW" method in any package that exists in your
		931	process (that is, CBOR::XS will not try to load modules, but any
		932	existing "THAW" method or function can be called, so they all have
		933	to be secure).
		934
		935	Less obviously, it will also invoke "TO_CBOR" and "FREEZE" methods -
		936	even if all your "THAW" methods are secure, encoding data structures
		937	from untrusted sources can invoke those and trigger bugs in those.
		938
		939	So, if you are not sure about the security of all the modules you
		940	have loaded (you shouldn't), you should disable this part using
		941	"forbid_objects".
		942
		943	CBOR can be extended with tags that call library code
		944	CBOR can be extended with tags, and "CBOR::XS" has a registry of
		945	conversion functions for many existing tags that can be extended via
		946	third-party modules (see the "filter" method).
		947
		948	If you don't trust these, you should configure the "safe" filter
		949	function, "CBOR::XS::safe_filter", which by default only includes
		950	conversion functions that are considered "safe" by the author (but
		951	again, they can be extended by third party modules).
		952
		953	Depending on your level of paranoia, you can use the "safe" filter:
		954
		955	$cbor->filter (\&CBOR::XS::safe_filter);
		956
		957	... your own filter...
		958
		959	$cbor->filter (sub { ... do your stuffs here ... });
		960
		961	... or even no filter at all, disabling all tag decoding:
		962
		963	$cbor->filter (sub { });
		964
		965	This is never a problem for encoding, as the tag mechanism only
		966	exists in CBOR texts.
		967
		968	Resource-starving attacks: object memory usage
		969	You need to avoid resource-starving attacks. That means you should
		970	limit the size of CBOR data you accept, or make sure then when your
		971	resources run out, that's just fine (e.g. by using a separate
		972	process that can crash safely). The size of a CBOR string in octets
		973	is usually a good indication of the size of the resources required
		974	to decode it into a Perl structure. While CBOR::XS can check the
		975	size of the CBOR text (using "max_size"), it might be too late when
		976	you already have it in memory, so you might want to check the size
		977	before you accept the string.
		978
		979	As for encoding, it is possible to construct data structures that
		980	are relatively small but result in large CBOR texts (for example by
		981	having an array full of references to the same big data structure,
		982	which will all be deep-cloned during encoding by default). This is
		983	rarely an actual issue (and the worst case is still just running out
		984	of memory), but you can reduce this risk by using "allow_sharing".
		985
		986	Resource-starving attacks: stack overflows
		987	CBOR::XS recurses using the C stack when decoding objects and
		988	arrays. The C stack is a limited resource: for instance, on my amd64
		989	machine with 8MB of stack size I can decode around 180k nested
		990	arrays but only 14k nested CBOR objects (due to perl itself
		991	recursing deeply on croak to free the temporary). If that is
		992	exceeded, the program crashes. To be conservative, the default
		993	nesting limit is set to 512. If your process has a smaller stack,
		994	you should adjust this setting accordingly with the "max_depth"
		995	method.
		996
		997	Resource-starving attacks: CPU en-/decoding complexity
		998	CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat
		999	libraries to represent encode/decode bignums. These can be very slow
		1000	(as in, centuries of CPU time) and can even crash your program (and
		1001	are generally not very trustworthy). See the next section for
		1002	details.
		1003
		1004	Data breaches: leaking information in error messages
		1005	CBOR::XS might leak contents of your Perl data structures in its
		1006	error messages, so when you serialise sensitive information you
		1007	might want to make sure that exceptions thrown by CBOR::XS will not
		1008	end up in front of untrusted eyes.
		1009
		1010	Something else...
		1011	Something else could bomb you, too, that I forgot to think of. In
		1012	that case, you get to keep the pieces. I am always open for hints,
		1013	though...
		1014
		1015	BIGNUM SECURITY CONSIDERATIONS
		1016	CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and
		1017	Math::BigFloat that tries to encode the number in the simplest possible
		1018	way, that is, either a CBOR integer, a CBOR bigint/decimal fraction (tag
		1019	4) or an arbitrary-exponent decimal fraction (tag 264). Rational numbers
		1020	(Math::BigRat, tag 30) can also contain bignums as members.
		1021
		1022	CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
		1023	bigfloats (tags 5 and 265), but it will never generate these on its own.
		1024
		1025	Using the built-in Math::BigInt::Calc support, encoding and decoding
		1026	decimal fractions is generally fast. Decoding bigints can be slow for
		1027	very big numbers (tens of thousands of digits, something that could
		1028	potentially be caught by limiting the size of CBOR texts), and decoding
		1029	bigfloats or arbitrary-exponent bigfloats can be extremely slow
		1030	(minutes, decades) for large exponents (roughly 40 bit and longer).
		1031
		1032	Additionally, Math::BigInt can take advantage of other bignum libraries,
		1033	such as Math::GMP, which cannot handle big floats with large exponents,
		1034	and might simply abort or crash your program, due to their code quality.
		1035
		1036	This can be a concern if you want to parse untrusted CBOR. If it is, you
		1037	might want to disable decoding of tag 2 (bigint) and 3 (negative bigint)
		1038	types. You should also disable types 5 and 265, as these can be slow
		1039	even without bigints.
		1040
		1041	Disabling bigints will also partially or fully disable types that rely
		1042	on them, e.g. rational numbers that use bignums.
		1043
		1044	CBOR IMPLEMENTATION NOTES
		1045	This section contains some random implementation notes. They do not
		1046	describe guaranteed behaviour, but merely behaviour as-is implemented
		1047	right now.
		1048
		1049	64 bit integers are only properly decoded when Perl was built with 64
		1050	bit support.
		1051
		1052	Strings and arrays are encoded with a definite length. Hashes as well,
		1053	unless they are tied (or otherwise magical).
		1054
		1055	Only the double data type is supported for NV data types - when Perl
		1056	uses long double to represent floating point values, they might not be
		1057	encoded properly. Half precision types are accepted, but not encoded.
		1058
		1059	Strict mode and canonical mode are not implemented.
		1060
		1061	LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
		1062	On perls that were built without 64 bit integer support (these are rare
		1063	nowadays, even on 32 bit architectures, as all major Perl distributions
		1064	are built with 64 bit integer support), support for any kind of 64 bit
		1065	integer in CBOR is very limited - most likely, these 64 bit values will
		1066	be truncated, corrupted, or otherwise not decoded correctly. This also
		1067	includes string, array and map sizes that are stored as 64 bit integers.
		1068
		1069	THREADS
		1070	This module is not guaranteed to be thread safe and there are no plans
		1071	to change this until Perl gets thread support (as opposed to the
		1072	horribly slow so-called "threads" which are simply slow and bloated
		1073	process simulations - use fork, it's much faster, cheaper, better).
		1074
		1075	(It might actually work, but you have been warned).
		1076
		1077	BUGS
		1078	While the goal of this module is to be correct, that unfortunately does
		1079	not mean it's bug-free, only that I think its design is bug-free. If you
		1080	keep reporting bugs they will be fixed swiftly, though.
		1081
		1082	Please refrain from using rt.cpan.org or any other bug reporting
		1083	service. I put the contact address into my modules for a reason.
		1084
		1085	SEE ALSO
		1086	The JSON and JSON::XS modules that do similar, but human-readable,
		1087	serialisation.
		1088
		1089	The Types::Serialiser module provides the data model for true, false and
		1090	error values.
		1091
		1092	AUTHOR
		1093	Marc Lehmann <schmorp@schmorp.de>
		1094	http://home.schmorp.de/
		1095

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing CBOR-XS/README (file contents): Revision 1.1 by root, Fri Oct 25 23:09:45 2013 UTC vs. Revision 1.18 by root, Wed Dec 7 14:14:30 2016 UTC

Diff Legend

Comparing CBOR-XS/README (file contents):
Revision 1.1 by root, Fri Oct 25 23:09:45 2013 UTC vs.
Revision 1.18 by root, Wed Dec 7 14:14:30 2016 UTC