CBOR-XS/XS.pm

=head1 NAME

CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)

=encoding utf-8

=head1 SYNOPSIS

 use CBOR::XS;

 $binary_cbor_data = encode_cbor $perl_value;
 $perl_value       = decode_cbor $binary_cbor_data;

 # OO-interface

 $coder = CBOR::XS->new;
 $binary_cbor_data = $coder->encode ($perl_value);
 $perl_value       = $coder->decode ($binary_cbor_data);

 # prefix decoding

 my $many_cbor_strings = ...;
 while (length $many_cbor_strings) {
    my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
    # data was decoded
    substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
 }

=head1 DESCRIPTION

WARNING! THIS IS A PRE-ALPHA RELEASE! IT WILL CRASH, CORRUPT YOUR DATA
AND EAT YOUR CHILDREN! (Actually, apart from being untested and a bit
feature-limited, it might already be useful).

This module converts Perl data structures to the Concise Binary Object
Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
format that aims to use a superset of the JSON data model, i.e. when you
can represent something in JSON, you should be able to represent it in
CBOR.

This makes it a faster and more compact binary alternative to JSON, with
the added ability of supporting serialising of perl objects.

The primary goal of this module is to be I<correct> and the secondary goal
is to be I<fast>. To reach the latter goal it was written in C.

See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
vice versa.

=cut

package CBOR::XS;

use common::sense;

our $VERSION = 0.03;
our @ISA = qw(Exporter);

our @EXPORT = qw(encode_cbor decode_cbor);

use Exporter;
use XSLoader;

use Types::Serialiser;

our $MAGIC = "\xd9\xd9\xf7";

=head1 FUNCTIONAL INTERFACE

The following convenience methods are provided by this module. They are
exported by default:

=over 4

=item $cbor_data = encode_cbor $perl_scalar

Converts the given Perl data structure to CBOR representation. Croaks on
error.

=item $perl_scalar = decode_cbor $cbor_data

The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
returning the resulting perl scalar. Croaks on error.

=back


=head1 OBJECT-ORIENTED INTERFACE

The object oriented interface lets you configure your own encoding or
decoding style, within the limits of supported formats.

=over 4

=item $cbor = new CBOR::XS

Creates a new CBOR::XS object that can be used to de/encode CBOR
strings. All boolean flags described below are by default I<disabled>.

The mutators for flags all return the CBOR object again and thus calls can
be chained:

#TODO
   my $cbor = CBOR::XS->new->encode ({a => [1,2]});

=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])

=item $max_depth = $cbor->get_max_depth

Sets the maximum nesting level (default C<512>) accepted while encoding
or decoding. If a higher nesting level is detected in CBOR data or a Perl
data structure, then the encoder and decoder will stop and croak at that
point.

Nesting level is defined by number of hash- or arrayrefs that the encoder
needs to traverse to reach a given point or the number of C<{> or C<[>
characters without their matching closing parenthesis crossed to reach a
given character in a string.

Setting the maximum depth to one disallows any nesting, so that ensures
that the object is only a single hash/object or array.

If no argument is given, the highest possible setting will be used, which
is rarely useful.

Note that nesting is implemented by recursion in C. The default value has
been chosen to be as large as typical operating systems allow without
crashing.

See SECURITY CONSIDERATIONS, below, for more info on why this is useful.

=item $cbor = $cbor->max_size ([$maximum_string_size])

=item $max_size = $cbor->get_max_size

Set the maximum length a CBOR string may have (in bytes) where decoding
is being attempted. The default is C<0>, meaning no limit. When C<decode>
is called on a string that is longer then this many bytes, it will not
attempt to decode the string but throw an exception. This setting has no
effect on C<encode> (yet).

If no argument is given, the limit check will be deactivated (same as when
C<0> is specified).

See SECURITY CONSIDERATIONS, below, for more info on why this is useful.

=item $cbor_data = $cbor->encode ($perl_scalar)

Converts the given Perl data structure (a scalar value) to its CBOR
representation.

=item $perl_scalar = $cbor->decode ($cbor_data)

The opposite of C<encode>: expects CBOR data and tries to parse it,
returning the resulting simple scalar or reference. Croaks on error.

=item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)

This works like the C<decode> method, but instead of raising an exception
when there is trailing garbage after the CBOR string, it will silently
stop parsing there and return the number of characters consumed so far.

This is useful if your CBOR texts are not delimited by an outer protocol
and you need to know where the first CBOR string ends amd the next one
starts.

   CBOR::XS->new->decode_prefix ("......")
   => ("...", 3)

=back


=head1 MAPPING

This section describes how CBOR::XS maps Perl values to CBOR values and
vice versa. These mappings are designed to "do the right thing" in most
circumstances automatically, preserving round-tripping characteristics
(what you put in comes out as something equivalent).

For the more enlightened: note that in the following descriptions,
lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
refers to the abstract Perl language itself.


=head2 CBOR -> PERL

=over 4

=item integers

CBOR integers become (numeric) perl scalars. On perls without 64 bit
support, 64 bit integers will be truncated or otherwise corrupted.

=item byte strings

Byte strings will become octet strings in Perl (the byte values 0..255
will simply become characters of the same value in Perl).

=item UTF-8 strings

UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
decoded into proper Unicode code points. At the moment, the validity of
the UTF-8 octets will not be validated - corrupt input will result in
corrupted Perl strings.

=item arrays, maps

CBOR arrays and CBOR maps will be converted into references to a Perl
array or hash, respectively. The keys of the map will be stringified
during this process.

=item null

CBOR null becomes C<undef> in Perl.

=item true, false, undefined

These CBOR values become C<Types:Serialiser::true>,
C<Types:Serialiser::false> and C<Types::Serialiser::error>,
respectively. They are overloaded to act almost exactly like the numbers
C<1> and C<0> (for true and false) or to throw an exception on access (for
error). See the L<Types::Serialiser> manpage for details.

=item CBOR tag 256 (perl object)

The tag value C<256> (TODO: pending iana registration) will be used
to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
SERIALISATION", below, for details.

=item CBOR tag 55799 (magic header)

The tag 55799 is ignored (this tag implements the magic header).

=item other CBOR tags

Tagged items consists of a numeric tag and another CBOR value. Tags not
handled internally are currently converted into a L<CBOR::XS::Tagged>
object, which is simply a blessed array reference consisting of the
numeric tag value followed by the (decoded) CBOR value.

In the future, support for user-supplied conversions might get added.

=item anything else

Anything else (e.g. unsupported simple values) will raise a decoding
error.

=back


=head2 PERL -> CBOR

The mapping from Perl to CBOR is slightly more difficult, as Perl is a
truly typeless language, so we can only guess which CBOR type is meant by
a Perl value.

=over 4

=item hash references

Perl hash references become CBOR maps. As there is no inherent ordering in
hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
order.

Currently, tied hashes will use the indefinite-length format, while normal
hashes will use the fixed-length format.

=item array references

Perl array references become fixed-length CBOR arrays.

=item other references

Other unblessed references are generally not allowed and will cause an
exception to be thrown, except for references to the integers C<0> and
C<1>, which get turned into false and true in CBOR.

=item CBOR::XS::Tagged objects

Objects of this type must be arrays consisting of a single C<[tag, value]>
pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
encoded as appropriate for the value.

=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error

These special values become CBOR true, CBOR false and CBOR undefined
values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
if you want.

=item other blessed objects

Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
"OBJECT SERIALISATION", below, for details.

=item simple scalars

TODO
Simple Perl scalars (any scalar that is not a reference) are the most
difficult objects to encode: CBOR::XS will encode undefined scalars as
CBOR null values, scalars that have last been used in a string context
before encoding as CBOR strings, and anything else as number value:

   # dump as number
   encode_cbor [2]                      # yields [2]
   encode_cbor [-3.0e17]                # yields [-3e+17]
   my $value = 5; encode_cbor [$value]  # yields [5]

   # used as string, so dump as string
   print $value;
   encode_cbor [$value]                 # yields ["5"]

   # undef becomes null
   encode_cbor [undef]                  # yields [null]

You can force the type to be a CBOR string by stringifying it:

   my $x = 3.1; # some variable containing a number
   "$x";        # stringified
   $x .= "";    # another, more awkward way to stringify
   print $x;    # perl does it for you, too, quite often

You can force the type to be a CBOR number by numifying it:

   my $x = "3"; # some variable containing a string
   $x += 0;     # numify it, ensuring it will be dumped as a number
   $x *= 1;     # same thing, the choice is yours.

You can not currently force the type in other, less obscure, ways. Tell me
if you need this capability (but don't forget to explain why it's needed
:).

Perl values that seem to be integers generally use the shortest possible
representation. Floating-point values will use either the IEEE single
format if possible without loss of precision, otherwise the IEEE double
format will be used. Perls that use formats other than IEEE double to
represent numerical values are supported, but might suffer loss of
precision.

=back

=head2 OBJECT SERIALISATION

This module knows two way to serialise a Perl object: The CBOR-specific
way, and the generic way.

Whenever the encoder encounters a Perl object that it cnanot serialise
directly (most of them), it will first look up the C<TO_CBOR> method on
it.

If it has a C<TO_CBOR> method, it will call it with the object as only
argument, and expects exactly one return value, which it will then
substitute and encode it in the place of the object.

Otherwise, it will look up the C<FREEZE> method. If it exists, it will
call it with the object as first argument, and the constant string C<CBOR>
as the second argument, to distinguish it from other serialisers.

The C<FREEZE> method can return any number of values (i.e. zero or
more). These will be encoded as CBOR perl object, together with the
classname.

If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
with an error.

Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
objects encoded via C<FREEZE> can be decoded using the following protocol:

When an encoded CBOR perl object is encountered by the decoder, it will
look up the C<THAW> method, by using the stored classname, and will fail
if the method cannot be found.

After the lookup it will call the C<THAW> method with the stored classname
as first argument, the constant string C<CBOR> as second argument, and all
values returned by C<FREEZE> as remaining arguments.

=head4 EXAMPLES

Here is an example C<TO_CBOR> method:

   sub My::Object::TO_CBOR {
      my ($obj) = @_;

      ["this is a serialised My::Object object", $obj->{id}]
   }

When a C<My::Object> is encoded to CBOR, it will instead encode a simple
array with two members: a string, and the "object id". Decoding this CBOR
string will yield a normal perl array reference in place of the object.

A more useful and practical example would be a serialisation method for
the URI module. CBOR has a custom tag value for URIs, namely 32:

  sub URI::TO_CBOR {
     my ($self) = @_;
     my $uri = "$self"; # stringify uri
     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
     CBOR::XS::tagged 32, "$_[0]"
  }

This will encode URIs as a UTF-8 string with tag 32, which indicates an
URI.

Decoding such an URI will not (currently) give you an URI object, but
instead a CBOR::XS::Tagged object with tag number 32 and the string -
exactly what was returned by C<TO_CBOR>.

To serialise an object so it can automatically be deserialised, you need
to use C<FREEZE> and C<THAW>. To take the URI module as example, this
would be a possible implementation:

   sub URI::FREEZE {
      my ($self, $serialiser) = @_;
      "$self" # encode url string
   }

   sub URI::THAW {
      my ($class, $serialiser, $uri) = @_;

      $class->new ($uri)
   }

Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
example, a C<FREEZE> method that returns "type", "id" and "variant" values
would cause an invocation of C<THAW> with 5 arguments:

   sub My::Object::FREEZE {
      my ($self, $serialiser) = @_;

      ($self->{type}, $self->{id}, $self->{variant})
   }

   sub My::Object::THAW {
      my ($class, $serialiser, $type, $id, $variant) = @_;

      $class-<new (type => $type, id => $id, variant => $variant)
   }


=head1 MAGIC HEADER

There is no way to distinguish CBOR from other formats
programmatically. To make it easier to distinguish CBOR from other
formats, the CBOR specification has a special "magic string" that can be
prepended to any CBOR string without changing it's meaning.

This string is available as C<$CBOR::XS::MAGIC>. This module does not
prepend this string tot he CBOR data it generates, but it will ignroe it
if present, so users can prepend this string as a "file type" indicator as
required.


=head1 CBOR and JSON

CBOR is supposed to implement a superset of the JSON data model, and is,
with some coercion, able to represent all JSON texts (something that other
"binary JSON" formats such as BSON generally do not support).

CBOR implements some extra hints and support for JSON interoperability,
and the spec offers further guidance for conversion between CBOR and
JSON. None of this is currently implemented in CBOR, and the guidelines
in the spec do not result in correct round-tripping of data. If JSON
interoperability is improved in the future, then the goal will be to
ensure that decoded JSON data will round-trip encoding and decoding to
CBOR intact.


=head1 SECURITY CONSIDERATIONS

When you are using CBOR in a protocol, talking to untrusted potentially
hostile creatures requires relatively few measures.

First of all, your CBOR decoder should be secure, that is, should not have
any buffer overflows. Obviously, this module should ensure that and I am
trying hard on making that true, but you never know.

Second, you need to avoid resource-starving attacks. That means you should
limit the size of CBOR data you accept, or make sure then when your
resources run out, that's just fine (e.g. by using a separate process that
can crash safely). The size of a CBOR string in octets is usually a good
indication of the size of the resources required to decode it into a Perl
structure. While CBOR::XS can check the size of the CBOR text, it might be
too late when you already have it in memory, so you might want to check
the size before you accept the string.

Third, CBOR::XS recurses using the C stack when decoding objects and
arrays. The C stack is a limited resource: for instance, on my amd64
machine with 8MB of stack size I can decode around 180k nested arrays but
only 14k nested CBOR objects (due to perl itself recursing deeply on croak
to free the temporary). If that is exceeded, the program crashes. To be
conservative, the default nesting limit is set to 512. If your process
has a smaller stack, you should adjust this setting accordingly with the
C<max_depth> method.

Something else could bomb you, too, that I forgot to think of. In that
case, you get to keep the pieces. I am always open for hints, though...

Also keep in mind that CBOR::XS might leak contents of your Perl data
structures in its error messages, so when you serialise sensitive
information you might want to make sure that exceptions thrown by CBOR::XS
will not end up in front of untrusted eyes.

=head1 CBOR IMPLEMENTATION NOTES

This section contains some random implementation notes. They do not
describe guaranteed behaviour, but merely behaviour as-is implemented
right now.

64 bit integers are only properly decoded when Perl was built with 64 bit
support.

Strings and arrays are encoded with a definite length. Hashes as well,
unless they are tied (or otherwise magical).

Only the double data type is supported for NV data types - when Perl uses
long double to represent floating point values, they might not be encoded
properly. Half precision types are accepted, but not encoded.

Strict mode and canonical mode are not implemented.


=head1 THREADS

This module is I<not> guaranteed to be thread safe and there are no
plans to change this until Perl gets thread support (as opposed to the
horribly slow so-called "threads" which are simply slow and bloated
process simulations - use fork, it's I<much> faster, cheaper, better).

(It might actually work, but you have been warned).


=head1 BUGS

While the goal of this module is to be correct, that unfortunately does
not mean it's bug-free, only that I think its design is bug-free. If you
keep reporting bugs they will be fixed swiftly, though.

Please refrain from using rt.cpan.org or any other bug reporting
service. I put the contact address into my modules for a reason.

=cut

XSLoader::load "CBOR::XS", $VERSION;

=head1 SEE ALSO

The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
serialisation.

The L<Types::Serialiser> module provides the data model for true, false
and error values.

=head1 AUTHOR

 Marc Lehmann <schmorp@schmorp.de>
 http://home.schmorp.de/

=cut

1

Revision:	1.7
Committed:	Sun Oct 27 22:35:15 2013 UTC (10 years, 6 months ago) by root
Branch:	MAIN
Changes since 1.6:	+105 -16 lines
Log Message:	* empty log message *
#	Content
1	=head1 NAME
2
3	CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4
5	=encoding utf-8
6
7	=head1 SYNOPSIS
8
9	use CBOR::XS;
10
11	$binary_cbor_data = encode_cbor $perl_value;
12	$perl_value = decode_cbor $binary_cbor_data;
13
14	# OO-interface
15
16	$coder = CBOR::XS->new;
17	$binary_cbor_data = $coder->encode ($perl_value);
18	$perl_value = $coder->decode ($binary_cbor_data);
19
20	# prefix decoding
21
22	my $many_cbor_strings = ...;
23	while (length $many_cbor_strings) {
24	my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25	# data was decoded
26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27	}
28
29	=head1 DESCRIPTION
30
31	WARNING! THIS IS A PRE-ALPHA RELEASE! IT WILL CRASH, CORRUPT YOUR DATA
32	AND EAT YOUR CHILDREN! (Actually, apart from being untested and a bit
33	feature-limited, it might already be useful).
34
35	This module converts Perl data structures to the Concise Binary Object
36	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
37	format that aims to use a superset of the JSON data model, i.e. when you
38	can represent something in JSON, you should be able to represent it in
39	CBOR.
40
41	This makes it a faster and more compact binary alternative to JSON, with
42	the added ability of supporting serialising of perl objects.
43
44	The primary goal of this module is to be I<correct> and the secondary goal
45	is to be I<fast>. To reach the latter goal it was written in C.
46
47	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
48	vice versa.
49
50	=cut
51
52	package CBOR::XS;
53
54	use common::sense;
55
56	our $VERSION = 0.03;
57	our @ISA = qw(Exporter);
58
59	our @EXPORT = qw(encode_cbor decode_cbor);
60
61	use Exporter;
62	use XSLoader;
63
64	use Types::Serialiser;
65
66	our $MAGIC = "\xd9\xd9\xf7";
67
68	=head1 FUNCTIONAL INTERFACE
69
70	The following convenience methods are provided by this module. They are
71	exported by default:
72
73	=over 4
74
75	=item $cbor_data = encode_cbor $perl_scalar
76
77	Converts the given Perl data structure to CBOR representation. Croaks on
78	error.
79
80	=item $perl_scalar = decode_cbor $cbor_data
81
82	The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
83	returning the resulting perl scalar. Croaks on error.
84
85	=back
86
87
88	=head1 OBJECT-ORIENTED INTERFACE
89
90	The object oriented interface lets you configure your own encoding or
91	decoding style, within the limits of supported formats.
92
93	=over 4
94
95	=item $cbor = new CBOR::XS
96
97	Creates a new CBOR::XS object that can be used to de/encode CBOR
98	strings. All boolean flags described below are by default I<disabled>.
99
100	The mutators for flags all return the CBOR object again and thus calls can
101	be chained:
102
103	#TODO
104	my $cbor = CBOR::XS->new->encode ({a => [1,2]});
105
106	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
107
108	=item $max_depth = $cbor->get_max_depth
109
110	Sets the maximum nesting level (default C<512>) accepted while encoding
111	or decoding. If a higher nesting level is detected in CBOR data or a Perl
112	data structure, then the encoder and decoder will stop and croak at that
113	point.
114
115	Nesting level is defined by number of hash- or arrayrefs that the encoder
116	needs to traverse to reach a given point or the number of C<{> or C<[>
117	characters without their matching closing parenthesis crossed to reach a
118	given character in a string.
119
120	Setting the maximum depth to one disallows any nesting, so that ensures
121	that the object is only a single hash/object or array.
122
123	If no argument is given, the highest possible setting will be used, which
124	is rarely useful.
125
126	Note that nesting is implemented by recursion in C. The default value has
127	been chosen to be as large as typical operating systems allow without
128	crashing.
129
130	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
131
132	=item $cbor = $cbor->max_size ([$maximum_string_size])
133
134	=item $max_size = $cbor->get_max_size
135
136	Set the maximum length a CBOR string may have (in bytes) where decoding
137	is being attempted. The default is C<0>, meaning no limit. When C<decode>
138	is called on a string that is longer then this many bytes, it will not
139	attempt to decode the string but throw an exception. This setting has no
140	effect on C<encode> (yet).
141
142	If no argument is given, the limit check will be deactivated (same as when
143	C<0> is specified).
144
145	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
146
147	=item $cbor_data = $cbor->encode ($perl_scalar)
148
149	Converts the given Perl data structure (a scalar value) to its CBOR
150	representation.
151
152	=item $perl_scalar = $cbor->decode ($cbor_data)
153
154	The opposite of C<encode>: expects CBOR data and tries to parse it,
155	returning the resulting simple scalar or reference. Croaks on error.
156
157	=item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
158
159	This works like the C<decode> method, but instead of raising an exception
160	when there is trailing garbage after the CBOR string, it will silently
161	stop parsing there and return the number of characters consumed so far.
162
163	This is useful if your CBOR texts are not delimited by an outer protocol
164	and you need to know where the first CBOR string ends amd the next one
165	starts.
166
167	CBOR::XS->new->decode_prefix ("......")
168	=> ("...", 3)
169
170	=back
171
172
173	=head1 MAPPING
174
175	This section describes how CBOR::XS maps Perl values to CBOR values and
176	vice versa. These mappings are designed to "do the right thing" in most
177	circumstances automatically, preserving round-tripping characteristics
178	(what you put in comes out as something equivalent).
179
180	For the more enlightened: note that in the following descriptions,
181	lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
182	refers to the abstract Perl language itself.
183
184
185	=head2 CBOR -> PERL
186
187	=over 4
188
189	=item integers
190
191	CBOR integers become (numeric) perl scalars. On perls without 64 bit
192	support, 64 bit integers will be truncated or otherwise corrupted.
193
194	=item byte strings
195
196	Byte strings will become octet strings in Perl (the byte values 0..255
197	will simply become characters of the same value in Perl).
198
199	=item UTF-8 strings
200
201	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
202	decoded into proper Unicode code points. At the moment, the validity of
203	the UTF-8 octets will not be validated - corrupt input will result in
204	corrupted Perl strings.
205
206	=item arrays, maps
207
208	CBOR arrays and CBOR maps will be converted into references to a Perl
209	array or hash, respectively. The keys of the map will be stringified
210	during this process.
211
212	=item null
213
214	CBOR null becomes C<undef> in Perl.
215
216	=item true, false, undefined
217
218	These CBOR values become C<Types:Serialiser::true>,
219	C<Types:Serialiser::false> and C<Types::Serialiser::error>,
220	respectively. They are overloaded to act almost exactly like the numbers
221	C<1> and C<0> (for true and false) or to throw an exception on access (for
222	error). See the L<Types::Serialiser> manpage for details.
223
224	=item CBOR tag 256 (perl object)
225
226	The tag value C<256> (TODO: pending iana registration) will be used
227	to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
228	SERIALISATION", below, for details.
229
230	=item CBOR tag 55799 (magic header)
231
232	The tag 55799 is ignored (this tag implements the magic header).
233
234	=item other CBOR tags
235
236	Tagged items consists of a numeric tag and another CBOR value. Tags not
237	handled internally are currently converted into a L<CBOR::XS::Tagged>
238	object, which is simply a blessed array reference consisting of the
239	numeric tag value followed by the (decoded) CBOR value.
240
241	In the future, support for user-supplied conversions might get added.
242
243	=item anything else
244
245	Anything else (e.g. unsupported simple values) will raise a decoding
246	error.
247
248	=back
249
250
251	=head2 PERL -> CBOR
252
253	The mapping from Perl to CBOR is slightly more difficult, as Perl is a
254	truly typeless language, so we can only guess which CBOR type is meant by
255	a Perl value.
256
257	=over 4
258
259	=item hash references
260
261	Perl hash references become CBOR maps. As there is no inherent ordering in
262	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
263	order.
264
265	Currently, tied hashes will use the indefinite-length format, while normal
266	hashes will use the fixed-length format.
267
268	=item array references
269
270	Perl array references become fixed-length CBOR arrays.
271
272	=item other references
273
274	Other unblessed references are generally not allowed and will cause an
275	exception to be thrown, except for references to the integers C<0> and
276	C<1>, which get turned into false and true in CBOR.
277
278	=item CBOR::XS::Tagged objects
279
280	Objects of this type must be arrays consisting of a single C<[tag, value]>
281	pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
282	encoded as appropriate for the value.
283
284	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
285
286	These special values become CBOR true, CBOR false and CBOR undefined
287	values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
288	if you want.
289
290	=item other blessed objects
291
292	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
293	"OBJECT SERIALISATION", below, for details.
294
295	=item simple scalars
296
297	TODO
298	Simple Perl scalars (any scalar that is not a reference) are the most
299	difficult objects to encode: CBOR::XS will encode undefined scalars as
300	CBOR null values, scalars that have last been used in a string context
301	before encoding as CBOR strings, and anything else as number value:
302
303	# dump as number
304	encode_cbor [2] # yields [2]
305	encode_cbor [-3.0e17] # yields [-3e+17]
306	my $value = 5; encode_cbor [$value] # yields [5]
307
308	# used as string, so dump as string
309	print $value;
310	encode_cbor [$value] # yields ["5"]
311
312	# undef becomes null
313	encode_cbor [undef] # yields [null]
314
315	You can force the type to be a CBOR string by stringifying it:
316
317	my $x = 3.1; # some variable containing a number
318	"$x"; # stringified
319	$x .= ""; # another, more awkward way to stringify
320	print $x; # perl does it for you, too, quite often
321
322	You can force the type to be a CBOR number by numifying it:
323
324	my $x = "3"; # some variable containing a string
325	$x += 0; # numify it, ensuring it will be dumped as a number
326	$x *= 1; # same thing, the choice is yours.
327
328	You can not currently force the type in other, less obscure, ways. Tell me
329	if you need this capability (but don't forget to explain why it's needed
330	:).
331
332	Perl values that seem to be integers generally use the shortest possible
333	representation. Floating-point values will use either the IEEE single
334	format if possible without loss of precision, otherwise the IEEE double
335	format will be used. Perls that use formats other than IEEE double to
336	represent numerical values are supported, but might suffer loss of
337	precision.
338
339	=back
340
341	=head2 OBJECT SERIALISATION
342
343	This module knows two way to serialise a Perl object: The CBOR-specific
344	way, and the generic way.
345
346	Whenever the encoder encounters a Perl object that it cnanot serialise
347	directly (most of them), it will first look up the C<TO_CBOR> method on
348	it.
349
350	If it has a C<TO_CBOR> method, it will call it with the object as only
351	argument, and expects exactly one return value, which it will then
352	substitute and encode it in the place of the object.
353
354	Otherwise, it will look up the C<FREEZE> method. If it exists, it will
355	call it with the object as first argument, and the constant string C<CBOR>
356	as the second argument, to distinguish it from other serialisers.
357
358	The C<FREEZE> method can return any number of values (i.e. zero or
359	more). These will be encoded as CBOR perl object, together with the
360	classname.
361
362	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
363	with an error.
364
365	Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
366	objects encoded via C<FREEZE> can be decoded using the following protocol:
367
368	When an encoded CBOR perl object is encountered by the decoder, it will
369	look up the C<THAW> method, by using the stored classname, and will fail
370	if the method cannot be found.
371
372	After the lookup it will call the C<THAW> method with the stored classname
373	as first argument, the constant string C<CBOR> as second argument, and all
374	values returned by C<FREEZE> as remaining arguments.
375
376	=head4 EXAMPLES
377
378	Here is an example C<TO_CBOR> method:
379
380	sub My::Object::TO_CBOR {
381	my ($obj) = @_;
382
383	["this is a serialised My::Object object", $obj->{id}]
384	}
385
386	When a C<My::Object> is encoded to CBOR, it will instead encode a simple
387	array with two members: a string, and the "object id". Decoding this CBOR
388	string will yield a normal perl array reference in place of the object.
389
390	A more useful and practical example would be a serialisation method for
391	the URI module. CBOR has a custom tag value for URIs, namely 32:
392
393	sub URI::TO_CBOR {
394	my ($self) = @_;
395	my $uri = "$self"; # stringify uri
396	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
397	CBOR::XS::tagged 32, "$_[0]"
398	}
399
400	This will encode URIs as a UTF-8 string with tag 32, which indicates an
401	URI.
402
403	Decoding such an URI will not (currently) give you an URI object, but
404	instead a CBOR::XS::Tagged object with tag number 32 and the string -
405	exactly what was returned by C<TO_CBOR>.
406
407	To serialise an object so it can automatically be deserialised, you need
408	to use C<FREEZE> and C<THAW>. To take the URI module as example, this
409	would be a possible implementation:
410
411	sub URI::FREEZE {
412	my ($self, $serialiser) = @_;
413	"$self" # encode url string
414	}
415
416	sub URI::THAW {
417	my ($class, $serialiser, $uri) = @_;
418
419	$class->new ($uri)
420	}
421
422	Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
423	example, a C<FREEZE> method that returns "type", "id" and "variant" values
424	would cause an invocation of C<THAW> with 5 arguments:
425
426	sub My::Object::FREEZE {
427	my ($self, $serialiser) = @_;
428
429	($self->{type}, $self->{id}, $self->{variant})
430	}
431
432	sub My::Object::THAW {
433	my ($class, $serialiser, $type, $id, $variant) = @_;
434
435	$class-<new (type => $type, id => $id, variant => $variant)
436	}
437
438
439	=head1 MAGIC HEADER
440
441	There is no way to distinguish CBOR from other formats
442	programmatically. To make it easier to distinguish CBOR from other
443	formats, the CBOR specification has a special "magic string" that can be
444	prepended to any CBOR string without changing it's meaning.
445
446	This string is available as C<$CBOR::XS::MAGIC>. This module does not
447	prepend this string tot he CBOR data it generates, but it will ignroe it
448	if present, so users can prepend this string as a "file type" indicator as
449	required.
450
451
452	=head1 CBOR and JSON
453
454	CBOR is supposed to implement a superset of the JSON data model, and is,
455	with some coercion, able to represent all JSON texts (something that other
456	"binary JSON" formats such as BSON generally do not support).
457
458	CBOR implements some extra hints and support for JSON interoperability,
459	and the spec offers further guidance for conversion between CBOR and
460	JSON. None of this is currently implemented in CBOR, and the guidelines
461	in the spec do not result in correct round-tripping of data. If JSON
462	interoperability is improved in the future, then the goal will be to
463	ensure that decoded JSON data will round-trip encoding and decoding to
464	CBOR intact.
465
466
467	=head1 SECURITY CONSIDERATIONS
468
469	When you are using CBOR in a protocol, talking to untrusted potentially
470	hostile creatures requires relatively few measures.
471
472	First of all, your CBOR decoder should be secure, that is, should not have
473	any buffer overflows. Obviously, this module should ensure that and I am
474	trying hard on making that true, but you never know.
475
476	Second, you need to avoid resource-starving attacks. That means you should
477	limit the size of CBOR data you accept, or make sure then when your
478	resources run out, that's just fine (e.g. by using a separate process that
479	can crash safely). The size of a CBOR string in octets is usually a good
480	indication of the size of the resources required to decode it into a Perl
481	structure. While CBOR::XS can check the size of the CBOR text, it might be
482	too late when you already have it in memory, so you might want to check
483	the size before you accept the string.
484
485	Third, CBOR::XS recurses using the C stack when decoding objects and
486	arrays. The C stack is a limited resource: for instance, on my amd64
487	machine with 8MB of stack size I can decode around 180k nested arrays but
488	only 14k nested CBOR objects (due to perl itself recursing deeply on croak
489	to free the temporary). If that is exceeded, the program crashes. To be
490	conservative, the default nesting limit is set to 512. If your process
491	has a smaller stack, you should adjust this setting accordingly with the
492	C<max_depth> method.
493
494	Something else could bomb you, too, that I forgot to think of. In that
495	case, you get to keep the pieces. I am always open for hints, though...
496
497	Also keep in mind that CBOR::XS might leak contents of your Perl data
498	structures in its error messages, so when you serialise sensitive
499	information you might want to make sure that exceptions thrown by CBOR::XS
500	will not end up in front of untrusted eyes.
501
502	=head1 CBOR IMPLEMENTATION NOTES
503
504	This section contains some random implementation notes. They do not
505	describe guaranteed behaviour, but merely behaviour as-is implemented
506	right now.
507
508	64 bit integers are only properly decoded when Perl was built with 64 bit
509	support.
510
511	Strings and arrays are encoded with a definite length. Hashes as well,
512	unless they are tied (or otherwise magical).
513
514	Only the double data type is supported for NV data types - when Perl uses
515	long double to represent floating point values, they might not be encoded
516	properly. Half precision types are accepted, but not encoded.
517
518	Strict mode and canonical mode are not implemented.
519
520
521	=head1 THREADS
522
523	This module is I<not> guaranteed to be thread safe and there are no
524	plans to change this until Perl gets thread support (as opposed to the
525	horribly slow so-called "threads" which are simply slow and bloated
526	process simulations - use fork, it's I<much> faster, cheaper, better).
527
528	(It might actually work, but you have been warned).
529
530
531	=head1 BUGS
532
533	While the goal of this module is to be correct, that unfortunately does
534	not mean it's bug-free, only that I think its design is bug-free. If you
535	keep reporting bugs they will be fixed swiftly, though.
536
537	Please refrain from using rt.cpan.org or any other bug reporting
538	service. I put the contact address into my modules for a reason.
539
540	=cut
541
542	XSLoader::load "CBOR::XS", $VERSION;
543
544	=head1 SEE ALSO
545
546	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
547	serialisation.
548
549	The L<Types::Serialiser> module provides the data model for true, false
550	and error values.
551
552	=head1 AUTHOR
553
554	Marc Lehmann <schmorp@schmorp.de>
555	http://home.schmorp.de/
556
557	=cut
558
559	1
560