CBOR-XS/XS.pm

=head1 NAME

CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)

=encoding utf-8

=head1 SYNOPSIS

 use CBOR::XS;

 $binary_cbor_data = encode_cbor $perl_value;
 $perl_value       = decode_cbor $binary_cbor_data;

 # OO-interface

 $coder = CBOR::XS->new;
 $binary_cbor_data = $coder->encode ($perl_value);
 $perl_value       = $coder->decode ($binary_cbor_data);

 # prefix decoding

 my $many_cbor_strings = ...;
 while (length $many_cbor_strings) {
    my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
    # data was decoded
    substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
 }

=head1 DESCRIPTION

WARNING! This module is very new, and not very well tested (that's up to
you to do). Furthermore, details of the implementation might change freely
before version 1.0. And lastly, the object serialisation protocol depends
on a pending IANA assignment, and until that assignment is official, this
implementation is not interoperable with other implementations (even
future versions of this module) until the assignment is done.

You are still invited to try out CBOR, and this module.

This module converts Perl data structures to the Concise Binary Object
Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
format that aims to use a superset of the JSON data model, i.e. when you
can represent something in JSON, you should be able to represent it in
CBOR.

In short, CBOR is a faster and very compact binary alternative to JSON,
with the added ability of supporting serialisation of Perl objects. (JSON
often compresses better than CBOR though, so if you plan to compress the
data later you might want to compare both formats first).

The primary goal of this module is to be I<correct> and the secondary goal
is to be I<fast>. To reach the latter goal it was written in C.

See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
vice versa.

=cut

package CBOR::XS;

use common::sense;

our $VERSION = 0.05;
our @ISA = qw(Exporter);

our @EXPORT = qw(encode_cbor decode_cbor);

use Exporter;
use XSLoader;

use Types::Serialiser;

our $MAGIC = "\xd9\xd9\xf7";

=head1 FUNCTIONAL INTERFACE

The following convenience methods are provided by this module. They are
exported by default:

=over 4

=item $cbor_data = encode_cbor $perl_scalar

Converts the given Perl data structure to CBOR representation. Croaks on
error.

=item $perl_scalar = decode_cbor $cbor_data

The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
returning the resulting perl scalar. Croaks on error.

=back


=head1 OBJECT-ORIENTED INTERFACE

The object oriented interface lets you configure your own encoding or
decoding style, within the limits of supported formats.

=over 4

=item $cbor = new CBOR::XS

Creates a new CBOR::XS object that can be used to de/encode CBOR
strings. All boolean flags described below are by default I<disabled>.

The mutators for flags all return the CBOR object again and thus calls can
be chained:

#TODO
   my $cbor = CBOR::XS->new->encode ({a => [1,2]});

=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])

=item $max_depth = $cbor->get_max_depth

Sets the maximum nesting level (default C<512>) accepted while encoding
or decoding. If a higher nesting level is detected in CBOR data or a Perl
data structure, then the encoder and decoder will stop and croak at that
point.

Nesting level is defined by number of hash- or arrayrefs that the encoder
needs to traverse to reach a given point or the number of C<{> or C<[>
characters without their matching closing parenthesis crossed to reach a
given character in a string.

Setting the maximum depth to one disallows any nesting, so that ensures
that the object is only a single hash/object or array.

If no argument is given, the highest possible setting will be used, which
is rarely useful.

Note that nesting is implemented by recursion in C. The default value has
been chosen to be as large as typical operating systems allow without
crashing.

See SECURITY CONSIDERATIONS, below, for more info on why this is useful.

=item $cbor = $cbor->max_size ([$maximum_string_size])

=item $max_size = $cbor->get_max_size

Set the maximum length a CBOR string may have (in bytes) where decoding
is being attempted. The default is C<0>, meaning no limit. When C<decode>
is called on a string that is longer then this many bytes, it will not
attempt to decode the string but throw an exception. This setting has no
effect on C<encode> (yet).

If no argument is given, the limit check will be deactivated (same as when
C<0> is specified).

See SECURITY CONSIDERATIONS, below, for more info on why this is useful.

=item $cbor_data = $cbor->encode ($perl_scalar)

Converts the given Perl data structure (a scalar value) to its CBOR
representation.

=item $perl_scalar = $cbor->decode ($cbor_data)

The opposite of C<encode>: expects CBOR data and tries to parse it,
returning the resulting simple scalar or reference. Croaks on error.

=item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)

This works like the C<decode> method, but instead of raising an exception
when there is trailing garbage after the CBOR string, it will silently
stop parsing there and return the number of characters consumed so far.

This is useful if your CBOR texts are not delimited by an outer protocol
and you need to know where the first CBOR string ends amd the next one
starts.

   CBOR::XS->new->decode_prefix ("......")
   => ("...", 3)

=back


=head1 MAPPING

This section describes how CBOR::XS maps Perl values to CBOR values and
vice versa. These mappings are designed to "do the right thing" in most
circumstances automatically, preserving round-tripping characteristics
(what you put in comes out as something equivalent).

For the more enlightened: note that in the following descriptions,
lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
refers to the abstract Perl language itself.


=head2 CBOR -> PERL

=over 4

=item integers

CBOR integers become (numeric) perl scalars. On perls without 64 bit
support, 64 bit integers will be truncated or otherwise corrupted.

=item byte strings

Byte strings will become octet strings in Perl (the byte values 0..255
will simply become characters of the same value in Perl).

=item UTF-8 strings

UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
decoded into proper Unicode code points. At the moment, the validity of
the UTF-8 octets will not be validated - corrupt input will result in
corrupted Perl strings.

=item arrays, maps

CBOR arrays and CBOR maps will be converted into references to a Perl
array or hash, respectively. The keys of the map will be stringified
during this process.

=item null

CBOR null becomes C<undef> in Perl.

=item true, false, undefined

These CBOR values become C<Types:Serialiser::true>,
C<Types:Serialiser::false> and C<Types::Serialiser::error>,
respectively. They are overloaded to act almost exactly like the numbers
C<1> and C<0> (for true and false) or to throw an exception on access (for
error). See the L<Types::Serialiser> manpage for details.

=item CBOR tag 256 (perl object)

The tag value C<256> (TODO: pending iana registration) will be used
to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
SERIALISATION", below, for details.

=item CBOR tag 55799 (magic header)

The tag 55799 is ignored (this tag implements the magic header).

=item other CBOR tags

Tagged items consists of a numeric tag and another CBOR value. Tags not
handled internally are currently converted into a L<CBOR::XS::Tagged>
object, which is simply a blessed array reference consisting of the
numeric tag value followed by the (decoded) CBOR value.

In the future, support for user-supplied conversions might get added.

=item anything else

Anything else (e.g. unsupported simple values) will raise a decoding
error.

=back


=head2 PERL -> CBOR

The mapping from Perl to CBOR is slightly more difficult, as Perl is a
truly typeless language, so we can only guess which CBOR type is meant by
a Perl value.

=over 4

=item hash references

Perl hash references become CBOR maps. As there is no inherent ordering in
hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
order.

Currently, tied hashes will use the indefinite-length format, while normal
hashes will use the fixed-length format.

=item array references

Perl array references become fixed-length CBOR arrays.

=item other references

Other unblessed references are generally not allowed and will cause an
exception to be thrown, except for references to the integers C<0> and
C<1>, which get turned into false and true in CBOR.

=item CBOR::XS::Tagged objects

Objects of this type must be arrays consisting of a single C<[tag, value]>
pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
encoded as appropriate for the value.

=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error

These special values become CBOR true, CBOR false and CBOR undefined
values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
if you want.

=item other blessed objects

Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
"OBJECT SERIALISATION", below, for details.

=item simple scalars

TODO
Simple Perl scalars (any scalar that is not a reference) are the most
difficult objects to encode: CBOR::XS will encode undefined scalars as
CBOR null values, scalars that have last been used in a string context
before encoding as CBOR strings, and anything else as number value:

   # dump as number
   encode_cbor [2]                      # yields [2]
   encode_cbor [-3.0e17]                # yields [-3e+17]
   my $value = 5; encode_cbor [$value]  # yields [5]

   # used as string, so dump as string
   print $value;
   encode_cbor [$value]                 # yields ["5"]

   # undef becomes null
   encode_cbor [undef]                  # yields [null]

You can force the type to be a CBOR string by stringifying it:

   my $x = 3.1; # some variable containing a number
   "$x";        # stringified
   $x .= "";    # another, more awkward way to stringify
   print $x;    # perl does it for you, too, quite often

You can force the type to be a CBOR number by numifying it:

   my $x = "3"; # some variable containing a string
   $x += 0;     # numify it, ensuring it will be dumped as a number
   $x *= 1;     # same thing, the choice is yours.

You can not currently force the type in other, less obscure, ways. Tell me
if you need this capability (but don't forget to explain why it's needed
:).

Perl values that seem to be integers generally use the shortest possible
representation. Floating-point values will use either the IEEE single
format if possible without loss of precision, otherwise the IEEE double
format will be used. Perls that use formats other than IEEE double to
represent numerical values are supported, but might suffer loss of
precision.

=back

=head2 OBJECT SERIALISATION

This module knows two way to serialise a Perl object: The CBOR-specific
way, and the generic way.

Whenever the encoder encounters a Perl object that it cnanot serialise
directly (most of them), it will first look up the C<TO_CBOR> method on
it.

If it has a C<TO_CBOR> method, it will call it with the object as only
argument, and expects exactly one return value, which it will then
substitute and encode it in the place of the object.

Otherwise, it will look up the C<FREEZE> method. If it exists, it will
call it with the object as first argument, and the constant string C<CBOR>
as the second argument, to distinguish it from other serialisers.

The C<FREEZE> method can return any number of values (i.e. zero or
more). These will be encoded as CBOR perl object, together with the
classname.

If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
with an error.

Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
objects encoded via C<FREEZE> can be decoded using the following protocol:

When an encoded CBOR perl object is encountered by the decoder, it will
look up the C<THAW> method, by using the stored classname, and will fail
if the method cannot be found.

After the lookup it will call the C<THAW> method with the stored classname
as first argument, the constant string C<CBOR> as second argument, and all
values returned by C<FREEZE> as remaining arguments.

=head4 EXAMPLES

Here is an example C<TO_CBOR> method:

   sub My::Object::TO_CBOR {
      my ($obj) = @_;

      ["this is a serialised My::Object object", $obj->{id}]
   }

When a C<My::Object> is encoded to CBOR, it will instead encode a simple
array with two members: a string, and the "object id". Decoding this CBOR
string will yield a normal perl array reference in place of the object.

A more useful and practical example would be a serialisation method for
the URI module. CBOR has a custom tag value for URIs, namely 32:

  sub URI::TO_CBOR {
     my ($self) = @_;
     my $uri = "$self"; # stringify uri
     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
     CBOR::XS::tagged 32, "$_[0]"
  }

This will encode URIs as a UTF-8 string with tag 32, which indicates an
URI.

Decoding such an URI will not (currently) give you an URI object, but
instead a CBOR::XS::Tagged object with tag number 32 and the string -
exactly what was returned by C<TO_CBOR>.

To serialise an object so it can automatically be deserialised, you need
to use C<FREEZE> and C<THAW>. To take the URI module as example, this
would be a possible implementation:

   sub URI::FREEZE {
      my ($self, $serialiser) = @_;
      "$self" # encode url string
   }

   sub URI::THAW {
      my ($class, $serialiser, $uri) = @_;

      $class->new ($uri)
   }

Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
example, a C<FREEZE> method that returns "type", "id" and "variant" values
would cause an invocation of C<THAW> with 5 arguments:

   sub My::Object::FREEZE {
      my ($self, $serialiser) = @_;

      ($self->{type}, $self->{id}, $self->{variant})
   }

   sub My::Object::THAW {
      my ($class, $serialiser, $type, $id, $variant) = @_;

      $class-<new (type => $type, id => $id, variant => $variant)
   }


=head1 MAGIC HEADER

There is no way to distinguish CBOR from other formats
programmatically. To make it easier to distinguish CBOR from other
formats, the CBOR specification has a special "magic string" that can be
prepended to any CBOR string without changing it's meaning.

This string is available as C<$CBOR::XS::MAGIC>. This module does not
prepend this string tot he CBOR data it generates, but it will ignroe it
if present, so users can prepend this string as a "file type" indicator as
required.


=head1 CBOR and JSON

CBOR is supposed to implement a superset of the JSON data model, and is,
with some coercion, able to represent all JSON texts (something that other
"binary JSON" formats such as BSON generally do not support).

CBOR implements some extra hints and support for JSON interoperability,
and the spec offers further guidance for conversion between CBOR and
JSON. None of this is currently implemented in CBOR, and the guidelines
in the spec do not result in correct round-tripping of data. If JSON
interoperability is improved in the future, then the goal will be to
ensure that decoded JSON data will round-trip encoding and decoding to
CBOR intact.


=head1 SECURITY CONSIDERATIONS

When you are using CBOR in a protocol, talking to untrusted potentially
hostile creatures requires relatively few measures.

First of all, your CBOR decoder should be secure, that is, should not have
any buffer overflows. Obviously, this module should ensure that and I am
trying hard on making that true, but you never know.

Second, you need to avoid resource-starving attacks. That means you should
limit the size of CBOR data you accept, or make sure then when your
resources run out, that's just fine (e.g. by using a separate process that
can crash safely). The size of a CBOR string in octets is usually a good
indication of the size of the resources required to decode it into a Perl
structure. While CBOR::XS can check the size of the CBOR text, it might be
too late when you already have it in memory, so you might want to check
the size before you accept the string.

Third, CBOR::XS recurses using the C stack when decoding objects and
arrays. The C stack is a limited resource: for instance, on my amd64
machine with 8MB of stack size I can decode around 180k nested arrays but
only 14k nested CBOR objects (due to perl itself recursing deeply on croak
to free the temporary). If that is exceeded, the program crashes. To be
conservative, the default nesting limit is set to 512. If your process
has a smaller stack, you should adjust this setting accordingly with the
C<max_depth> method.

Something else could bomb you, too, that I forgot to think of. In that
case, you get to keep the pieces. I am always open for hints, though...

Also keep in mind that CBOR::XS might leak contents of your Perl data
structures in its error messages, so when you serialise sensitive
information you might want to make sure that exceptions thrown by CBOR::XS
will not end up in front of untrusted eyes.

=head1 CBOR IMPLEMENTATION NOTES

This section contains some random implementation notes. They do not
describe guaranteed behaviour, but merely behaviour as-is implemented
right now.

64 bit integers are only properly decoded when Perl was built with 64 bit
support.

Strings and arrays are encoded with a definite length. Hashes as well,
unless they are tied (or otherwise magical).

Only the double data type is supported for NV data types - when Perl uses
long double to represent floating point values, they might not be encoded
properly. Half precision types are accepted, but not encoded.

Strict mode and canonical mode are not implemented.


=head1 THREADS

This module is I<not> guaranteed to be thread safe and there are no
plans to change this until Perl gets thread support (as opposed to the
horribly slow so-called "threads" which are simply slow and bloated
process simulations - use fork, it's I<much> faster, cheaper, better).

(It might actually work, but you have been warned).


=head1 BUGS

While the goal of this module is to be correct, that unfortunately does
not mean it's bug-free, only that I think its design is bug-free. If you
keep reporting bugs they will be fixed swiftly, though.

Please refrain from using rt.cpan.org or any other bug reporting
service. I put the contact address into my modules for a reason.

=cut

XSLoader::load "CBOR::XS", $VERSION;

=head1 SEE ALSO

The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
serialisation.

The L<Types::Serialiser> module provides the data model for true, false
and error values.

=head1 AUTHOR

 Marc Lehmann <schmorp@schmorp.de>
 http://home.schmorp.de/

=cut

1

Revision:	1.10
Committed:	Mon Oct 28 22:03:20 2013 UTC (10 years, 6 months ago) by root
Branch:	MAIN
Changes since 1.9:	+3 -1 lines
Log Message:	* empty log message *
#	Content
1	=head1 NAME
2
3	CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4
5	=encoding utf-8
6
7	=head1 SYNOPSIS
8
9	use CBOR::XS;
10
11	$binary_cbor_data = encode_cbor $perl_value;
12	$perl_value = decode_cbor $binary_cbor_data;
13
14	# OO-interface
15
16	$coder = CBOR::XS->new;
17	$binary_cbor_data = $coder->encode ($perl_value);
18	$perl_value = $coder->decode ($binary_cbor_data);
19
20	# prefix decoding
21
22	my $many_cbor_strings = ...;
23	while (length $many_cbor_strings) {
24	my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25	# data was decoded
26	substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27	}
28
29	=head1 DESCRIPTION
30
31	WARNING! This module is very new, and not very well tested (that's up to
32	you to do). Furthermore, details of the implementation might change freely
33	before version 1.0. And lastly, the object serialisation protocol depends
34	on a pending IANA assignment, and until that assignment is official, this
35	implementation is not interoperable with other implementations (even
36	future versions of this module) until the assignment is done.
37
38	You are still invited to try out CBOR, and this module.
39
40	This module converts Perl data structures to the Concise Binary Object
41	Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42	format that aims to use a superset of the JSON data model, i.e. when you
43	can represent something in JSON, you should be able to represent it in
44	CBOR.
45
46	In short, CBOR is a faster and very compact binary alternative to JSON,
47	with the added ability of supporting serialisation of Perl objects. (JSON
48	often compresses better than CBOR though, so if you plan to compress the
49	data later you might want to compare both formats first).
50
51	The primary goal of this module is to be I<correct> and the secondary goal
52	is to be I<fast>. To reach the latter goal it was written in C.
53
54	See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
55	vice versa.
56
57	=cut
58
59	package CBOR::XS;
60
61	use common::sense;
62
63	our $VERSION = 0.05;
64	our @ISA = qw(Exporter);
65
66	our @EXPORT = qw(encode_cbor decode_cbor);
67
68	use Exporter;
69	use XSLoader;
70
71	use Types::Serialiser;
72
73	our $MAGIC = "\xd9\xd9\xf7";
74
75	=head1 FUNCTIONAL INTERFACE
76
77	The following convenience methods are provided by this module. They are
78	exported by default:
79
80	=over 4
81
82	=item $cbor_data = encode_cbor $perl_scalar
83
84	Converts the given Perl data structure to CBOR representation. Croaks on
85	error.
86
87	=item $perl_scalar = decode_cbor $cbor_data
88
89	The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
90	returning the resulting perl scalar. Croaks on error.
91
92	=back
93
94
95	=head1 OBJECT-ORIENTED INTERFACE
96
97	The object oriented interface lets you configure your own encoding or
98	decoding style, within the limits of supported formats.
99
100	=over 4
101
102	=item $cbor = new CBOR::XS
103
104	Creates a new CBOR::XS object that can be used to de/encode CBOR
105	strings. All boolean flags described below are by default I<disabled>.
106
107	The mutators for flags all return the CBOR object again and thus calls can
108	be chained:
109
110	#TODO
111	my $cbor = CBOR::XS->new->encode ({a => [1,2]});
112
113	=item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
114
115	=item $max_depth = $cbor->get_max_depth
116
117	Sets the maximum nesting level (default C<512>) accepted while encoding
118	or decoding. If a higher nesting level is detected in CBOR data or a Perl
119	data structure, then the encoder and decoder will stop and croak at that
120	point.
121
122	Nesting level is defined by number of hash- or arrayrefs that the encoder
123	needs to traverse to reach a given point or the number of C<{> or C<[>
124	characters without their matching closing parenthesis crossed to reach a
125	given character in a string.
126
127	Setting the maximum depth to one disallows any nesting, so that ensures
128	that the object is only a single hash/object or array.
129
130	If no argument is given, the highest possible setting will be used, which
131	is rarely useful.
132
133	Note that nesting is implemented by recursion in C. The default value has
134	been chosen to be as large as typical operating systems allow without
135	crashing.
136
137	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
138
139	=item $cbor = $cbor->max_size ([$maximum_string_size])
140
141	=item $max_size = $cbor->get_max_size
142
143	Set the maximum length a CBOR string may have (in bytes) where decoding
144	is being attempted. The default is C<0>, meaning no limit. When C<decode>
145	is called on a string that is longer then this many bytes, it will not
146	attempt to decode the string but throw an exception. This setting has no
147	effect on C<encode> (yet).
148
149	If no argument is given, the limit check will be deactivated (same as when
150	C<0> is specified).
151
152	See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
153
154	=item $cbor_data = $cbor->encode ($perl_scalar)
155
156	Converts the given Perl data structure (a scalar value) to its CBOR
157	representation.
158
159	=item $perl_scalar = $cbor->decode ($cbor_data)
160
161	The opposite of C<encode>: expects CBOR data and tries to parse it,
162	returning the resulting simple scalar or reference. Croaks on error.
163
164	=item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
165
166	This works like the C<decode> method, but instead of raising an exception
167	when there is trailing garbage after the CBOR string, it will silently
168	stop parsing there and return the number of characters consumed so far.
169
170	This is useful if your CBOR texts are not delimited by an outer protocol
171	and you need to know where the first CBOR string ends amd the next one
172	starts.
173
174	CBOR::XS->new->decode_prefix ("......")
175	=> ("...", 3)
176
177	=back
178
179
180	=head1 MAPPING
181
182	This section describes how CBOR::XS maps Perl values to CBOR values and
183	vice versa. These mappings are designed to "do the right thing" in most
184	circumstances automatically, preserving round-tripping characteristics
185	(what you put in comes out as something equivalent).
186
187	For the more enlightened: note that in the following descriptions,
188	lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
189	refers to the abstract Perl language itself.
190
191
192	=head2 CBOR -> PERL
193
194	=over 4
195
196	=item integers
197
198	CBOR integers become (numeric) perl scalars. On perls without 64 bit
199	support, 64 bit integers will be truncated or otherwise corrupted.
200
201	=item byte strings
202
203	Byte strings will become octet strings in Perl (the byte values 0..255
204	will simply become characters of the same value in Perl).
205
206	=item UTF-8 strings
207
208	UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
209	decoded into proper Unicode code points. At the moment, the validity of
210	the UTF-8 octets will not be validated - corrupt input will result in
211	corrupted Perl strings.
212
213	=item arrays, maps
214
215	CBOR arrays and CBOR maps will be converted into references to a Perl
216	array or hash, respectively. The keys of the map will be stringified
217	during this process.
218
219	=item null
220
221	CBOR null becomes C<undef> in Perl.
222
223	=item true, false, undefined
224
225	These CBOR values become C<Types:Serialiser::true>,
226	C<Types:Serialiser::false> and C<Types::Serialiser::error>,
227	respectively. They are overloaded to act almost exactly like the numbers
228	C<1> and C<0> (for true and false) or to throw an exception on access (for
229	error). See the L<Types::Serialiser> manpage for details.
230
231	=item CBOR tag 256 (perl object)
232
233	The tag value C<256> (TODO: pending iana registration) will be used
234	to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
235	SERIALISATION", below, for details.
236
237	=item CBOR tag 55799 (magic header)
238
239	The tag 55799 is ignored (this tag implements the magic header).
240
241	=item other CBOR tags
242
243	Tagged items consists of a numeric tag and another CBOR value. Tags not
244	handled internally are currently converted into a L<CBOR::XS::Tagged>
245	object, which is simply a blessed array reference consisting of the
246	numeric tag value followed by the (decoded) CBOR value.
247
248	In the future, support for user-supplied conversions might get added.
249
250	=item anything else
251
252	Anything else (e.g. unsupported simple values) will raise a decoding
253	error.
254
255	=back
256
257
258	=head2 PERL -> CBOR
259
260	The mapping from Perl to CBOR is slightly more difficult, as Perl is a
261	truly typeless language, so we can only guess which CBOR type is meant by
262	a Perl value.
263
264	=over 4
265
266	=item hash references
267
268	Perl hash references become CBOR maps. As there is no inherent ordering in
269	hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
270	order.
271
272	Currently, tied hashes will use the indefinite-length format, while normal
273	hashes will use the fixed-length format.
274
275	=item array references
276
277	Perl array references become fixed-length CBOR arrays.
278
279	=item other references
280
281	Other unblessed references are generally not allowed and will cause an
282	exception to be thrown, except for references to the integers C<0> and
283	C<1>, which get turned into false and true in CBOR.
284
285	=item CBOR::XS::Tagged objects
286
287	Objects of this type must be arrays consisting of a single C<[tag, value]>
288	pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
289	encoded as appropriate for the value.
290
291	=item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
292
293	These special values become CBOR true, CBOR false and CBOR undefined
294	values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
295	if you want.
296
297	=item other blessed objects
298
299	Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
300	"OBJECT SERIALISATION", below, for details.
301
302	=item simple scalars
303
304	TODO
305	Simple Perl scalars (any scalar that is not a reference) are the most
306	difficult objects to encode: CBOR::XS will encode undefined scalars as
307	CBOR null values, scalars that have last been used in a string context
308	before encoding as CBOR strings, and anything else as number value:
309
310	# dump as number
311	encode_cbor [2] # yields [2]
312	encode_cbor [-3.0e17] # yields [-3e+17]
313	my $value = 5; encode_cbor [$value] # yields [5]
314
315	# used as string, so dump as string
316	print $value;
317	encode_cbor [$value] # yields ["5"]
318
319	# undef becomes null
320	encode_cbor [undef] # yields [null]
321
322	You can force the type to be a CBOR string by stringifying it:
323
324	my $x = 3.1; # some variable containing a number
325	"$x"; # stringified
326	$x .= ""; # another, more awkward way to stringify
327	print $x; # perl does it for you, too, quite often
328
329	You can force the type to be a CBOR number by numifying it:
330
331	my $x = "3"; # some variable containing a string
332	$x += 0; # numify it, ensuring it will be dumped as a number
333	$x *= 1; # same thing, the choice is yours.
334
335	You can not currently force the type in other, less obscure, ways. Tell me
336	if you need this capability (but don't forget to explain why it's needed
337	:).
338
339	Perl values that seem to be integers generally use the shortest possible
340	representation. Floating-point values will use either the IEEE single
341	format if possible without loss of precision, otherwise the IEEE double
342	format will be used. Perls that use formats other than IEEE double to
343	represent numerical values are supported, but might suffer loss of
344	precision.
345
346	=back
347
348	=head2 OBJECT SERIALISATION
349
350	This module knows two way to serialise a Perl object: The CBOR-specific
351	way, and the generic way.
352
353	Whenever the encoder encounters a Perl object that it cnanot serialise
354	directly (most of them), it will first look up the C<TO_CBOR> method on
355	it.
356
357	If it has a C<TO_CBOR> method, it will call it with the object as only
358	argument, and expects exactly one return value, which it will then
359	substitute and encode it in the place of the object.
360
361	Otherwise, it will look up the C<FREEZE> method. If it exists, it will
362	call it with the object as first argument, and the constant string C<CBOR>
363	as the second argument, to distinguish it from other serialisers.
364
365	The C<FREEZE> method can return any number of values (i.e. zero or
366	more). These will be encoded as CBOR perl object, together with the
367	classname.
368
369	If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
370	with an error.
371
372	Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
373	objects encoded via C<FREEZE> can be decoded using the following protocol:
374
375	When an encoded CBOR perl object is encountered by the decoder, it will
376	look up the C<THAW> method, by using the stored classname, and will fail
377	if the method cannot be found.
378
379	After the lookup it will call the C<THAW> method with the stored classname
380	as first argument, the constant string C<CBOR> as second argument, and all
381	values returned by C<FREEZE> as remaining arguments.
382
383	=head4 EXAMPLES
384
385	Here is an example C<TO_CBOR> method:
386
387	sub My::Object::TO_CBOR {
388	my ($obj) = @_;
389
390	["this is a serialised My::Object object", $obj->{id}]
391	}
392
393	When a C<My::Object> is encoded to CBOR, it will instead encode a simple
394	array with two members: a string, and the "object id". Decoding this CBOR
395	string will yield a normal perl array reference in place of the object.
396
397	A more useful and practical example would be a serialisation method for
398	the URI module. CBOR has a custom tag value for URIs, namely 32:
399
400	sub URI::TO_CBOR {
401	my ($self) = @_;
402	my $uri = "$self"; # stringify uri
403	utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
404	CBOR::XS::tagged 32, "$_[0]"
405	}
406
407	This will encode URIs as a UTF-8 string with tag 32, which indicates an
408	URI.
409
410	Decoding such an URI will not (currently) give you an URI object, but
411	instead a CBOR::XS::Tagged object with tag number 32 and the string -
412	exactly what was returned by C<TO_CBOR>.
413
414	To serialise an object so it can automatically be deserialised, you need
415	to use C<FREEZE> and C<THAW>. To take the URI module as example, this
416	would be a possible implementation:
417
418	sub URI::FREEZE {
419	my ($self, $serialiser) = @_;
420	"$self" # encode url string
421	}
422
423	sub URI::THAW {
424	my ($class, $serialiser, $uri) = @_;
425
426	$class->new ($uri)
427	}
428
429	Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
430	example, a C<FREEZE> method that returns "type", "id" and "variant" values
431	would cause an invocation of C<THAW> with 5 arguments:
432
433	sub My::Object::FREEZE {
434	my ($self, $serialiser) = @_;
435
436	($self->{type}, $self->{id}, $self->{variant})
437	}
438
439	sub My::Object::THAW {
440	my ($class, $serialiser, $type, $id, $variant) = @_;
441
442	$class-<new (type => $type, id => $id, variant => $variant)
443	}
444
445
446	=head1 MAGIC HEADER
447
448	There is no way to distinguish CBOR from other formats
449	programmatically. To make it easier to distinguish CBOR from other
450	formats, the CBOR specification has a special "magic string" that can be
451	prepended to any CBOR string without changing it's meaning.
452
453	This string is available as C<$CBOR::XS::MAGIC>. This module does not
454	prepend this string tot he CBOR data it generates, but it will ignroe it
455	if present, so users can prepend this string as a "file type" indicator as
456	required.
457
458
459	=head1 CBOR and JSON
460
461	CBOR is supposed to implement a superset of the JSON data model, and is,
462	with some coercion, able to represent all JSON texts (something that other
463	"binary JSON" formats such as BSON generally do not support).
464
465	CBOR implements some extra hints and support for JSON interoperability,
466	and the spec offers further guidance for conversion between CBOR and
467	JSON. None of this is currently implemented in CBOR, and the guidelines
468	in the spec do not result in correct round-tripping of data. If JSON
469	interoperability is improved in the future, then the goal will be to
470	ensure that decoded JSON data will round-trip encoding and decoding to
471	CBOR intact.
472
473
474	=head1 SECURITY CONSIDERATIONS
475
476	When you are using CBOR in a protocol, talking to untrusted potentially
477	hostile creatures requires relatively few measures.
478
479	First of all, your CBOR decoder should be secure, that is, should not have
480	any buffer overflows. Obviously, this module should ensure that and I am
481	trying hard on making that true, but you never know.
482
483	Second, you need to avoid resource-starving attacks. That means you should
484	limit the size of CBOR data you accept, or make sure then when your
485	resources run out, that's just fine (e.g. by using a separate process that
486	can crash safely). The size of a CBOR string in octets is usually a good
487	indication of the size of the resources required to decode it into a Perl
488	structure. While CBOR::XS can check the size of the CBOR text, it might be
489	too late when you already have it in memory, so you might want to check
490	the size before you accept the string.
491
492	Third, CBOR::XS recurses using the C stack when decoding objects and
493	arrays. The C stack is a limited resource: for instance, on my amd64
494	machine with 8MB of stack size I can decode around 180k nested arrays but
495	only 14k nested CBOR objects (due to perl itself recursing deeply on croak
496	to free the temporary). If that is exceeded, the program crashes. To be
497	conservative, the default nesting limit is set to 512. If your process
498	has a smaller stack, you should adjust this setting accordingly with the
499	C<max_depth> method.
500
501	Something else could bomb you, too, that I forgot to think of. In that
502	case, you get to keep the pieces. I am always open for hints, though...
503
504	Also keep in mind that CBOR::XS might leak contents of your Perl data
505	structures in its error messages, so when you serialise sensitive
506	information you might want to make sure that exceptions thrown by CBOR::XS
507	will not end up in front of untrusted eyes.
508
509	=head1 CBOR IMPLEMENTATION NOTES
510
511	This section contains some random implementation notes. They do not
512	describe guaranteed behaviour, but merely behaviour as-is implemented
513	right now.
514
515	64 bit integers are only properly decoded when Perl was built with 64 bit
516	support.
517
518	Strings and arrays are encoded with a definite length. Hashes as well,
519	unless they are tied (or otherwise magical).
520
521	Only the double data type is supported for NV data types - when Perl uses
522	long double to represent floating point values, they might not be encoded
523	properly. Half precision types are accepted, but not encoded.
524
525	Strict mode and canonical mode are not implemented.
526
527
528	=head1 THREADS
529
530	This module is I<not> guaranteed to be thread safe and there are no
531	plans to change this until Perl gets thread support (as opposed to the
532	horribly slow so-called "threads" which are simply slow and bloated
533	process simulations - use fork, it's I<much> faster, cheaper, better).
534
535	(It might actually work, but you have been warned).
536
537
538	=head1 BUGS
539
540	While the goal of this module is to be correct, that unfortunately does
541	not mean it's bug-free, only that I think its design is bug-free. If you
542	keep reporting bugs they will be fixed swiftly, though.
543
544	Please refrain from using rt.cpan.org or any other bug reporting
545	service. I put the contact address into my modules for a reason.
546
547	=cut
548
549	XSLoader::load "CBOR::XS", $VERSION;
550
551	=head1 SEE ALSO
552
553	The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
554	serialisation.
555
556	The L<Types::Serialiser> module provides the data model for true, false
557	and error values.
558
559	=head1 AUTHOR
560
561	Marc Lehmann <schmorp@schmorp.de>
562	http://home.schmorp.de/
563
564	=cut
565
566	1
567