ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/XS.pm
Revision: 1.10
Committed: Mon Oct 28 22:03:20 2013 UTC (10 years, 6 months ago) by root
Branch: MAIN
Changes since 1.9: +3 -1 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4    
5     =encoding utf-8
6    
7     =head1 SYNOPSIS
8    
9     use CBOR::XS;
10    
11     $binary_cbor_data = encode_cbor $perl_value;
12     $perl_value = decode_cbor $binary_cbor_data;
13    
14     # OO-interface
15    
16     $coder = CBOR::XS->new;
17 root 1.6 $binary_cbor_data = $coder->encode ($perl_value);
18     $perl_value = $coder->decode ($binary_cbor_data);
19    
20     # prefix decoding
21    
22     my $many_cbor_strings = ...;
23     while (length $many_cbor_strings) {
24     my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25     # data was decoded
26     substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27     }
28 root 1.1
29     =head1 DESCRIPTION
30    
31 root 1.9 WARNING! This module is very new, and not very well tested (that's up to
32     you to do). Furthermore, details of the implementation might change freely
33     before version 1.0. And lastly, the object serialisation protocol depends
34     on a pending IANA assignment, and until that assignment is official, this
35     implementation is not interoperable with other implementations (even
36     future versions of this module) until the assignment is done.
37    
38     You are still invited to try out CBOR, and this module.
39 root 1.5
40     This module converts Perl data structures to the Concise Binary Object
41     Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42     format that aims to use a superset of the JSON data model, i.e. when you
43     can represent something in JSON, you should be able to represent it in
44     CBOR.
45 root 1.1
46 root 1.9 In short, CBOR is a faster and very compact binary alternative to JSON,
47 root 1.10 with the added ability of supporting serialisation of Perl objects. (JSON
48     often compresses better than CBOR though, so if you plan to compress the
49     data later you might want to compare both formats first).
50 root 1.5
51     The primary goal of this module is to be I<correct> and the secondary goal
52     is to be I<fast>. To reach the latter goal it was written in C.
53 root 1.1
54     See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
55     vice versa.
56    
57     =cut
58    
59     package CBOR::XS;
60    
61     use common::sense;
62    
63 root 1.9 our $VERSION = 0.05;
64 root 1.1 our @ISA = qw(Exporter);
65    
66     our @EXPORT = qw(encode_cbor decode_cbor);
67    
68     use Exporter;
69     use XSLoader;
70    
71 root 1.6 use Types::Serialiser;
72    
73 root 1.3 our $MAGIC = "\xd9\xd9\xf7";
74    
75 root 1.1 =head1 FUNCTIONAL INTERFACE
76    
77     The following convenience methods are provided by this module. They are
78     exported by default:
79    
80     =over 4
81    
82     =item $cbor_data = encode_cbor $perl_scalar
83    
84     Converts the given Perl data structure to CBOR representation. Croaks on
85     error.
86    
87     =item $perl_scalar = decode_cbor $cbor_data
88    
89     The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
90     returning the resulting perl scalar. Croaks on error.
91    
92     =back
93    
94    
95     =head1 OBJECT-ORIENTED INTERFACE
96    
97     The object oriented interface lets you configure your own encoding or
98     decoding style, within the limits of supported formats.
99    
100     =over 4
101    
102     =item $cbor = new CBOR::XS
103    
104     Creates a new CBOR::XS object that can be used to de/encode CBOR
105     strings. All boolean flags described below are by default I<disabled>.
106    
107     The mutators for flags all return the CBOR object again and thus calls can
108     be chained:
109    
110     #TODO
111     my $cbor = CBOR::XS->new->encode ({a => [1,2]});
112    
113     =item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
114    
115     =item $max_depth = $cbor->get_max_depth
116    
117     Sets the maximum nesting level (default C<512>) accepted while encoding
118     or decoding. If a higher nesting level is detected in CBOR data or a Perl
119     data structure, then the encoder and decoder will stop and croak at that
120     point.
121    
122     Nesting level is defined by number of hash- or arrayrefs that the encoder
123     needs to traverse to reach a given point or the number of C<{> or C<[>
124     characters without their matching closing parenthesis crossed to reach a
125     given character in a string.
126    
127     Setting the maximum depth to one disallows any nesting, so that ensures
128     that the object is only a single hash/object or array.
129    
130     If no argument is given, the highest possible setting will be used, which
131     is rarely useful.
132    
133     Note that nesting is implemented by recursion in C. The default value has
134     been chosen to be as large as typical operating systems allow without
135     crashing.
136    
137     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
138    
139     =item $cbor = $cbor->max_size ([$maximum_string_size])
140    
141     =item $max_size = $cbor->get_max_size
142    
143     Set the maximum length a CBOR string may have (in bytes) where decoding
144     is being attempted. The default is C<0>, meaning no limit. When C<decode>
145     is called on a string that is longer then this many bytes, it will not
146     attempt to decode the string but throw an exception. This setting has no
147     effect on C<encode> (yet).
148    
149     If no argument is given, the limit check will be deactivated (same as when
150     C<0> is specified).
151    
152     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
153    
154     =item $cbor_data = $cbor->encode ($perl_scalar)
155    
156     Converts the given Perl data structure (a scalar value) to its CBOR
157     representation.
158    
159     =item $perl_scalar = $cbor->decode ($cbor_data)
160    
161     The opposite of C<encode>: expects CBOR data and tries to parse it,
162     returning the resulting simple scalar or reference. Croaks on error.
163    
164     =item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
165    
166     This works like the C<decode> method, but instead of raising an exception
167     when there is trailing garbage after the CBOR string, it will silently
168     stop parsing there and return the number of characters consumed so far.
169    
170     This is useful if your CBOR texts are not delimited by an outer protocol
171     and you need to know where the first CBOR string ends amd the next one
172     starts.
173    
174     CBOR::XS->new->decode_prefix ("......")
175     => ("...", 3)
176    
177     =back
178    
179    
180     =head1 MAPPING
181    
182     This section describes how CBOR::XS maps Perl values to CBOR values and
183     vice versa. These mappings are designed to "do the right thing" in most
184     circumstances automatically, preserving round-tripping characteristics
185     (what you put in comes out as something equivalent).
186    
187     For the more enlightened: note that in the following descriptions,
188     lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
189     refers to the abstract Perl language itself.
190    
191    
192     =head2 CBOR -> PERL
193    
194     =over 4
195    
196 root 1.4 =item integers
197    
198     CBOR integers become (numeric) perl scalars. On perls without 64 bit
199     support, 64 bit integers will be truncated or otherwise corrupted.
200    
201     =item byte strings
202    
203     Byte strings will become octet strings in Perl (the byte values 0..255
204     will simply become characters of the same value in Perl).
205    
206     =item UTF-8 strings
207    
208     UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
209     decoded into proper Unicode code points. At the moment, the validity of
210     the UTF-8 octets will not be validated - corrupt input will result in
211     corrupted Perl strings.
212    
213     =item arrays, maps
214    
215     CBOR arrays and CBOR maps will be converted into references to a Perl
216     array or hash, respectively. The keys of the map will be stringified
217     during this process.
218    
219 root 1.6 =item null
220    
221     CBOR null becomes C<undef> in Perl.
222    
223     =item true, false, undefined
224 root 1.1
225 root 1.6 These CBOR values become C<Types:Serialiser::true>,
226     C<Types:Serialiser::false> and C<Types::Serialiser::error>,
227 root 1.1 respectively. They are overloaded to act almost exactly like the numbers
228 root 1.6 C<1> and C<0> (for true and false) or to throw an exception on access (for
229     error). See the L<Types::Serialiser> manpage for details.
230    
231     =item CBOR tag 256 (perl object)
232    
233 root 1.7 The tag value C<256> (TODO: pending iana registration) will be used
234     to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
235     SERIALISATION", below, for details.
236 root 1.1
237 root 1.6 =item CBOR tag 55799 (magic header)
238 root 1.4
239 root 1.6 The tag 55799 is ignored (this tag implements the magic header).
240 root 1.1
241 root 1.6 =item other CBOR tags
242 root 1.4
243 root 1.6 Tagged items consists of a numeric tag and another CBOR value. Tags not
244     handled internally are currently converted into a L<CBOR::XS::Tagged>
245     object, which is simply a blessed array reference consisting of the
246     numeric tag value followed by the (decoded) CBOR value.
247 root 1.4
248 root 1.6 In the future, support for user-supplied conversions might get added.
249 root 1.4
250     =item anything else
251    
252     Anything else (e.g. unsupported simple values) will raise a decoding
253     error.
254 root 1.1
255     =back
256    
257    
258     =head2 PERL -> CBOR
259    
260     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
261     truly typeless language, so we can only guess which CBOR type is meant by
262     a Perl value.
263    
264     =over 4
265    
266     =item hash references
267    
268 root 1.4 Perl hash references become CBOR maps. As there is no inherent ordering in
269     hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
270     order.
271    
272     Currently, tied hashes will use the indefinite-length format, while normal
273     hashes will use the fixed-length format.
274 root 1.1
275     =item array references
276    
277 root 1.4 Perl array references become fixed-length CBOR arrays.
278 root 1.1
279     =item other references
280    
281     Other unblessed references are generally not allowed and will cause an
282     exception to be thrown, except for references to the integers C<0> and
283 root 1.4 C<1>, which get turned into false and true in CBOR.
284    
285     =item CBOR::XS::Tagged objects
286    
287     Objects of this type must be arrays consisting of a single C<[tag, value]>
288     pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
289     encoded as appropriate for the value.
290 root 1.1
291 root 1.6 =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
292 root 1.1
293 root 1.6 These special values become CBOR true, CBOR false and CBOR undefined
294     values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
295     if you want.
296 root 1.1
297 root 1.7 =item other blessed objects
298 root 1.1
299 root 1.7 Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
300     "OBJECT SERIALISATION", below, for details.
301 root 1.1
302     =item simple scalars
303    
304     TODO
305     Simple Perl scalars (any scalar that is not a reference) are the most
306     difficult objects to encode: CBOR::XS will encode undefined scalars as
307 root 1.4 CBOR null values, scalars that have last been used in a string context
308 root 1.1 before encoding as CBOR strings, and anything else as number value:
309    
310     # dump as number
311     encode_cbor [2] # yields [2]
312     encode_cbor [-3.0e17] # yields [-3e+17]
313     my $value = 5; encode_cbor [$value] # yields [5]
314    
315     # used as string, so dump as string
316     print $value;
317     encode_cbor [$value] # yields ["5"]
318    
319     # undef becomes null
320     encode_cbor [undef] # yields [null]
321    
322     You can force the type to be a CBOR string by stringifying it:
323    
324     my $x = 3.1; # some variable containing a number
325     "$x"; # stringified
326     $x .= ""; # another, more awkward way to stringify
327     print $x; # perl does it for you, too, quite often
328    
329     You can force the type to be a CBOR number by numifying it:
330    
331     my $x = "3"; # some variable containing a string
332     $x += 0; # numify it, ensuring it will be dumped as a number
333     $x *= 1; # same thing, the choice is yours.
334    
335     You can not currently force the type in other, less obscure, ways. Tell me
336     if you need this capability (but don't forget to explain why it's needed
337     :).
338    
339 root 1.4 Perl values that seem to be integers generally use the shortest possible
340     representation. Floating-point values will use either the IEEE single
341     format if possible without loss of precision, otherwise the IEEE double
342     format will be used. Perls that use formats other than IEEE double to
343     represent numerical values are supported, but might suffer loss of
344     precision.
345 root 1.1
346     =back
347    
348 root 1.7 =head2 OBJECT SERIALISATION
349    
350     This module knows two way to serialise a Perl object: The CBOR-specific
351     way, and the generic way.
352    
353     Whenever the encoder encounters a Perl object that it cnanot serialise
354     directly (most of them), it will first look up the C<TO_CBOR> method on
355     it.
356    
357     If it has a C<TO_CBOR> method, it will call it with the object as only
358     argument, and expects exactly one return value, which it will then
359     substitute and encode it in the place of the object.
360    
361     Otherwise, it will look up the C<FREEZE> method. If it exists, it will
362     call it with the object as first argument, and the constant string C<CBOR>
363     as the second argument, to distinguish it from other serialisers.
364    
365     The C<FREEZE> method can return any number of values (i.e. zero or
366     more). These will be encoded as CBOR perl object, together with the
367     classname.
368    
369     If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
370     with an error.
371    
372     Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
373     objects encoded via C<FREEZE> can be decoded using the following protocol:
374    
375     When an encoded CBOR perl object is encountered by the decoder, it will
376     look up the C<THAW> method, by using the stored classname, and will fail
377     if the method cannot be found.
378    
379     After the lookup it will call the C<THAW> method with the stored classname
380     as first argument, the constant string C<CBOR> as second argument, and all
381     values returned by C<FREEZE> as remaining arguments.
382    
383     =head4 EXAMPLES
384    
385     Here is an example C<TO_CBOR> method:
386    
387     sub My::Object::TO_CBOR {
388     my ($obj) = @_;
389    
390     ["this is a serialised My::Object object", $obj->{id}]
391     }
392    
393     When a C<My::Object> is encoded to CBOR, it will instead encode a simple
394     array with two members: a string, and the "object id". Decoding this CBOR
395     string will yield a normal perl array reference in place of the object.
396    
397     A more useful and practical example would be a serialisation method for
398     the URI module. CBOR has a custom tag value for URIs, namely 32:
399    
400     sub URI::TO_CBOR {
401     my ($self) = @_;
402     my $uri = "$self"; # stringify uri
403     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
404     CBOR::XS::tagged 32, "$_[0]"
405     }
406    
407     This will encode URIs as a UTF-8 string with tag 32, which indicates an
408     URI.
409    
410     Decoding such an URI will not (currently) give you an URI object, but
411     instead a CBOR::XS::Tagged object with tag number 32 and the string -
412     exactly what was returned by C<TO_CBOR>.
413    
414     To serialise an object so it can automatically be deserialised, you need
415     to use C<FREEZE> and C<THAW>. To take the URI module as example, this
416     would be a possible implementation:
417    
418     sub URI::FREEZE {
419     my ($self, $serialiser) = @_;
420     "$self" # encode url string
421     }
422    
423     sub URI::THAW {
424     my ($class, $serialiser, $uri) = @_;
425    
426     $class->new ($uri)
427     }
428    
429     Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
430     example, a C<FREEZE> method that returns "type", "id" and "variant" values
431     would cause an invocation of C<THAW> with 5 arguments:
432    
433     sub My::Object::FREEZE {
434     my ($self, $serialiser) = @_;
435    
436     ($self->{type}, $self->{id}, $self->{variant})
437     }
438    
439     sub My::Object::THAW {
440     my ($class, $serialiser, $type, $id, $variant) = @_;
441    
442     $class-<new (type => $type, id => $id, variant => $variant)
443     }
444    
445 root 1.1
446 root 1.7 =head1 MAGIC HEADER
447 root 1.3
448     There is no way to distinguish CBOR from other formats
449     programmatically. To make it easier to distinguish CBOR from other
450     formats, the CBOR specification has a special "magic string" that can be
451     prepended to any CBOR string without changing it's meaning.
452    
453     This string is available as C<$CBOR::XS::MAGIC>. This module does not
454     prepend this string tot he CBOR data it generates, but it will ignroe it
455     if present, so users can prepend this string as a "file type" indicator as
456     required.
457    
458    
459 root 1.7 =head1 CBOR and JSON
460 root 1.1
461 root 1.4 CBOR is supposed to implement a superset of the JSON data model, and is,
462     with some coercion, able to represent all JSON texts (something that other
463     "binary JSON" formats such as BSON generally do not support).
464    
465     CBOR implements some extra hints and support for JSON interoperability,
466     and the spec offers further guidance for conversion between CBOR and
467     JSON. None of this is currently implemented in CBOR, and the guidelines
468     in the spec do not result in correct round-tripping of data. If JSON
469     interoperability is improved in the future, then the goal will be to
470     ensure that decoded JSON data will round-trip encoding and decoding to
471     CBOR intact.
472 root 1.1
473    
474     =head1 SECURITY CONSIDERATIONS
475    
476     When you are using CBOR in a protocol, talking to untrusted potentially
477     hostile creatures requires relatively few measures.
478    
479     First of all, your CBOR decoder should be secure, that is, should not have
480     any buffer overflows. Obviously, this module should ensure that and I am
481     trying hard on making that true, but you never know.
482    
483     Second, you need to avoid resource-starving attacks. That means you should
484     limit the size of CBOR data you accept, or make sure then when your
485     resources run out, that's just fine (e.g. by using a separate process that
486     can crash safely). The size of a CBOR string in octets is usually a good
487     indication of the size of the resources required to decode it into a Perl
488     structure. While CBOR::XS can check the size of the CBOR text, it might be
489     too late when you already have it in memory, so you might want to check
490     the size before you accept the string.
491    
492     Third, CBOR::XS recurses using the C stack when decoding objects and
493     arrays. The C stack is a limited resource: for instance, on my amd64
494     machine with 8MB of stack size I can decode around 180k nested arrays but
495     only 14k nested CBOR objects (due to perl itself recursing deeply on croak
496     to free the temporary). If that is exceeded, the program crashes. To be
497     conservative, the default nesting limit is set to 512. If your process
498     has a smaller stack, you should adjust this setting accordingly with the
499     C<max_depth> method.
500    
501     Something else could bomb you, too, that I forgot to think of. In that
502     case, you get to keep the pieces. I am always open for hints, though...
503    
504     Also keep in mind that CBOR::XS might leak contents of your Perl data
505     structures in its error messages, so when you serialise sensitive
506     information you might want to make sure that exceptions thrown by CBOR::XS
507     will not end up in front of untrusted eyes.
508    
509     =head1 CBOR IMPLEMENTATION NOTES
510    
511     This section contains some random implementation notes. They do not
512     describe guaranteed behaviour, but merely behaviour as-is implemented
513     right now.
514    
515     64 bit integers are only properly decoded when Perl was built with 64 bit
516     support.
517    
518     Strings and arrays are encoded with a definite length. Hashes as well,
519     unless they are tied (or otherwise magical).
520    
521     Only the double data type is supported for NV data types - when Perl uses
522     long double to represent floating point values, they might not be encoded
523     properly. Half precision types are accepted, but not encoded.
524    
525     Strict mode and canonical mode are not implemented.
526    
527    
528     =head1 THREADS
529    
530     This module is I<not> guaranteed to be thread safe and there are no
531     plans to change this until Perl gets thread support (as opposed to the
532     horribly slow so-called "threads" which are simply slow and bloated
533     process simulations - use fork, it's I<much> faster, cheaper, better).
534    
535     (It might actually work, but you have been warned).
536    
537    
538     =head1 BUGS
539    
540     While the goal of this module is to be correct, that unfortunately does
541     not mean it's bug-free, only that I think its design is bug-free. If you
542     keep reporting bugs they will be fixed swiftly, though.
543    
544     Please refrain from using rt.cpan.org or any other bug reporting
545     service. I put the contact address into my modules for a reason.
546    
547     =cut
548    
549     XSLoader::load "CBOR::XS", $VERSION;
550    
551     =head1 SEE ALSO
552    
553     The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
554     serialisation.
555    
556 root 1.6 The L<Types::Serialiser> module provides the data model for true, false
557     and error values.
558    
559 root 1.1 =head1 AUTHOR
560    
561     Marc Lehmann <schmorp@schmorp.de>
562     http://home.schmorp.de/
563    
564     =cut
565    
566 root 1.6 1
567