ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/XS.pm
Revision: 1.9
Committed: Mon Oct 28 21:28:14 2013 UTC (10 years, 6 months ago) by root
Branch: MAIN
CVS Tags: rel-0_05
Changes since 1.8: +11 -6 lines
Log Message:
0.05

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4    
5     =encoding utf-8
6    
7     =head1 SYNOPSIS
8    
9     use CBOR::XS;
10    
11     $binary_cbor_data = encode_cbor $perl_value;
12     $perl_value = decode_cbor $binary_cbor_data;
13    
14     # OO-interface
15    
16     $coder = CBOR::XS->new;
17 root 1.6 $binary_cbor_data = $coder->encode ($perl_value);
18     $perl_value = $coder->decode ($binary_cbor_data);
19    
20     # prefix decoding
21    
22     my $many_cbor_strings = ...;
23     while (length $many_cbor_strings) {
24     my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25     # data was decoded
26     substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27     }
28 root 1.1
29     =head1 DESCRIPTION
30    
31 root 1.9 WARNING! This module is very new, and not very well tested (that's up to
32     you to do). Furthermore, details of the implementation might change freely
33     before version 1.0. And lastly, the object serialisation protocol depends
34     on a pending IANA assignment, and until that assignment is official, this
35     implementation is not interoperable with other implementations (even
36     future versions of this module) until the assignment is done.
37    
38     You are still invited to try out CBOR, and this module.
39 root 1.5
40     This module converts Perl data structures to the Concise Binary Object
41     Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42     format that aims to use a superset of the JSON data model, i.e. when you
43     can represent something in JSON, you should be able to represent it in
44     CBOR.
45 root 1.1
46 root 1.9 In short, CBOR is a faster and very compact binary alternative to JSON,
47     with the added ability of supporting serialisation of Perl objects.
48 root 1.5
49     The primary goal of this module is to be I<correct> and the secondary goal
50     is to be I<fast>. To reach the latter goal it was written in C.
51 root 1.1
52     See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
53     vice versa.
54    
55     =cut
56    
57     package CBOR::XS;
58    
59     use common::sense;
60    
61 root 1.9 our $VERSION = 0.05;
62 root 1.1 our @ISA = qw(Exporter);
63    
64     our @EXPORT = qw(encode_cbor decode_cbor);
65    
66     use Exporter;
67     use XSLoader;
68    
69 root 1.6 use Types::Serialiser;
70    
71 root 1.3 our $MAGIC = "\xd9\xd9\xf7";
72    
73 root 1.1 =head1 FUNCTIONAL INTERFACE
74    
75     The following convenience methods are provided by this module. They are
76     exported by default:
77    
78     =over 4
79    
80     =item $cbor_data = encode_cbor $perl_scalar
81    
82     Converts the given Perl data structure to CBOR representation. Croaks on
83     error.
84    
85     =item $perl_scalar = decode_cbor $cbor_data
86    
87     The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
88     returning the resulting perl scalar. Croaks on error.
89    
90     =back
91    
92    
93     =head1 OBJECT-ORIENTED INTERFACE
94    
95     The object oriented interface lets you configure your own encoding or
96     decoding style, within the limits of supported formats.
97    
98     =over 4
99    
100     =item $cbor = new CBOR::XS
101    
102     Creates a new CBOR::XS object that can be used to de/encode CBOR
103     strings. All boolean flags described below are by default I<disabled>.
104    
105     The mutators for flags all return the CBOR object again and thus calls can
106     be chained:
107    
108     #TODO
109     my $cbor = CBOR::XS->new->encode ({a => [1,2]});
110    
111     =item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
112    
113     =item $max_depth = $cbor->get_max_depth
114    
115     Sets the maximum nesting level (default C<512>) accepted while encoding
116     or decoding. If a higher nesting level is detected in CBOR data or a Perl
117     data structure, then the encoder and decoder will stop and croak at that
118     point.
119    
120     Nesting level is defined by number of hash- or arrayrefs that the encoder
121     needs to traverse to reach a given point or the number of C<{> or C<[>
122     characters without their matching closing parenthesis crossed to reach a
123     given character in a string.
124    
125     Setting the maximum depth to one disallows any nesting, so that ensures
126     that the object is only a single hash/object or array.
127    
128     If no argument is given, the highest possible setting will be used, which
129     is rarely useful.
130    
131     Note that nesting is implemented by recursion in C. The default value has
132     been chosen to be as large as typical operating systems allow without
133     crashing.
134    
135     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
136    
137     =item $cbor = $cbor->max_size ([$maximum_string_size])
138    
139     =item $max_size = $cbor->get_max_size
140    
141     Set the maximum length a CBOR string may have (in bytes) where decoding
142     is being attempted. The default is C<0>, meaning no limit. When C<decode>
143     is called on a string that is longer then this many bytes, it will not
144     attempt to decode the string but throw an exception. This setting has no
145     effect on C<encode> (yet).
146    
147     If no argument is given, the limit check will be deactivated (same as when
148     C<0> is specified).
149    
150     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
151    
152     =item $cbor_data = $cbor->encode ($perl_scalar)
153    
154     Converts the given Perl data structure (a scalar value) to its CBOR
155     representation.
156    
157     =item $perl_scalar = $cbor->decode ($cbor_data)
158    
159     The opposite of C<encode>: expects CBOR data and tries to parse it,
160     returning the resulting simple scalar or reference. Croaks on error.
161    
162     =item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
163    
164     This works like the C<decode> method, but instead of raising an exception
165     when there is trailing garbage after the CBOR string, it will silently
166     stop parsing there and return the number of characters consumed so far.
167    
168     This is useful if your CBOR texts are not delimited by an outer protocol
169     and you need to know where the first CBOR string ends amd the next one
170     starts.
171    
172     CBOR::XS->new->decode_prefix ("......")
173     => ("...", 3)
174    
175     =back
176    
177    
178     =head1 MAPPING
179    
180     This section describes how CBOR::XS maps Perl values to CBOR values and
181     vice versa. These mappings are designed to "do the right thing" in most
182     circumstances automatically, preserving round-tripping characteristics
183     (what you put in comes out as something equivalent).
184    
185     For the more enlightened: note that in the following descriptions,
186     lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
187     refers to the abstract Perl language itself.
188    
189    
190     =head2 CBOR -> PERL
191    
192     =over 4
193    
194 root 1.4 =item integers
195    
196     CBOR integers become (numeric) perl scalars. On perls without 64 bit
197     support, 64 bit integers will be truncated or otherwise corrupted.
198    
199     =item byte strings
200    
201     Byte strings will become octet strings in Perl (the byte values 0..255
202     will simply become characters of the same value in Perl).
203    
204     =item UTF-8 strings
205    
206     UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
207     decoded into proper Unicode code points. At the moment, the validity of
208     the UTF-8 octets will not be validated - corrupt input will result in
209     corrupted Perl strings.
210    
211     =item arrays, maps
212    
213     CBOR arrays and CBOR maps will be converted into references to a Perl
214     array or hash, respectively. The keys of the map will be stringified
215     during this process.
216    
217 root 1.6 =item null
218    
219     CBOR null becomes C<undef> in Perl.
220    
221     =item true, false, undefined
222 root 1.1
223 root 1.6 These CBOR values become C<Types:Serialiser::true>,
224     C<Types:Serialiser::false> and C<Types::Serialiser::error>,
225 root 1.1 respectively. They are overloaded to act almost exactly like the numbers
226 root 1.6 C<1> and C<0> (for true and false) or to throw an exception on access (for
227     error). See the L<Types::Serialiser> manpage for details.
228    
229     =item CBOR tag 256 (perl object)
230    
231 root 1.7 The tag value C<256> (TODO: pending iana registration) will be used
232     to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
233     SERIALISATION", below, for details.
234 root 1.1
235 root 1.6 =item CBOR tag 55799 (magic header)
236 root 1.4
237 root 1.6 The tag 55799 is ignored (this tag implements the magic header).
238 root 1.1
239 root 1.6 =item other CBOR tags
240 root 1.4
241 root 1.6 Tagged items consists of a numeric tag and another CBOR value. Tags not
242     handled internally are currently converted into a L<CBOR::XS::Tagged>
243     object, which is simply a blessed array reference consisting of the
244     numeric tag value followed by the (decoded) CBOR value.
245 root 1.4
246 root 1.6 In the future, support for user-supplied conversions might get added.
247 root 1.4
248     =item anything else
249    
250     Anything else (e.g. unsupported simple values) will raise a decoding
251     error.
252 root 1.1
253     =back
254    
255    
256     =head2 PERL -> CBOR
257    
258     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
259     truly typeless language, so we can only guess which CBOR type is meant by
260     a Perl value.
261    
262     =over 4
263    
264     =item hash references
265    
266 root 1.4 Perl hash references become CBOR maps. As there is no inherent ordering in
267     hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
268     order.
269    
270     Currently, tied hashes will use the indefinite-length format, while normal
271     hashes will use the fixed-length format.
272 root 1.1
273     =item array references
274    
275 root 1.4 Perl array references become fixed-length CBOR arrays.
276 root 1.1
277     =item other references
278    
279     Other unblessed references are generally not allowed and will cause an
280     exception to be thrown, except for references to the integers C<0> and
281 root 1.4 C<1>, which get turned into false and true in CBOR.
282    
283     =item CBOR::XS::Tagged objects
284    
285     Objects of this type must be arrays consisting of a single C<[tag, value]>
286     pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
287     encoded as appropriate for the value.
288 root 1.1
289 root 1.6 =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
290 root 1.1
291 root 1.6 These special values become CBOR true, CBOR false and CBOR undefined
292     values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
293     if you want.
294 root 1.1
295 root 1.7 =item other blessed objects
296 root 1.1
297 root 1.7 Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
298     "OBJECT SERIALISATION", below, for details.
299 root 1.1
300     =item simple scalars
301    
302     TODO
303     Simple Perl scalars (any scalar that is not a reference) are the most
304     difficult objects to encode: CBOR::XS will encode undefined scalars as
305 root 1.4 CBOR null values, scalars that have last been used in a string context
306 root 1.1 before encoding as CBOR strings, and anything else as number value:
307    
308     # dump as number
309     encode_cbor [2] # yields [2]
310     encode_cbor [-3.0e17] # yields [-3e+17]
311     my $value = 5; encode_cbor [$value] # yields [5]
312    
313     # used as string, so dump as string
314     print $value;
315     encode_cbor [$value] # yields ["5"]
316    
317     # undef becomes null
318     encode_cbor [undef] # yields [null]
319    
320     You can force the type to be a CBOR string by stringifying it:
321    
322     my $x = 3.1; # some variable containing a number
323     "$x"; # stringified
324     $x .= ""; # another, more awkward way to stringify
325     print $x; # perl does it for you, too, quite often
326    
327     You can force the type to be a CBOR number by numifying it:
328    
329     my $x = "3"; # some variable containing a string
330     $x += 0; # numify it, ensuring it will be dumped as a number
331     $x *= 1; # same thing, the choice is yours.
332    
333     You can not currently force the type in other, less obscure, ways. Tell me
334     if you need this capability (but don't forget to explain why it's needed
335     :).
336    
337 root 1.4 Perl values that seem to be integers generally use the shortest possible
338     representation. Floating-point values will use either the IEEE single
339     format if possible without loss of precision, otherwise the IEEE double
340     format will be used. Perls that use formats other than IEEE double to
341     represent numerical values are supported, but might suffer loss of
342     precision.
343 root 1.1
344     =back
345    
346 root 1.7 =head2 OBJECT SERIALISATION
347    
348     This module knows two way to serialise a Perl object: The CBOR-specific
349     way, and the generic way.
350    
351     Whenever the encoder encounters a Perl object that it cnanot serialise
352     directly (most of them), it will first look up the C<TO_CBOR> method on
353     it.
354    
355     If it has a C<TO_CBOR> method, it will call it with the object as only
356     argument, and expects exactly one return value, which it will then
357     substitute and encode it in the place of the object.
358    
359     Otherwise, it will look up the C<FREEZE> method. If it exists, it will
360     call it with the object as first argument, and the constant string C<CBOR>
361     as the second argument, to distinguish it from other serialisers.
362    
363     The C<FREEZE> method can return any number of values (i.e. zero or
364     more). These will be encoded as CBOR perl object, together with the
365     classname.
366    
367     If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
368     with an error.
369    
370     Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
371     objects encoded via C<FREEZE> can be decoded using the following protocol:
372    
373     When an encoded CBOR perl object is encountered by the decoder, it will
374     look up the C<THAW> method, by using the stored classname, and will fail
375     if the method cannot be found.
376    
377     After the lookup it will call the C<THAW> method with the stored classname
378     as first argument, the constant string C<CBOR> as second argument, and all
379     values returned by C<FREEZE> as remaining arguments.
380    
381     =head4 EXAMPLES
382    
383     Here is an example C<TO_CBOR> method:
384    
385     sub My::Object::TO_CBOR {
386     my ($obj) = @_;
387    
388     ["this is a serialised My::Object object", $obj->{id}]
389     }
390    
391     When a C<My::Object> is encoded to CBOR, it will instead encode a simple
392     array with two members: a string, and the "object id". Decoding this CBOR
393     string will yield a normal perl array reference in place of the object.
394    
395     A more useful and practical example would be a serialisation method for
396     the URI module. CBOR has a custom tag value for URIs, namely 32:
397    
398     sub URI::TO_CBOR {
399     my ($self) = @_;
400     my $uri = "$self"; # stringify uri
401     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
402     CBOR::XS::tagged 32, "$_[0]"
403     }
404    
405     This will encode URIs as a UTF-8 string with tag 32, which indicates an
406     URI.
407    
408     Decoding such an URI will not (currently) give you an URI object, but
409     instead a CBOR::XS::Tagged object with tag number 32 and the string -
410     exactly what was returned by C<TO_CBOR>.
411    
412     To serialise an object so it can automatically be deserialised, you need
413     to use C<FREEZE> and C<THAW>. To take the URI module as example, this
414     would be a possible implementation:
415    
416     sub URI::FREEZE {
417     my ($self, $serialiser) = @_;
418     "$self" # encode url string
419     }
420    
421     sub URI::THAW {
422     my ($class, $serialiser, $uri) = @_;
423    
424     $class->new ($uri)
425     }
426    
427     Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
428     example, a C<FREEZE> method that returns "type", "id" and "variant" values
429     would cause an invocation of C<THAW> with 5 arguments:
430    
431     sub My::Object::FREEZE {
432     my ($self, $serialiser) = @_;
433    
434     ($self->{type}, $self->{id}, $self->{variant})
435     }
436    
437     sub My::Object::THAW {
438     my ($class, $serialiser, $type, $id, $variant) = @_;
439    
440     $class-<new (type => $type, id => $id, variant => $variant)
441     }
442    
443 root 1.1
444 root 1.7 =head1 MAGIC HEADER
445 root 1.3
446     There is no way to distinguish CBOR from other formats
447     programmatically. To make it easier to distinguish CBOR from other
448     formats, the CBOR specification has a special "magic string" that can be
449     prepended to any CBOR string without changing it's meaning.
450    
451     This string is available as C<$CBOR::XS::MAGIC>. This module does not
452     prepend this string tot he CBOR data it generates, but it will ignroe it
453     if present, so users can prepend this string as a "file type" indicator as
454     required.
455    
456    
457 root 1.7 =head1 CBOR and JSON
458 root 1.1
459 root 1.4 CBOR is supposed to implement a superset of the JSON data model, and is,
460     with some coercion, able to represent all JSON texts (something that other
461     "binary JSON" formats such as BSON generally do not support).
462    
463     CBOR implements some extra hints and support for JSON interoperability,
464     and the spec offers further guidance for conversion between CBOR and
465     JSON. None of this is currently implemented in CBOR, and the guidelines
466     in the spec do not result in correct round-tripping of data. If JSON
467     interoperability is improved in the future, then the goal will be to
468     ensure that decoded JSON data will round-trip encoding and decoding to
469     CBOR intact.
470 root 1.1
471    
472     =head1 SECURITY CONSIDERATIONS
473    
474     When you are using CBOR in a protocol, talking to untrusted potentially
475     hostile creatures requires relatively few measures.
476    
477     First of all, your CBOR decoder should be secure, that is, should not have
478     any buffer overflows. Obviously, this module should ensure that and I am
479     trying hard on making that true, but you never know.
480    
481     Second, you need to avoid resource-starving attacks. That means you should
482     limit the size of CBOR data you accept, or make sure then when your
483     resources run out, that's just fine (e.g. by using a separate process that
484     can crash safely). The size of a CBOR string in octets is usually a good
485     indication of the size of the resources required to decode it into a Perl
486     structure. While CBOR::XS can check the size of the CBOR text, it might be
487     too late when you already have it in memory, so you might want to check
488     the size before you accept the string.
489    
490     Third, CBOR::XS recurses using the C stack when decoding objects and
491     arrays. The C stack is a limited resource: for instance, on my amd64
492     machine with 8MB of stack size I can decode around 180k nested arrays but
493     only 14k nested CBOR objects (due to perl itself recursing deeply on croak
494     to free the temporary). If that is exceeded, the program crashes. To be
495     conservative, the default nesting limit is set to 512. If your process
496     has a smaller stack, you should adjust this setting accordingly with the
497     C<max_depth> method.
498    
499     Something else could bomb you, too, that I forgot to think of. In that
500     case, you get to keep the pieces. I am always open for hints, though...
501    
502     Also keep in mind that CBOR::XS might leak contents of your Perl data
503     structures in its error messages, so when you serialise sensitive
504     information you might want to make sure that exceptions thrown by CBOR::XS
505     will not end up in front of untrusted eyes.
506    
507     =head1 CBOR IMPLEMENTATION NOTES
508    
509     This section contains some random implementation notes. They do not
510     describe guaranteed behaviour, but merely behaviour as-is implemented
511     right now.
512    
513     64 bit integers are only properly decoded when Perl was built with 64 bit
514     support.
515    
516     Strings and arrays are encoded with a definite length. Hashes as well,
517     unless they are tied (or otherwise magical).
518    
519     Only the double data type is supported for NV data types - when Perl uses
520     long double to represent floating point values, they might not be encoded
521     properly. Half precision types are accepted, but not encoded.
522    
523     Strict mode and canonical mode are not implemented.
524    
525    
526     =head1 THREADS
527    
528     This module is I<not> guaranteed to be thread safe and there are no
529     plans to change this until Perl gets thread support (as opposed to the
530     horribly slow so-called "threads" which are simply slow and bloated
531     process simulations - use fork, it's I<much> faster, cheaper, better).
532    
533     (It might actually work, but you have been warned).
534    
535    
536     =head1 BUGS
537    
538     While the goal of this module is to be correct, that unfortunately does
539     not mean it's bug-free, only that I think its design is bug-free. If you
540     keep reporting bugs they will be fixed swiftly, though.
541    
542     Please refrain from using rt.cpan.org or any other bug reporting
543     service. I put the contact address into my modules for a reason.
544    
545     =cut
546    
547     XSLoader::load "CBOR::XS", $VERSION;
548    
549     =head1 SEE ALSO
550    
551     The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
552     serialisation.
553    
554 root 1.6 The L<Types::Serialiser> module provides the data model for true, false
555     and error values.
556    
557 root 1.1 =head1 AUTHOR
558    
559     Marc Lehmann <schmorp@schmorp.de>
560     http://home.schmorp.de/
561    
562     =cut
563    
564 root 1.6 1
565