ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/XS.pm
Revision: 1.7
Committed: Sun Oct 27 22:35:15 2013 UTC (10 years, 6 months ago) by root
Branch: MAIN
Changes since 1.6: +105 -16 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4    
5     =encoding utf-8
6    
7     =head1 SYNOPSIS
8    
9     use CBOR::XS;
10    
11     $binary_cbor_data = encode_cbor $perl_value;
12     $perl_value = decode_cbor $binary_cbor_data;
13    
14     # OO-interface
15    
16     $coder = CBOR::XS->new;
17 root 1.6 $binary_cbor_data = $coder->encode ($perl_value);
18     $perl_value = $coder->decode ($binary_cbor_data);
19    
20     # prefix decoding
21    
22     my $many_cbor_strings = ...;
23     while (length $many_cbor_strings) {
24     my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25     # data was decoded
26     substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27     }
28 root 1.1
29     =head1 DESCRIPTION
30    
31 root 1.5 WARNING! THIS IS A PRE-ALPHA RELEASE! IT WILL CRASH, CORRUPT YOUR DATA
32     AND EAT YOUR CHILDREN! (Actually, apart from being untested and a bit
33     feature-limited, it might already be useful).
34    
35     This module converts Perl data structures to the Concise Binary Object
36     Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
37     format that aims to use a superset of the JSON data model, i.e. when you
38     can represent something in JSON, you should be able to represent it in
39     CBOR.
40 root 1.1
41 root 1.6 This makes it a faster and more compact binary alternative to JSON, with
42     the added ability of supporting serialising of perl objects.
43 root 1.5
44     The primary goal of this module is to be I<correct> and the secondary goal
45     is to be I<fast>. To reach the latter goal it was written in C.
46 root 1.1
47     See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
48     vice versa.
49    
50     =cut
51    
52     package CBOR::XS;
53    
54     use common::sense;
55    
56 root 1.5 our $VERSION = 0.03;
57 root 1.1 our @ISA = qw(Exporter);
58    
59     our @EXPORT = qw(encode_cbor decode_cbor);
60    
61     use Exporter;
62     use XSLoader;
63    
64 root 1.6 use Types::Serialiser;
65    
66 root 1.3 our $MAGIC = "\xd9\xd9\xf7";
67    
68 root 1.1 =head1 FUNCTIONAL INTERFACE
69    
70     The following convenience methods are provided by this module. They are
71     exported by default:
72    
73     =over 4
74    
75     =item $cbor_data = encode_cbor $perl_scalar
76    
77     Converts the given Perl data structure to CBOR representation. Croaks on
78     error.
79    
80     =item $perl_scalar = decode_cbor $cbor_data
81    
82     The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
83     returning the resulting perl scalar. Croaks on error.
84    
85     =back
86    
87    
88     =head1 OBJECT-ORIENTED INTERFACE
89    
90     The object oriented interface lets you configure your own encoding or
91     decoding style, within the limits of supported formats.
92    
93     =over 4
94    
95     =item $cbor = new CBOR::XS
96    
97     Creates a new CBOR::XS object that can be used to de/encode CBOR
98     strings. All boolean flags described below are by default I<disabled>.
99    
100     The mutators for flags all return the CBOR object again and thus calls can
101     be chained:
102    
103     #TODO
104     my $cbor = CBOR::XS->new->encode ({a => [1,2]});
105    
106     =item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
107    
108     =item $max_depth = $cbor->get_max_depth
109    
110     Sets the maximum nesting level (default C<512>) accepted while encoding
111     or decoding. If a higher nesting level is detected in CBOR data or a Perl
112     data structure, then the encoder and decoder will stop and croak at that
113     point.
114    
115     Nesting level is defined by number of hash- or arrayrefs that the encoder
116     needs to traverse to reach a given point or the number of C<{> or C<[>
117     characters without their matching closing parenthesis crossed to reach a
118     given character in a string.
119    
120     Setting the maximum depth to one disallows any nesting, so that ensures
121     that the object is only a single hash/object or array.
122    
123     If no argument is given, the highest possible setting will be used, which
124     is rarely useful.
125    
126     Note that nesting is implemented by recursion in C. The default value has
127     been chosen to be as large as typical operating systems allow without
128     crashing.
129    
130     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
131    
132     =item $cbor = $cbor->max_size ([$maximum_string_size])
133    
134     =item $max_size = $cbor->get_max_size
135    
136     Set the maximum length a CBOR string may have (in bytes) where decoding
137     is being attempted. The default is C<0>, meaning no limit. When C<decode>
138     is called on a string that is longer then this many bytes, it will not
139     attempt to decode the string but throw an exception. This setting has no
140     effect on C<encode> (yet).
141    
142     If no argument is given, the limit check will be deactivated (same as when
143     C<0> is specified).
144    
145     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
146    
147     =item $cbor_data = $cbor->encode ($perl_scalar)
148    
149     Converts the given Perl data structure (a scalar value) to its CBOR
150     representation.
151    
152     =item $perl_scalar = $cbor->decode ($cbor_data)
153    
154     The opposite of C<encode>: expects CBOR data and tries to parse it,
155     returning the resulting simple scalar or reference. Croaks on error.
156    
157     =item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
158    
159     This works like the C<decode> method, but instead of raising an exception
160     when there is trailing garbage after the CBOR string, it will silently
161     stop parsing there and return the number of characters consumed so far.
162    
163     This is useful if your CBOR texts are not delimited by an outer protocol
164     and you need to know where the first CBOR string ends amd the next one
165     starts.
166    
167     CBOR::XS->new->decode_prefix ("......")
168     => ("...", 3)
169    
170     =back
171    
172    
173     =head1 MAPPING
174    
175     This section describes how CBOR::XS maps Perl values to CBOR values and
176     vice versa. These mappings are designed to "do the right thing" in most
177     circumstances automatically, preserving round-tripping characteristics
178     (what you put in comes out as something equivalent).
179    
180     For the more enlightened: note that in the following descriptions,
181     lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
182     refers to the abstract Perl language itself.
183    
184    
185     =head2 CBOR -> PERL
186    
187     =over 4
188    
189 root 1.4 =item integers
190    
191     CBOR integers become (numeric) perl scalars. On perls without 64 bit
192     support, 64 bit integers will be truncated or otherwise corrupted.
193    
194     =item byte strings
195    
196     Byte strings will become octet strings in Perl (the byte values 0..255
197     will simply become characters of the same value in Perl).
198    
199     =item UTF-8 strings
200    
201     UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
202     decoded into proper Unicode code points. At the moment, the validity of
203     the UTF-8 octets will not be validated - corrupt input will result in
204     corrupted Perl strings.
205    
206     =item arrays, maps
207    
208     CBOR arrays and CBOR maps will be converted into references to a Perl
209     array or hash, respectively. The keys of the map will be stringified
210     during this process.
211    
212 root 1.6 =item null
213    
214     CBOR null becomes C<undef> in Perl.
215    
216     =item true, false, undefined
217 root 1.1
218 root 1.6 These CBOR values become C<Types:Serialiser::true>,
219     C<Types:Serialiser::false> and C<Types::Serialiser::error>,
220 root 1.1 respectively. They are overloaded to act almost exactly like the numbers
221 root 1.6 C<1> and C<0> (for true and false) or to throw an exception on access (for
222     error). See the L<Types::Serialiser> manpage for details.
223    
224     =item CBOR tag 256 (perl object)
225    
226 root 1.7 The tag value C<256> (TODO: pending iana registration) will be used
227     to deserialise a Perl object serialised with C<FREEZE>. See "OBJECT
228     SERIALISATION", below, for details.
229 root 1.1
230 root 1.6 =item CBOR tag 55799 (magic header)
231 root 1.4
232 root 1.6 The tag 55799 is ignored (this tag implements the magic header).
233 root 1.1
234 root 1.6 =item other CBOR tags
235 root 1.4
236 root 1.6 Tagged items consists of a numeric tag and another CBOR value. Tags not
237     handled internally are currently converted into a L<CBOR::XS::Tagged>
238     object, which is simply a blessed array reference consisting of the
239     numeric tag value followed by the (decoded) CBOR value.
240 root 1.4
241 root 1.6 In the future, support for user-supplied conversions might get added.
242 root 1.4
243     =item anything else
244    
245     Anything else (e.g. unsupported simple values) will raise a decoding
246     error.
247 root 1.1
248     =back
249    
250    
251     =head2 PERL -> CBOR
252    
253     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
254     truly typeless language, so we can only guess which CBOR type is meant by
255     a Perl value.
256    
257     =over 4
258    
259     =item hash references
260    
261 root 1.4 Perl hash references become CBOR maps. As there is no inherent ordering in
262     hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
263     order.
264    
265     Currently, tied hashes will use the indefinite-length format, while normal
266     hashes will use the fixed-length format.
267 root 1.1
268     =item array references
269    
270 root 1.4 Perl array references become fixed-length CBOR arrays.
271 root 1.1
272     =item other references
273    
274     Other unblessed references are generally not allowed and will cause an
275     exception to be thrown, except for references to the integers C<0> and
276 root 1.4 C<1>, which get turned into false and true in CBOR.
277    
278     =item CBOR::XS::Tagged objects
279    
280     Objects of this type must be arrays consisting of a single C<[tag, value]>
281     pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
282     encoded as appropriate for the value.
283 root 1.1
284 root 1.6 =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
285 root 1.1
286 root 1.6 These special values become CBOR true, CBOR false and CBOR undefined
287     values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
288     if you want.
289 root 1.1
290 root 1.7 =item other blessed objects
291 root 1.1
292 root 1.7 Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
293     "OBJECT SERIALISATION", below, for details.
294 root 1.1
295     =item simple scalars
296    
297     TODO
298     Simple Perl scalars (any scalar that is not a reference) are the most
299     difficult objects to encode: CBOR::XS will encode undefined scalars as
300 root 1.4 CBOR null values, scalars that have last been used in a string context
301 root 1.1 before encoding as CBOR strings, and anything else as number value:
302    
303     # dump as number
304     encode_cbor [2] # yields [2]
305     encode_cbor [-3.0e17] # yields [-3e+17]
306     my $value = 5; encode_cbor [$value] # yields [5]
307    
308     # used as string, so dump as string
309     print $value;
310     encode_cbor [$value] # yields ["5"]
311    
312     # undef becomes null
313     encode_cbor [undef] # yields [null]
314    
315     You can force the type to be a CBOR string by stringifying it:
316    
317     my $x = 3.1; # some variable containing a number
318     "$x"; # stringified
319     $x .= ""; # another, more awkward way to stringify
320     print $x; # perl does it for you, too, quite often
321    
322     You can force the type to be a CBOR number by numifying it:
323    
324     my $x = "3"; # some variable containing a string
325     $x += 0; # numify it, ensuring it will be dumped as a number
326     $x *= 1; # same thing, the choice is yours.
327    
328     You can not currently force the type in other, less obscure, ways. Tell me
329     if you need this capability (but don't forget to explain why it's needed
330     :).
331    
332 root 1.4 Perl values that seem to be integers generally use the shortest possible
333     representation. Floating-point values will use either the IEEE single
334     format if possible without loss of precision, otherwise the IEEE double
335     format will be used. Perls that use formats other than IEEE double to
336     represent numerical values are supported, but might suffer loss of
337     precision.
338 root 1.1
339     =back
340    
341 root 1.7 =head2 OBJECT SERIALISATION
342    
343     This module knows two way to serialise a Perl object: The CBOR-specific
344     way, and the generic way.
345    
346     Whenever the encoder encounters a Perl object that it cnanot serialise
347     directly (most of them), it will first look up the C<TO_CBOR> method on
348     it.
349    
350     If it has a C<TO_CBOR> method, it will call it with the object as only
351     argument, and expects exactly one return value, which it will then
352     substitute and encode it in the place of the object.
353    
354     Otherwise, it will look up the C<FREEZE> method. If it exists, it will
355     call it with the object as first argument, and the constant string C<CBOR>
356     as the second argument, to distinguish it from other serialisers.
357    
358     The C<FREEZE> method can return any number of values (i.e. zero or
359     more). These will be encoded as CBOR perl object, together with the
360     classname.
361    
362     If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
363     with an error.
364    
365     Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
366     objects encoded via C<FREEZE> can be decoded using the following protocol:
367    
368     When an encoded CBOR perl object is encountered by the decoder, it will
369     look up the C<THAW> method, by using the stored classname, and will fail
370     if the method cannot be found.
371    
372     After the lookup it will call the C<THAW> method with the stored classname
373     as first argument, the constant string C<CBOR> as second argument, and all
374     values returned by C<FREEZE> as remaining arguments.
375    
376     =head4 EXAMPLES
377    
378     Here is an example C<TO_CBOR> method:
379    
380     sub My::Object::TO_CBOR {
381     my ($obj) = @_;
382    
383     ["this is a serialised My::Object object", $obj->{id}]
384     }
385    
386     When a C<My::Object> is encoded to CBOR, it will instead encode a simple
387     array with two members: a string, and the "object id". Decoding this CBOR
388     string will yield a normal perl array reference in place of the object.
389    
390     A more useful and practical example would be a serialisation method for
391     the URI module. CBOR has a custom tag value for URIs, namely 32:
392    
393     sub URI::TO_CBOR {
394     my ($self) = @_;
395     my $uri = "$self"; # stringify uri
396     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
397     CBOR::XS::tagged 32, "$_[0]"
398     }
399    
400     This will encode URIs as a UTF-8 string with tag 32, which indicates an
401     URI.
402    
403     Decoding such an URI will not (currently) give you an URI object, but
404     instead a CBOR::XS::Tagged object with tag number 32 and the string -
405     exactly what was returned by C<TO_CBOR>.
406    
407     To serialise an object so it can automatically be deserialised, you need
408     to use C<FREEZE> and C<THAW>. To take the URI module as example, this
409     would be a possible implementation:
410    
411     sub URI::FREEZE {
412     my ($self, $serialiser) = @_;
413     "$self" # encode url string
414     }
415    
416     sub URI::THAW {
417     my ($class, $serialiser, $uri) = @_;
418    
419     $class->new ($uri)
420     }
421    
422     Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
423     example, a C<FREEZE> method that returns "type", "id" and "variant" values
424     would cause an invocation of C<THAW> with 5 arguments:
425    
426     sub My::Object::FREEZE {
427     my ($self, $serialiser) = @_;
428    
429     ($self->{type}, $self->{id}, $self->{variant})
430     }
431    
432     sub My::Object::THAW {
433     my ($class, $serialiser, $type, $id, $variant) = @_;
434    
435     $class-<new (type => $type, id => $id, variant => $variant)
436     }
437    
438 root 1.1
439 root 1.7 =head1 MAGIC HEADER
440 root 1.3
441     There is no way to distinguish CBOR from other formats
442     programmatically. To make it easier to distinguish CBOR from other
443     formats, the CBOR specification has a special "magic string" that can be
444     prepended to any CBOR string without changing it's meaning.
445    
446     This string is available as C<$CBOR::XS::MAGIC>. This module does not
447     prepend this string tot he CBOR data it generates, but it will ignroe it
448     if present, so users can prepend this string as a "file type" indicator as
449     required.
450    
451    
452 root 1.7 =head1 CBOR and JSON
453 root 1.1
454 root 1.4 CBOR is supposed to implement a superset of the JSON data model, and is,
455     with some coercion, able to represent all JSON texts (something that other
456     "binary JSON" formats such as BSON generally do not support).
457    
458     CBOR implements some extra hints and support for JSON interoperability,
459     and the spec offers further guidance for conversion between CBOR and
460     JSON. None of this is currently implemented in CBOR, and the guidelines
461     in the spec do not result in correct round-tripping of data. If JSON
462     interoperability is improved in the future, then the goal will be to
463     ensure that decoded JSON data will round-trip encoding and decoding to
464     CBOR intact.
465 root 1.1
466    
467     =head1 SECURITY CONSIDERATIONS
468    
469     When you are using CBOR in a protocol, talking to untrusted potentially
470     hostile creatures requires relatively few measures.
471    
472     First of all, your CBOR decoder should be secure, that is, should not have
473     any buffer overflows. Obviously, this module should ensure that and I am
474     trying hard on making that true, but you never know.
475    
476     Second, you need to avoid resource-starving attacks. That means you should
477     limit the size of CBOR data you accept, or make sure then when your
478     resources run out, that's just fine (e.g. by using a separate process that
479     can crash safely). The size of a CBOR string in octets is usually a good
480     indication of the size of the resources required to decode it into a Perl
481     structure. While CBOR::XS can check the size of the CBOR text, it might be
482     too late when you already have it in memory, so you might want to check
483     the size before you accept the string.
484    
485     Third, CBOR::XS recurses using the C stack when decoding objects and
486     arrays. The C stack is a limited resource: for instance, on my amd64
487     machine with 8MB of stack size I can decode around 180k nested arrays but
488     only 14k nested CBOR objects (due to perl itself recursing deeply on croak
489     to free the temporary). If that is exceeded, the program crashes. To be
490     conservative, the default nesting limit is set to 512. If your process
491     has a smaller stack, you should adjust this setting accordingly with the
492     C<max_depth> method.
493    
494     Something else could bomb you, too, that I forgot to think of. In that
495     case, you get to keep the pieces. I am always open for hints, though...
496    
497     Also keep in mind that CBOR::XS might leak contents of your Perl data
498     structures in its error messages, so when you serialise sensitive
499     information you might want to make sure that exceptions thrown by CBOR::XS
500     will not end up in front of untrusted eyes.
501    
502     =head1 CBOR IMPLEMENTATION NOTES
503    
504     This section contains some random implementation notes. They do not
505     describe guaranteed behaviour, but merely behaviour as-is implemented
506     right now.
507    
508     64 bit integers are only properly decoded when Perl was built with 64 bit
509     support.
510    
511     Strings and arrays are encoded with a definite length. Hashes as well,
512     unless they are tied (or otherwise magical).
513    
514     Only the double data type is supported for NV data types - when Perl uses
515     long double to represent floating point values, they might not be encoded
516     properly. Half precision types are accepted, but not encoded.
517    
518     Strict mode and canonical mode are not implemented.
519    
520    
521     =head1 THREADS
522    
523     This module is I<not> guaranteed to be thread safe and there are no
524     plans to change this until Perl gets thread support (as opposed to the
525     horribly slow so-called "threads" which are simply slow and bloated
526     process simulations - use fork, it's I<much> faster, cheaper, better).
527    
528     (It might actually work, but you have been warned).
529    
530    
531     =head1 BUGS
532    
533     While the goal of this module is to be correct, that unfortunately does
534     not mean it's bug-free, only that I think its design is bug-free. If you
535     keep reporting bugs they will be fixed swiftly, though.
536    
537     Please refrain from using rt.cpan.org or any other bug reporting
538     service. I put the contact address into my modules for a reason.
539    
540     =cut
541    
542     XSLoader::load "CBOR::XS", $VERSION;
543    
544     =head1 SEE ALSO
545    
546     The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
547     serialisation.
548    
549 root 1.6 The L<Types::Serialiser> module provides the data model for true, false
550     and error values.
551    
552 root 1.1 =head1 AUTHOR
553    
554     Marc Lehmann <schmorp@schmorp.de>
555     http://home.schmorp.de/
556    
557     =cut
558    
559 root 1.6 1
560