ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/XS.pm
Revision: 1.13
Committed: Tue Oct 29 15:56:31 2013 UTC (10 years, 6 months ago) by root
Branch: MAIN
CVS Tags: rel-0_06
Changes since 1.12: +31 -3 lines
Log Message:
0.06

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4    
5     =encoding utf-8
6    
7     =head1 SYNOPSIS
8    
9     use CBOR::XS;
10    
11     $binary_cbor_data = encode_cbor $perl_value;
12     $perl_value = decode_cbor $binary_cbor_data;
13    
14     # OO-interface
15    
16     $coder = CBOR::XS->new;
17 root 1.6 $binary_cbor_data = $coder->encode ($perl_value);
18     $perl_value = $coder->decode ($binary_cbor_data);
19    
20     # prefix decoding
21    
22     my $many_cbor_strings = ...;
23     while (length $many_cbor_strings) {
24     my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25     # data was decoded
26     substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27     }
28 root 1.1
29     =head1 DESCRIPTION
30    
31 root 1.9 WARNING! This module is very new, and not very well tested (that's up to
32     you to do). Furthermore, details of the implementation might change freely
33     before version 1.0. And lastly, the object serialisation protocol depends
34     on a pending IANA assignment, and until that assignment is official, this
35     implementation is not interoperable with other implementations (even
36     future versions of this module) until the assignment is done.
37    
38     You are still invited to try out CBOR, and this module.
39 root 1.5
40     This module converts Perl data structures to the Concise Binary Object
41     Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42     format that aims to use a superset of the JSON data model, i.e. when you
43     can represent something in JSON, you should be able to represent it in
44     CBOR.
45 root 1.1
46 root 1.9 In short, CBOR is a faster and very compact binary alternative to JSON,
47 root 1.10 with the added ability of supporting serialisation of Perl objects. (JSON
48     often compresses better than CBOR though, so if you plan to compress the
49     data later you might want to compare both formats first).
50 root 1.5
51     The primary goal of this module is to be I<correct> and the secondary goal
52     is to be I<fast>. To reach the latter goal it was written in C.
53 root 1.1
54     See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
55     vice versa.
56    
57     =cut
58    
59     package CBOR::XS;
60    
61     use common::sense;
62    
63 root 1.13 our $VERSION = 0.06;
64 root 1.1 our @ISA = qw(Exporter);
65    
66     our @EXPORT = qw(encode_cbor decode_cbor);
67    
68     use Exporter;
69     use XSLoader;
70    
71 root 1.6 use Types::Serialiser;
72    
73 root 1.3 our $MAGIC = "\xd9\xd9\xf7";
74    
75 root 1.1 =head1 FUNCTIONAL INTERFACE
76    
77     The following convenience methods are provided by this module. They are
78     exported by default:
79    
80     =over 4
81    
82     =item $cbor_data = encode_cbor $perl_scalar
83    
84     Converts the given Perl data structure to CBOR representation. Croaks on
85     error.
86    
87     =item $perl_scalar = decode_cbor $cbor_data
88    
89     The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
90     returning the resulting perl scalar. Croaks on error.
91    
92     =back
93    
94    
95     =head1 OBJECT-ORIENTED INTERFACE
96    
97     The object oriented interface lets you configure your own encoding or
98     decoding style, within the limits of supported formats.
99    
100     =over 4
101    
102     =item $cbor = new CBOR::XS
103    
104     Creates a new CBOR::XS object that can be used to de/encode CBOR
105     strings. All boolean flags described below are by default I<disabled>.
106    
107     The mutators for flags all return the CBOR object again and thus calls can
108     be chained:
109    
110     #TODO
111     my $cbor = CBOR::XS->new->encode ({a => [1,2]});
112    
113     =item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
114    
115     =item $max_depth = $cbor->get_max_depth
116    
117     Sets the maximum nesting level (default C<512>) accepted while encoding
118     or decoding. If a higher nesting level is detected in CBOR data or a Perl
119     data structure, then the encoder and decoder will stop and croak at that
120     point.
121    
122     Nesting level is defined by number of hash- or arrayrefs that the encoder
123     needs to traverse to reach a given point or the number of C<{> or C<[>
124     characters without their matching closing parenthesis crossed to reach a
125     given character in a string.
126    
127     Setting the maximum depth to one disallows any nesting, so that ensures
128     that the object is only a single hash/object or array.
129    
130     If no argument is given, the highest possible setting will be used, which
131     is rarely useful.
132    
133     Note that nesting is implemented by recursion in C. The default value has
134     been chosen to be as large as typical operating systems allow without
135     crashing.
136    
137     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
138    
139     =item $cbor = $cbor->max_size ([$maximum_string_size])
140    
141     =item $max_size = $cbor->get_max_size
142    
143     Set the maximum length a CBOR string may have (in bytes) where decoding
144     is being attempted. The default is C<0>, meaning no limit. When C<decode>
145     is called on a string that is longer then this many bytes, it will not
146     attempt to decode the string but throw an exception. This setting has no
147     effect on C<encode> (yet).
148    
149     If no argument is given, the limit check will be deactivated (same as when
150     C<0> is specified).
151    
152     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
153    
154     =item $cbor_data = $cbor->encode ($perl_scalar)
155    
156     Converts the given Perl data structure (a scalar value) to its CBOR
157     representation.
158    
159     =item $perl_scalar = $cbor->decode ($cbor_data)
160    
161     The opposite of C<encode>: expects CBOR data and tries to parse it,
162     returning the resulting simple scalar or reference. Croaks on error.
163    
164     =item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
165    
166     This works like the C<decode> method, but instead of raising an exception
167     when there is trailing garbage after the CBOR string, it will silently
168     stop parsing there and return the number of characters consumed so far.
169    
170     This is useful if your CBOR texts are not delimited by an outer protocol
171     and you need to know where the first CBOR string ends amd the next one
172     starts.
173    
174     CBOR::XS->new->decode_prefix ("......")
175     => ("...", 3)
176    
177     =back
178    
179    
180     =head1 MAPPING
181    
182     This section describes how CBOR::XS maps Perl values to CBOR values and
183     vice versa. These mappings are designed to "do the right thing" in most
184     circumstances automatically, preserving round-tripping characteristics
185     (what you put in comes out as something equivalent).
186    
187     For the more enlightened: note that in the following descriptions,
188     lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
189     refers to the abstract Perl language itself.
190    
191    
192     =head2 CBOR -> PERL
193    
194     =over 4
195    
196 root 1.4 =item integers
197    
198     CBOR integers become (numeric) perl scalars. On perls without 64 bit
199     support, 64 bit integers will be truncated or otherwise corrupted.
200    
201     =item byte strings
202    
203     Byte strings will become octet strings in Perl (the byte values 0..255
204     will simply become characters of the same value in Perl).
205    
206     =item UTF-8 strings
207    
208     UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
209     decoded into proper Unicode code points. At the moment, the validity of
210     the UTF-8 octets will not be validated - corrupt input will result in
211     corrupted Perl strings.
212    
213     =item arrays, maps
214    
215     CBOR arrays and CBOR maps will be converted into references to a Perl
216     array or hash, respectively. The keys of the map will be stringified
217     during this process.
218    
219 root 1.6 =item null
220    
221     CBOR null becomes C<undef> in Perl.
222    
223     =item true, false, undefined
224 root 1.1
225 root 1.6 These CBOR values become C<Types:Serialiser::true>,
226     C<Types:Serialiser::false> and C<Types::Serialiser::error>,
227 root 1.1 respectively. They are overloaded to act almost exactly like the numbers
228 root 1.6 C<1> and C<0> (for true and false) or to throw an exception on access (for
229     error). See the L<Types::Serialiser> manpage for details.
230    
231     =item CBOR tag 256 (perl object)
232    
233 root 1.7 The tag value C<256> (TODO: pending iana registration) will be used
234 root 1.11 to deserialise a Perl object serialised with C<FREEZE>. See L<OBJECT
235     SERIALISATION>, below, for details.
236 root 1.1
237 root 1.6 =item CBOR tag 55799 (magic header)
238 root 1.4
239 root 1.6 The tag 55799 is ignored (this tag implements the magic header).
240 root 1.1
241 root 1.6 =item other CBOR tags
242 root 1.4
243 root 1.6 Tagged items consists of a numeric tag and another CBOR value. Tags not
244     handled internally are currently converted into a L<CBOR::XS::Tagged>
245     object, which is simply a blessed array reference consisting of the
246     numeric tag value followed by the (decoded) CBOR value.
247 root 1.4
248 root 1.6 In the future, support for user-supplied conversions might get added.
249 root 1.4
250     =item anything else
251    
252     Anything else (e.g. unsupported simple values) will raise a decoding
253     error.
254 root 1.1
255     =back
256    
257    
258     =head2 PERL -> CBOR
259    
260     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
261     truly typeless language, so we can only guess which CBOR type is meant by
262     a Perl value.
263    
264     =over 4
265    
266     =item hash references
267    
268 root 1.4 Perl hash references become CBOR maps. As there is no inherent ordering in
269     hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
270     order.
271    
272     Currently, tied hashes will use the indefinite-length format, while normal
273     hashes will use the fixed-length format.
274 root 1.1
275     =item array references
276    
277 root 1.4 Perl array references become fixed-length CBOR arrays.
278 root 1.1
279     =item other references
280    
281     Other unblessed references are generally not allowed and will cause an
282     exception to be thrown, except for references to the integers C<0> and
283 root 1.4 C<1>, which get turned into false and true in CBOR.
284    
285     =item CBOR::XS::Tagged objects
286    
287     Objects of this type must be arrays consisting of a single C<[tag, value]>
288 root 1.13 pair. The (numerical) tag will be encoded as a CBOR tag, the value will
289     be encoded as appropriate for the value. You cna use C<CBOR::XS::tag> to
290     create such objects.
291 root 1.1
292 root 1.6 =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
293 root 1.1
294 root 1.6 These special values become CBOR true, CBOR false and CBOR undefined
295     values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
296     if you want.
297 root 1.1
298 root 1.7 =item other blessed objects
299 root 1.1
300 root 1.7 Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
301 root 1.11 L<OBJECT SERIALISATION>, below, for details.
302 root 1.1
303     =item simple scalars
304    
305     TODO
306     Simple Perl scalars (any scalar that is not a reference) are the most
307     difficult objects to encode: CBOR::XS will encode undefined scalars as
308 root 1.4 CBOR null values, scalars that have last been used in a string context
309 root 1.1 before encoding as CBOR strings, and anything else as number value:
310    
311     # dump as number
312     encode_cbor [2] # yields [2]
313     encode_cbor [-3.0e17] # yields [-3e+17]
314     my $value = 5; encode_cbor [$value] # yields [5]
315    
316     # used as string, so dump as string
317     print $value;
318     encode_cbor [$value] # yields ["5"]
319    
320     # undef becomes null
321     encode_cbor [undef] # yields [null]
322    
323     You can force the type to be a CBOR string by stringifying it:
324    
325     my $x = 3.1; # some variable containing a number
326     "$x"; # stringified
327     $x .= ""; # another, more awkward way to stringify
328     print $x; # perl does it for you, too, quite often
329    
330     You can force the type to be a CBOR number by numifying it:
331    
332     my $x = "3"; # some variable containing a string
333     $x += 0; # numify it, ensuring it will be dumped as a number
334     $x *= 1; # same thing, the choice is yours.
335    
336     You can not currently force the type in other, less obscure, ways. Tell me
337     if you need this capability (but don't forget to explain why it's needed
338     :).
339    
340 root 1.4 Perl values that seem to be integers generally use the shortest possible
341     representation. Floating-point values will use either the IEEE single
342     format if possible without loss of precision, otherwise the IEEE double
343     format will be used. Perls that use formats other than IEEE double to
344     represent numerical values are supported, but might suffer loss of
345     precision.
346 root 1.1
347     =back
348    
349 root 1.7 =head2 OBJECT SERIALISATION
350    
351     This module knows two way to serialise a Perl object: The CBOR-specific
352     way, and the generic way.
353    
354     Whenever the encoder encounters a Perl object that it cnanot serialise
355     directly (most of them), it will first look up the C<TO_CBOR> method on
356     it.
357    
358     If it has a C<TO_CBOR> method, it will call it with the object as only
359     argument, and expects exactly one return value, which it will then
360     substitute and encode it in the place of the object.
361    
362     Otherwise, it will look up the C<FREEZE> method. If it exists, it will
363     call it with the object as first argument, and the constant string C<CBOR>
364     as the second argument, to distinguish it from other serialisers.
365    
366     The C<FREEZE> method can return any number of values (i.e. zero or
367     more). These will be encoded as CBOR perl object, together with the
368     classname.
369    
370     If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
371     with an error.
372    
373     Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
374     objects encoded via C<FREEZE> can be decoded using the following protocol:
375    
376     When an encoded CBOR perl object is encountered by the decoder, it will
377     look up the C<THAW> method, by using the stored classname, and will fail
378     if the method cannot be found.
379    
380     After the lookup it will call the C<THAW> method with the stored classname
381     as first argument, the constant string C<CBOR> as second argument, and all
382     values returned by C<FREEZE> as remaining arguments.
383    
384     =head4 EXAMPLES
385    
386     Here is an example C<TO_CBOR> method:
387    
388     sub My::Object::TO_CBOR {
389     my ($obj) = @_;
390    
391     ["this is a serialised My::Object object", $obj->{id}]
392     }
393    
394     When a C<My::Object> is encoded to CBOR, it will instead encode a simple
395     array with two members: a string, and the "object id". Decoding this CBOR
396     string will yield a normal perl array reference in place of the object.
397    
398     A more useful and practical example would be a serialisation method for
399     the URI module. CBOR has a custom tag value for URIs, namely 32:
400    
401     sub URI::TO_CBOR {
402     my ($self) = @_;
403     my $uri = "$self"; # stringify uri
404     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
405     CBOR::XS::tagged 32, "$_[0]"
406     }
407    
408     This will encode URIs as a UTF-8 string with tag 32, which indicates an
409     URI.
410    
411     Decoding such an URI will not (currently) give you an URI object, but
412     instead a CBOR::XS::Tagged object with tag number 32 and the string -
413     exactly what was returned by C<TO_CBOR>.
414    
415     To serialise an object so it can automatically be deserialised, you need
416     to use C<FREEZE> and C<THAW>. To take the URI module as example, this
417     would be a possible implementation:
418    
419     sub URI::FREEZE {
420     my ($self, $serialiser) = @_;
421     "$self" # encode url string
422     }
423    
424     sub URI::THAW {
425     my ($class, $serialiser, $uri) = @_;
426    
427     $class->new ($uri)
428     }
429    
430     Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
431     example, a C<FREEZE> method that returns "type", "id" and "variant" values
432     would cause an invocation of C<THAW> with 5 arguments:
433    
434     sub My::Object::FREEZE {
435     my ($self, $serialiser) = @_;
436    
437     ($self->{type}, $self->{id}, $self->{variant})
438     }
439    
440     sub My::Object::THAW {
441     my ($class, $serialiser, $type, $id, $variant) = @_;
442    
443     $class-<new (type => $type, id => $id, variant => $variant)
444     }
445    
446 root 1.1
447 root 1.7 =head1 MAGIC HEADER
448 root 1.3
449     There is no way to distinguish CBOR from other formats
450     programmatically. To make it easier to distinguish CBOR from other
451     formats, the CBOR specification has a special "magic string" that can be
452     prepended to any CBOR string without changing it's meaning.
453    
454     This string is available as C<$CBOR::XS::MAGIC>. This module does not
455     prepend this string tot he CBOR data it generates, but it will ignroe it
456     if present, so users can prepend this string as a "file type" indicator as
457     required.
458    
459    
460 root 1.12 =head1 THE CBOR::XS::Tagged CLASS
461    
462     CBOR has the concept of tagged values - any CBOR value can be tagged with
463     a numeric 64 bit number, which are centrally administered.
464    
465     C<CBOR::XS> handles a few tags internally when en- or decoding. You can
466     also create tags yourself by encoding C<CBOR::XS::Tagged> objects, and the
467     decoder will create C<CBOR::XS::Tagged> objects itself when it hits an
468     unknown tag.
469    
470     These objects are simply blessed array references - the first member of
471     the array being the numerical tag, the second being the value.
472    
473     You can interact with C<CBOR::XS::Tagged> objects in the following ways:
474    
475     =over 4
476    
477     =item $tagged = CBOR::XS::tag $tag, $value
478    
479     This function(!) creates a new C<CBOR::XS::Tagged> object using the given
480     C<$tag> (0..2**64-1) to tag the given C<$value> (which can be any Perl
481     value that can be encoded in CBOR, including serialisable Perl objects and
482     C<CBOR::XS::Tagged> objects).
483    
484     =item $tagged->[0]
485    
486     =item $tagged->[0] = $new_tag
487    
488     =item $tag = $tagged->tag
489    
490     =item $new_tag = $tagged->tag ($new_tag)
491    
492     Access/mutate the tag.
493    
494     =item $tagged->[1]
495    
496     =item $tagged->[1] = $new_value
497    
498     =item $value = $tagged->value
499    
500     =item $new_value = $tagged->value ($new_value)
501    
502     Access/mutate the tagged value.
503    
504     =back
505    
506     =cut
507    
508     sub tag($$) {
509     bless [@_], CBOR::XS::Tagged::;
510     }
511    
512     sub CBOR::XS::Tagged::tag {
513     $_[0][0] = $_[1] if $#_;
514     $_[0][0]
515     }
516    
517     sub CBOR::XS::Tagged::value {
518     $_[0][1] = $_[1] if $#_;
519     $_[0][1]
520     }
521    
522 root 1.13 =head2 EXAMPLES
523    
524     Here are some examples of C<CBOR::XS::Tagged> uses to tag objects.
525    
526     You can look up CBOR tag value and emanings in the IANA registry at
527     L<http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
528    
529     Prepend a magic header (C<$CBOR::XS::MAGIC>):
530    
531     my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
532     # same as:
533     my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
534    
535     Serialise some URIs and a regex in an array:
536    
537     my $cbor = encode_cbor [
538     (CBOR::XS::tag 32, "http://www.nethype.de/"),
539     (CBOR::XS::tag 32, "http://software.schmorp.de/"),
540     (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
541     ];
542    
543     Wrap CBOR data in CBOR:
544    
545     my $cbor_cbor = encode_cbor
546     CBOR::XS::tag 24,
547     encode_cbor [1, 2, 3];
548    
549 root 1.7 =head1 CBOR and JSON
550 root 1.1
551 root 1.4 CBOR is supposed to implement a superset of the JSON data model, and is,
552     with some coercion, able to represent all JSON texts (something that other
553     "binary JSON" formats such as BSON generally do not support).
554    
555     CBOR implements some extra hints and support for JSON interoperability,
556     and the spec offers further guidance for conversion between CBOR and
557     JSON. None of this is currently implemented in CBOR, and the guidelines
558     in the spec do not result in correct round-tripping of data. If JSON
559     interoperability is improved in the future, then the goal will be to
560     ensure that decoded JSON data will round-trip encoding and decoding to
561     CBOR intact.
562 root 1.1
563    
564     =head1 SECURITY CONSIDERATIONS
565    
566     When you are using CBOR in a protocol, talking to untrusted potentially
567     hostile creatures requires relatively few measures.
568    
569     First of all, your CBOR decoder should be secure, that is, should not have
570     any buffer overflows. Obviously, this module should ensure that and I am
571     trying hard on making that true, but you never know.
572    
573     Second, you need to avoid resource-starving attacks. That means you should
574     limit the size of CBOR data you accept, or make sure then when your
575     resources run out, that's just fine (e.g. by using a separate process that
576     can crash safely). The size of a CBOR string in octets is usually a good
577     indication of the size of the resources required to decode it into a Perl
578     structure. While CBOR::XS can check the size of the CBOR text, it might be
579     too late when you already have it in memory, so you might want to check
580     the size before you accept the string.
581    
582     Third, CBOR::XS recurses using the C stack when decoding objects and
583     arrays. The C stack is a limited resource: for instance, on my amd64
584     machine with 8MB of stack size I can decode around 180k nested arrays but
585     only 14k nested CBOR objects (due to perl itself recursing deeply on croak
586     to free the temporary). If that is exceeded, the program crashes. To be
587     conservative, the default nesting limit is set to 512. If your process
588     has a smaller stack, you should adjust this setting accordingly with the
589     C<max_depth> method.
590    
591     Something else could bomb you, too, that I forgot to think of. In that
592     case, you get to keep the pieces. I am always open for hints, though...
593    
594     Also keep in mind that CBOR::XS might leak contents of your Perl data
595     structures in its error messages, so when you serialise sensitive
596     information you might want to make sure that exceptions thrown by CBOR::XS
597     will not end up in front of untrusted eyes.
598    
599     =head1 CBOR IMPLEMENTATION NOTES
600    
601     This section contains some random implementation notes. They do not
602     describe guaranteed behaviour, but merely behaviour as-is implemented
603     right now.
604    
605     64 bit integers are only properly decoded when Perl was built with 64 bit
606     support.
607    
608     Strings and arrays are encoded with a definite length. Hashes as well,
609     unless they are tied (or otherwise magical).
610    
611     Only the double data type is supported for NV data types - when Perl uses
612     long double to represent floating point values, they might not be encoded
613     properly. Half precision types are accepted, but not encoded.
614    
615     Strict mode and canonical mode are not implemented.
616    
617    
618     =head1 THREADS
619    
620     This module is I<not> guaranteed to be thread safe and there are no
621     plans to change this until Perl gets thread support (as opposed to the
622     horribly slow so-called "threads" which are simply slow and bloated
623     process simulations - use fork, it's I<much> faster, cheaper, better).
624    
625     (It might actually work, but you have been warned).
626    
627    
628     =head1 BUGS
629    
630     While the goal of this module is to be correct, that unfortunately does
631     not mean it's bug-free, only that I think its design is bug-free. If you
632     keep reporting bugs they will be fixed swiftly, though.
633    
634     Please refrain from using rt.cpan.org or any other bug reporting
635     service. I put the contact address into my modules for a reason.
636    
637     =cut
638    
639     XSLoader::load "CBOR::XS", $VERSION;
640    
641     =head1 SEE ALSO
642    
643     The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
644     serialisation.
645    
646 root 1.6 The L<Types::Serialiser> module provides the data model for true, false
647     and error values.
648    
649 root 1.1 =head1 AUTHOR
650    
651     Marc Lehmann <schmorp@schmorp.de>
652     http://home.schmorp.de/
653    
654     =cut
655    
656 root 1.6 1
657