ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/XS.pm
Revision: 1.14
Committed: Tue Oct 29 20:59:16 2013 UTC (10 years, 6 months ago) by root
Branch: MAIN
Changes since 1.13: +5 -0 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4    
5     =encoding utf-8
6    
7     =head1 SYNOPSIS
8    
9     use CBOR::XS;
10    
11     $binary_cbor_data = encode_cbor $perl_value;
12     $perl_value = decode_cbor $binary_cbor_data;
13    
14     # OO-interface
15    
16     $coder = CBOR::XS->new;
17 root 1.6 $binary_cbor_data = $coder->encode ($perl_value);
18     $perl_value = $coder->decode ($binary_cbor_data);
19    
20     # prefix decoding
21    
22     my $many_cbor_strings = ...;
23     while (length $many_cbor_strings) {
24     my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25     # data was decoded
26     substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27     }
28 root 1.1
29     =head1 DESCRIPTION
30    
31 root 1.9 WARNING! This module is very new, and not very well tested (that's up to
32     you to do). Furthermore, details of the implementation might change freely
33     before version 1.0. And lastly, the object serialisation protocol depends
34     on a pending IANA assignment, and until that assignment is official, this
35     implementation is not interoperable with other implementations (even
36     future versions of this module) until the assignment is done.
37    
38     You are still invited to try out CBOR, and this module.
39 root 1.5
40     This module converts Perl data structures to the Concise Binary Object
41     Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42     format that aims to use a superset of the JSON data model, i.e. when you
43     can represent something in JSON, you should be able to represent it in
44     CBOR.
45 root 1.1
46 root 1.9 In short, CBOR is a faster and very compact binary alternative to JSON,
47 root 1.10 with the added ability of supporting serialisation of Perl objects. (JSON
48     often compresses better than CBOR though, so if you plan to compress the
49     data later you might want to compare both formats first).
50 root 1.5
51 root 1.14 To give you a general idea, with texts in the megabyte range, C<CBOR::XS>
52     usually encodes roughly twice as fast as L<Storable> or L<JSON::XS> and
53     decodes about 15%-30% faster than those. The shorter the data, the worse
54     L<Storable> performs in comparison.
55    
56 root 1.5 The primary goal of this module is to be I<correct> and the secondary goal
57     is to be I<fast>. To reach the latter goal it was written in C.
58 root 1.1
59     See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
60     vice versa.
61    
62     =cut
63    
64     package CBOR::XS;
65    
66     use common::sense;
67    
68 root 1.13 our $VERSION = 0.06;
69 root 1.1 our @ISA = qw(Exporter);
70    
71     our @EXPORT = qw(encode_cbor decode_cbor);
72    
73     use Exporter;
74     use XSLoader;
75    
76 root 1.6 use Types::Serialiser;
77    
78 root 1.3 our $MAGIC = "\xd9\xd9\xf7";
79    
80 root 1.1 =head1 FUNCTIONAL INTERFACE
81    
82     The following convenience methods are provided by this module. They are
83     exported by default:
84    
85     =over 4
86    
87     =item $cbor_data = encode_cbor $perl_scalar
88    
89     Converts the given Perl data structure to CBOR representation. Croaks on
90     error.
91    
92     =item $perl_scalar = decode_cbor $cbor_data
93    
94     The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
95     returning the resulting perl scalar. Croaks on error.
96    
97     =back
98    
99    
100     =head1 OBJECT-ORIENTED INTERFACE
101    
102     The object oriented interface lets you configure your own encoding or
103     decoding style, within the limits of supported formats.
104    
105     =over 4
106    
107     =item $cbor = new CBOR::XS
108    
109     Creates a new CBOR::XS object that can be used to de/encode CBOR
110     strings. All boolean flags described below are by default I<disabled>.
111    
112     The mutators for flags all return the CBOR object again and thus calls can
113     be chained:
114    
115     #TODO
116     my $cbor = CBOR::XS->new->encode ({a => [1,2]});
117    
118     =item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
119    
120     =item $max_depth = $cbor->get_max_depth
121    
122     Sets the maximum nesting level (default C<512>) accepted while encoding
123     or decoding. If a higher nesting level is detected in CBOR data or a Perl
124     data structure, then the encoder and decoder will stop and croak at that
125     point.
126    
127     Nesting level is defined by number of hash- or arrayrefs that the encoder
128     needs to traverse to reach a given point or the number of C<{> or C<[>
129     characters without their matching closing parenthesis crossed to reach a
130     given character in a string.
131    
132     Setting the maximum depth to one disallows any nesting, so that ensures
133     that the object is only a single hash/object or array.
134    
135     If no argument is given, the highest possible setting will be used, which
136     is rarely useful.
137    
138     Note that nesting is implemented by recursion in C. The default value has
139     been chosen to be as large as typical operating systems allow without
140     crashing.
141    
142     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
143    
144     =item $cbor = $cbor->max_size ([$maximum_string_size])
145    
146     =item $max_size = $cbor->get_max_size
147    
148     Set the maximum length a CBOR string may have (in bytes) where decoding
149     is being attempted. The default is C<0>, meaning no limit. When C<decode>
150     is called on a string that is longer then this many bytes, it will not
151     attempt to decode the string but throw an exception. This setting has no
152     effect on C<encode> (yet).
153    
154     If no argument is given, the limit check will be deactivated (same as when
155     C<0> is specified).
156    
157     See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
158    
159     =item $cbor_data = $cbor->encode ($perl_scalar)
160    
161     Converts the given Perl data structure (a scalar value) to its CBOR
162     representation.
163    
164     =item $perl_scalar = $cbor->decode ($cbor_data)
165    
166     The opposite of C<encode>: expects CBOR data and tries to parse it,
167     returning the resulting simple scalar or reference. Croaks on error.
168    
169     =item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
170    
171     This works like the C<decode> method, but instead of raising an exception
172     when there is trailing garbage after the CBOR string, it will silently
173     stop parsing there and return the number of characters consumed so far.
174    
175     This is useful if your CBOR texts are not delimited by an outer protocol
176     and you need to know where the first CBOR string ends amd the next one
177     starts.
178    
179     CBOR::XS->new->decode_prefix ("......")
180     => ("...", 3)
181    
182     =back
183    
184    
185     =head1 MAPPING
186    
187     This section describes how CBOR::XS maps Perl values to CBOR values and
188     vice versa. These mappings are designed to "do the right thing" in most
189     circumstances automatically, preserving round-tripping characteristics
190     (what you put in comes out as something equivalent).
191    
192     For the more enlightened: note that in the following descriptions,
193     lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
194     refers to the abstract Perl language itself.
195    
196    
197     =head2 CBOR -> PERL
198    
199     =over 4
200    
201 root 1.4 =item integers
202    
203     CBOR integers become (numeric) perl scalars. On perls without 64 bit
204     support, 64 bit integers will be truncated or otherwise corrupted.
205    
206     =item byte strings
207    
208     Byte strings will become octet strings in Perl (the byte values 0..255
209     will simply become characters of the same value in Perl).
210    
211     =item UTF-8 strings
212    
213     UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
214     decoded into proper Unicode code points. At the moment, the validity of
215     the UTF-8 octets will not be validated - corrupt input will result in
216     corrupted Perl strings.
217    
218     =item arrays, maps
219    
220     CBOR arrays and CBOR maps will be converted into references to a Perl
221     array or hash, respectively. The keys of the map will be stringified
222     during this process.
223    
224 root 1.6 =item null
225    
226     CBOR null becomes C<undef> in Perl.
227    
228     =item true, false, undefined
229 root 1.1
230 root 1.6 These CBOR values become C<Types:Serialiser::true>,
231     C<Types:Serialiser::false> and C<Types::Serialiser::error>,
232 root 1.1 respectively. They are overloaded to act almost exactly like the numbers
233 root 1.6 C<1> and C<0> (for true and false) or to throw an exception on access (for
234     error). See the L<Types::Serialiser> manpage for details.
235    
236     =item CBOR tag 256 (perl object)
237    
238 root 1.7 The tag value C<256> (TODO: pending iana registration) will be used
239 root 1.11 to deserialise a Perl object serialised with C<FREEZE>. See L<OBJECT
240     SERIALISATION>, below, for details.
241 root 1.1
242 root 1.6 =item CBOR tag 55799 (magic header)
243 root 1.4
244 root 1.6 The tag 55799 is ignored (this tag implements the magic header).
245 root 1.1
246 root 1.6 =item other CBOR tags
247 root 1.4
248 root 1.6 Tagged items consists of a numeric tag and another CBOR value. Tags not
249     handled internally are currently converted into a L<CBOR::XS::Tagged>
250     object, which is simply a blessed array reference consisting of the
251     numeric tag value followed by the (decoded) CBOR value.
252 root 1.4
253 root 1.6 In the future, support for user-supplied conversions might get added.
254 root 1.4
255     =item anything else
256    
257     Anything else (e.g. unsupported simple values) will raise a decoding
258     error.
259 root 1.1
260     =back
261    
262    
263     =head2 PERL -> CBOR
264    
265     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
266     truly typeless language, so we can only guess which CBOR type is meant by
267     a Perl value.
268    
269     =over 4
270    
271     =item hash references
272    
273 root 1.4 Perl hash references become CBOR maps. As there is no inherent ordering in
274     hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
275     order.
276    
277     Currently, tied hashes will use the indefinite-length format, while normal
278     hashes will use the fixed-length format.
279 root 1.1
280     =item array references
281    
282 root 1.4 Perl array references become fixed-length CBOR arrays.
283 root 1.1
284     =item other references
285    
286     Other unblessed references are generally not allowed and will cause an
287     exception to be thrown, except for references to the integers C<0> and
288 root 1.4 C<1>, which get turned into false and true in CBOR.
289    
290     =item CBOR::XS::Tagged objects
291    
292     Objects of this type must be arrays consisting of a single C<[tag, value]>
293 root 1.13 pair. The (numerical) tag will be encoded as a CBOR tag, the value will
294     be encoded as appropriate for the value. You cna use C<CBOR::XS::tag> to
295     create such objects.
296 root 1.1
297 root 1.6 =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
298 root 1.1
299 root 1.6 These special values become CBOR true, CBOR false and CBOR undefined
300     values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
301     if you want.
302 root 1.1
303 root 1.7 =item other blessed objects
304 root 1.1
305 root 1.7 Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
306 root 1.11 L<OBJECT SERIALISATION>, below, for details.
307 root 1.1
308     =item simple scalars
309    
310     TODO
311     Simple Perl scalars (any scalar that is not a reference) are the most
312     difficult objects to encode: CBOR::XS will encode undefined scalars as
313 root 1.4 CBOR null values, scalars that have last been used in a string context
314 root 1.1 before encoding as CBOR strings, and anything else as number value:
315    
316     # dump as number
317     encode_cbor [2] # yields [2]
318     encode_cbor [-3.0e17] # yields [-3e+17]
319     my $value = 5; encode_cbor [$value] # yields [5]
320    
321     # used as string, so dump as string
322     print $value;
323     encode_cbor [$value] # yields ["5"]
324    
325     # undef becomes null
326     encode_cbor [undef] # yields [null]
327    
328     You can force the type to be a CBOR string by stringifying it:
329    
330     my $x = 3.1; # some variable containing a number
331     "$x"; # stringified
332     $x .= ""; # another, more awkward way to stringify
333     print $x; # perl does it for you, too, quite often
334    
335     You can force the type to be a CBOR number by numifying it:
336    
337     my $x = "3"; # some variable containing a string
338     $x += 0; # numify it, ensuring it will be dumped as a number
339     $x *= 1; # same thing, the choice is yours.
340    
341     You can not currently force the type in other, less obscure, ways. Tell me
342     if you need this capability (but don't forget to explain why it's needed
343     :).
344    
345 root 1.4 Perl values that seem to be integers generally use the shortest possible
346     representation. Floating-point values will use either the IEEE single
347     format if possible without loss of precision, otherwise the IEEE double
348     format will be used. Perls that use formats other than IEEE double to
349     represent numerical values are supported, but might suffer loss of
350     precision.
351 root 1.1
352     =back
353    
354 root 1.7 =head2 OBJECT SERIALISATION
355    
356     This module knows two way to serialise a Perl object: The CBOR-specific
357     way, and the generic way.
358    
359     Whenever the encoder encounters a Perl object that it cnanot serialise
360     directly (most of them), it will first look up the C<TO_CBOR> method on
361     it.
362    
363     If it has a C<TO_CBOR> method, it will call it with the object as only
364     argument, and expects exactly one return value, which it will then
365     substitute and encode it in the place of the object.
366    
367     Otherwise, it will look up the C<FREEZE> method. If it exists, it will
368     call it with the object as first argument, and the constant string C<CBOR>
369     as the second argument, to distinguish it from other serialisers.
370    
371     The C<FREEZE> method can return any number of values (i.e. zero or
372     more). These will be encoded as CBOR perl object, together with the
373     classname.
374    
375     If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
376     with an error.
377    
378     Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
379     objects encoded via C<FREEZE> can be decoded using the following protocol:
380    
381     When an encoded CBOR perl object is encountered by the decoder, it will
382     look up the C<THAW> method, by using the stored classname, and will fail
383     if the method cannot be found.
384    
385     After the lookup it will call the C<THAW> method with the stored classname
386     as first argument, the constant string C<CBOR> as second argument, and all
387     values returned by C<FREEZE> as remaining arguments.
388    
389     =head4 EXAMPLES
390    
391     Here is an example C<TO_CBOR> method:
392    
393     sub My::Object::TO_CBOR {
394     my ($obj) = @_;
395    
396     ["this is a serialised My::Object object", $obj->{id}]
397     }
398    
399     When a C<My::Object> is encoded to CBOR, it will instead encode a simple
400     array with two members: a string, and the "object id". Decoding this CBOR
401     string will yield a normal perl array reference in place of the object.
402    
403     A more useful and practical example would be a serialisation method for
404     the URI module. CBOR has a custom tag value for URIs, namely 32:
405    
406     sub URI::TO_CBOR {
407     my ($self) = @_;
408     my $uri = "$self"; # stringify uri
409     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
410     CBOR::XS::tagged 32, "$_[0]"
411     }
412    
413     This will encode URIs as a UTF-8 string with tag 32, which indicates an
414     URI.
415    
416     Decoding such an URI will not (currently) give you an URI object, but
417     instead a CBOR::XS::Tagged object with tag number 32 and the string -
418     exactly what was returned by C<TO_CBOR>.
419    
420     To serialise an object so it can automatically be deserialised, you need
421     to use C<FREEZE> and C<THAW>. To take the URI module as example, this
422     would be a possible implementation:
423    
424     sub URI::FREEZE {
425     my ($self, $serialiser) = @_;
426     "$self" # encode url string
427     }
428    
429     sub URI::THAW {
430     my ($class, $serialiser, $uri) = @_;
431    
432     $class->new ($uri)
433     }
434    
435     Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
436     example, a C<FREEZE> method that returns "type", "id" and "variant" values
437     would cause an invocation of C<THAW> with 5 arguments:
438    
439     sub My::Object::FREEZE {
440     my ($self, $serialiser) = @_;
441    
442     ($self->{type}, $self->{id}, $self->{variant})
443     }
444    
445     sub My::Object::THAW {
446     my ($class, $serialiser, $type, $id, $variant) = @_;
447    
448     $class-<new (type => $type, id => $id, variant => $variant)
449     }
450    
451 root 1.1
452 root 1.7 =head1 MAGIC HEADER
453 root 1.3
454     There is no way to distinguish CBOR from other formats
455     programmatically. To make it easier to distinguish CBOR from other
456     formats, the CBOR specification has a special "magic string" that can be
457     prepended to any CBOR string without changing it's meaning.
458    
459     This string is available as C<$CBOR::XS::MAGIC>. This module does not
460     prepend this string tot he CBOR data it generates, but it will ignroe it
461     if present, so users can prepend this string as a "file type" indicator as
462     required.
463    
464    
465 root 1.12 =head1 THE CBOR::XS::Tagged CLASS
466    
467     CBOR has the concept of tagged values - any CBOR value can be tagged with
468     a numeric 64 bit number, which are centrally administered.
469    
470     C<CBOR::XS> handles a few tags internally when en- or decoding. You can
471     also create tags yourself by encoding C<CBOR::XS::Tagged> objects, and the
472     decoder will create C<CBOR::XS::Tagged> objects itself when it hits an
473     unknown tag.
474    
475     These objects are simply blessed array references - the first member of
476     the array being the numerical tag, the second being the value.
477    
478     You can interact with C<CBOR::XS::Tagged> objects in the following ways:
479    
480     =over 4
481    
482     =item $tagged = CBOR::XS::tag $tag, $value
483    
484     This function(!) creates a new C<CBOR::XS::Tagged> object using the given
485     C<$tag> (0..2**64-1) to tag the given C<$value> (which can be any Perl
486     value that can be encoded in CBOR, including serialisable Perl objects and
487     C<CBOR::XS::Tagged> objects).
488    
489     =item $tagged->[0]
490    
491     =item $tagged->[0] = $new_tag
492    
493     =item $tag = $tagged->tag
494    
495     =item $new_tag = $tagged->tag ($new_tag)
496    
497     Access/mutate the tag.
498    
499     =item $tagged->[1]
500    
501     =item $tagged->[1] = $new_value
502    
503     =item $value = $tagged->value
504    
505     =item $new_value = $tagged->value ($new_value)
506    
507     Access/mutate the tagged value.
508    
509     =back
510    
511     =cut
512    
513     sub tag($$) {
514     bless [@_], CBOR::XS::Tagged::;
515     }
516    
517     sub CBOR::XS::Tagged::tag {
518     $_[0][0] = $_[1] if $#_;
519     $_[0][0]
520     }
521    
522     sub CBOR::XS::Tagged::value {
523     $_[0][1] = $_[1] if $#_;
524     $_[0][1]
525     }
526    
527 root 1.13 =head2 EXAMPLES
528    
529     Here are some examples of C<CBOR::XS::Tagged> uses to tag objects.
530    
531     You can look up CBOR tag value and emanings in the IANA registry at
532     L<http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
533    
534     Prepend a magic header (C<$CBOR::XS::MAGIC>):
535    
536     my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
537     # same as:
538     my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
539    
540     Serialise some URIs and a regex in an array:
541    
542     my $cbor = encode_cbor [
543     (CBOR::XS::tag 32, "http://www.nethype.de/"),
544     (CBOR::XS::tag 32, "http://software.schmorp.de/"),
545     (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
546     ];
547    
548     Wrap CBOR data in CBOR:
549    
550     my $cbor_cbor = encode_cbor
551     CBOR::XS::tag 24,
552     encode_cbor [1, 2, 3];
553    
554 root 1.7 =head1 CBOR and JSON
555 root 1.1
556 root 1.4 CBOR is supposed to implement a superset of the JSON data model, and is,
557     with some coercion, able to represent all JSON texts (something that other
558     "binary JSON" formats such as BSON generally do not support).
559    
560     CBOR implements some extra hints and support for JSON interoperability,
561     and the spec offers further guidance for conversion between CBOR and
562     JSON. None of this is currently implemented in CBOR, and the guidelines
563     in the spec do not result in correct round-tripping of data. If JSON
564     interoperability is improved in the future, then the goal will be to
565     ensure that decoded JSON data will round-trip encoding and decoding to
566     CBOR intact.
567 root 1.1
568    
569     =head1 SECURITY CONSIDERATIONS
570    
571     When you are using CBOR in a protocol, talking to untrusted potentially
572     hostile creatures requires relatively few measures.
573    
574     First of all, your CBOR decoder should be secure, that is, should not have
575     any buffer overflows. Obviously, this module should ensure that and I am
576     trying hard on making that true, but you never know.
577    
578     Second, you need to avoid resource-starving attacks. That means you should
579     limit the size of CBOR data you accept, or make sure then when your
580     resources run out, that's just fine (e.g. by using a separate process that
581     can crash safely). The size of a CBOR string in octets is usually a good
582     indication of the size of the resources required to decode it into a Perl
583     structure. While CBOR::XS can check the size of the CBOR text, it might be
584     too late when you already have it in memory, so you might want to check
585     the size before you accept the string.
586    
587     Third, CBOR::XS recurses using the C stack when decoding objects and
588     arrays. The C stack is a limited resource: for instance, on my amd64
589     machine with 8MB of stack size I can decode around 180k nested arrays but
590     only 14k nested CBOR objects (due to perl itself recursing deeply on croak
591     to free the temporary). If that is exceeded, the program crashes. To be
592     conservative, the default nesting limit is set to 512. If your process
593     has a smaller stack, you should adjust this setting accordingly with the
594     C<max_depth> method.
595    
596     Something else could bomb you, too, that I forgot to think of. In that
597     case, you get to keep the pieces. I am always open for hints, though...
598    
599     Also keep in mind that CBOR::XS might leak contents of your Perl data
600     structures in its error messages, so when you serialise sensitive
601     information you might want to make sure that exceptions thrown by CBOR::XS
602     will not end up in front of untrusted eyes.
603    
604     =head1 CBOR IMPLEMENTATION NOTES
605    
606     This section contains some random implementation notes. They do not
607     describe guaranteed behaviour, but merely behaviour as-is implemented
608     right now.
609    
610     64 bit integers are only properly decoded when Perl was built with 64 bit
611     support.
612    
613     Strings and arrays are encoded with a definite length. Hashes as well,
614     unless they are tied (or otherwise magical).
615    
616     Only the double data type is supported for NV data types - when Perl uses
617     long double to represent floating point values, they might not be encoded
618     properly. Half precision types are accepted, but not encoded.
619    
620     Strict mode and canonical mode are not implemented.
621    
622    
623     =head1 THREADS
624    
625     This module is I<not> guaranteed to be thread safe and there are no
626     plans to change this until Perl gets thread support (as opposed to the
627     horribly slow so-called "threads" which are simply slow and bloated
628     process simulations - use fork, it's I<much> faster, cheaper, better).
629    
630     (It might actually work, but you have been warned).
631    
632    
633     =head1 BUGS
634    
635     While the goal of this module is to be correct, that unfortunately does
636     not mean it's bug-free, only that I think its design is bug-free. If you
637     keep reporting bugs they will be fixed swiftly, though.
638    
639     Please refrain from using rt.cpan.org or any other bug reporting
640     service. I put the contact address into my modules for a reason.
641    
642     =cut
643    
644     XSLoader::load "CBOR::XS", $VERSION;
645    
646     =head1 SEE ALSO
647    
648     The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
649     serialisation.
650    
651 root 1.6 The L<Types::Serialiser> module provides the data model for true, false
652     and error values.
653    
654 root 1.1 =head1 AUTHOR
655    
656     Marc Lehmann <schmorp@schmorp.de>
657     http://home.schmorp.de/
658    
659     =cut
660    
661 root 1.6 1
662