ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/XS.pm
Revision: 1.14
Committed: Tue Oct 29 20:59:16 2013 UTC (10 years, 6 months ago) by root
Branch: MAIN
Changes since 1.13: +5 -0 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4
5 =encoding utf-8
6
7 =head1 SYNOPSIS
8
9 use CBOR::XS;
10
11 $binary_cbor_data = encode_cbor $perl_value;
12 $perl_value = decode_cbor $binary_cbor_data;
13
14 # OO-interface
15
16 $coder = CBOR::XS->new;
17 $binary_cbor_data = $coder->encode ($perl_value);
18 $perl_value = $coder->decode ($binary_cbor_data);
19
20 # prefix decoding
21
22 my $many_cbor_strings = ...;
23 while (length $many_cbor_strings) {
24 my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
25 # data was decoded
26 substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
27 }
28
29 =head1 DESCRIPTION
30
31 WARNING! This module is very new, and not very well tested (that's up to
32 you to do). Furthermore, details of the implementation might change freely
33 before version 1.0. And lastly, the object serialisation protocol depends
34 on a pending IANA assignment, and until that assignment is official, this
35 implementation is not interoperable with other implementations (even
36 future versions of this module) until the assignment is done.
37
38 You are still invited to try out CBOR, and this module.
39
40 This module converts Perl data structures to the Concise Binary Object
41 Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
42 format that aims to use a superset of the JSON data model, i.e. when you
43 can represent something in JSON, you should be able to represent it in
44 CBOR.
45
46 In short, CBOR is a faster and very compact binary alternative to JSON,
47 with the added ability of supporting serialisation of Perl objects. (JSON
48 often compresses better than CBOR though, so if you plan to compress the
49 data later you might want to compare both formats first).
50
51 To give you a general idea, with texts in the megabyte range, C<CBOR::XS>
52 usually encodes roughly twice as fast as L<Storable> or L<JSON::XS> and
53 decodes about 15%-30% faster than those. The shorter the data, the worse
54 L<Storable> performs in comparison.
55
56 The primary goal of this module is to be I<correct> and the secondary goal
57 is to be I<fast>. To reach the latter goal it was written in C.
58
59 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
60 vice versa.
61
62 =cut
63
64 package CBOR::XS;
65
66 use common::sense;
67
68 our $VERSION = 0.06;
69 our @ISA = qw(Exporter);
70
71 our @EXPORT = qw(encode_cbor decode_cbor);
72
73 use Exporter;
74 use XSLoader;
75
76 use Types::Serialiser;
77
78 our $MAGIC = "\xd9\xd9\xf7";
79
80 =head1 FUNCTIONAL INTERFACE
81
82 The following convenience methods are provided by this module. They are
83 exported by default:
84
85 =over 4
86
87 =item $cbor_data = encode_cbor $perl_scalar
88
89 Converts the given Perl data structure to CBOR representation. Croaks on
90 error.
91
92 =item $perl_scalar = decode_cbor $cbor_data
93
94 The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
95 returning the resulting perl scalar. Croaks on error.
96
97 =back
98
99
100 =head1 OBJECT-ORIENTED INTERFACE
101
102 The object oriented interface lets you configure your own encoding or
103 decoding style, within the limits of supported formats.
104
105 =over 4
106
107 =item $cbor = new CBOR::XS
108
109 Creates a new CBOR::XS object that can be used to de/encode CBOR
110 strings. All boolean flags described below are by default I<disabled>.
111
112 The mutators for flags all return the CBOR object again and thus calls can
113 be chained:
114
115 #TODO
116 my $cbor = CBOR::XS->new->encode ({a => [1,2]});
117
118 =item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
119
120 =item $max_depth = $cbor->get_max_depth
121
122 Sets the maximum nesting level (default C<512>) accepted while encoding
123 or decoding. If a higher nesting level is detected in CBOR data or a Perl
124 data structure, then the encoder and decoder will stop and croak at that
125 point.
126
127 Nesting level is defined by number of hash- or arrayrefs that the encoder
128 needs to traverse to reach a given point or the number of C<{> or C<[>
129 characters without their matching closing parenthesis crossed to reach a
130 given character in a string.
131
132 Setting the maximum depth to one disallows any nesting, so that ensures
133 that the object is only a single hash/object or array.
134
135 If no argument is given, the highest possible setting will be used, which
136 is rarely useful.
137
138 Note that nesting is implemented by recursion in C. The default value has
139 been chosen to be as large as typical operating systems allow without
140 crashing.
141
142 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
143
144 =item $cbor = $cbor->max_size ([$maximum_string_size])
145
146 =item $max_size = $cbor->get_max_size
147
148 Set the maximum length a CBOR string may have (in bytes) where decoding
149 is being attempted. The default is C<0>, meaning no limit. When C<decode>
150 is called on a string that is longer then this many bytes, it will not
151 attempt to decode the string but throw an exception. This setting has no
152 effect on C<encode> (yet).
153
154 If no argument is given, the limit check will be deactivated (same as when
155 C<0> is specified).
156
157 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
158
159 =item $cbor_data = $cbor->encode ($perl_scalar)
160
161 Converts the given Perl data structure (a scalar value) to its CBOR
162 representation.
163
164 =item $perl_scalar = $cbor->decode ($cbor_data)
165
166 The opposite of C<encode>: expects CBOR data and tries to parse it,
167 returning the resulting simple scalar or reference. Croaks on error.
168
169 =item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
170
171 This works like the C<decode> method, but instead of raising an exception
172 when there is trailing garbage after the CBOR string, it will silently
173 stop parsing there and return the number of characters consumed so far.
174
175 This is useful if your CBOR texts are not delimited by an outer protocol
176 and you need to know where the first CBOR string ends amd the next one
177 starts.
178
179 CBOR::XS->new->decode_prefix ("......")
180 => ("...", 3)
181
182 =back
183
184
185 =head1 MAPPING
186
187 This section describes how CBOR::XS maps Perl values to CBOR values and
188 vice versa. These mappings are designed to "do the right thing" in most
189 circumstances automatically, preserving round-tripping characteristics
190 (what you put in comes out as something equivalent).
191
192 For the more enlightened: note that in the following descriptions,
193 lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
194 refers to the abstract Perl language itself.
195
196
197 =head2 CBOR -> PERL
198
199 =over 4
200
201 =item integers
202
203 CBOR integers become (numeric) perl scalars. On perls without 64 bit
204 support, 64 bit integers will be truncated or otherwise corrupted.
205
206 =item byte strings
207
208 Byte strings will become octet strings in Perl (the byte values 0..255
209 will simply become characters of the same value in Perl).
210
211 =item UTF-8 strings
212
213 UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
214 decoded into proper Unicode code points. At the moment, the validity of
215 the UTF-8 octets will not be validated - corrupt input will result in
216 corrupted Perl strings.
217
218 =item arrays, maps
219
220 CBOR arrays and CBOR maps will be converted into references to a Perl
221 array or hash, respectively. The keys of the map will be stringified
222 during this process.
223
224 =item null
225
226 CBOR null becomes C<undef> in Perl.
227
228 =item true, false, undefined
229
230 These CBOR values become C<Types:Serialiser::true>,
231 C<Types:Serialiser::false> and C<Types::Serialiser::error>,
232 respectively. They are overloaded to act almost exactly like the numbers
233 C<1> and C<0> (for true and false) or to throw an exception on access (for
234 error). See the L<Types::Serialiser> manpage for details.
235
236 =item CBOR tag 256 (perl object)
237
238 The tag value C<256> (TODO: pending iana registration) will be used
239 to deserialise a Perl object serialised with C<FREEZE>. See L<OBJECT
240 SERIALISATION>, below, for details.
241
242 =item CBOR tag 55799 (magic header)
243
244 The tag 55799 is ignored (this tag implements the magic header).
245
246 =item other CBOR tags
247
248 Tagged items consists of a numeric tag and another CBOR value. Tags not
249 handled internally are currently converted into a L<CBOR::XS::Tagged>
250 object, which is simply a blessed array reference consisting of the
251 numeric tag value followed by the (decoded) CBOR value.
252
253 In the future, support for user-supplied conversions might get added.
254
255 =item anything else
256
257 Anything else (e.g. unsupported simple values) will raise a decoding
258 error.
259
260 =back
261
262
263 =head2 PERL -> CBOR
264
265 The mapping from Perl to CBOR is slightly more difficult, as Perl is a
266 truly typeless language, so we can only guess which CBOR type is meant by
267 a Perl value.
268
269 =over 4
270
271 =item hash references
272
273 Perl hash references become CBOR maps. As there is no inherent ordering in
274 hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
275 order.
276
277 Currently, tied hashes will use the indefinite-length format, while normal
278 hashes will use the fixed-length format.
279
280 =item array references
281
282 Perl array references become fixed-length CBOR arrays.
283
284 =item other references
285
286 Other unblessed references are generally not allowed and will cause an
287 exception to be thrown, except for references to the integers C<0> and
288 C<1>, which get turned into false and true in CBOR.
289
290 =item CBOR::XS::Tagged objects
291
292 Objects of this type must be arrays consisting of a single C<[tag, value]>
293 pair. The (numerical) tag will be encoded as a CBOR tag, the value will
294 be encoded as appropriate for the value. You cna use C<CBOR::XS::tag> to
295 create such objects.
296
297 =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error
298
299 These special values become CBOR true, CBOR false and CBOR undefined
300 values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly
301 if you want.
302
303 =item other blessed objects
304
305 Other blessed objects are serialised via C<TO_CBOR> or C<FREEZE>. See
306 L<OBJECT SERIALISATION>, below, for details.
307
308 =item simple scalars
309
310 TODO
311 Simple Perl scalars (any scalar that is not a reference) are the most
312 difficult objects to encode: CBOR::XS will encode undefined scalars as
313 CBOR null values, scalars that have last been used in a string context
314 before encoding as CBOR strings, and anything else as number value:
315
316 # dump as number
317 encode_cbor [2] # yields [2]
318 encode_cbor [-3.0e17] # yields [-3e+17]
319 my $value = 5; encode_cbor [$value] # yields [5]
320
321 # used as string, so dump as string
322 print $value;
323 encode_cbor [$value] # yields ["5"]
324
325 # undef becomes null
326 encode_cbor [undef] # yields [null]
327
328 You can force the type to be a CBOR string by stringifying it:
329
330 my $x = 3.1; # some variable containing a number
331 "$x"; # stringified
332 $x .= ""; # another, more awkward way to stringify
333 print $x; # perl does it for you, too, quite often
334
335 You can force the type to be a CBOR number by numifying it:
336
337 my $x = "3"; # some variable containing a string
338 $x += 0; # numify it, ensuring it will be dumped as a number
339 $x *= 1; # same thing, the choice is yours.
340
341 You can not currently force the type in other, less obscure, ways. Tell me
342 if you need this capability (but don't forget to explain why it's needed
343 :).
344
345 Perl values that seem to be integers generally use the shortest possible
346 representation. Floating-point values will use either the IEEE single
347 format if possible without loss of precision, otherwise the IEEE double
348 format will be used. Perls that use formats other than IEEE double to
349 represent numerical values are supported, but might suffer loss of
350 precision.
351
352 =back
353
354 =head2 OBJECT SERIALISATION
355
356 This module knows two way to serialise a Perl object: The CBOR-specific
357 way, and the generic way.
358
359 Whenever the encoder encounters a Perl object that it cnanot serialise
360 directly (most of them), it will first look up the C<TO_CBOR> method on
361 it.
362
363 If it has a C<TO_CBOR> method, it will call it with the object as only
364 argument, and expects exactly one return value, which it will then
365 substitute and encode it in the place of the object.
366
367 Otherwise, it will look up the C<FREEZE> method. If it exists, it will
368 call it with the object as first argument, and the constant string C<CBOR>
369 as the second argument, to distinguish it from other serialisers.
370
371 The C<FREEZE> method can return any number of values (i.e. zero or
372 more). These will be encoded as CBOR perl object, together with the
373 classname.
374
375 If an object supports neither C<TO_CBOR> nor C<FREEZE>, encoding will fail
376 with an error.
377
378 Objects encoded via C<TO_CBOR> cannot be automatically decoded, but
379 objects encoded via C<FREEZE> can be decoded using the following protocol:
380
381 When an encoded CBOR perl object is encountered by the decoder, it will
382 look up the C<THAW> method, by using the stored classname, and will fail
383 if the method cannot be found.
384
385 After the lookup it will call the C<THAW> method with the stored classname
386 as first argument, the constant string C<CBOR> as second argument, and all
387 values returned by C<FREEZE> as remaining arguments.
388
389 =head4 EXAMPLES
390
391 Here is an example C<TO_CBOR> method:
392
393 sub My::Object::TO_CBOR {
394 my ($obj) = @_;
395
396 ["this is a serialised My::Object object", $obj->{id}]
397 }
398
399 When a C<My::Object> is encoded to CBOR, it will instead encode a simple
400 array with two members: a string, and the "object id". Decoding this CBOR
401 string will yield a normal perl array reference in place of the object.
402
403 A more useful and practical example would be a serialisation method for
404 the URI module. CBOR has a custom tag value for URIs, namely 32:
405
406 sub URI::TO_CBOR {
407 my ($self) = @_;
408 my $uri = "$self"; # stringify uri
409 utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
410 CBOR::XS::tagged 32, "$_[0]"
411 }
412
413 This will encode URIs as a UTF-8 string with tag 32, which indicates an
414 URI.
415
416 Decoding such an URI will not (currently) give you an URI object, but
417 instead a CBOR::XS::Tagged object with tag number 32 and the string -
418 exactly what was returned by C<TO_CBOR>.
419
420 To serialise an object so it can automatically be deserialised, you need
421 to use C<FREEZE> and C<THAW>. To take the URI module as example, this
422 would be a possible implementation:
423
424 sub URI::FREEZE {
425 my ($self, $serialiser) = @_;
426 "$self" # encode url string
427 }
428
429 sub URI::THAW {
430 my ($class, $serialiser, $uri) = @_;
431
432 $class->new ($uri)
433 }
434
435 Unlike C<TO_CBOR>, multiple values can be returned by C<FREEZE>. For
436 example, a C<FREEZE> method that returns "type", "id" and "variant" values
437 would cause an invocation of C<THAW> with 5 arguments:
438
439 sub My::Object::FREEZE {
440 my ($self, $serialiser) = @_;
441
442 ($self->{type}, $self->{id}, $self->{variant})
443 }
444
445 sub My::Object::THAW {
446 my ($class, $serialiser, $type, $id, $variant) = @_;
447
448 $class-<new (type => $type, id => $id, variant => $variant)
449 }
450
451
452 =head1 MAGIC HEADER
453
454 There is no way to distinguish CBOR from other formats
455 programmatically. To make it easier to distinguish CBOR from other
456 formats, the CBOR specification has a special "magic string" that can be
457 prepended to any CBOR string without changing it's meaning.
458
459 This string is available as C<$CBOR::XS::MAGIC>. This module does not
460 prepend this string tot he CBOR data it generates, but it will ignroe it
461 if present, so users can prepend this string as a "file type" indicator as
462 required.
463
464
465 =head1 THE CBOR::XS::Tagged CLASS
466
467 CBOR has the concept of tagged values - any CBOR value can be tagged with
468 a numeric 64 bit number, which are centrally administered.
469
470 C<CBOR::XS> handles a few tags internally when en- or decoding. You can
471 also create tags yourself by encoding C<CBOR::XS::Tagged> objects, and the
472 decoder will create C<CBOR::XS::Tagged> objects itself when it hits an
473 unknown tag.
474
475 These objects are simply blessed array references - the first member of
476 the array being the numerical tag, the second being the value.
477
478 You can interact with C<CBOR::XS::Tagged> objects in the following ways:
479
480 =over 4
481
482 =item $tagged = CBOR::XS::tag $tag, $value
483
484 This function(!) creates a new C<CBOR::XS::Tagged> object using the given
485 C<$tag> (0..2**64-1) to tag the given C<$value> (which can be any Perl
486 value that can be encoded in CBOR, including serialisable Perl objects and
487 C<CBOR::XS::Tagged> objects).
488
489 =item $tagged->[0]
490
491 =item $tagged->[0] = $new_tag
492
493 =item $tag = $tagged->tag
494
495 =item $new_tag = $tagged->tag ($new_tag)
496
497 Access/mutate the tag.
498
499 =item $tagged->[1]
500
501 =item $tagged->[1] = $new_value
502
503 =item $value = $tagged->value
504
505 =item $new_value = $tagged->value ($new_value)
506
507 Access/mutate the tagged value.
508
509 =back
510
511 =cut
512
513 sub tag($$) {
514 bless [@_], CBOR::XS::Tagged::;
515 }
516
517 sub CBOR::XS::Tagged::tag {
518 $_[0][0] = $_[1] if $#_;
519 $_[0][0]
520 }
521
522 sub CBOR::XS::Tagged::value {
523 $_[0][1] = $_[1] if $#_;
524 $_[0][1]
525 }
526
527 =head2 EXAMPLES
528
529 Here are some examples of C<CBOR::XS::Tagged> uses to tag objects.
530
531 You can look up CBOR tag value and emanings in the IANA registry at
532 L<http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
533
534 Prepend a magic header (C<$CBOR::XS::MAGIC>):
535
536 my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
537 # same as:
538 my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
539
540 Serialise some URIs and a regex in an array:
541
542 my $cbor = encode_cbor [
543 (CBOR::XS::tag 32, "http://www.nethype.de/"),
544 (CBOR::XS::tag 32, "http://software.schmorp.de/"),
545 (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
546 ];
547
548 Wrap CBOR data in CBOR:
549
550 my $cbor_cbor = encode_cbor
551 CBOR::XS::tag 24,
552 encode_cbor [1, 2, 3];
553
554 =head1 CBOR and JSON
555
556 CBOR is supposed to implement a superset of the JSON data model, and is,
557 with some coercion, able to represent all JSON texts (something that other
558 "binary JSON" formats such as BSON generally do not support).
559
560 CBOR implements some extra hints and support for JSON interoperability,
561 and the spec offers further guidance for conversion between CBOR and
562 JSON. None of this is currently implemented in CBOR, and the guidelines
563 in the spec do not result in correct round-tripping of data. If JSON
564 interoperability is improved in the future, then the goal will be to
565 ensure that decoded JSON data will round-trip encoding and decoding to
566 CBOR intact.
567
568
569 =head1 SECURITY CONSIDERATIONS
570
571 When you are using CBOR in a protocol, talking to untrusted potentially
572 hostile creatures requires relatively few measures.
573
574 First of all, your CBOR decoder should be secure, that is, should not have
575 any buffer overflows. Obviously, this module should ensure that and I am
576 trying hard on making that true, but you never know.
577
578 Second, you need to avoid resource-starving attacks. That means you should
579 limit the size of CBOR data you accept, or make sure then when your
580 resources run out, that's just fine (e.g. by using a separate process that
581 can crash safely). The size of a CBOR string in octets is usually a good
582 indication of the size of the resources required to decode it into a Perl
583 structure. While CBOR::XS can check the size of the CBOR text, it might be
584 too late when you already have it in memory, so you might want to check
585 the size before you accept the string.
586
587 Third, CBOR::XS recurses using the C stack when decoding objects and
588 arrays. The C stack is a limited resource: for instance, on my amd64
589 machine with 8MB of stack size I can decode around 180k nested arrays but
590 only 14k nested CBOR objects (due to perl itself recursing deeply on croak
591 to free the temporary). If that is exceeded, the program crashes. To be
592 conservative, the default nesting limit is set to 512. If your process
593 has a smaller stack, you should adjust this setting accordingly with the
594 C<max_depth> method.
595
596 Something else could bomb you, too, that I forgot to think of. In that
597 case, you get to keep the pieces. I am always open for hints, though...
598
599 Also keep in mind that CBOR::XS might leak contents of your Perl data
600 structures in its error messages, so when you serialise sensitive
601 information you might want to make sure that exceptions thrown by CBOR::XS
602 will not end up in front of untrusted eyes.
603
604 =head1 CBOR IMPLEMENTATION NOTES
605
606 This section contains some random implementation notes. They do not
607 describe guaranteed behaviour, but merely behaviour as-is implemented
608 right now.
609
610 64 bit integers are only properly decoded when Perl was built with 64 bit
611 support.
612
613 Strings and arrays are encoded with a definite length. Hashes as well,
614 unless they are tied (or otherwise magical).
615
616 Only the double data type is supported for NV data types - when Perl uses
617 long double to represent floating point values, they might not be encoded
618 properly. Half precision types are accepted, but not encoded.
619
620 Strict mode and canonical mode are not implemented.
621
622
623 =head1 THREADS
624
625 This module is I<not> guaranteed to be thread safe and there are no
626 plans to change this until Perl gets thread support (as opposed to the
627 horribly slow so-called "threads" which are simply slow and bloated
628 process simulations - use fork, it's I<much> faster, cheaper, better).
629
630 (It might actually work, but you have been warned).
631
632
633 =head1 BUGS
634
635 While the goal of this module is to be correct, that unfortunately does
636 not mean it's bug-free, only that I think its design is bug-free. If you
637 keep reporting bugs they will be fixed swiftly, though.
638
639 Please refrain from using rt.cpan.org or any other bug reporting
640 service. I put the contact address into my modules for a reason.
641
642 =cut
643
644 XSLoader::load "CBOR::XS", $VERSION;
645
646 =head1 SEE ALSO
647
648 The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
649 serialisation.
650
651 The L<Types::Serialiser> module provides the data model for true, false
652 and error values.
653
654 =head1 AUTHOR
655
656 Marc Lehmann <schmorp@schmorp.de>
657 http://home.schmorp.de/
658
659 =cut
660
661 1
662