ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/README
Revision: 1.18
Committed: Wed Dec 7 14:14:30 2016 UTC (7 years, 5 months ago) by root
Branch: MAIN
CVS Tags: rel-1_71, rel-1_7, rel-1_6
Changes since 1.17: +175 -38 lines
Log Message:
1.6

File Contents

# User Rev Content
1 root 1.2 NAME
2     CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
3    
4     SYNOPSIS
5     use CBOR::XS;
6    
7     $binary_cbor_data = encode_cbor $perl_value;
8     $perl_value = decode_cbor $binary_cbor_data;
9    
10     # OO-interface
11    
12     $coder = CBOR::XS->new;
13 root 1.5 $binary_cbor_data = $coder->encode ($perl_value);
14     $perl_value = $coder->decode ($binary_cbor_data);
15    
16     # prefix decoding
17    
18     my $many_cbor_strings = ...;
19     while (length $many_cbor_strings) {
20     my ($data, $length) = $cbor->decode_prefix ($many_cbor_strings);
21     # data was decoded
22     substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
23     }
24 root 1.2
25     DESCRIPTION
26 root 1.4 This module converts Perl data structures to the Concise Binary Object
27     Representation (CBOR) and vice versa. CBOR is a fast binary
28 root 1.10 serialisation format that aims to use an (almost) superset of the JSON
29     data model, i.e. when you can represent something useful in JSON, you
30     should be able to represent it in CBOR.
31 root 1.4
32 root 1.10 In short, CBOR is a faster and quite compact binary alternative to JSON,
33 root 1.6 with the added ability of supporting serialisation of Perl objects.
34 root 1.7 (JSON often compresses better than CBOR though, so if you plan to
35 root 1.10 compress the data later and speed is less important you might want to
36     compare both formats first).
37 root 1.4
38 root 1.8 To give you a general idea about speed, with texts in the megabyte
39     range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
40     JSON::XS and decodes about 15%-30% faster than those. The shorter the
41     data, the worse Storable performs in comparison.
42    
43 root 1.10 Regarding compactness, "CBOR::XS"-encoded data structures are usually
44     about 20% smaller than the same data encoded as (compact) JSON or
45     Storable.
46 root 1.8
47 root 1.9 In addition to the core CBOR data format, this module implements a
48 root 1.10 number of extensions, to support cyclic and shared data structures (see
49 root 1.11 "allow_sharing" and "allow_cycles"), string deduplication (see
50     "pack_strings") and scalar references (always enabled).
51 root 1.9
52 root 1.4 The primary goal of this module is to be *correct* and the secondary
53     goal is to be *fast*. To reach the latter goal it was written in C.
54 root 1.2
55     See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
56     vice versa.
57    
58     FUNCTIONAL INTERFACE
59     The following convenience methods are provided by this module. They are
60     exported by default:
61    
62     $cbor_data = encode_cbor $perl_scalar
63     Converts the given Perl data structure to CBOR representation.
64     Croaks on error.
65    
66     $perl_scalar = decode_cbor $cbor_data
67     The opposite of "encode_cbor": expects a valid CBOR string to parse,
68     returning the resulting perl scalar. Croaks on error.
69    
70     OBJECT-ORIENTED INTERFACE
71     The object oriented interface lets you configure your own encoding or
72     decoding style, within the limits of supported formats.
73    
74     $cbor = new CBOR::XS
75     Creates a new CBOR::XS object that can be used to de/encode CBOR
76     strings. All boolean flags described below are by default
77     *disabled*.
78    
79     The mutators for flags all return the CBOR object again and thus
80     calls can be chained:
81    
82 root 1.9 my $cbor = CBOR::XS->new->encode ({a => [1,2]});
83 root 1.2
84 root 1.18 $cbor = new_safe CBOR::XS
85     Create a new, safe/secure CBOR::XS object. This is similar to "new",
86     but configures the coder object to be safe to use with untrusted
87     data. Currently, this is equivalent to:
88    
89     my $cbor = CBOR::XS
90     ->new
91     ->forbid_objects
92     ->filter (\&CBOR::XS::safe_filter)
93     ->max_size (1e8);
94    
95     But is more future proof (it is better to crash because of a change
96     than to be exploited in other ways).
97    
98 root 1.2 $cbor = $cbor->max_depth ([$maximum_nesting_depth])
99     $max_depth = $cbor->get_max_depth
100     Sets the maximum nesting level (default 512) accepted while encoding
101     or decoding. If a higher nesting level is detected in CBOR data or a
102     Perl data structure, then the encoder and decoder will stop and
103     croak at that point.
104    
105     Nesting level is defined by number of hash- or arrayrefs that the
106     encoder needs to traverse to reach a given point or the number of
107     "{" or "[" characters without their matching closing parenthesis
108     crossed to reach a given character in a string.
109    
110     Setting the maximum depth to one disallows any nesting, so that
111     ensures that the object is only a single hash/object or array.
112    
113     If no argument is given, the highest possible setting will be used,
114     which is rarely useful.
115    
116     Note that nesting is implemented by recursion in C. The default
117     value has been chosen to be as large as typical operating systems
118     allow without crashing.
119    
120 root 1.18 See "SECURITY CONSIDERATIONS", below, for more info on why this is
121 root 1.2 useful.
122    
123     $cbor = $cbor->max_size ([$maximum_string_size])
124     $max_size = $cbor->get_max_size
125     Set the maximum length a CBOR string may have (in bytes) where
126     decoding is being attempted. The default is 0, meaning no limit.
127     When "decode" is called on a string that is longer then this many
128     bytes, it will not attempt to decode the string but throw an
129     exception. This setting has no effect on "encode" (yet).
130    
131     If no argument is given, the limit check will be deactivated (same
132     as when 0 is specified).
133    
134 root 1.18 See "SECURITY CONSIDERATIONS", below, for more info on why this is
135 root 1.2 useful.
136    
137 root 1.9 $cbor = $cbor->allow_unknown ([$enable])
138     $enabled = $cbor->get_allow_unknown
139     If $enable is true (or missing), then "encode" will *not* throw an
140     exception when it encounters values it cannot represent in CBOR (for
141     example, filehandles) but instead will encode a CBOR "error" value.
142    
143     If $enable is false (the default), then "encode" will throw an
144     exception when it encounters anything it cannot encode as CBOR.
145    
146     This option does not affect "decode" in any way, and it is
147     recommended to leave it off unless you know your communications
148     partner.
149    
150     $cbor = $cbor->allow_sharing ([$enable])
151     $enabled = $cbor->get_allow_sharing
152     If $enable is true (or missing), then "encode" will not
153     double-encode values that have been referenced before (e.g. when the
154     same object, such as an array, is referenced multiple times), but
155     instead will emit a reference to the earlier value.
156    
157     This means that such values will only be encoded once, and will not
158     result in a deep cloning of the value on decode, in decoders
159 root 1.10 supporting the value sharing extension. This also makes it possible
160 root 1.18 to encode cyclic data structures (which need "allow_cycles" to be
161 root 1.11 enabled to be decoded by this module).
162 root 1.9
163     It is recommended to leave it off unless you know your communication
164     partner supports the value sharing extensions to CBOR
165 root 1.10 (<http://cbor.schmorp.de/value-sharing>), as without decoder
166     support, the resulting data structure might be unusable.
167 root 1.9
168     Detecting shared values incurs a runtime overhead when values are
169     encoded that have a reference counter large than one, and might
170     unnecessarily increase the encoded size, as potentially shared
171 root 1.11 values are encode as shareable whether or not they are actually
172 root 1.9 shared.
173    
174     At the moment, only targets of references can be shared (e.g.
175     scalars, arrays or hashes pointed to by a reference). Weirder
176     constructs, such as an array with multiple "copies" of the *same*
177     string, which are hard but not impossible to create in Perl, are not
178 root 1.10 supported (this is the same as with Storable).
179 root 1.9
180 root 1.10 If $enable is false (the default), then "encode" will encode shared
181     data structures repeatedly, unsharing them in the process. Cyclic
182     data structures cannot be encoded in this mode.
183 root 1.9
184     This option does not affect "decode" in any way - shared values and
185     references will always be decoded properly if present.
186    
187 root 1.11 $cbor = $cbor->allow_cycles ([$enable])
188     $enabled = $cbor->get_allow_cycles
189     If $enable is true (or missing), then "decode" will happily decode
190     self-referential (cyclic) data structures. By default these will not
191     be decoded, as they need manual cleanup to avoid memory leaks, so
192     code that isn't prepared for this will not leak memory.
193    
194     If $enable is false (the default), then "decode" will throw an error
195     when it encounters a self-referential/cyclic data structure.
196    
197 root 1.14 FUTURE DIRECTION: the motivation behind this option is to avoid
198     *real* cycles - future versions of this module might chose to decode
199     cyclic data structures using weak references when this option is
200     off, instead of throwing an error.
201    
202 root 1.11 This option does not affect "encode" in any way - shared values and
203 root 1.14 references will always be encoded properly if present.
204 root 1.11
205 root 1.18 $cbor = $cbor->forbid_objects ([$enable])
206     $enabled = $cbor->get_forbid_objects
207     Disables the use of the object serialiser protocol.
208    
209     If $enable is true (or missing), then "encode" will will throw an
210     exception when it encounters perl objects that would be encoded
211     using the perl-object tag (26). When "decode" encounters such tags,
212     it will fall back to the general filter/tagged logic as if this were
213     an unknown tag (by default resulting in a "CBOR::XC::Tagged"
214     object).
215    
216     If $enable is false (the default), then "encode" will use the
217     Types::Serialiser object serialisation protocol to serialise objects
218     into perl-object tags, and "decode" will do the same to decode such
219     tags.
220    
221     See "SECURITY CONSIDERATIONS", below, for more info on why
222     forbidding this protocol can be useful.
223    
224 root 1.10 $cbor = $cbor->pack_strings ([$enable])
225     $enabled = $cbor->get_pack_strings
226 root 1.9 If $enable is true (or missing), then "encode" will try not to
227     encode the same string twice, but will instead encode a reference to
228 root 1.10 the string instead. Depending on your data format, this can save a
229 root 1.9 lot of space, but also results in a very large runtime overhead
230     (expect encoding times to be 2-4 times as high as without).
231    
232     It is recommended to leave it off unless you know your
233     communications partner supports the stringref extension to CBOR
234 root 1.10 (<http://cbor.schmorp.de/stringref>), as without decoder support,
235     the resulting data structure might not be usable.
236 root 1.9
237 root 1.10 If $enable is false (the default), then "encode" will encode strings
238     the standard CBOR way.
239 root 1.9
240     This option does not affect "decode" in any way - string references
241     will always be decoded properly if present.
242    
243 root 1.17 $cbor = $cbor->text_keys ([$enable])
244     $enabled = $cbor->get_text_keys
245     If $enabled is true (or missing), then "encode" will encode all perl
246     hash keys as CBOR text strings/UTF-8 string, upgrading them as
247     needed.
248    
249     If $enable is false (the default), then "encode" will encode hash
250     keys normally - upgraded perl strings (strings internally encoded as
251     UTF-8) as CBOR text strings, and downgraded perl strings as CBOR
252     byte strings.
253    
254     This option does not affect "decode" in any way.
255    
256     This option is useful for interoperability with CBOR decoders that
257     don't treat byte strings as a form of text. It is especially useful
258     as Perl gives very little control over hash keys.
259    
260     Enabling this option can be slow, as all downgraded hash keys that
261     are encoded need to be scanned and converted to UTF-8.
262    
263     $cbor = $cbor->text_strings ([$enable])
264     $enabled = $cbor->get_text_strings
265     This option works similar to "text_keys", above, but works on all
266     strings (including hash keys), so "text_keys" has no further effect
267     after enabling "text_strings".
268    
269     If $enabled is true (or missing), then "encode" will encode all perl
270     strings as CBOR text strings/UTF-8 strings, upgrading them as
271     needed.
272    
273     If $enable is false (the default), then "encode" will encode strings
274     normally (but see "text_keys") - upgraded perl strings (strings
275     internally encoded as UTF-8) as CBOR text strings, and downgraded
276     perl strings as CBOR byte strings.
277    
278     This option does not affect "decode" in any way.
279    
280     This option has similar advantages and disadvantages as "text_keys".
281     In addition, this option effectively removes the ability to encode
282     byte strings, which might break some "FREEZE" and "TO_CBOR" methods
283     that rely on this, such as bignum encoding, so this option is mainly
284     useful for very simple data.
285    
286 root 1.12 $cbor = $cbor->validate_utf8 ([$enable])
287     $enabled = $cbor->get_validate_utf8
288     If $enable is true (or missing), then "decode" will validate that
289     elements (text strings) containing UTF-8 data in fact contain valid
290     UTF-8 data (instead of blindly accepting it). This validation
291     obviously takes extra time during decoding.
292    
293     The concept of "valid UTF-8" used is perl's concept, which is a
294     superset of the official UTF-8.
295    
296     If $enable is false (the default), then "decode" will blindly accept
297     UTF-8 data, marking them as valid UTF-8 in the resulting data
298 root 1.17 structure regardless of whether that's true or not.
299 root 1.12
300     Perl isn't too happy about corrupted UTF-8 in strings, but should
301     generally not crash or do similarly evil things. Extensions might be
302     not so forgiving, so it's recommended to turn on this setting if you
303     receive untrusted CBOR.
304    
305     This option does not affect "encode" in any way - strings that are
306     supposedly valid UTF-8 will simply be dumped into the resulting CBOR
307     string without checking whether that is, in fact, true or not.
308    
309 root 1.9 $cbor = $cbor->filter ([$cb->($tag, $value)])
310     $cb_or_undef = $cbor->get_filter
311     Sets or replaces the tagged value decoding filter (when $cb is
312     specified) or clears the filter (if no argument or "undef" is
313     provided).
314    
315     The filter callback is called only during decoding, when a
316     non-enforced tagged value has been decoded (see "TAG HANDLING AND
317     EXTENSIONS" for a list of enforced tags). For specific tags, it's
318     often better to provide a default converter using the
319     %CBOR::XS::FILTER hash (see below).
320    
321     The first argument is the numerical tag, the second is the (decoded)
322     value that has been tagged.
323    
324     The filter function should return either exactly one value, which
325     will replace the tagged value in the decoded data structure, or no
326     values, which will result in default handling, which currently means
327     the decoder creates a "CBOR::XS::Tagged" object to hold the tag and
328     the value.
329    
330     When the filter is cleared (the default state), the default filter
331     function, "CBOR::XS::default_filter", is used. This function simply
332     looks up the tag in the %CBOR::XS::FILTER hash. If an entry exists
333     it must be a code reference that is called with tag and value, and
334     is responsible for decoding the value. If no entry exists, it
335 root 1.18 returns no values. "CBOR::XS" provides a number of default filter
336     functions already, the the %CBOR::XS::FILTER hash can be freely
337     extended with more.
338    
339     "CBOR::XS" additionally provides an alternative filter function that
340     is supposed to be safe to use with untrusted data (which the default
341     filter might not), called "CBOR::XS::safe_filter", which works the
342     same as the "default_filter" but uses the %CBOR::XS::SAFE_FILTER
343     variable instead. It is prepopulated with the tag decoding functions
344     that are deemed safe (basically the same as %CBOR::XS::FILTER
345     without all the bignum tags), and can be extended by user code as
346     wlel, although, obviously, one should be very careful about adding
347     decoding functions here, since the expectation is that they are safe
348     to use on untrusted data, after all.
349 root 1.9
350     Example: decode all tags not handled internally into
351 root 1.10 "CBOR::XS::Tagged" objects, with no other special handling (useful
352 root 1.9 when working with potentially "unsafe" CBOR data).
353    
354     CBOR::XS->new->filter (sub { })->decode ($cbor_data);
355    
356     Example: provide a global filter for tag 1347375694, converting the
357     value into some string form.
358    
359     $CBOR::XS::FILTER{1347375694} = sub {
360     my ($tag, $value);
361    
362     "tag 1347375694 value $value"
363     };
364    
365 root 1.18 Example: provide your own filter function that looks up tags in your
366     own hash:
367    
368     my %my_filter = (
369     998347484 => sub {
370     my ($tag, $value);
371    
372     "tag 998347484 value $value"
373     };
374     );
375    
376     my $coder = CBOR::XS->new->filter (sub {
377     &{ $my_filter{$_[0]} or return }
378     });
379    
380     Example: use the safe filter function (see "SECURITY CONSIDERATIONS"
381     for more considerations on security).
382    
383     CBOR::XS->new->filter (\&CBOR::XS::safe_filter)->decode ($cbor_data);
384    
385 root 1.2 $cbor_data = $cbor->encode ($perl_scalar)
386     Converts the given Perl data structure (a scalar value) to its CBOR
387     representation.
388    
389     $perl_scalar = $cbor->decode ($cbor_data)
390     The opposite of "encode": expects CBOR data and tries to parse it,
391     returning the resulting simple scalar or reference. Croaks on error.
392    
393     ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
394     This works like the "decode" method, but instead of raising an
395     exception when there is trailing garbage after the CBOR string, it
396     will silently stop parsing there and return the number of characters
397     consumed so far.
398    
399     This is useful if your CBOR texts are not delimited by an outer
400     protocol and you need to know where the first CBOR string ends amd
401     the next one starts.
402    
403     CBOR::XS->new->decode_prefix ("......")
404     => ("...", 3)
405    
406 root 1.13 INCREMENTAL PARSING
407     In some cases, there is the need for incremental parsing of JSON texts.
408     While this module always has to keep both CBOR text and resulting Perl
409     data structure in memory at one time, it does allow you to parse a CBOR
410     stream incrementally, using a similar to using "decode_prefix" to see if
411     a full CBOR object is available, but is much more efficient.
412    
413     It basically works by parsing as much of a CBOR string as possible - if
414     the CBOR data is not complete yet, the pasrer will remember where it
415     was, to be able to restart when more data has been accumulated. Once
416     enough data is available to either decode a complete CBOR value or raise
417     an error, a real decode will be attempted.
418    
419     A typical use case would be a network protocol that consists of sending
420     and receiving CBOR-encoded messages. The solution that works with CBOR
421     and about anything else is by prepending a length to every CBOR value,
422     so the receiver knows how many octets to read. More compact (and
423     slightly slower) would be to just send CBOR values back-to-back, as
424     "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
425     length.
426    
427     The following methods help with this:
428    
429     @decoded = $cbor->incr_parse ($buffer)
430     This method attempts to decode exactly one CBOR value from the
431     beginning of the given $buffer. The value is removed from the
432     $buffer on success. When $buffer doesn't contain a complete value
433     yet, it returns nothing. Finally, when the $buffer doesn't start
434     with something that could ever be a valid CBOR value, it raises an
435     exception, just as "decode" would. In the latter case the decoder
436     state is undefined and must be reset before being able to parse
437     further.
438    
439     This method modifies the $buffer in place. When no CBOR value can be
440     decoded, the decoder stores the current string offset. On the next
441     call, continues decoding at the place where it stopped before. For
442     this to make sense, the $buffer must begin with the same octets as
443     on previous unsuccessful calls.
444    
445     You can call this method in scalar context, in which case it either
446     returns a decoded value or "undef". This makes it impossible to
447     distinguish between CBOR null values (which decode to "undef") and
448     an unsuccessful decode, which is often acceptable.
449    
450     @decoded = $cbor->incr_parse_multiple ($buffer)
451     Same as "incr_parse", but attempts to decode as many CBOR values as
452     possible in one go, instead of at most one. Calls to "incr_parse"
453     and "incr_parse_multiple" can be interleaved.
454    
455     $cbor->incr_reset
456     Resets the incremental decoder. This throws away any saved state, so
457     that subsequent calls to "incr_parse" or "incr_parse_multiple" start
458     to parse a new CBOR value from the beginning of the $buffer again.
459    
460 root 1.18 This method can be called at any time, but it *must* be called if
461     you want to change your $buffer or there was a decoding error and
462     you want to reuse the $cbor object for future incremental parsings.
463 root 1.13
464 root 1.2 MAPPING
465     This section describes how CBOR::XS maps Perl values to CBOR values and
466     vice versa. These mappings are designed to "do the right thing" in most
467     circumstances automatically, preserving round-tripping characteristics
468     (what you put in comes out as something equivalent).
469    
470     For the more enlightened: note that in the following descriptions,
471     lowercase *perl* refers to the Perl interpreter, while uppercase *Perl*
472     refers to the abstract Perl language itself.
473    
474     CBOR -> PERL
475 root 1.4 integers
476     CBOR integers become (numeric) perl scalars. On perls without 64 bit
477     support, 64 bit integers will be truncated or otherwise corrupted.
478    
479     byte strings
480 root 1.10 Byte strings will become octet strings in Perl (the Byte values
481 root 1.4 0..255 will simply become characters of the same value in Perl).
482    
483     UTF-8 strings
484     UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
485     decoded into proper Unicode code points. At the moment, the validity
486     of the UTF-8 octets will not be validated - corrupt input will
487     result in corrupted Perl strings.
488    
489     arrays, maps
490     CBOR arrays and CBOR maps will be converted into references to a
491     Perl array or hash, respectively. The keys of the map will be
492     stringified during this process.
493    
494 root 1.5 null
495     CBOR null becomes "undef" in Perl.
496    
497     true, false, undefined
498     These CBOR values become "Types:Serialiser::true",
499     "Types:Serialiser::false" and "Types::Serialiser::error",
500 root 1.2 respectively. They are overloaded to act almost exactly like the
501 root 1.5 numbers 1 and 0 (for true and false) or to throw an exception on
502     access (for error). See the Types::Serialiser manpage for details.
503    
504 root 1.9 tagged values
505     Tagged items consists of a numeric tag and another CBOR value.
506 root 1.2
507 root 1.9 See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
508 root 1.10 for details on which tags are handled how.
509 root 1.4
510     anything else
511     Anything else (e.g. unsupported simple values) will raise a decoding
512     error.
513 root 1.2
514     PERL -> CBOR
515     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
516 root 1.10 typeless language. That means this module can only guess which CBOR type
517     is meant by a perl value.
518 root 1.2
519     hash references
520     Perl hash references become CBOR maps. As there is no inherent
521     ordering in hash keys (or CBOR maps), they will usually be encoded
522 root 1.10 in a pseudo-random order. This order can be different each time a
523 root 1.17 hash is encoded.
524 root 1.2
525 root 1.4 Currently, tied hashes will use the indefinite-length format, while
526     normal hashes will use the fixed-length format.
527    
528 root 1.2 array references
529 root 1.4 Perl array references become fixed-length CBOR arrays.
530 root 1.2
531     other references
532 root 1.10 Other unblessed references will be represented using the indirection
533     tag extension (tag value 22098,
534     <http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
535     to be able to decode these values somehow, by either "doing the
536     right thing", decoding into a generic tagged object, simply ignoring
537     the tag, or something else.
538 root 1.4
539     CBOR::XS::Tagged objects
540     Objects of this type must be arrays consisting of a single "[tag,
541     value]" pair. The (numerical) tag will be encoded as a CBOR tag, the
542 root 1.10 value will be encoded as appropriate for the value. You must use
543 root 1.7 "CBOR::XS::tag" to create such objects.
544 root 1.2
545 root 1.5 Types::Serialiser::true, Types::Serialiser::false,
546     Types::Serialiser::error
547     These special values become CBOR true, CBOR false and CBOR undefined
548     values, respectively. You can also use "\1", "\0" and "\undef"
549     directly if you want.
550    
551     other blessed objects
552     Other blessed objects are serialised via "TO_CBOR" or "FREEZE". See
553 root 1.9 "TAG HANDLING AND EXTENSIONS" for specific classes handled by this
554     module, and "OBJECT SERIALISATION" for generic object serialisation.
555 root 1.2
556     simple scalars
557 root 1.9 Simple Perl scalars (any scalar that is not a reference) are the
558     most difficult objects to encode: CBOR::XS will encode undefined
559 root 1.4 scalars as CBOR null values, scalars that have last been used in a
560 root 1.2 string context before encoding as CBOR strings, and anything else as
561     number value:
562    
563     # dump as number
564     encode_cbor [2] # yields [2]
565     encode_cbor [-3.0e17] # yields [-3e+17]
566     my $value = 5; encode_cbor [$value] # yields [5]
567    
568 root 1.10 # used as string, so dump as string (either byte or text)
569 root 1.2 print $value;
570     encode_cbor [$value] # yields ["5"]
571    
572     # undef becomes null
573     encode_cbor [undef] # yields [null]
574    
575     You can force the type to be a CBOR string by stringifying it:
576    
577     my $x = 3.1; # some variable containing a number
578     "$x"; # stringified
579     $x .= ""; # another, more awkward way to stringify
580     print $x; # perl does it for you, too, quite often
581    
582 root 1.17 You can force whether a string is encoded as byte or text string by
583     using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
584     disabled):
585 root 1.10
586     utf8::upgrade $x; # encode $x as text string
587     utf8::downgrade $x; # encode $x as byte string
588    
589     Perl doesn't define what operations up- and downgrade strings, so if
590     the difference between byte and text is important, you should up- or
591 root 1.17 downgrade your string as late as possible before encoding. You can
592     also force the use of CBOR text strings by using "text_keys" or
593     "text_strings".
594 root 1.10
595 root 1.2 You can force the type to be a CBOR number by numifying it:
596    
597     my $x = "3"; # some variable containing a string
598     $x += 0; # numify it, ensuring it will be dumped as a number
599     $x *= 1; # same thing, the choice is yours.
600    
601     You can not currently force the type in other, less obscure, ways.
602     Tell me if you need this capability (but don't forget to explain why
603     it's needed :).
604    
605 root 1.4 Perl values that seem to be integers generally use the shortest
606     possible representation. Floating-point values will use either the
607     IEEE single format if possible without loss of precision, otherwise
608     the IEEE double format will be used. Perls that use formats other
609     than IEEE double to represent numerical values are supported, but
610     might suffer loss of precision.
611 root 1.2
612 root 1.5 OBJECT SERIALISATION
613 root 1.11 This module implements both a CBOR-specific and the generic
614     Types::Serialier object serialisation protocol. The following
615     subsections explain both methods.
616    
617     ENCODING
618 root 1.5 This module knows two way to serialise a Perl object: The CBOR-specific
619     way, and the generic way.
620    
621 root 1.11 Whenever the encoder encounters a Perl object that it cannot serialise
622 root 1.5 directly (most of them), it will first look up the "TO_CBOR" method on
623     it.
624    
625     If it has a "TO_CBOR" method, it will call it with the object as only
626     argument, and expects exactly one return value, which it will then
627     substitute and encode it in the place of the object.
628    
629     Otherwise, it will look up the "FREEZE" method. If it exists, it will
630     call it with the object as first argument, and the constant string
631     "CBOR" as the second argument, to distinguish it from other serialisers.
632    
633     The "FREEZE" method can return any number of values (i.e. zero or more).
634     These will be encoded as CBOR perl object, together with the classname.
635    
636 root 1.11 These methods *MUST NOT* change the data structure that is being
637     serialised. Failure to comply to this can result in memory corruption -
638     and worse.
639    
640 root 1.5 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail
641     with an error.
642    
643 root 1.11 DECODING
644     Objects encoded via "TO_CBOR" cannot (normally) be automatically
645     decoded, but objects encoded via "FREEZE" can be decoded using the
646     following protocol:
647 root 1.5
648     When an encoded CBOR perl object is encountered by the decoder, it will
649     look up the "THAW" method, by using the stored classname, and will fail
650     if the method cannot be found.
651    
652     After the lookup it will call the "THAW" method with the stored
653     classname as first argument, the constant string "CBOR" as second
654     argument, and all values returned by "FREEZE" as remaining arguments.
655    
656     EXAMPLES
657     Here is an example "TO_CBOR" method:
658    
659     sub My::Object::TO_CBOR {
660     my ($obj) = @_;
661    
662     ["this is a serialised My::Object object", $obj->{id}]
663     }
664    
665     When a "My::Object" is encoded to CBOR, it will instead encode a simple
666     array with two members: a string, and the "object id". Decoding this
667     CBOR string will yield a normal perl array reference in place of the
668     object.
669    
670     A more useful and practical example would be a serialisation method for
671     the URI module. CBOR has a custom tag value for URIs, namely 32:
672    
673     sub URI::TO_CBOR {
674     my ($self) = @_;
675     my $uri = "$self"; # stringify uri
676     utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
677 root 1.10 CBOR::XS::tag 32, "$_[0]"
678 root 1.5 }
679    
680     This will encode URIs as a UTF-8 string with tag 32, which indicates an
681     URI.
682    
683     Decoding such an URI will not (currently) give you an URI object, but
684     instead a CBOR::XS::Tagged object with tag number 32 and the string -
685     exactly what was returned by "TO_CBOR".
686    
687     To serialise an object so it can automatically be deserialised, you need
688     to use "FREEZE" and "THAW". To take the URI module as example, this
689     would be a possible implementation:
690    
691     sub URI::FREEZE {
692     my ($self, $serialiser) = @_;
693     "$self" # encode url string
694     }
695    
696     sub URI::THAW {
697     my ($class, $serialiser, $uri) = @_;
698     $class->new ($uri)
699     }
700    
701     Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For
702     example, a "FREEZE" method that returns "type", "id" and "variant"
703     values would cause an invocation of "THAW" with 5 arguments:
704    
705     sub My::Object::FREEZE {
706     my ($self, $serialiser) = @_;
707    
708     ($self->{type}, $self->{id}, $self->{variant})
709     }
710    
711     sub My::Object::THAW {
712     my ($class, $serialiser, $type, $id, $variant) = @_;
713    
714     $class-<new (type => $type, id => $id, variant => $variant)
715     }
716    
717     MAGIC HEADER
718 root 1.3 There is no way to distinguish CBOR from other formats programmatically.
719     To make it easier to distinguish CBOR from other formats, the CBOR
720     specification has a special "magic string" that can be prepended to any
721 root 1.9 CBOR string without changing its meaning.
722 root 1.3
723     This string is available as $CBOR::XS::MAGIC. This module does not
724 root 1.9 prepend this string to the CBOR data it generates, but it will ignore it
725 root 1.3 if present, so users can prepend this string as a "file type" indicator
726     as required.
727    
728 root 1.7 THE CBOR::XS::Tagged CLASS
729     CBOR has the concept of tagged values - any CBOR value can be tagged
730     with a numeric 64 bit number, which are centrally administered.
731    
732     "CBOR::XS" handles a few tags internally when en- or decoding. You can
733     also create tags yourself by encoding "CBOR::XS::Tagged" objects, and
734     the decoder will create "CBOR::XS::Tagged" objects itself when it hits
735     an unknown tag.
736    
737     These objects are simply blessed array references - the first member of
738     the array being the numerical tag, the second being the value.
739    
740     You can interact with "CBOR::XS::Tagged" objects in the following ways:
741    
742     $tagged = CBOR::XS::tag $tag, $value
743     This function(!) creates a new "CBOR::XS::Tagged" object using the
744     given $tag (0..2**64-1) to tag the given $value (which can be any
745     Perl value that can be encoded in CBOR, including serialisable Perl
746     objects and "CBOR::XS::Tagged" objects).
747    
748     $tagged->[0]
749     $tagged->[0] = $new_tag
750     $tag = $tagged->tag
751     $new_tag = $tagged->tag ($new_tag)
752     Access/mutate the tag.
753    
754     $tagged->[1]
755     $tagged->[1] = $new_value
756     $value = $tagged->value
757     $new_value = $tagged->value ($new_value)
758     Access/mutate the tagged value.
759    
760     EXAMPLES
761     Here are some examples of "CBOR::XS::Tagged" uses to tag objects.
762    
763     You can look up CBOR tag value and emanings in the IANA registry at
764     <http://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>.
765    
766     Prepend a magic header ($CBOR::XS::MAGIC):
767    
768     my $cbor = encode_cbor CBOR::XS::tag 55799, $value;
769     # same as:
770     my $cbor = $CBOR::XS::MAGIC . encode_cbor $value;
771    
772     Serialise some URIs and a regex in an array:
773    
774     my $cbor = encode_cbor [
775     (CBOR::XS::tag 32, "http://www.nethype.de/"),
776     (CBOR::XS::tag 32, "http://software.schmorp.de/"),
777     (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"),
778     ];
779    
780     Wrap CBOR data in CBOR:
781    
782     my $cbor_cbor = encode_cbor
783     CBOR::XS::tag 24,
784     encode_cbor [1, 2, 3];
785    
786 root 1.9 TAG HANDLING AND EXTENSIONS
787     This section describes how this module handles specific tagged values
788     and extensions. If a tag is not mentioned here and no additional filters
789     are provided for it, then the default handling applies (creating a
790     CBOR::XS::Tagged object on decoding, and only encoding the tag when
791     explicitly requested).
792    
793     Tags not handled specifically are currently converted into a
794     CBOR::XS::Tagged object, which is simply a blessed array reference
795     consisting of the numeric tag value followed by the (decoded) CBOR
796     value.
797    
798     Future versions of this module reserve the right to special case
799     additional tags (such as base64url).
800    
801     ENFORCED TAGS
802     These tags are always handled when decoding, and their handling cannot
803 root 1.17 be overridden by the user.
804 root 1.9
805 root 1.10 26 (perl-object, <http://cbor.schmorp.de/perl-object>)
806 root 1.9 These tags are automatically created (and decoded) for serialisable
807     objects using the "FREEZE/THAW" methods (the Types::Serialier object
808     serialisation protocol). See "OBJECT SERIALISATION" for details.
809    
810 root 1.16 28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
811 root 1.11 These tags are automatically decoded when encountered (and they do
812     not result in a cyclic data structure, see "allow_cycles"),
813     resulting in shared values in the decoded object. They are only
814     encoded, however, when "allow_sharing" is enabled.
815    
816     Not all shared values can be successfully decoded: values that
817     reference themselves will *currently* decode as "undef" (this is not
818     the same as a reference pointing to itself, which will be
819     represented as a value that contains an indirect reference to itself
820     - these will be decoded properly).
821    
822     Note that considerably more shared value data structures can be
823     decoded than will be encoded - currently, only values pointed to by
824     references will be shared, others will not. While non-reference
825     shared values can be generated in Perl with some effort, they were
826     considered too unimportant to be supported in the encoder. The
827     decoder, however, will decode these values as shared values.
828 root 1.9
829 root 1.16 256, 25 (stringref-namespace, stringref,
830 root 1.9 <http://cbor.schmorp.de/stringref>)
831     These tags are automatically decoded when encountered. They are only
832 root 1.10 encoded, however, when "pack_strings" is enabled.
833 root 1.9
834     22098 (indirection, <http://cbor.schmorp.de/indirection>)
835     This tag is automatically generated when a reference are encountered
836 root 1.17 (with the exception of hash and array references). It is converted
837     to a reference when decoding.
838 root 1.9
839     55799 (self-describe CBOR, RFC 7049)
840     This value is not generated on encoding (unless explicitly requested
841     by the user), and is simply ignored when decoding.
842    
843     NON-ENFORCED TAGS
844     These tags have default filters provided when decoding. Their handling
845 root 1.17 can be overridden by changing the %CBOR::XS::FILTER entry for the tag,
846     or by providing a custom "filter" callback when decoding.
847 root 1.9
848     When they result in decoding into a specific Perl class, the module
849     usually provides a corresponding "TO_CBOR" method as well.
850    
851     When any of these need to load additional modules that are not part of
852     the perl core distribution (e.g. URI), it is (currently) up to the user
853     to provide these modules. The decoding usually fails with an exception
854     if the required module cannot be loaded.
855    
856 root 1.12 0, 1 (date/time string, seconds since the epoch)
857     These tags are decoded into Time::Piece objects. The corresponding
858     "Time::Piece::TO_CBOR" method always encodes into tag 1 values
859     currently.
860    
861     The Time::Piece API is generally surprisingly bad, and fractional
862     seconds are only accidentally kept intact, so watch out. On the plus
863     side, the module comes with perl since 5.10, which has to count for
864     something.
865    
866 root 1.9 2, 3 (positive/negative bignum)
867     These tags are decoded into Math::BigInt objects. The corresponding
868     "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
869     CBOR integers, and others into positive/negative CBOR bignums.
870    
871 root 1.17 4, 5, 264, 265 (decimal fraction/bigfloat)
872 root 1.9 Both decimal fractions and bigfloats are decoded into Math::BigFloat
873     objects. The corresponding "Math::BigFloat::TO_CBOR" method *always*
874 root 1.17 encodes into a decimal fraction (either tag 4 or 264).
875    
876     NaN and infinities are not encoded properly, as they cannot be
877     represented in CBOR.
878 root 1.9
879 root 1.17 See "BIGNUM SECURITY CONSIDERATIONS" for more info.
880 root 1.9
881 root 1.17 30 (rational numbers)
882     These tags are decoded into Math::BigRat objects. The corresponding
883     "Math::BigRat::TO_CBOR" method encodes rational numbers with
884     denominator 1 via their numerator only, i.e., they become normal
885     integers or "bignums".
886    
887     See "BIGNUM SECURITY CONSIDERATIONS" for more info.
888 root 1.9
889     21, 22, 23 (expected later JSON conversion)
890     CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore
891     these tags.
892    
893     32 (URI)
894     These objects decode into URI objects. The corresponding
895     "URI::TO_CBOR" method again results in a CBOR URI value.
896    
897 root 1.5 CBOR and JSON
898 root 1.4 CBOR is supposed to implement a superset of the JSON data model, and is,
899     with some coercion, able to represent all JSON texts (something that
900     other "binary JSON" formats such as BSON generally do not support).
901    
902     CBOR implements some extra hints and support for JSON interoperability,
903     and the spec offers further guidance for conversion between CBOR and
904     JSON. None of this is currently implemented in CBOR, and the guidelines
905     in the spec do not result in correct round-tripping of data. If JSON
906     interoperability is improved in the future, then the goal will be to
907     ensure that decoded JSON data will round-trip encoding and decoding to
908     CBOR intact.
909 root 1.2
910     SECURITY CONSIDERATIONS
911 root 1.18 Tl;dr... if you want to decode or encode CBOR from untrusted sources,
912     you should start with a coder object created via "new_safe":
913    
914     my $coder = CBOR::XS->new_safe;
915    
916     my $data = $coder->decode ($cbor_text);
917     my $cbor = $coder->encode ($data);
918 root 1.2
919 root 1.18 Longer version: When you are using CBOR in a protocol, talking to
920     untrusted potentially hostile creatures requires some thought:
921    
922     Security of the CBOR decoder itself
923     First and foremost, your CBOR decoder should be secure, that is,
924     should not have any buffer overflows or similar bugs that could
925     potentially be exploited. Obviously, this module should ensure that
926     and I am trying hard on making that true, but you never know.
927    
928     CBOR::XS can invoke almost arbitrary callbacks during decoding
929     CBOR::XS supports object serialisation - decoding CBOR can cause
930     calls to *any* "THAW" method in *any* package that exists in your
931     process (that is, CBOR::XS will not try to load modules, but any
932     existing "THAW" method or function can be called, so they all have
933     to be secure).
934    
935     Less obviously, it will also invoke "TO_CBOR" and "FREEZE" methods -
936     even if all your "THAW" methods are secure, encoding data structures
937     from untrusted sources can invoke those and trigger bugs in those.
938    
939     So, if you are not sure about the security of all the modules you
940     have loaded (you shouldn't), you should disable this part using
941     "forbid_objects".
942    
943     CBOR can be extended with tags that call library code
944     CBOR can be extended with tags, and "CBOR::XS" has a registry of
945     conversion functions for many existing tags that can be extended via
946     third-party modules (see the "filter" method).
947    
948     If you don't trust these, you should configure the "safe" filter
949     function, "CBOR::XS::safe_filter", which by default only includes
950     conversion functions that are considered "safe" by the author (but
951     again, they can be extended by third party modules).
952    
953     Depending on your level of paranoia, you can use the "safe" filter:
954    
955     $cbor->filter (\&CBOR::XS::safe_filter);
956    
957     ... your own filter...
958    
959     $cbor->filter (sub { ... do your stuffs here ... });
960    
961     ... or even no filter at all, disabling all tag decoding:
962    
963     $cbor->filter (sub { });
964    
965     This is never a problem for encoding, as the tag mechanism only
966     exists in CBOR texts.
967    
968     Resource-starving attacks: object memory usage
969     You need to avoid resource-starving attacks. That means you should
970     limit the size of CBOR data you accept, or make sure then when your
971     resources run out, that's just fine (e.g. by using a separate
972     process that can crash safely). The size of a CBOR string in octets
973     is usually a good indication of the size of the resources required
974     to decode it into a Perl structure. While CBOR::XS can check the
975     size of the CBOR text (using "max_size"), it might be too late when
976     you already have it in memory, so you might want to check the size
977     before you accept the string.
978    
979     As for encoding, it is possible to construct data structures that
980     are relatively small but result in large CBOR texts (for example by
981     having an array full of references to the same big data structure,
982     which will all be deep-cloned during encoding by default). This is
983     rarely an actual issue (and the worst case is still just running out
984     of memory), but you can reduce this risk by using "allow_sharing".
985    
986     Resource-starving attacks: stack overflows
987     CBOR::XS recurses using the C stack when decoding objects and
988     arrays. The C stack is a limited resource: for instance, on my amd64
989     machine with 8MB of stack size I can decode around 180k nested
990     arrays but only 14k nested CBOR objects (due to perl itself
991     recursing deeply on croak to free the temporary). If that is
992     exceeded, the program crashes. To be conservative, the default
993     nesting limit is set to 512. If your process has a smaller stack,
994     you should adjust this setting accordingly with the "max_depth"
995     method.
996    
997     Resource-starving attacks: CPU en-/decoding complexity
998     CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat
999     libraries to represent encode/decode bignums. These can be very slow
1000     (as in, centuries of CPU time) and can even crash your program (and
1001     are generally not very trustworthy). See the next section for
1002     details.
1003    
1004     Data breaches: leaking information in error messages
1005     CBOR::XS might leak contents of your Perl data structures in its
1006     error messages, so when you serialise sensitive information you
1007     might want to make sure that exceptions thrown by CBOR::XS will not
1008     end up in front of untrusted eyes.
1009    
1010     Something else...
1011     Something else could bomb you, too, that I forgot to think of. In
1012     that case, you get to keep the pieces. I am always open for hints,
1013     though...
1014 root 1.2
1015 root 1.17 BIGNUM SECURITY CONSIDERATIONS
1016     CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and
1017     Math::BigFloat that tries to encode the number in the simplest possible
1018     way, that is, either a CBOR integer, a CBOR bigint/decimal fraction (tag
1019     4) or an arbitrary-exponent decimal fraction (tag 264). Rational numbers
1020     (Math::BigRat, tag 30) can also contain bignums as members.
1021    
1022     CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
1023     bigfloats (tags 5 and 265), but it will never generate these on its own.
1024    
1025     Using the built-in Math::BigInt::Calc support, encoding and decoding
1026     decimal fractions is generally fast. Decoding bigints can be slow for
1027     very big numbers (tens of thousands of digits, something that could
1028     potentially be caught by limiting the size of CBOR texts), and decoding
1029     bigfloats or arbitrary-exponent bigfloats can be *extremely* slow
1030     (minutes, decades) for large exponents (roughly 40 bit and longer).
1031    
1032     Additionally, Math::BigInt can take advantage of other bignum libraries,
1033     such as Math::GMP, which cannot handle big floats with large exponents,
1034     and might simply abort or crash your program, due to their code quality.
1035    
1036     This can be a concern if you want to parse untrusted CBOR. If it is, you
1037     might want to disable decoding of tag 2 (bigint) and 3 (negative bigint)
1038     types. You should also disable types 5 and 265, as these can be slow
1039     even without bigints.
1040    
1041     Disabling bigints will also partially or fully disable types that rely
1042     on them, e.g. rational numbers that use bignums.
1043    
1044 root 1.2 CBOR IMPLEMENTATION NOTES
1045     This section contains some random implementation notes. They do not
1046     describe guaranteed behaviour, but merely behaviour as-is implemented
1047     right now.
1048    
1049     64 bit integers are only properly decoded when Perl was built with 64
1050     bit support.
1051    
1052     Strings and arrays are encoded with a definite length. Hashes as well,
1053     unless they are tied (or otherwise magical).
1054    
1055     Only the double data type is supported for NV data types - when Perl
1056     uses long double to represent floating point values, they might not be
1057     encoded properly. Half precision types are accepted, but not encoded.
1058    
1059     Strict mode and canonical mode are not implemented.
1060    
1061 root 1.11 LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
1062     On perls that were built without 64 bit integer support (these are rare
1063 root 1.15 nowadays, even on 32 bit architectures, as all major Perl distributions
1064     are built with 64 bit integer support), support for any kind of 64 bit
1065 root 1.11 integer in CBOR is very limited - most likely, these 64 bit values will
1066     be truncated, corrupted, or otherwise not decoded correctly. This also
1067     includes string, array and map sizes that are stored as 64 bit integers.
1068    
1069 root 1.2 THREADS
1070     This module is *not* guaranteed to be thread safe and there are no plans
1071     to change this until Perl gets thread support (as opposed to the
1072     horribly slow so-called "threads" which are simply slow and bloated
1073     process simulations - use fork, it's *much* faster, cheaper, better).
1074    
1075     (It might actually work, but you have been warned).
1076    
1077     BUGS
1078     While the goal of this module is to be correct, that unfortunately does
1079     not mean it's bug-free, only that I think its design is bug-free. If you
1080     keep reporting bugs they will be fixed swiftly, though.
1081    
1082     Please refrain from using rt.cpan.org or any other bug reporting
1083     service. I put the contact address into my modules for a reason.
1084    
1085     SEE ALSO
1086     The JSON and JSON::XS modules that do similar, but human-readable,
1087     serialisation.
1088    
1089 root 1.5 The Types::Serialiser module provides the data model for true, false and
1090     error values.
1091    
1092 root 1.2 AUTHOR
1093     Marc Lehmann <schmorp@schmorp.de>
1094     http://home.schmorp.de/
1095