… | |
… | |
44 | about 20% smaller than the same data encoded as (compact) JSON or |
44 | about 20% smaller than the same data encoded as (compact) JSON or |
45 | Storable. |
45 | Storable. |
46 | |
46 | |
47 | In addition to the core CBOR data format, this module implements a |
47 | In addition to the core CBOR data format, this module implements a |
48 | number of extensions, to support cyclic and shared data structures (see |
48 | number of extensions, to support cyclic and shared data structures (see |
49 | "allow_sharing"), string deduplication (see "pack_strings") and scalar |
49 | "allow_sharing" and "allow_cycles"), string deduplication (see |
50 | references (always enabled). |
50 | "pack_strings") and scalar references (always enabled). |
51 | |
51 | |
52 | The primary goal of this module is to be *correct* and the secondary |
52 | The primary goal of this module is to be *correct* and the secondary |
53 | goal is to be *fast*. To reach the latter goal it was written in C. |
53 | goal is to be *fast*. To reach the latter goal it was written in C. |
54 | |
54 | |
55 | See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and |
55 | See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and |
… | |
… | |
141 | instead will emit a reference to the earlier value. |
141 | instead will emit a reference to the earlier value. |
142 | |
142 | |
143 | This means that such values will only be encoded once, and will not |
143 | This means that such values will only be encoded once, and will not |
144 | result in a deep cloning of the value on decode, in decoders |
144 | result in a deep cloning of the value on decode, in decoders |
145 | supporting the value sharing extension. This also makes it possible |
145 | supporting the value sharing extension. This also makes it possible |
146 | to encode cyclic data structures. |
146 | to encode cyclic data structures (which need "allow_cycles" to ne |
|
|
147 | enabled to be decoded by this module). |
147 | |
148 | |
148 | It is recommended to leave it off unless you know your communication |
149 | It is recommended to leave it off unless you know your communication |
149 | partner supports the value sharing extensions to CBOR |
150 | partner supports the value sharing extensions to CBOR |
150 | (<http://cbor.schmorp.de/value-sharing>), as without decoder |
151 | (<http://cbor.schmorp.de/value-sharing>), as without decoder |
151 | support, the resulting data structure might be unusable. |
152 | support, the resulting data structure might be unusable. |
152 | |
153 | |
153 | Detecting shared values incurs a runtime overhead when values are |
154 | Detecting shared values incurs a runtime overhead when values are |
154 | encoded that have a reference counter large than one, and might |
155 | encoded that have a reference counter large than one, and might |
155 | unnecessarily increase the encoded size, as potentially shared |
156 | unnecessarily increase the encoded size, as potentially shared |
156 | values are encode as sharable whether or not they are actually |
157 | values are encode as shareable whether or not they are actually |
157 | shared. |
158 | shared. |
158 | |
159 | |
159 | At the moment, only targets of references can be shared (e.g. |
160 | At the moment, only targets of references can be shared (e.g. |
160 | scalars, arrays or hashes pointed to by a reference). Weirder |
161 | scalars, arrays or hashes pointed to by a reference). Weirder |
161 | constructs, such as an array with multiple "copies" of the *same* |
162 | constructs, such as an array with multiple "copies" of the *same* |
… | |
… | |
166 | data structures repeatedly, unsharing them in the process. Cyclic |
167 | data structures repeatedly, unsharing them in the process. Cyclic |
167 | data structures cannot be encoded in this mode. |
168 | data structures cannot be encoded in this mode. |
168 | |
169 | |
169 | This option does not affect "decode" in any way - shared values and |
170 | This option does not affect "decode" in any way - shared values and |
170 | references will always be decoded properly if present. |
171 | references will always be decoded properly if present. |
|
|
172 | |
|
|
173 | $cbor = $cbor->allow_cycles ([$enable]) |
|
|
174 | $enabled = $cbor->get_allow_cycles |
|
|
175 | If $enable is true (or missing), then "decode" will happily decode |
|
|
176 | self-referential (cyclic) data structures. By default these will not |
|
|
177 | be decoded, as they need manual cleanup to avoid memory leaks, so |
|
|
178 | code that isn't prepared for this will not leak memory. |
|
|
179 | |
|
|
180 | If $enable is false (the default), then "decode" will throw an error |
|
|
181 | when it encounters a self-referential/cyclic data structure. |
|
|
182 | |
|
|
183 | FUTURE DIRECTION: the motivation behind this option is to avoid |
|
|
184 | *real* cycles - future versions of this module might chose to decode |
|
|
185 | cyclic data structures using weak references when this option is |
|
|
186 | off, instead of throwing an error. |
|
|
187 | |
|
|
188 | This option does not affect "encode" in any way - shared values and |
|
|
189 | references will always be encoded properly if present. |
171 | |
190 | |
172 | $cbor = $cbor->pack_strings ([$enable]) |
191 | $cbor = $cbor->pack_strings ([$enable]) |
173 | $enabled = $cbor->get_pack_strings |
192 | $enabled = $cbor->get_pack_strings |
174 | If $enable is true (or missing), then "encode" will try not to |
193 | If $enable is true (or missing), then "encode" will try not to |
175 | encode the same string twice, but will instead encode a reference to |
194 | encode the same string twice, but will instead encode a reference to |
… | |
… | |
186 | the standard CBOR way. |
205 | the standard CBOR way. |
187 | |
206 | |
188 | This option does not affect "decode" in any way - string references |
207 | This option does not affect "decode" in any way - string references |
189 | will always be decoded properly if present. |
208 | will always be decoded properly if present. |
190 | |
209 | |
|
|
210 | $cbor = $cbor->text_keys ([$enable]) |
|
|
211 | $enabled = $cbor->get_text_keys |
|
|
212 | If $enabled is true (or missing), then "encode" will encode all perl |
|
|
213 | hash keys as CBOR text strings/UTF-8 string, upgrading them as |
|
|
214 | needed. |
|
|
215 | |
|
|
216 | If $enable is false (the default), then "encode" will encode hash |
|
|
217 | keys normally - upgraded perl strings (strings internally encoded as |
|
|
218 | UTF-8) as CBOR text strings, and downgraded perl strings as CBOR |
|
|
219 | byte strings. |
|
|
220 | |
|
|
221 | This option does not affect "decode" in any way. |
|
|
222 | |
|
|
223 | This option is useful for interoperability with CBOR decoders that |
|
|
224 | don't treat byte strings as a form of text. It is especially useful |
|
|
225 | as Perl gives very little control over hash keys. |
|
|
226 | |
|
|
227 | Enabling this option can be slow, as all downgraded hash keys that |
|
|
228 | are encoded need to be scanned and converted to UTF-8. |
|
|
229 | |
|
|
230 | $cbor = $cbor->text_strings ([$enable]) |
|
|
231 | $enabled = $cbor->get_text_strings |
|
|
232 | This option works similar to "text_keys", above, but works on all |
|
|
233 | strings (including hash keys), so "text_keys" has no further effect |
|
|
234 | after enabling "text_strings". |
|
|
235 | |
|
|
236 | If $enabled is true (or missing), then "encode" will encode all perl |
|
|
237 | strings as CBOR text strings/UTF-8 strings, upgrading them as |
|
|
238 | needed. |
|
|
239 | |
|
|
240 | If $enable is false (the default), then "encode" will encode strings |
|
|
241 | normally (but see "text_keys") - upgraded perl strings (strings |
|
|
242 | internally encoded as UTF-8) as CBOR text strings, and downgraded |
|
|
243 | perl strings as CBOR byte strings. |
|
|
244 | |
|
|
245 | This option does not affect "decode" in any way. |
|
|
246 | |
|
|
247 | This option has similar advantages and disadvantages as "text_keys". |
|
|
248 | In addition, this option effectively removes the ability to encode |
|
|
249 | byte strings, which might break some "FREEZE" and "TO_CBOR" methods |
|
|
250 | that rely on this, such as bignum encoding, so this option is mainly |
|
|
251 | useful for very simple data. |
|
|
252 | |
|
|
253 | $cbor = $cbor->validate_utf8 ([$enable]) |
|
|
254 | $enabled = $cbor->get_validate_utf8 |
|
|
255 | If $enable is true (or missing), then "decode" will validate that |
|
|
256 | elements (text strings) containing UTF-8 data in fact contain valid |
|
|
257 | UTF-8 data (instead of blindly accepting it). This validation |
|
|
258 | obviously takes extra time during decoding. |
|
|
259 | |
|
|
260 | The concept of "valid UTF-8" used is perl's concept, which is a |
|
|
261 | superset of the official UTF-8. |
|
|
262 | |
|
|
263 | If $enable is false (the default), then "decode" will blindly accept |
|
|
264 | UTF-8 data, marking them as valid UTF-8 in the resulting data |
|
|
265 | structure regardless of whether that's true or not. |
|
|
266 | |
|
|
267 | Perl isn't too happy about corrupted UTF-8 in strings, but should |
|
|
268 | generally not crash or do similarly evil things. Extensions might be |
|
|
269 | not so forgiving, so it's recommended to turn on this setting if you |
|
|
270 | receive untrusted CBOR. |
|
|
271 | |
|
|
272 | This option does not affect "encode" in any way - strings that are |
|
|
273 | supposedly valid UTF-8 will simply be dumped into the resulting CBOR |
|
|
274 | string without checking whether that is, in fact, true or not. |
|
|
275 | |
191 | $cbor = $cbor->filter ([$cb->($tag, $value)]) |
276 | $cbor = $cbor->filter ([$cb->($tag, $value)]) |
192 | $cb_or_undef = $cbor->get_filter |
277 | $cb_or_undef = $cbor->get_filter |
193 | Sets or replaces the tagged value decoding filter (when $cb is |
278 | Sets or replaces the tagged value decoding filter (when $cb is |
194 | specified) or clears the filter (if no argument or "undef" is |
279 | specified) or clears the filter (if no argument or "undef" is |
195 | provided). |
280 | provided). |
… | |
… | |
250 | the next one starts. |
335 | the next one starts. |
251 | |
336 | |
252 | CBOR::XS->new->decode_prefix ("......") |
337 | CBOR::XS->new->decode_prefix ("......") |
253 | => ("...", 3) |
338 | => ("...", 3) |
254 | |
339 | |
|
|
340 | INCREMENTAL PARSING |
|
|
341 | In some cases, there is the need for incremental parsing of JSON texts. |
|
|
342 | While this module always has to keep both CBOR text and resulting Perl |
|
|
343 | data structure in memory at one time, it does allow you to parse a CBOR |
|
|
344 | stream incrementally, using a similar to using "decode_prefix" to see if |
|
|
345 | a full CBOR object is available, but is much more efficient. |
|
|
346 | |
|
|
347 | It basically works by parsing as much of a CBOR string as possible - if |
|
|
348 | the CBOR data is not complete yet, the pasrer will remember where it |
|
|
349 | was, to be able to restart when more data has been accumulated. Once |
|
|
350 | enough data is available to either decode a complete CBOR value or raise |
|
|
351 | an error, a real decode will be attempted. |
|
|
352 | |
|
|
353 | A typical use case would be a network protocol that consists of sending |
|
|
354 | and receiving CBOR-encoded messages. The solution that works with CBOR |
|
|
355 | and about anything else is by prepending a length to every CBOR value, |
|
|
356 | so the receiver knows how many octets to read. More compact (and |
|
|
357 | slightly slower) would be to just send CBOR values back-to-back, as |
|
|
358 | "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit |
|
|
359 | length. |
|
|
360 | |
|
|
361 | The following methods help with this: |
|
|
362 | |
|
|
363 | @decoded = $cbor->incr_parse ($buffer) |
|
|
364 | This method attempts to decode exactly one CBOR value from the |
|
|
365 | beginning of the given $buffer. The value is removed from the |
|
|
366 | $buffer on success. When $buffer doesn't contain a complete value |
|
|
367 | yet, it returns nothing. Finally, when the $buffer doesn't start |
|
|
368 | with something that could ever be a valid CBOR value, it raises an |
|
|
369 | exception, just as "decode" would. In the latter case the decoder |
|
|
370 | state is undefined and must be reset before being able to parse |
|
|
371 | further. |
|
|
372 | |
|
|
373 | This method modifies the $buffer in place. When no CBOR value can be |
|
|
374 | decoded, the decoder stores the current string offset. On the next |
|
|
375 | call, continues decoding at the place where it stopped before. For |
|
|
376 | this to make sense, the $buffer must begin with the same octets as |
|
|
377 | on previous unsuccessful calls. |
|
|
378 | |
|
|
379 | You can call this method in scalar context, in which case it either |
|
|
380 | returns a decoded value or "undef". This makes it impossible to |
|
|
381 | distinguish between CBOR null values (which decode to "undef") and |
|
|
382 | an unsuccessful decode, which is often acceptable. |
|
|
383 | |
|
|
384 | @decoded = $cbor->incr_parse_multiple ($buffer) |
|
|
385 | Same as "incr_parse", but attempts to decode as many CBOR values as |
|
|
386 | possible in one go, instead of at most one. Calls to "incr_parse" |
|
|
387 | and "incr_parse_multiple" can be interleaved. |
|
|
388 | |
|
|
389 | $cbor->incr_reset |
|
|
390 | Resets the incremental decoder. This throws away any saved state, so |
|
|
391 | that subsequent calls to "incr_parse" or "incr_parse_multiple" start |
|
|
392 | to parse a new CBOR value from the beginning of the $buffer again. |
|
|
393 | |
|
|
394 | This method can be caled at any time, but it *must* be called if you |
|
|
395 | want to change your $buffer or there was a decoding error and you |
|
|
396 | want to reuse the $cbor object for future incremental parsings. |
|
|
397 | |
255 | MAPPING |
398 | MAPPING |
256 | This section describes how CBOR::XS maps Perl values to CBOR values and |
399 | This section describes how CBOR::XS maps Perl values to CBOR values and |
257 | vice versa. These mappings are designed to "do the right thing" in most |
400 | vice versa. These mappings are designed to "do the right thing" in most |
258 | circumstances automatically, preserving round-tripping characteristics |
401 | circumstances automatically, preserving round-tripping characteristics |
259 | (what you put in comes out as something equivalent). |
402 | (what you put in comes out as something equivalent). |
… | |
… | |
309 | |
452 | |
310 | hash references |
453 | hash references |
311 | Perl hash references become CBOR maps. As there is no inherent |
454 | Perl hash references become CBOR maps. As there is no inherent |
312 | ordering in hash keys (or CBOR maps), they will usually be encoded |
455 | ordering in hash keys (or CBOR maps), they will usually be encoded |
313 | in a pseudo-random order. This order can be different each time a |
456 | in a pseudo-random order. This order can be different each time a |
314 | hahs is encoded. |
457 | hash is encoded. |
315 | |
458 | |
316 | Currently, tied hashes will use the indefinite-length format, while |
459 | Currently, tied hashes will use the indefinite-length format, while |
317 | normal hashes will use the fixed-length format. |
460 | normal hashes will use the fixed-length format. |
318 | |
461 | |
319 | array references |
462 | array references |
… | |
… | |
368 | my $x = 3.1; # some variable containing a number |
511 | my $x = 3.1; # some variable containing a number |
369 | "$x"; # stringified |
512 | "$x"; # stringified |
370 | $x .= ""; # another, more awkward way to stringify |
513 | $x .= ""; # another, more awkward way to stringify |
371 | print $x; # perl does it for you, too, quite often |
514 | print $x; # perl does it for you, too, quite often |
372 | |
515 | |
373 | You can force whether a string ie encoded as byte or text string by |
516 | You can force whether a string is encoded as byte or text string by |
374 | using "utf8::upgrade" and "utf8::downgrade"): |
517 | using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is |
|
|
518 | disabled): |
375 | |
519 | |
376 | utf8::upgrade $x; # encode $x as text string |
520 | utf8::upgrade $x; # encode $x as text string |
377 | utf8::downgrade $x; # encode $x as byte string |
521 | utf8::downgrade $x; # encode $x as byte string |
378 | |
522 | |
379 | Perl doesn't define what operations up- and downgrade strings, so if |
523 | Perl doesn't define what operations up- and downgrade strings, so if |
380 | the difference between byte and text is important, you should up- or |
524 | the difference between byte and text is important, you should up- or |
381 | downgrade your string as late as possible before encoding. |
525 | downgrade your string as late as possible before encoding. You can |
|
|
526 | also force the use of CBOR text strings by using "text_keys" or |
|
|
527 | "text_strings". |
382 | |
528 | |
383 | You can force the type to be a CBOR number by numifying it: |
529 | You can force the type to be a CBOR number by numifying it: |
384 | |
530 | |
385 | my $x = "3"; # some variable containing a string |
531 | my $x = "3"; # some variable containing a string |
386 | $x += 0; # numify it, ensuring it will be dumped as a number |
532 | $x += 0; # numify it, ensuring it will be dumped as a number |
… | |
… | |
396 | the IEEE double format will be used. Perls that use formats other |
542 | the IEEE double format will be used. Perls that use formats other |
397 | than IEEE double to represent numerical values are supported, but |
543 | than IEEE double to represent numerical values are supported, but |
398 | might suffer loss of precision. |
544 | might suffer loss of precision. |
399 | |
545 | |
400 | OBJECT SERIALISATION |
546 | OBJECT SERIALISATION |
|
|
547 | This module implements both a CBOR-specific and the generic |
|
|
548 | Types::Serialier object serialisation protocol. The following |
|
|
549 | subsections explain both methods. |
|
|
550 | |
|
|
551 | ENCODING |
401 | This module knows two way to serialise a Perl object: The CBOR-specific |
552 | This module knows two way to serialise a Perl object: The CBOR-specific |
402 | way, and the generic way. |
553 | way, and the generic way. |
403 | |
554 | |
404 | Whenever the encoder encounters a Perl object that it cnanot serialise |
555 | Whenever the encoder encounters a Perl object that it cannot serialise |
405 | directly (most of them), it will first look up the "TO_CBOR" method on |
556 | directly (most of them), it will first look up the "TO_CBOR" method on |
406 | it. |
557 | it. |
407 | |
558 | |
408 | If it has a "TO_CBOR" method, it will call it with the object as only |
559 | If it has a "TO_CBOR" method, it will call it with the object as only |
409 | argument, and expects exactly one return value, which it will then |
560 | argument, and expects exactly one return value, which it will then |
… | |
… | |
414 | "CBOR" as the second argument, to distinguish it from other serialisers. |
565 | "CBOR" as the second argument, to distinguish it from other serialisers. |
415 | |
566 | |
416 | The "FREEZE" method can return any number of values (i.e. zero or more). |
567 | The "FREEZE" method can return any number of values (i.e. zero or more). |
417 | These will be encoded as CBOR perl object, together with the classname. |
568 | These will be encoded as CBOR perl object, together with the classname. |
418 | |
569 | |
|
|
570 | These methods *MUST NOT* change the data structure that is being |
|
|
571 | serialised. Failure to comply to this can result in memory corruption - |
|
|
572 | and worse. |
|
|
573 | |
419 | If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail |
574 | If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail |
420 | with an error. |
575 | with an error. |
421 | |
576 | |
|
|
577 | DECODING |
422 | Objects encoded via "TO_CBOR" cannot be automatically decoded, but |
578 | Objects encoded via "TO_CBOR" cannot (normally) be automatically |
423 | objects encoded via "FREEZE" can be decoded using the following |
579 | decoded, but objects encoded via "FREEZE" can be decoded using the |
424 | protocol: |
580 | following protocol: |
425 | |
581 | |
426 | When an encoded CBOR perl object is encountered by the decoder, it will |
582 | When an encoded CBOR perl object is encountered by the decoder, it will |
427 | look up the "THAW" method, by using the stored classname, and will fail |
583 | look up the "THAW" method, by using the stored classname, and will fail |
428 | if the method cannot be found. |
584 | if the method cannot be found. |
429 | |
585 | |
… | |
… | |
471 | "$self" # encode url string |
627 | "$self" # encode url string |
472 | } |
628 | } |
473 | |
629 | |
474 | sub URI::THAW { |
630 | sub URI::THAW { |
475 | my ($class, $serialiser, $uri) = @_; |
631 | my ($class, $serialiser, $uri) = @_; |
476 | |
|
|
477 | $class->new ($uri) |
632 | $class->new ($uri) |
478 | } |
633 | } |
479 | |
634 | |
480 | Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For |
635 | Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For |
481 | example, a "FREEZE" method that returns "type", "id" and "variant" |
636 | example, a "FREEZE" method that returns "type", "id" and "variant" |
… | |
… | |
577 | Future versions of this module reserve the right to special case |
732 | Future versions of this module reserve the right to special case |
578 | additional tags (such as base64url). |
733 | additional tags (such as base64url). |
579 | |
734 | |
580 | ENFORCED TAGS |
735 | ENFORCED TAGS |
581 | These tags are always handled when decoding, and their handling cannot |
736 | These tags are always handled when decoding, and their handling cannot |
582 | be overriden by the user. |
737 | be overridden by the user. |
583 | |
738 | |
584 | 26 (perl-object, <http://cbor.schmorp.de/perl-object>) |
739 | 26 (perl-object, <http://cbor.schmorp.de/perl-object>) |
585 | These tags are automatically created (and decoded) for serialisable |
740 | These tags are automatically created (and decoded) for serialisable |
586 | objects using the "FREEZE/THAW" methods (the Types::Serialier object |
741 | objects using the "FREEZE/THAW" methods (the Types::Serialier object |
587 | serialisation protocol). See "OBJECT SERIALISATION" for details. |
742 | serialisation protocol). See "OBJECT SERIALISATION" for details. |
588 | |
743 | |
589 | 28, 29 (sharable, sharedref, L <http://cbor.schmorp.de/value-sharing>) |
744 | 28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>) |
590 | These tags are automatically decoded when encountered, resulting in |
745 | These tags are automatically decoded when encountered (and they do |
|
|
746 | not result in a cyclic data structure, see "allow_cycles"), |
591 | shared values in the decoded object. They are only encoded, however, |
747 | resulting in shared values in the decoded object. They are only |
592 | when "allow_sharable" is enabled. |
748 | encoded, however, when "allow_sharing" is enabled. |
593 | |
749 | |
|
|
750 | Not all shared values can be successfully decoded: values that |
|
|
751 | reference themselves will *currently* decode as "undef" (this is not |
|
|
752 | the same as a reference pointing to itself, which will be |
|
|
753 | represented as a value that contains an indirect reference to itself |
|
|
754 | - these will be decoded properly). |
|
|
755 | |
|
|
756 | Note that considerably more shared value data structures can be |
|
|
757 | decoded than will be encoded - currently, only values pointed to by |
|
|
758 | references will be shared, others will not. While non-reference |
|
|
759 | shared values can be generated in Perl with some effort, they were |
|
|
760 | considered too unimportant to be supported in the encoder. The |
|
|
761 | decoder, however, will decode these values as shared values. |
|
|
762 | |
594 | 256, 25 (stringref-namespace, stringref, L |
763 | 256, 25 (stringref-namespace, stringref, |
595 | <http://cbor.schmorp.de/stringref>) |
764 | <http://cbor.schmorp.de/stringref>) |
596 | These tags are automatically decoded when encountered. They are only |
765 | These tags are automatically decoded when encountered. They are only |
597 | encoded, however, when "pack_strings" is enabled. |
766 | encoded, however, when "pack_strings" is enabled. |
598 | |
767 | |
599 | 22098 (indirection, <http://cbor.schmorp.de/indirection>) |
768 | 22098 (indirection, <http://cbor.schmorp.de/indirection>) |
600 | This tag is automatically generated when a reference are encountered |
769 | This tag is automatically generated when a reference are encountered |
601 | (with the exception of hash and array refernces). It is converted to |
770 | (with the exception of hash and array references). It is converted |
602 | a reference when decoding. |
771 | to a reference when decoding. |
603 | |
772 | |
604 | 55799 (self-describe CBOR, RFC 7049) |
773 | 55799 (self-describe CBOR, RFC 7049) |
605 | This value is not generated on encoding (unless explicitly requested |
774 | This value is not generated on encoding (unless explicitly requested |
606 | by the user), and is simply ignored when decoding. |
775 | by the user), and is simply ignored when decoding. |
607 | |
776 | |
608 | NON-ENFORCED TAGS |
777 | NON-ENFORCED TAGS |
609 | These tags have default filters provided when decoding. Their handling |
778 | These tags have default filters provided when decoding. Their handling |
610 | can be overriden by changing the %CBOR::XS::FILTER entry for the tag, or |
779 | can be overridden by changing the %CBOR::XS::FILTER entry for the tag, |
611 | by providing a custom "filter" callback when decoding. |
780 | or by providing a custom "filter" callback when decoding. |
612 | |
781 | |
613 | When they result in decoding into a specific Perl class, the module |
782 | When they result in decoding into a specific Perl class, the module |
614 | usually provides a corresponding "TO_CBOR" method as well. |
783 | usually provides a corresponding "TO_CBOR" method as well. |
615 | |
784 | |
616 | When any of these need to load additional modules that are not part of |
785 | When any of these need to load additional modules that are not part of |
617 | the perl core distribution (e.g. URI), it is (currently) up to the user |
786 | the perl core distribution (e.g. URI), it is (currently) up to the user |
618 | to provide these modules. The decoding usually fails with an exception |
787 | to provide these modules. The decoding usually fails with an exception |
619 | if the required module cannot be loaded. |
788 | if the required module cannot be loaded. |
620 | |
789 | |
|
|
790 | 0, 1 (date/time string, seconds since the epoch) |
|
|
791 | These tags are decoded into Time::Piece objects. The corresponding |
|
|
792 | "Time::Piece::TO_CBOR" method always encodes into tag 1 values |
|
|
793 | currently. |
|
|
794 | |
|
|
795 | The Time::Piece API is generally surprisingly bad, and fractional |
|
|
796 | seconds are only accidentally kept intact, so watch out. On the plus |
|
|
797 | side, the module comes with perl since 5.10, which has to count for |
|
|
798 | something. |
|
|
799 | |
621 | 2, 3 (positive/negative bignum) |
800 | 2, 3 (positive/negative bignum) |
622 | These tags are decoded into Math::BigInt objects. The corresponding |
801 | These tags are decoded into Math::BigInt objects. The corresponding |
623 | "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal |
802 | "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal |
624 | CBOR integers, and others into positive/negative CBOR bignums. |
803 | CBOR integers, and others into positive/negative CBOR bignums. |
625 | |
804 | |
626 | 4, 5 (decimal fraction/bigfloat) |
805 | 4, 5, 264, 265 (decimal fraction/bigfloat) |
627 | Both decimal fractions and bigfloats are decoded into Math::BigFloat |
806 | Both decimal fractions and bigfloats are decoded into Math::BigFloat |
628 | objects. The corresponding "Math::BigFloat::TO_CBOR" method *always* |
807 | objects. The corresponding "Math::BigFloat::TO_CBOR" method *always* |
629 | encodes into a decimal fraction. |
808 | encodes into a decimal fraction (either tag 4 or 264). |
630 | |
809 | |
631 | CBOR cannot represent bigfloats with *very* large exponents - |
|
|
632 | conversion of such big float objects is undefined. |
|
|
633 | |
|
|
634 | Also, NaN and infinities are not encoded properly. |
810 | NaN and infinities are not encoded properly, as they cannot be |
|
|
811 | represented in CBOR. |
|
|
812 | |
|
|
813 | See "BIGNUM SECURITY CONSIDERATIONS" for more info. |
|
|
814 | |
|
|
815 | 30 (rational numbers) |
|
|
816 | These tags are decoded into Math::BigRat objects. The corresponding |
|
|
817 | "Math::BigRat::TO_CBOR" method encodes rational numbers with |
|
|
818 | denominator 1 via their numerator only, i.e., they become normal |
|
|
819 | integers or "bignums". |
|
|
820 | |
|
|
821 | See "BIGNUM SECURITY CONSIDERATIONS" for more info. |
635 | |
822 | |
636 | 21, 22, 23 (expected later JSON conversion) |
823 | 21, 22, 23 (expected later JSON conversion) |
637 | CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore |
824 | CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore |
638 | these tags. |
825 | these tags. |
639 | |
826 | |
… | |
… | |
686 | Also keep in mind that CBOR::XS might leak contents of your Perl data |
873 | Also keep in mind that CBOR::XS might leak contents of your Perl data |
687 | structures in its error messages, so when you serialise sensitive |
874 | structures in its error messages, so when you serialise sensitive |
688 | information you might want to make sure that exceptions thrown by |
875 | information you might want to make sure that exceptions thrown by |
689 | CBOR::XS will not end up in front of untrusted eyes. |
876 | CBOR::XS will not end up in front of untrusted eyes. |
690 | |
877 | |
|
|
878 | BIGNUM SECURITY CONSIDERATIONS |
|
|
879 | CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and |
|
|
880 | Math::BigFloat that tries to encode the number in the simplest possible |
|
|
881 | way, that is, either a CBOR integer, a CBOR bigint/decimal fraction (tag |
|
|
882 | 4) or an arbitrary-exponent decimal fraction (tag 264). Rational numbers |
|
|
883 | (Math::BigRat, tag 30) can also contain bignums as members. |
|
|
884 | |
|
|
885 | CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent |
|
|
886 | bigfloats (tags 5 and 265), but it will never generate these on its own. |
|
|
887 | |
|
|
888 | Using the built-in Math::BigInt::Calc support, encoding and decoding |
|
|
889 | decimal fractions is generally fast. Decoding bigints can be slow for |
|
|
890 | very big numbers (tens of thousands of digits, something that could |
|
|
891 | potentially be caught by limiting the size of CBOR texts), and decoding |
|
|
892 | bigfloats or arbitrary-exponent bigfloats can be *extremely* slow |
|
|
893 | (minutes, decades) for large exponents (roughly 40 bit and longer). |
|
|
894 | |
|
|
895 | Additionally, Math::BigInt can take advantage of other bignum libraries, |
|
|
896 | such as Math::GMP, which cannot handle big floats with large exponents, |
|
|
897 | and might simply abort or crash your program, due to their code quality. |
|
|
898 | |
|
|
899 | This can be a concern if you want to parse untrusted CBOR. If it is, you |
|
|
900 | might want to disable decoding of tag 2 (bigint) and 3 (negative bigint) |
|
|
901 | types. You should also disable types 5 and 265, as these can be slow |
|
|
902 | even without bigints. |
|
|
903 | |
|
|
904 | Disabling bigints will also partially or fully disable types that rely |
|
|
905 | on them, e.g. rational numbers that use bignums. |
|
|
906 | |
691 | CBOR IMPLEMENTATION NOTES |
907 | CBOR IMPLEMENTATION NOTES |
692 | This section contains some random implementation notes. They do not |
908 | This section contains some random implementation notes. They do not |
693 | describe guaranteed behaviour, but merely behaviour as-is implemented |
909 | describe guaranteed behaviour, but merely behaviour as-is implemented |
694 | right now. |
910 | right now. |
695 | |
911 | |
… | |
… | |
702 | Only the double data type is supported for NV data types - when Perl |
918 | Only the double data type is supported for NV data types - when Perl |
703 | uses long double to represent floating point values, they might not be |
919 | uses long double to represent floating point values, they might not be |
704 | encoded properly. Half precision types are accepted, but not encoded. |
920 | encoded properly. Half precision types are accepted, but not encoded. |
705 | |
921 | |
706 | Strict mode and canonical mode are not implemented. |
922 | Strict mode and canonical mode are not implemented. |
|
|
923 | |
|
|
924 | LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT |
|
|
925 | On perls that were built without 64 bit integer support (these are rare |
|
|
926 | nowadays, even on 32 bit architectures, as all major Perl distributions |
|
|
927 | are built with 64 bit integer support), support for any kind of 64 bit |
|
|
928 | integer in CBOR is very limited - most likely, these 64 bit values will |
|
|
929 | be truncated, corrupted, or otherwise not decoded correctly. This also |
|
|
930 | includes string, array and map sizes that are stored as 64 bit integers. |
707 | |
931 | |
708 | THREADS |
932 | THREADS |
709 | This module is *not* guaranteed to be thread safe and there are no plans |
933 | This module is *not* guaranteed to be thread safe and there are no plans |
710 | to change this until Perl gets thread support (as opposed to the |
934 | to change this until Perl gets thread support (as opposed to the |
711 | horribly slow so-called "threads" which are simply slow and bloated |
935 | horribly slow so-called "threads" which are simply slow and bloated |