ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/README
(Generate patch)

Comparing CBOR-XS/README (file contents):
Revision 1.12 by root, Sun Dec 1 17:10:42 2013 UTC vs.
Revision 1.17 by root, Wed Apr 27 09:40:18 2016 UTC

178 code that isn't prepared for this will not leak memory. 178 code that isn't prepared for this will not leak memory.
179 179
180 If $enable is false (the default), then "decode" will throw an error 180 If $enable is false (the default), then "decode" will throw an error
181 when it encounters a self-referential/cyclic data structure. 181 when it encounters a self-referential/cyclic data structure.
182 182
183 FUTURE DIRECTION: the motivation behind this option is to avoid
184 *real* cycles - future versions of this module might chose to decode
185 cyclic data structures using weak references when this option is
186 off, instead of throwing an error.
187
183 This option does not affect "encode" in any way - shared values and 188 This option does not affect "encode" in any way - shared values and
184 references will always be decoded properly if present. 189 references will always be encoded properly if present.
185 190
186 $cbor = $cbor->pack_strings ([$enable]) 191 $cbor = $cbor->pack_strings ([$enable])
187 $enabled = $cbor->get_pack_strings 192 $enabled = $cbor->get_pack_strings
188 If $enable is true (or missing), then "encode" will try not to 193 If $enable is true (or missing), then "encode" will try not to
189 encode the same string twice, but will instead encode a reference to 194 encode the same string twice, but will instead encode a reference to
200 the standard CBOR way. 205 the standard CBOR way.
201 206
202 This option does not affect "decode" in any way - string references 207 This option does not affect "decode" in any way - string references
203 will always be decoded properly if present. 208 will always be decoded properly if present.
204 209
210 $cbor = $cbor->text_keys ([$enable])
211 $enabled = $cbor->get_text_keys
212 If $enabled is true (or missing), then "encode" will encode all perl
213 hash keys as CBOR text strings/UTF-8 string, upgrading them as
214 needed.
215
216 If $enable is false (the default), then "encode" will encode hash
217 keys normally - upgraded perl strings (strings internally encoded as
218 UTF-8) as CBOR text strings, and downgraded perl strings as CBOR
219 byte strings.
220
221 This option does not affect "decode" in any way.
222
223 This option is useful for interoperability with CBOR decoders that
224 don't treat byte strings as a form of text. It is especially useful
225 as Perl gives very little control over hash keys.
226
227 Enabling this option can be slow, as all downgraded hash keys that
228 are encoded need to be scanned and converted to UTF-8.
229
230 $cbor = $cbor->text_strings ([$enable])
231 $enabled = $cbor->get_text_strings
232 This option works similar to "text_keys", above, but works on all
233 strings (including hash keys), so "text_keys" has no further effect
234 after enabling "text_strings".
235
236 If $enabled is true (or missing), then "encode" will encode all perl
237 strings as CBOR text strings/UTF-8 strings, upgrading them as
238 needed.
239
240 If $enable is false (the default), then "encode" will encode strings
241 normally (but see "text_keys") - upgraded perl strings (strings
242 internally encoded as UTF-8) as CBOR text strings, and downgraded
243 perl strings as CBOR byte strings.
244
245 This option does not affect "decode" in any way.
246
247 This option has similar advantages and disadvantages as "text_keys".
248 In addition, this option effectively removes the ability to encode
249 byte strings, which might break some "FREEZE" and "TO_CBOR" methods
250 that rely on this, such as bignum encoding, so this option is mainly
251 useful for very simple data.
252
205 $cbor = $cbor->validate_utf8 ([$enable]) 253 $cbor = $cbor->validate_utf8 ([$enable])
206 $enabled = $cbor->get_validate_utf8 254 $enabled = $cbor->get_validate_utf8
207 If $enable is true (or missing), then "decode" will validate that 255 If $enable is true (or missing), then "decode" will validate that
208 elements (text strings) containing UTF-8 data in fact contain valid 256 elements (text strings) containing UTF-8 data in fact contain valid
209 UTF-8 data (instead of blindly accepting it). This validation 257 UTF-8 data (instead of blindly accepting it). This validation
212 The concept of "valid UTF-8" used is perl's concept, which is a 260 The concept of "valid UTF-8" used is perl's concept, which is a
213 superset of the official UTF-8. 261 superset of the official UTF-8.
214 262
215 If $enable is false (the default), then "decode" will blindly accept 263 If $enable is false (the default), then "decode" will blindly accept
216 UTF-8 data, marking them as valid UTF-8 in the resulting data 264 UTF-8 data, marking them as valid UTF-8 in the resulting data
217 structure regardless of whether thats true or not. 265 structure regardless of whether that's true or not.
218 266
219 Perl isn't too happy about corrupted UTF-8 in strings, but should 267 Perl isn't too happy about corrupted UTF-8 in strings, but should
220 generally not crash or do similarly evil things. Extensions might be 268 generally not crash or do similarly evil things. Extensions might be
221 not so forgiving, so it's recommended to turn on this setting if you 269 not so forgiving, so it's recommended to turn on this setting if you
222 receive untrusted CBOR. 270 receive untrusted CBOR.
287 the next one starts. 335 the next one starts.
288 336
289 CBOR::XS->new->decode_prefix ("......") 337 CBOR::XS->new->decode_prefix ("......")
290 => ("...", 3) 338 => ("...", 3)
291 339
340 INCREMENTAL PARSING
341 In some cases, there is the need for incremental parsing of JSON texts.
342 While this module always has to keep both CBOR text and resulting Perl
343 data structure in memory at one time, it does allow you to parse a CBOR
344 stream incrementally, using a similar to using "decode_prefix" to see if
345 a full CBOR object is available, but is much more efficient.
346
347 It basically works by parsing as much of a CBOR string as possible - if
348 the CBOR data is not complete yet, the pasrer will remember where it
349 was, to be able to restart when more data has been accumulated. Once
350 enough data is available to either decode a complete CBOR value or raise
351 an error, a real decode will be attempted.
352
353 A typical use case would be a network protocol that consists of sending
354 and receiving CBOR-encoded messages. The solution that works with CBOR
355 and about anything else is by prepending a length to every CBOR value,
356 so the receiver knows how many octets to read. More compact (and
357 slightly slower) would be to just send CBOR values back-to-back, as
358 "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
359 length.
360
361 The following methods help with this:
362
363 @decoded = $cbor->incr_parse ($buffer)
364 This method attempts to decode exactly one CBOR value from the
365 beginning of the given $buffer. The value is removed from the
366 $buffer on success. When $buffer doesn't contain a complete value
367 yet, it returns nothing. Finally, when the $buffer doesn't start
368 with something that could ever be a valid CBOR value, it raises an
369 exception, just as "decode" would. In the latter case the decoder
370 state is undefined and must be reset before being able to parse
371 further.
372
373 This method modifies the $buffer in place. When no CBOR value can be
374 decoded, the decoder stores the current string offset. On the next
375 call, continues decoding at the place where it stopped before. For
376 this to make sense, the $buffer must begin with the same octets as
377 on previous unsuccessful calls.
378
379 You can call this method in scalar context, in which case it either
380 returns a decoded value or "undef". This makes it impossible to
381 distinguish between CBOR null values (which decode to "undef") and
382 an unsuccessful decode, which is often acceptable.
383
384 @decoded = $cbor->incr_parse_multiple ($buffer)
385 Same as "incr_parse", but attempts to decode as many CBOR values as
386 possible in one go, instead of at most one. Calls to "incr_parse"
387 and "incr_parse_multiple" can be interleaved.
388
389 $cbor->incr_reset
390 Resets the incremental decoder. This throws away any saved state, so
391 that subsequent calls to "incr_parse" or "incr_parse_multiple" start
392 to parse a new CBOR value from the beginning of the $buffer again.
393
394 This method can be caled at any time, but it *must* be called if you
395 want to change your $buffer or there was a decoding error and you
396 want to reuse the $cbor object for future incremental parsings.
397
292MAPPING 398MAPPING
293 This section describes how CBOR::XS maps Perl values to CBOR values and 399 This section describes how CBOR::XS maps Perl values to CBOR values and
294 vice versa. These mappings are designed to "do the right thing" in most 400 vice versa. These mappings are designed to "do the right thing" in most
295 circumstances automatically, preserving round-tripping characteristics 401 circumstances automatically, preserving round-tripping characteristics
296 (what you put in comes out as something equivalent). 402 (what you put in comes out as something equivalent).
346 452
347 hash references 453 hash references
348 Perl hash references become CBOR maps. As there is no inherent 454 Perl hash references become CBOR maps. As there is no inherent
349 ordering in hash keys (or CBOR maps), they will usually be encoded 455 ordering in hash keys (or CBOR maps), they will usually be encoded
350 in a pseudo-random order. This order can be different each time a 456 in a pseudo-random order. This order can be different each time a
351 hahs is encoded. 457 hash is encoded.
352 458
353 Currently, tied hashes will use the indefinite-length format, while 459 Currently, tied hashes will use the indefinite-length format, while
354 normal hashes will use the fixed-length format. 460 normal hashes will use the fixed-length format.
355 461
356 array references 462 array references
405 my $x = 3.1; # some variable containing a number 511 my $x = 3.1; # some variable containing a number
406 "$x"; # stringified 512 "$x"; # stringified
407 $x .= ""; # another, more awkward way to stringify 513 $x .= ""; # another, more awkward way to stringify
408 print $x; # perl does it for you, too, quite often 514 print $x; # perl does it for you, too, quite often
409 515
410 You can force whether a string ie encoded as byte or text string by 516 You can force whether a string is encoded as byte or text string by
411 using "utf8::upgrade" and "utf8::downgrade"): 517 using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
518 disabled):
412 519
413 utf8::upgrade $x; # encode $x as text string 520 utf8::upgrade $x; # encode $x as text string
414 utf8::downgrade $x; # encode $x as byte string 521 utf8::downgrade $x; # encode $x as byte string
415 522
416 Perl doesn't define what operations up- and downgrade strings, so if 523 Perl doesn't define what operations up- and downgrade strings, so if
417 the difference between byte and text is important, you should up- or 524 the difference between byte and text is important, you should up- or
418 downgrade your string as late as possible before encoding. 525 downgrade your string as late as possible before encoding. You can
526 also force the use of CBOR text strings by using "text_keys" or
527 "text_strings".
419 528
420 You can force the type to be a CBOR number by numifying it: 529 You can force the type to be a CBOR number by numifying it:
421 530
422 my $x = "3"; # some variable containing a string 531 my $x = "3"; # some variable containing a string
423 $x += 0; # numify it, ensuring it will be dumped as a number 532 $x += 0; # numify it, ensuring it will be dumped as a number
518 "$self" # encode url string 627 "$self" # encode url string
519 } 628 }
520 629
521 sub URI::THAW { 630 sub URI::THAW {
522 my ($class, $serialiser, $uri) = @_; 631 my ($class, $serialiser, $uri) = @_;
523
524 $class->new ($uri) 632 $class->new ($uri)
525 } 633 }
526 634
527 Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For 635 Unlike "TO_CBOR", multiple values can be returned by "FREEZE". For
528 example, a "FREEZE" method that returns "type", "id" and "variant" 636 example, a "FREEZE" method that returns "type", "id" and "variant"
624 Future versions of this module reserve the right to special case 732 Future versions of this module reserve the right to special case
625 additional tags (such as base64url). 733 additional tags (such as base64url).
626 734
627 ENFORCED TAGS 735 ENFORCED TAGS
628 These tags are always handled when decoding, and their handling cannot 736 These tags are always handled when decoding, and their handling cannot
629 be overriden by the user. 737 be overridden by the user.
630 738
631 26 (perl-object, <http://cbor.schmorp.de/perl-object>) 739 26 (perl-object, <http://cbor.schmorp.de/perl-object>)
632 These tags are automatically created (and decoded) for serialisable 740 These tags are automatically created (and decoded) for serialisable
633 objects using the "FREEZE/THAW" methods (the Types::Serialier object 741 objects using the "FREEZE/THAW" methods (the Types::Serialier object
634 serialisation protocol). See "OBJECT SERIALISATION" for details. 742 serialisation protocol). See "OBJECT SERIALISATION" for details.
635 743
636 28, 29 (shareable, sharedref, L <http://cbor.schmorp.de/value-sharing>) 744 28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
637 These tags are automatically decoded when encountered (and they do 745 These tags are automatically decoded when encountered (and they do
638 not result in a cyclic data structure, see "allow_cycles"), 746 not result in a cyclic data structure, see "allow_cycles"),
639 resulting in shared values in the decoded object. They are only 747 resulting in shared values in the decoded object. They are only
640 encoded, however, when "allow_sharing" is enabled. 748 encoded, however, when "allow_sharing" is enabled.
641 749
650 references will be shared, others will not. While non-reference 758 references will be shared, others will not. While non-reference
651 shared values can be generated in Perl with some effort, they were 759 shared values can be generated in Perl with some effort, they were
652 considered too unimportant to be supported in the encoder. The 760 considered too unimportant to be supported in the encoder. The
653 decoder, however, will decode these values as shared values. 761 decoder, however, will decode these values as shared values.
654 762
655 256, 25 (stringref-namespace, stringref, L 763 256, 25 (stringref-namespace, stringref,
656 <http://cbor.schmorp.de/stringref>) 764 <http://cbor.schmorp.de/stringref>)
657 These tags are automatically decoded when encountered. They are only 765 These tags are automatically decoded when encountered. They are only
658 encoded, however, when "pack_strings" is enabled. 766 encoded, however, when "pack_strings" is enabled.
659 767
660 22098 (indirection, <http://cbor.schmorp.de/indirection>) 768 22098 (indirection, <http://cbor.schmorp.de/indirection>)
661 This tag is automatically generated when a reference are encountered 769 This tag is automatically generated when a reference are encountered
662 (with the exception of hash and array refernces). It is converted to 770 (with the exception of hash and array references). It is converted
663 a reference when decoding. 771 to a reference when decoding.
664 772
665 55799 (self-describe CBOR, RFC 7049) 773 55799 (self-describe CBOR, RFC 7049)
666 This value is not generated on encoding (unless explicitly requested 774 This value is not generated on encoding (unless explicitly requested
667 by the user), and is simply ignored when decoding. 775 by the user), and is simply ignored when decoding.
668 776
669 NON-ENFORCED TAGS 777 NON-ENFORCED TAGS
670 These tags have default filters provided when decoding. Their handling 778 These tags have default filters provided when decoding. Their handling
671 can be overriden by changing the %CBOR::XS::FILTER entry for the tag, or 779 can be overridden by changing the %CBOR::XS::FILTER entry for the tag,
672 by providing a custom "filter" callback when decoding. 780 or by providing a custom "filter" callback when decoding.
673 781
674 When they result in decoding into a specific Perl class, the module 782 When they result in decoding into a specific Perl class, the module
675 usually provides a corresponding "TO_CBOR" method as well. 783 usually provides a corresponding "TO_CBOR" method as well.
676 784
677 When any of these need to load additional modules that are not part of 785 When any of these need to load additional modules that are not part of
692 2, 3 (positive/negative bignum) 800 2, 3 (positive/negative bignum)
693 These tags are decoded into Math::BigInt objects. The corresponding 801 These tags are decoded into Math::BigInt objects. The corresponding
694 "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal 802 "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
695 CBOR integers, and others into positive/negative CBOR bignums. 803 CBOR integers, and others into positive/negative CBOR bignums.
696 804
697 4, 5 (decimal fraction/bigfloat) 805 4, 5, 264, 265 (decimal fraction/bigfloat)
698 Both decimal fractions and bigfloats are decoded into Math::BigFloat 806 Both decimal fractions and bigfloats are decoded into Math::BigFloat
699 objects. The corresponding "Math::BigFloat::TO_CBOR" method *always* 807 objects. The corresponding "Math::BigFloat::TO_CBOR" method *always*
700 encodes into a decimal fraction. 808 encodes into a decimal fraction (either tag 4 or 264).
701 809
702 CBOR cannot represent bigfloats with *very* large exponents -
703 conversion of such big float objects is undefined.
704
705 Also, NaN and infinities are not encoded properly. 810 NaN and infinities are not encoded properly, as they cannot be
811 represented in CBOR.
812
813 See "BIGNUM SECURITY CONSIDERATIONS" for more info.
814
815 30 (rational numbers)
816 These tags are decoded into Math::BigRat objects. The corresponding
817 "Math::BigRat::TO_CBOR" method encodes rational numbers with
818 denominator 1 via their numerator only, i.e., they become normal
819 integers or "bignums".
820
821 See "BIGNUM SECURITY CONSIDERATIONS" for more info.
706 822
707 21, 22, 23 (expected later JSON conversion) 823 21, 22, 23 (expected later JSON conversion)
708 CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore 824 CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore
709 these tags. 825 these tags.
710 826
757 Also keep in mind that CBOR::XS might leak contents of your Perl data 873 Also keep in mind that CBOR::XS might leak contents of your Perl data
758 structures in its error messages, so when you serialise sensitive 874 structures in its error messages, so when you serialise sensitive
759 information you might want to make sure that exceptions thrown by 875 information you might want to make sure that exceptions thrown by
760 CBOR::XS will not end up in front of untrusted eyes. 876 CBOR::XS will not end up in front of untrusted eyes.
761 877
878BIGNUM SECURITY CONSIDERATIONS
879 CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and
880 Math::BigFloat that tries to encode the number in the simplest possible
881 way, that is, either a CBOR integer, a CBOR bigint/decimal fraction (tag
882 4) or an arbitrary-exponent decimal fraction (tag 264). Rational numbers
883 (Math::BigRat, tag 30) can also contain bignums as members.
884
885 CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
886 bigfloats (tags 5 and 265), but it will never generate these on its own.
887
888 Using the built-in Math::BigInt::Calc support, encoding and decoding
889 decimal fractions is generally fast. Decoding bigints can be slow for
890 very big numbers (tens of thousands of digits, something that could
891 potentially be caught by limiting the size of CBOR texts), and decoding
892 bigfloats or arbitrary-exponent bigfloats can be *extremely* slow
893 (minutes, decades) for large exponents (roughly 40 bit and longer).
894
895 Additionally, Math::BigInt can take advantage of other bignum libraries,
896 such as Math::GMP, which cannot handle big floats with large exponents,
897 and might simply abort or crash your program, due to their code quality.
898
899 This can be a concern if you want to parse untrusted CBOR. If it is, you
900 might want to disable decoding of tag 2 (bigint) and 3 (negative bigint)
901 types. You should also disable types 5 and 265, as these can be slow
902 even without bigints.
903
904 Disabling bigints will also partially or fully disable types that rely
905 on them, e.g. rational numbers that use bignums.
906
762CBOR IMPLEMENTATION NOTES 907CBOR IMPLEMENTATION NOTES
763 This section contains some random implementation notes. They do not 908 This section contains some random implementation notes. They do not
764 describe guaranteed behaviour, but merely behaviour as-is implemented 909 describe guaranteed behaviour, but merely behaviour as-is implemented
765 right now. 910 right now.
766 911
776 921
777 Strict mode and canonical mode are not implemented. 922 Strict mode and canonical mode are not implemented.
778 923
779LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT 924LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
780 On perls that were built without 64 bit integer support (these are rare 925 On perls that were built without 64 bit integer support (these are rare
781 nowadays, even on 32 bit architectures), support for any kind of 64 bit 926 nowadays, even on 32 bit architectures, as all major Perl distributions
927 are built with 64 bit integer support), support for any kind of 64 bit
782 integer in CBOR is very limited - most likely, these 64 bit values will 928 integer in CBOR is very limited - most likely, these 64 bit values will
783 be truncated, corrupted, or otherwise not decoded correctly. This also 929 be truncated, corrupted, or otherwise not decoded correctly. This also
784 includes string, array and map sizes that are stored as 64 bit integers. 930 includes string, array and map sizes that are stored as 64 bit integers.
785 931
786THREADS 932THREADS

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines