ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/README
(Generate patch)

Comparing CBOR-XS/README (file contents):
Revision 1.18 by root, Wed Dec 7 14:14:30 2016 UTC vs.
Revision 1.23 by root, Fri Sep 8 20:03:06 2023 UTC

33 with the added ability of supporting serialisation of Perl objects. 33 with the added ability of supporting serialisation of Perl objects.
34 (JSON often compresses better than CBOR though, so if you plan to 34 (JSON often compresses better than CBOR though, so if you plan to
35 compress the data later and speed is less important you might want to 35 compress the data later and speed is less important you might want to
36 compare both formats first). 36 compare both formats first).
37 37
38 The primary goal of this module is to be *correct* and the secondary
39 goal is to be *fast*. To reach the latter goal it was written in C.
40
38 To give you a general idea about speed, with texts in the megabyte 41 To give you a general idea about speed, with texts in the megabyte
39 range, "CBOR::XS" usually encodes roughly twice as fast as Storable or 42 range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
40 JSON::XS and decodes about 15%-30% faster than those. The shorter the 43 JSON::XS and decodes about 15%-30% faster than those. The shorter the
41 data, the worse Storable performs in comparison. 44 data, the worse Storable performs in comparison.
42 45
47 In addition to the core CBOR data format, this module implements a 50 In addition to the core CBOR data format, this module implements a
48 number of extensions, to support cyclic and shared data structures (see 51 number of extensions, to support cyclic and shared data structures (see
49 "allow_sharing" and "allow_cycles"), string deduplication (see 52 "allow_sharing" and "allow_cycles"), string deduplication (see
50 "pack_strings") and scalar references (always enabled). 53 "pack_strings") and scalar references (always enabled).
51 54
52 The primary goal of this module is to be *correct* and the secondary
53 goal is to be *fast*. To reach the latter goal it was written in C.
54
55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and 55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
56 vice versa. 56 vice versa.
57 57
58FUNCTIONAL INTERFACE 58FUNCTIONAL INTERFACE
59 The following convenience methods are provided by this module. They are 59 The following convenience methods are provided by this module. They are
86 but configures the coder object to be safe to use with untrusted 86 but configures the coder object to be safe to use with untrusted
87 data. Currently, this is equivalent to: 87 data. Currently, this is equivalent to:
88 88
89 my $cbor = CBOR::XS 89 my $cbor = CBOR::XS
90 ->new 90 ->new
91 ->validate_utf8
91 ->forbid_objects 92 ->forbid_objects
92 ->filter (\&CBOR::XS::safe_filter) 93 ->filter (\&CBOR::XS::safe_filter)
93 ->max_size (1e8); 94 ->max_size (1e8);
94 95
95 But is more future proof (it is better to crash because of a change 96 But is more future proof (it is better to crash because of a change
164 partner supports the value sharing extensions to CBOR 165 partner supports the value sharing extensions to CBOR
165 (<http://cbor.schmorp.de/value-sharing>), as without decoder 166 (<http://cbor.schmorp.de/value-sharing>), as without decoder
166 support, the resulting data structure might be unusable. 167 support, the resulting data structure might be unusable.
167 168
168 Detecting shared values incurs a runtime overhead when values are 169 Detecting shared values incurs a runtime overhead when values are
169 encoded that have a reference counter large than one, and might 170 encoded that have a reference counter larger than one, and might
170 unnecessarily increase the encoded size, as potentially shared 171 unnecessarily increase the encoded size, as potentially shared
171 values are encode as shareable whether or not they are actually 172 values are encoded as shareable whether or not they are actually
172 shared. 173 shared.
173 174
174 At the moment, only targets of references can be shared (e.g. 175 At the moment, only targets of references can be shared (e.g.
175 scalars, arrays or hashes pointed to by a reference). Weirder 176 scalars, arrays or hashes pointed to by a reference). Weirder
176 constructs, such as an array with multiple "copies" of the *same* 177 constructs, such as an array with multiple "copies" of the *same*
192 code that isn't prepared for this will not leak memory. 193 code that isn't prepared for this will not leak memory.
193 194
194 If $enable is false (the default), then "decode" will throw an error 195 If $enable is false (the default), then "decode" will throw an error
195 when it encounters a self-referential/cyclic data structure. 196 when it encounters a self-referential/cyclic data structure.
196 197
197 FUTURE DIRECTION: the motivation behind this option is to avoid 198 This option does not affect "encode" in any way - shared values and
198 *real* cycles - future versions of this module might chose to decode 199 references will always be encoded properly if present.
199 cyclic data structures using weak references when this option is 200
200 off, instead of throwing an error. 201 $cbor = $cbor->allow_weak_cycles ([$enable])
202 $enabled = $cbor->get_allow_weak_cycles
203 This works like "allow_cycles" in that it allows the resulting data
204 structures to contain cycles, but unlike "allow_cycles", those
205 cyclic rreferences will be weak. That means that code that
206 recurrsively walks the data structure must be prepared with cycles,
207 but at least not special precautions must be implemented to free
208 these data structures.
209
210 Only those references leading to actual cycles will be weakened -
211 other references, e.g. when the same hash or arrray is referenced
212 multiple times in an arrray, will be normal references.
201 213
202 This option does not affect "encode" in any way - shared values and 214 This option does not affect "encode" in any way - shared values and
203 references will always be encoded properly if present. 215 references will always be encoded properly if present.
204 216
205 $cbor = $cbor->forbid_objects ([$enable]) 217 $cbor = $cbor->forbid_objects ([$enable])
276 perl strings as CBOR byte strings. 288 perl strings as CBOR byte strings.
277 289
278 This option does not affect "decode" in any way. 290 This option does not affect "decode" in any way.
279 291
280 This option has similar advantages and disadvantages as "text_keys". 292 This option has similar advantages and disadvantages as "text_keys".
281 In addition, this option effectively removes the ability to encode 293 In addition, this option effectively removes the ability to
282 byte strings, which might break some "FREEZE" and "TO_CBOR" methods 294 automatically encode byte strings, which might break some "FREEZE"
283 that rely on this, such as bignum encoding, so this option is mainly 295 and "TO_CBOR" methods that rely on this.
284 useful for very simple data. 296
297 A workaround is to use explicit type casts, which are unaffected by
298 this option.
285 299
286 $cbor = $cbor->validate_utf8 ([$enable]) 300 $cbor = $cbor->validate_utf8 ([$enable])
287 $enabled = $cbor->get_validate_utf8 301 $enabled = $cbor->get_validate_utf8
288 If $enable is true (or missing), then "decode" will validate that 302 If $enable is true (or missing), then "decode" will validate that
289 elements (text strings) containing UTF-8 data in fact contain valid 303 elements (text strings) containing UTF-8 data in fact contain valid
396 will silently stop parsing there and return the number of characters 410 will silently stop parsing there and return the number of characters
397 consumed so far. 411 consumed so far.
398 412
399 This is useful if your CBOR texts are not delimited by an outer 413 This is useful if your CBOR texts are not delimited by an outer
400 protocol and you need to know where the first CBOR string ends amd 414 protocol and you need to know where the first CBOR string ends amd
401 the next one starts. 415 the next one starts - CBOR strings are self-delimited, so it is
416 possible to concatenate CBOR strings without any delimiters or size
417 fields and recover their data.
402 418
403 CBOR::XS->new->decode_prefix ("......") 419 CBOR::XS->new->decode_prefix ("......")
404 => ("...", 3) 420 => ("...", 3)
405 421
406 INCREMENTAL PARSING 422 INCREMENTAL PARSING
409 data structure in memory at one time, it does allow you to parse a CBOR 425 data structure in memory at one time, it does allow you to parse a CBOR
410 stream incrementally, using a similar to using "decode_prefix" to see if 426 stream incrementally, using a similar to using "decode_prefix" to see if
411 a full CBOR object is available, but is much more efficient. 427 a full CBOR object is available, but is much more efficient.
412 428
413 It basically works by parsing as much of a CBOR string as possible - if 429 It basically works by parsing as much of a CBOR string as possible - if
414 the CBOR data is not complete yet, the pasrer will remember where it 430 the CBOR data is not complete yet, the parser will remember where it
415 was, to be able to restart when more data has been accumulated. Once 431 was, to be able to restart when more data has been accumulated. Once
416 enough data is available to either decode a complete CBOR value or raise 432 enough data is available to either decode a complete CBOR value or raise
417 an error, a real decode will be attempted. 433 an error, a real decode will be attempted.
418 434
419 A typical use case would be a network protocol that consists of sending 435 A typical use case would be a network protocol that consists of sending
543 "CBOR::XS::tag" to create such objects. 559 "CBOR::XS::tag" to create such objects.
544 560
545 Types::Serialiser::true, Types::Serialiser::false, 561 Types::Serialiser::true, Types::Serialiser::false,
546 Types::Serialiser::error 562 Types::Serialiser::error
547 These special values become CBOR true, CBOR false and CBOR undefined 563 These special values become CBOR true, CBOR false and CBOR undefined
548 values, respectively. You can also use "\1", "\0" and "\undef" 564 values, respectively.
549 directly if you want.
550 565
551 other blessed objects 566 other blessed objects
552 Other blessed objects are serialised via "TO_CBOR" or "FREEZE". See 567 Other blessed objects are serialised via "TO_CBOR" or "FREEZE". See
553 "TAG HANDLING AND EXTENSIONS" for specific classes handled by this 568 "TAG HANDLING AND EXTENSIONS" for specific classes handled by this
554 module, and "OBJECT SERIALISATION" for generic object serialisation. 569 module, and "OBJECT SERIALISATION" for generic object serialisation.
579 $x .= ""; # another, more awkward way to stringify 594 $x .= ""; # another, more awkward way to stringify
580 print $x; # perl does it for you, too, quite often 595 print $x; # perl does it for you, too, quite often
581 596
582 You can force whether a string is encoded as byte or text string by 597 You can force whether a string is encoded as byte or text string by
583 using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is 598 using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
584 disabled): 599 disabled).
585 600
586 utf8::upgrade $x; # encode $x as text string 601 utf8::upgrade $x; # encode $x as text string
587 utf8::downgrade $x; # encode $x as byte string 602 utf8::downgrade $x; # encode $x as byte string
603
604 More options are available, see "TYPE CASTS", below, and the
605 "text_keys" and "text_strings" options.
588 606
589 Perl doesn't define what operations up- and downgrade strings, so if 607 Perl doesn't define what operations up- and downgrade strings, so if
590 the difference between byte and text is important, you should up- or 608 the difference between byte and text is important, you should up- or
591 downgrade your string as late as possible before encoding. You can 609 downgrade your string as late as possible before encoding. You can
592 also force the use of CBOR text strings by using "text_keys" or 610 also force the use of CBOR text strings by using "text_keys" or
606 possible representation. Floating-point values will use either the 624 possible representation. Floating-point values will use either the
607 IEEE single format if possible without loss of precision, otherwise 625 IEEE single format if possible without loss of precision, otherwise
608 the IEEE double format will be used. Perls that use formats other 626 the IEEE double format will be used. Perls that use formats other
609 than IEEE double to represent numerical values are supported, but 627 than IEEE double to represent numerical values are supported, but
610 might suffer loss of precision. 628 might suffer loss of precision.
629
630 TYPE CASTS
631 EXPERIMENTAL: As an experimental extension, "CBOR::XS" allows you to
632 force specific CBOR types to be used when encoding. That allows you to
633 encode types not normally accessible (e.g. half floats) as well as force
634 string types even when "text_strings" is in effect.
635
636 Type forcing is done by calling a special "cast" function which keeps a
637 copy of the value and returns a new value that can be handed over to any
638 CBOR encoder function.
639
640 The following casts are currently available (all of which are unary
641 operators, that is, have a prototype of "$"):
642
643 CBOR::XS::as_int $value
644 Forces the value to be encoded as some form of (basic, not bignum)
645 integer type.
646
647 CBOR::XS::as_text $value
648 Forces the value to be encoded as (UTF-8) text values.
649
650 CBOR::XS::as_bytes $value
651 Forces the value to be encoded as a (binary) string value.
652
653 Example: encode a perl string as binary even though "text_strings"
654 is in effect.
655
656 CBOR::XS->new->text_strings->encode ([4, "text", CBOR::XS::bytes "bytevalue"]);
657
658 CBOR::XS::as_bool $value
659 Converts a Perl boolean (which can be any kind of scalar) into a
660 CBOR boolean. Strictly the same, but shorter to write, than:
661
662 $value ? Types::Serialiser::true : Types::Serialiser::false
663
664 CBOR::XS::as_float16 $value
665 Forces half-float (IEEE 754 binary16) encoding of the given value.
666
667 CBOR::XS::as_float32 $value
668 Forces single-float (IEEE 754 binary32) encoding of the given value.
669
670 CBOR::XS::as_float64 $value
671 Forces double-float (IEEE 754 binary64) encoding of the given value.
672
673 CBOR::XS::as_cbor $cbor_text
674 Not a type cast per-se, this type cast forces the argument to be
675 encoded as-is. This can be used to embed pre-encoded CBOR data.
676
677 Note that no checking on the validity of the $cbor_text is done -
678 it's the callers responsibility to correctly encode values.
679
680 CBOR::XS::as_map [key => value...]
681 Treat the array reference as key value pairs and output a CBOR map.
682 This allows you to generate CBOR maps with arbitrary key types (or,
683 if you don't care about semantics, duplicate keys or pairs in a
684 custom order), which is otherwise hard to do with Perl.
685
686 The single argument must be an array reference with an even number
687 of elements.
688
689 Note that only the reference to the array is copied, the array
690 itself is not. Modifications done to the array before calling an
691 encoding function will be reflected in the encoded output.
692
693 Example: encode a CBOR map with a string and an integer as keys.
694
695 encode_cbor CBOR::XS::as_map [string => "value", 5 => "value"]
611 696
612 OBJECT SERIALISATION 697 OBJECT SERIALISATION
613 This module implements both a CBOR-specific and the generic 698 This module implements both a CBOR-specific and the generic
614 Types::Serialier object serialisation protocol. The following 699 Types::Serialier object serialisation protocol. The following
615 subsections explain both methods. 700 subsections explain both methods.
907 ensure that decoded JSON data will round-trip encoding and decoding to 992 ensure that decoded JSON data will round-trip encoding and decoding to
908 CBOR intact. 993 CBOR intact.
909 994
910SECURITY CONSIDERATIONS 995SECURITY CONSIDERATIONS
911 Tl;dr... if you want to decode or encode CBOR from untrusted sources, 996 Tl;dr... if you want to decode or encode CBOR from untrusted sources,
912 you should start with a coder object created via "new_safe": 997 you should start with a coder object created via "new_safe" (which
998 implements the mitigations explained below):
913 999
914 my $coder = CBOR::XS->new_safe; 1000 my $coder = CBOR::XS->new_safe;
915 1001
916 my $data = $coder->decode ($cbor_text); 1002 my $data = $coder->decode ($cbor_text);
917 my $cbor = $coder->encode ($data); 1003 my $cbor = $coder->encode ($data);
936 even if all your "THAW" methods are secure, encoding data structures 1022 even if all your "THAW" methods are secure, encoding data structures
937 from untrusted sources can invoke those and trigger bugs in those. 1023 from untrusted sources can invoke those and trigger bugs in those.
938 1024
939 So, if you are not sure about the security of all the modules you 1025 So, if you are not sure about the security of all the modules you
940 have loaded (you shouldn't), you should disable this part using 1026 have loaded (you shouldn't), you should disable this part using
941 "forbid_objects". 1027 "forbid_objects" or using "new_safe".
942 1028
943 CBOR can be extended with tags that call library code 1029 CBOR can be extended with tags that call library code
944 CBOR can be extended with tags, and "CBOR::XS" has a registry of 1030 CBOR can be extended with tags, and "CBOR::XS" has a registry of
945 conversion functions for many existing tags that can be extended via 1031 conversion functions for many existing tags that can be extended via
946 third-party modules (see the "filter" method). 1032 third-party modules (see the "filter" method).
947 1033
948 If you don't trust these, you should configure the "safe" filter 1034 If you don't trust these, you should configure the "safe" filter
949 function, "CBOR::XS::safe_filter", which by default only includes 1035 function, "CBOR::XS::safe_filter" ("new_safe" does this), which by
950 conversion functions that are considered "safe" by the author (but 1036 default only includes conversion functions that are considered
951 again, they can be extended by third party modules). 1037 "safe" by the author (but again, they can be extended by third party
1038 modules).
952 1039
953 Depending on your level of paranoia, you can use the "safe" filter: 1040 Depending on your level of paranoia, you can use the "safe" filter:
954 1041
955 $cbor->filter (\&CBOR::XS::safe_filter); 1042 $cbor->filter (\&CBOR::XS::safe_filter);
956 1043
970 limit the size of CBOR data you accept, or make sure then when your 1057 limit the size of CBOR data you accept, or make sure then when your
971 resources run out, that's just fine (e.g. by using a separate 1058 resources run out, that's just fine (e.g. by using a separate
972 process that can crash safely). The size of a CBOR string in octets 1059 process that can crash safely). The size of a CBOR string in octets
973 is usually a good indication of the size of the resources required 1060 is usually a good indication of the size of the resources required
974 to decode it into a Perl structure. While CBOR::XS can check the 1061 to decode it into a Perl structure. While CBOR::XS can check the
975 size of the CBOR text (using "max_size"), it might be too late when 1062 size of the CBOR text (using "max_size" - done by "new_safe"), it
976 you already have it in memory, so you might want to check the size 1063 might be too late when you already have it in memory, so you might
977 before you accept the string. 1064 want to check the size before you accept the string.
978 1065
979 As for encoding, it is possible to construct data structures that 1066 As for encoding, it is possible to construct data structures that
980 are relatively small but result in large CBOR texts (for example by 1067 are relatively small but result in large CBOR texts (for example by
981 having an array full of references to the same big data structure, 1068 having an array full of references to the same big data structure,
982 which will all be deep-cloned during encoding by default). This is 1069 which will all be deep-cloned during encoding by default). This is
996 1083
997 Resource-starving attacks: CPU en-/decoding complexity 1084 Resource-starving attacks: CPU en-/decoding complexity
998 CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat 1085 CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat
999 libraries to represent encode/decode bignums. These can be very slow 1086 libraries to represent encode/decode bignums. These can be very slow
1000 (as in, centuries of CPU time) and can even crash your program (and 1087 (as in, centuries of CPU time) and can even crash your program (and
1001 are generally not very trustworthy). See the next section for 1088 are generally not very trustworthy). See the next section on bignum
1002 details. 1089 security for details.
1003 1090
1004 Data breaches: leaking information in error messages 1091 Data breaches: leaking information in error messages
1005 CBOR::XS might leak contents of your Perl data structures in its 1092 CBOR::XS might leak contents of your Perl data structures in its
1006 error messages, so when you serialise sensitive information you 1093 error messages, so when you serialise sensitive information you
1007 might want to make sure that exceptions thrown by CBOR::XS will not 1094 might want to make sure that exceptions thrown by CBOR::XS will not
1060 1147
1061LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT 1148LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
1062 On perls that were built without 64 bit integer support (these are rare 1149 On perls that were built without 64 bit integer support (these are rare
1063 nowadays, even on 32 bit architectures, as all major Perl distributions 1150 nowadays, even on 32 bit architectures, as all major Perl distributions
1064 are built with 64 bit integer support), support for any kind of 64 bit 1151 are built with 64 bit integer support), support for any kind of 64 bit
1065 integer in CBOR is very limited - most likely, these 64 bit values will 1152 value in CBOR is very limited - most likely, these 64 bit values will be
1066 be truncated, corrupted, or otherwise not decoded correctly. This also 1153 truncated, corrupted, or otherwise not decoded correctly. This also
1067 includes string, array and map sizes that are stored as 64 bit integers. 1154 includes string, float, array and map sizes that are stored as 64 bit
1155 integers.
1068 1156
1069THREADS 1157THREADS
1070 This module is *not* guaranteed to be thread safe and there are no plans 1158 This module is *not* guaranteed to be thread safe and there are no plans
1071 to change this until Perl gets thread support (as opposed to the 1159 to change this until Perl gets thread support (as opposed to the
1072 horribly slow so-called "threads" which are simply slow and bloated 1160 horribly slow so-called "threads" which are simply slow and bloated

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines