ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/README
(Generate patch)

Comparing CBOR-XS/README (file contents):
Revision 1.18 by root, Wed Dec 7 14:14:30 2016 UTC vs.
Revision 1.19 by root, Sun Nov 29 21:35:51 2020 UTC

33 with the added ability of supporting serialisation of Perl objects. 33 with the added ability of supporting serialisation of Perl objects.
34 (JSON often compresses better than CBOR though, so if you plan to 34 (JSON often compresses better than CBOR though, so if you plan to
35 compress the data later and speed is less important you might want to 35 compress the data later and speed is less important you might want to
36 compare both formats first). 36 compare both formats first).
37 37
38 The primary goal of this module is to be *correct* and the secondary
39 goal is to be *fast*. To reach the latter goal it was written in C.
40
38 To give you a general idea about speed, with texts in the megabyte 41 To give you a general idea about speed, with texts in the megabyte
39 range, "CBOR::XS" usually encodes roughly twice as fast as Storable or 42 range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
40 JSON::XS and decodes about 15%-30% faster than those. The shorter the 43 JSON::XS and decodes about 15%-30% faster than those. The shorter the
41 data, the worse Storable performs in comparison. 44 data, the worse Storable performs in comparison.
42 45
46 49
47 In addition to the core CBOR data format, this module implements a 50 In addition to the core CBOR data format, this module implements a
48 number of extensions, to support cyclic and shared data structures (see 51 number of extensions, to support cyclic and shared data structures (see
49 "allow_sharing" and "allow_cycles"), string deduplication (see 52 "allow_sharing" and "allow_cycles"), string deduplication (see
50 "pack_strings") and scalar references (always enabled). 53 "pack_strings") and scalar references (always enabled).
51
52 The primary goal of this module is to be *correct* and the secondary
53 goal is to be *fast*. To reach the latter goal it was written in C.
54 54
55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and 55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
56 vice versa. 56 vice versa.
57 57
58FUNCTIONAL INTERFACE 58FUNCTIONAL INTERFACE
166 support, the resulting data structure might be unusable. 166 support, the resulting data structure might be unusable.
167 167
168 Detecting shared values incurs a runtime overhead when values are 168 Detecting shared values incurs a runtime overhead when values are
169 encoded that have a reference counter large than one, and might 169 encoded that have a reference counter large than one, and might
170 unnecessarily increase the encoded size, as potentially shared 170 unnecessarily increase the encoded size, as potentially shared
171 values are encode as shareable whether or not they are actually 171 values are encoded as shareable whether or not they are actually
172 shared. 172 shared.
173 173
174 At the moment, only targets of references can be shared (e.g. 174 At the moment, only targets of references can be shared (e.g.
175 scalars, arrays or hashes pointed to by a reference). Weirder 175 scalars, arrays or hashes pointed to by a reference). Weirder
176 constructs, such as an array with multiple "copies" of the *same* 176 constructs, such as an array with multiple "copies" of the *same*
276 perl strings as CBOR byte strings. 276 perl strings as CBOR byte strings.
277 277
278 This option does not affect "decode" in any way. 278 This option does not affect "decode" in any way.
279 279
280 This option has similar advantages and disadvantages as "text_keys". 280 This option has similar advantages and disadvantages as "text_keys".
281 In addition, this option effectively removes the ability to encode 281 In addition, this option effectively removes the ability to
282 byte strings, which might break some "FREEZE" and "TO_CBOR" methods 282 automatically encode byte strings, which might break some "FREEZE"
283 that rely on this, such as bignum encoding, so this option is mainly 283 and "TO_CBOR" methods that rely on this.
284 useful for very simple data. 284
285 A workaround is to use explicit type casts, which are unaffected by
286 this option.
285 287
286 $cbor = $cbor->validate_utf8 ([$enable]) 288 $cbor = $cbor->validate_utf8 ([$enable])
287 $enabled = $cbor->get_validate_utf8 289 $enabled = $cbor->get_validate_utf8
288 If $enable is true (or missing), then "decode" will validate that 290 If $enable is true (or missing), then "decode" will validate that
289 elements (text strings) containing UTF-8 data in fact contain valid 291 elements (text strings) containing UTF-8 data in fact contain valid
396 will silently stop parsing there and return the number of characters 398 will silently stop parsing there and return the number of characters
397 consumed so far. 399 consumed so far.
398 400
399 This is useful if your CBOR texts are not delimited by an outer 401 This is useful if your CBOR texts are not delimited by an outer
400 protocol and you need to know where the first CBOR string ends amd 402 protocol and you need to know where the first CBOR string ends amd
401 the next one starts. 403 the next one starts - CBOR strings are self-delimited, so it is
404 possible to concatenate CBOR strings without any delimiters or size
405 fields and recover their data.
402 406
403 CBOR::XS->new->decode_prefix ("......") 407 CBOR::XS->new->decode_prefix ("......")
404 => ("...", 3) 408 => ("...", 3)
405 409
406 INCREMENTAL PARSING 410 INCREMENTAL PARSING
579 $x .= ""; # another, more awkward way to stringify 583 $x .= ""; # another, more awkward way to stringify
580 print $x; # perl does it for you, too, quite often 584 print $x; # perl does it for you, too, quite often
581 585
582 You can force whether a string is encoded as byte or text string by 586 You can force whether a string is encoded as byte or text string by
583 using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is 587 using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
584 disabled): 588 disabled).
585 589
586 utf8::upgrade $x; # encode $x as text string 590 utf8::upgrade $x; # encode $x as text string
587 utf8::downgrade $x; # encode $x as byte string 591 utf8::downgrade $x; # encode $x as byte string
592
593 More options are available, see "TYPE CASTS", below, and the
594 "text_keys" and "text_strings" options.
588 595
589 Perl doesn't define what operations up- and downgrade strings, so if 596 Perl doesn't define what operations up- and downgrade strings, so if
590 the difference between byte and text is important, you should up- or 597 the difference between byte and text is important, you should up- or
591 downgrade your string as late as possible before encoding. You can 598 downgrade your string as late as possible before encoding. You can
592 also force the use of CBOR text strings by using "text_keys" or 599 also force the use of CBOR text strings by using "text_keys" or
606 possible representation. Floating-point values will use either the 613 possible representation. Floating-point values will use either the
607 IEEE single format if possible without loss of precision, otherwise 614 IEEE single format if possible without loss of precision, otherwise
608 the IEEE double format will be used. Perls that use formats other 615 the IEEE double format will be used. Perls that use formats other
609 than IEEE double to represent numerical values are supported, but 616 than IEEE double to represent numerical values are supported, but
610 might suffer loss of precision. 617 might suffer loss of precision.
618
619 TYPE CASTS
620 EXPERIMENTAL: As an experimental extension, "CBOR::XS" allows you to
621 force specific cbor types to be used when encoding. That allows you to
622 encode types not normally accessible (e.g. half floats) as well as force
623 string types even when "text_strings" is in effect.
624
625 Type forcing is done by calling a special "cast" function which keeps a
626 copy of the value and returns a new value that can be handed over to any
627 CBOR encoder function.
628
629 The following casts are currently available (all of which are unary
630 operators):
631
632 CBOR::XS::as_int $value
633 Forces the value to be encoded as some form of (basic, not bignum)
634 integer type.
635
636 CBOR::XS::as_text $value
637 Forces the value to be encoded as (UTF-8) text values.
638
639 CBOR::XS::as_bytes $value
640 Forces the value to be encoded as a (binary) string value.
641
642 CBOR::XS::as_float16 $value
643 Forces half-float (IEEE 754 binary16) encoding of the given value.
644
645 CBOR::XS::as_float32 $value
646 Forces single-float (IEEE 754 binary32) encoding of the given value.
647
648 CBOR::XS::as_float64 $value
649 Forces double-float (IEEE 754 binary64) encoding of the given value.
650
651 =item, CBOR::XS::as_cbor $cbor_text
652
653 Bot a type cast per-se, this type cast forces the argument to eb
654 encoded as-is. This can be used to embed pre-encoded CBOR data.
655
656 Note that no checking on the validity of the $cbor_text is done -
657 it's the callers responsibility to correctly encode values.
658
659 Example: encode a perl string as binary even though "text_strings" is in
660 effect.
661
662 CBOR::XS->new->text_strings->encode ([4, "text", CBOR::XS::bytes "bytevalue"]);
611 663
612 OBJECT SERIALISATION 664 OBJECT SERIALISATION
613 This module implements both a CBOR-specific and the generic 665 This module implements both a CBOR-specific and the generic
614 Types::Serialier object serialisation protocol. The following 666 Types::Serialier object serialisation protocol. The following
615 subsections explain both methods. 667 subsections explain both methods.
907 ensure that decoded JSON data will round-trip encoding and decoding to 959 ensure that decoded JSON data will round-trip encoding and decoding to
908 CBOR intact. 960 CBOR intact.
909 961
910SECURITY CONSIDERATIONS 962SECURITY CONSIDERATIONS
911 Tl;dr... if you want to decode or encode CBOR from untrusted sources, 963 Tl;dr... if you want to decode or encode CBOR from untrusted sources,
912 you should start with a coder object created via "new_safe": 964 you should start with a coder object created via "new_safe" (which
965 implements the mitigations explained below):
913 966
914 my $coder = CBOR::XS->new_safe; 967 my $coder = CBOR::XS->new_safe;
915 968
916 my $data = $coder->decode ($cbor_text); 969 my $data = $coder->decode ($cbor_text);
917 my $cbor = $coder->encode ($data); 970 my $cbor = $coder->encode ($data);
936 even if all your "THAW" methods are secure, encoding data structures 989 even if all your "THAW" methods are secure, encoding data structures
937 from untrusted sources can invoke those and trigger bugs in those. 990 from untrusted sources can invoke those and trigger bugs in those.
938 991
939 So, if you are not sure about the security of all the modules you 992 So, if you are not sure about the security of all the modules you
940 have loaded (you shouldn't), you should disable this part using 993 have loaded (you shouldn't), you should disable this part using
941 "forbid_objects". 994 "forbid_objects" or using "new_safe".
942 995
943 CBOR can be extended with tags that call library code 996 CBOR can be extended with tags that call library code
944 CBOR can be extended with tags, and "CBOR::XS" has a registry of 997 CBOR can be extended with tags, and "CBOR::XS" has a registry of
945 conversion functions for many existing tags that can be extended via 998 conversion functions for many existing tags that can be extended via
946 third-party modules (see the "filter" method). 999 third-party modules (see the "filter" method).
947 1000
948 If you don't trust these, you should configure the "safe" filter 1001 If you don't trust these, you should configure the "safe" filter
949 function, "CBOR::XS::safe_filter", which by default only includes 1002 function, "CBOR::XS::safe_filter" ("new_safe" does this), which by
950 conversion functions that are considered "safe" by the author (but 1003 default only includes conversion functions that are considered
951 again, they can be extended by third party modules). 1004 "safe" by the author (but again, they can be extended by third party
1005 modules).
952 1006
953 Depending on your level of paranoia, you can use the "safe" filter: 1007 Depending on your level of paranoia, you can use the "safe" filter:
954 1008
955 $cbor->filter (\&CBOR::XS::safe_filter); 1009 $cbor->filter (\&CBOR::XS::safe_filter);
956 1010
970 limit the size of CBOR data you accept, or make sure then when your 1024 limit the size of CBOR data you accept, or make sure then when your
971 resources run out, that's just fine (e.g. by using a separate 1025 resources run out, that's just fine (e.g. by using a separate
972 process that can crash safely). The size of a CBOR string in octets 1026 process that can crash safely). The size of a CBOR string in octets
973 is usually a good indication of the size of the resources required 1027 is usually a good indication of the size of the resources required
974 to decode it into a Perl structure. While CBOR::XS can check the 1028 to decode it into a Perl structure. While CBOR::XS can check the
975 size of the CBOR text (using "max_size"), it might be too late when 1029 size of the CBOR text (using "max_size" - done by "new_safe"), it
976 you already have it in memory, so you might want to check the size 1030 might be too late when you already have it in memory, so you might
977 before you accept the string. 1031 want to check the size before you accept the string.
978 1032
979 As for encoding, it is possible to construct data structures that 1033 As for encoding, it is possible to construct data structures that
980 are relatively small but result in large CBOR texts (for example by 1034 are relatively small but result in large CBOR texts (for example by
981 having an array full of references to the same big data structure, 1035 having an array full of references to the same big data structure,
982 which will all be deep-cloned during encoding by default). This is 1036 which will all be deep-cloned during encoding by default). This is
996 1050
997 Resource-starving attacks: CPU en-/decoding complexity 1051 Resource-starving attacks: CPU en-/decoding complexity
998 CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat 1052 CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat
999 libraries to represent encode/decode bignums. These can be very slow 1053 libraries to represent encode/decode bignums. These can be very slow
1000 (as in, centuries of CPU time) and can even crash your program (and 1054 (as in, centuries of CPU time) and can even crash your program (and
1001 are generally not very trustworthy). See the next section for 1055 are generally not very trustworthy). See the next section on bignum
1002 details. 1056 security for details.
1003 1057
1004 Data breaches: leaking information in error messages 1058 Data breaches: leaking information in error messages
1005 CBOR::XS might leak contents of your Perl data structures in its 1059 CBOR::XS might leak contents of your Perl data structures in its
1006 error messages, so when you serialise sensitive information you 1060 error messages, so when you serialise sensitive information you
1007 might want to make sure that exceptions thrown by CBOR::XS will not 1061 might want to make sure that exceptions thrown by CBOR::XS will not
1060 1114
1061LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT 1115LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
1062 On perls that were built without 64 bit integer support (these are rare 1116 On perls that were built without 64 bit integer support (these are rare
1063 nowadays, even on 32 bit architectures, as all major Perl distributions 1117 nowadays, even on 32 bit architectures, as all major Perl distributions
1064 are built with 64 bit integer support), support for any kind of 64 bit 1118 are built with 64 bit integer support), support for any kind of 64 bit
1065 integer in CBOR is very limited - most likely, these 64 bit values will 1119 value in CBOR is very limited - most likely, these 64 bit values will be
1066 be truncated, corrupted, or otherwise not decoded correctly. This also 1120 truncated, corrupted, or otherwise not decoded correctly. This also
1067 includes string, array and map sizes that are stored as 64 bit integers. 1121 includes string, float, array and map sizes that are stored as 64 bit
1122 integers.
1068 1123
1069THREADS 1124THREADS
1070 This module is *not* guaranteed to be thread safe and there are no plans 1125 This module is *not* guaranteed to be thread safe and there are no plans
1071 to change this until Perl gets thread support (as opposed to the 1126 to change this until Perl gets thread support (as opposed to the
1072 horribly slow so-called "threads" which are simply slow and bloated 1127 horribly slow so-called "threads" which are simply slow and bloated

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines