… | |
… | |
33 | with the added ability of supporting serialisation of Perl objects. |
33 | with the added ability of supporting serialisation of Perl objects. |
34 | (JSON often compresses better than CBOR though, so if you plan to |
34 | (JSON often compresses better than CBOR though, so if you plan to |
35 | compress the data later and speed is less important you might want to |
35 | compress the data later and speed is less important you might want to |
36 | compare both formats first). |
36 | compare both formats first). |
37 | |
37 | |
|
|
38 | The primary goal of this module is to be *correct* and the secondary |
|
|
39 | goal is to be *fast*. To reach the latter goal it was written in C. |
|
|
40 | |
38 | To give you a general idea about speed, with texts in the megabyte |
41 | To give you a general idea about speed, with texts in the megabyte |
39 | range, "CBOR::XS" usually encodes roughly twice as fast as Storable or |
42 | range, "CBOR::XS" usually encodes roughly twice as fast as Storable or |
40 | JSON::XS and decodes about 15%-30% faster than those. The shorter the |
43 | JSON::XS and decodes about 15%-30% faster than those. The shorter the |
41 | data, the worse Storable performs in comparison. |
44 | data, the worse Storable performs in comparison. |
42 | |
45 | |
… | |
… | |
46 | |
49 | |
47 | In addition to the core CBOR data format, this module implements a |
50 | In addition to the core CBOR data format, this module implements a |
48 | number of extensions, to support cyclic and shared data structures (see |
51 | number of extensions, to support cyclic and shared data structures (see |
49 | "allow_sharing" and "allow_cycles"), string deduplication (see |
52 | "allow_sharing" and "allow_cycles"), string deduplication (see |
50 | "pack_strings") and scalar references (always enabled). |
53 | "pack_strings") and scalar references (always enabled). |
51 | |
|
|
52 | The primary goal of this module is to be *correct* and the secondary |
|
|
53 | goal is to be *fast*. To reach the latter goal it was written in C. |
|
|
54 | |
54 | |
55 | See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and |
55 | See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and |
56 | vice versa. |
56 | vice versa. |
57 | |
57 | |
58 | FUNCTIONAL INTERFACE |
58 | FUNCTIONAL INTERFACE |
… | |
… | |
166 | support, the resulting data structure might be unusable. |
166 | support, the resulting data structure might be unusable. |
167 | |
167 | |
168 | Detecting shared values incurs a runtime overhead when values are |
168 | Detecting shared values incurs a runtime overhead when values are |
169 | encoded that have a reference counter large than one, and might |
169 | encoded that have a reference counter large than one, and might |
170 | unnecessarily increase the encoded size, as potentially shared |
170 | unnecessarily increase the encoded size, as potentially shared |
171 | values are encode as shareable whether or not they are actually |
171 | values are encoded as shareable whether or not they are actually |
172 | shared. |
172 | shared. |
173 | |
173 | |
174 | At the moment, only targets of references can be shared (e.g. |
174 | At the moment, only targets of references can be shared (e.g. |
175 | scalars, arrays or hashes pointed to by a reference). Weirder |
175 | scalars, arrays or hashes pointed to by a reference). Weirder |
176 | constructs, such as an array with multiple "copies" of the *same* |
176 | constructs, such as an array with multiple "copies" of the *same* |
… | |
… | |
276 | perl strings as CBOR byte strings. |
276 | perl strings as CBOR byte strings. |
277 | |
277 | |
278 | This option does not affect "decode" in any way. |
278 | This option does not affect "decode" in any way. |
279 | |
279 | |
280 | This option has similar advantages and disadvantages as "text_keys". |
280 | This option has similar advantages and disadvantages as "text_keys". |
281 | In addition, this option effectively removes the ability to encode |
281 | In addition, this option effectively removes the ability to |
282 | byte strings, which might break some "FREEZE" and "TO_CBOR" methods |
282 | automatically encode byte strings, which might break some "FREEZE" |
283 | that rely on this, such as bignum encoding, so this option is mainly |
283 | and "TO_CBOR" methods that rely on this. |
284 | useful for very simple data. |
284 | |
|
|
285 | A workaround is to use explicit type casts, which are unaffected by |
|
|
286 | this option. |
285 | |
287 | |
286 | $cbor = $cbor->validate_utf8 ([$enable]) |
288 | $cbor = $cbor->validate_utf8 ([$enable]) |
287 | $enabled = $cbor->get_validate_utf8 |
289 | $enabled = $cbor->get_validate_utf8 |
288 | If $enable is true (or missing), then "decode" will validate that |
290 | If $enable is true (or missing), then "decode" will validate that |
289 | elements (text strings) containing UTF-8 data in fact contain valid |
291 | elements (text strings) containing UTF-8 data in fact contain valid |
… | |
… | |
396 | will silently stop parsing there and return the number of characters |
398 | will silently stop parsing there and return the number of characters |
397 | consumed so far. |
399 | consumed so far. |
398 | |
400 | |
399 | This is useful if your CBOR texts are not delimited by an outer |
401 | This is useful if your CBOR texts are not delimited by an outer |
400 | protocol and you need to know where the first CBOR string ends amd |
402 | protocol and you need to know where the first CBOR string ends amd |
401 | the next one starts. |
403 | the next one starts - CBOR strings are self-delimited, so it is |
|
|
404 | possible to concatenate CBOR strings without any delimiters or size |
|
|
405 | fields and recover their data. |
402 | |
406 | |
403 | CBOR::XS->new->decode_prefix ("......") |
407 | CBOR::XS->new->decode_prefix ("......") |
404 | => ("...", 3) |
408 | => ("...", 3) |
405 | |
409 | |
406 | INCREMENTAL PARSING |
410 | INCREMENTAL PARSING |
… | |
… | |
579 | $x .= ""; # another, more awkward way to stringify |
583 | $x .= ""; # another, more awkward way to stringify |
580 | print $x; # perl does it for you, too, quite often |
584 | print $x; # perl does it for you, too, quite often |
581 | |
585 | |
582 | You can force whether a string is encoded as byte or text string by |
586 | You can force whether a string is encoded as byte or text string by |
583 | using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is |
587 | using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is |
584 | disabled): |
588 | disabled). |
585 | |
589 | |
586 | utf8::upgrade $x; # encode $x as text string |
590 | utf8::upgrade $x; # encode $x as text string |
587 | utf8::downgrade $x; # encode $x as byte string |
591 | utf8::downgrade $x; # encode $x as byte string |
|
|
592 | |
|
|
593 | More options are available, see "TYPE CASTS", below, and the |
|
|
594 | "text_keys" and "text_strings" options. |
588 | |
595 | |
589 | Perl doesn't define what operations up- and downgrade strings, so if |
596 | Perl doesn't define what operations up- and downgrade strings, so if |
590 | the difference between byte and text is important, you should up- or |
597 | the difference between byte and text is important, you should up- or |
591 | downgrade your string as late as possible before encoding. You can |
598 | downgrade your string as late as possible before encoding. You can |
592 | also force the use of CBOR text strings by using "text_keys" or |
599 | also force the use of CBOR text strings by using "text_keys" or |
… | |
… | |
606 | possible representation. Floating-point values will use either the |
613 | possible representation. Floating-point values will use either the |
607 | IEEE single format if possible without loss of precision, otherwise |
614 | IEEE single format if possible without loss of precision, otherwise |
608 | the IEEE double format will be used. Perls that use formats other |
615 | the IEEE double format will be used. Perls that use formats other |
609 | than IEEE double to represent numerical values are supported, but |
616 | than IEEE double to represent numerical values are supported, but |
610 | might suffer loss of precision. |
617 | might suffer loss of precision. |
|
|
618 | |
|
|
619 | TYPE CASTS |
|
|
620 | EXPERIMENTAL: As an experimental extension, "CBOR::XS" allows you to |
|
|
621 | force specific cbor types to be used when encoding. That allows you to |
|
|
622 | encode types not normally accessible (e.g. half floats) as well as force |
|
|
623 | string types even when "text_strings" is in effect. |
|
|
624 | |
|
|
625 | Type forcing is done by calling a special "cast" function which keeps a |
|
|
626 | copy of the value and returns a new value that can be handed over to any |
|
|
627 | CBOR encoder function. |
|
|
628 | |
|
|
629 | The following casts are currently available (all of which are unary |
|
|
630 | operators): |
|
|
631 | |
|
|
632 | CBOR::XS::as_int $value |
|
|
633 | Forces the value to be encoded as some form of (basic, not bignum) |
|
|
634 | integer type. |
|
|
635 | |
|
|
636 | CBOR::XS::as_text $value |
|
|
637 | Forces the value to be encoded as (UTF-8) text values. |
|
|
638 | |
|
|
639 | CBOR::XS::as_bytes $value |
|
|
640 | Forces the value to be encoded as a (binary) string value. |
|
|
641 | |
|
|
642 | CBOR::XS::as_float16 $value |
|
|
643 | Forces half-float (IEEE 754 binary16) encoding of the given value. |
|
|
644 | |
|
|
645 | CBOR::XS::as_float32 $value |
|
|
646 | Forces single-float (IEEE 754 binary32) encoding of the given value. |
|
|
647 | |
|
|
648 | CBOR::XS::as_float64 $value |
|
|
649 | Forces double-float (IEEE 754 binary64) encoding of the given value. |
|
|
650 | |
|
|
651 | =item, CBOR::XS::as_cbor $cbor_text |
|
|
652 | |
|
|
653 | Bot a type cast per-se, this type cast forces the argument to eb |
|
|
654 | encoded as-is. This can be used to embed pre-encoded CBOR data. |
|
|
655 | |
|
|
656 | Note that no checking on the validity of the $cbor_text is done - |
|
|
657 | it's the callers responsibility to correctly encode values. |
|
|
658 | |
|
|
659 | Example: encode a perl string as binary even though "text_strings" is in |
|
|
660 | effect. |
|
|
661 | |
|
|
662 | CBOR::XS->new->text_strings->encode ([4, "text", CBOR::XS::bytes "bytevalue"]); |
611 | |
663 | |
612 | OBJECT SERIALISATION |
664 | OBJECT SERIALISATION |
613 | This module implements both a CBOR-specific and the generic |
665 | This module implements both a CBOR-specific and the generic |
614 | Types::Serialier object serialisation protocol. The following |
666 | Types::Serialier object serialisation protocol. The following |
615 | subsections explain both methods. |
667 | subsections explain both methods. |
… | |
… | |
907 | ensure that decoded JSON data will round-trip encoding and decoding to |
959 | ensure that decoded JSON data will round-trip encoding and decoding to |
908 | CBOR intact. |
960 | CBOR intact. |
909 | |
961 | |
910 | SECURITY CONSIDERATIONS |
962 | SECURITY CONSIDERATIONS |
911 | Tl;dr... if you want to decode or encode CBOR from untrusted sources, |
963 | Tl;dr... if you want to decode or encode CBOR from untrusted sources, |
912 | you should start with a coder object created via "new_safe": |
964 | you should start with a coder object created via "new_safe" (which |
|
|
965 | implements the mitigations explained below): |
913 | |
966 | |
914 | my $coder = CBOR::XS->new_safe; |
967 | my $coder = CBOR::XS->new_safe; |
915 | |
968 | |
916 | my $data = $coder->decode ($cbor_text); |
969 | my $data = $coder->decode ($cbor_text); |
917 | my $cbor = $coder->encode ($data); |
970 | my $cbor = $coder->encode ($data); |
… | |
… | |
936 | even if all your "THAW" methods are secure, encoding data structures |
989 | even if all your "THAW" methods are secure, encoding data structures |
937 | from untrusted sources can invoke those and trigger bugs in those. |
990 | from untrusted sources can invoke those and trigger bugs in those. |
938 | |
991 | |
939 | So, if you are not sure about the security of all the modules you |
992 | So, if you are not sure about the security of all the modules you |
940 | have loaded (you shouldn't), you should disable this part using |
993 | have loaded (you shouldn't), you should disable this part using |
941 | "forbid_objects". |
994 | "forbid_objects" or using "new_safe". |
942 | |
995 | |
943 | CBOR can be extended with tags that call library code |
996 | CBOR can be extended with tags that call library code |
944 | CBOR can be extended with tags, and "CBOR::XS" has a registry of |
997 | CBOR can be extended with tags, and "CBOR::XS" has a registry of |
945 | conversion functions for many existing tags that can be extended via |
998 | conversion functions for many existing tags that can be extended via |
946 | third-party modules (see the "filter" method). |
999 | third-party modules (see the "filter" method). |
947 | |
1000 | |
948 | If you don't trust these, you should configure the "safe" filter |
1001 | If you don't trust these, you should configure the "safe" filter |
949 | function, "CBOR::XS::safe_filter", which by default only includes |
1002 | function, "CBOR::XS::safe_filter" ("new_safe" does this), which by |
950 | conversion functions that are considered "safe" by the author (but |
1003 | default only includes conversion functions that are considered |
951 | again, they can be extended by third party modules). |
1004 | "safe" by the author (but again, they can be extended by third party |
|
|
1005 | modules). |
952 | |
1006 | |
953 | Depending on your level of paranoia, you can use the "safe" filter: |
1007 | Depending on your level of paranoia, you can use the "safe" filter: |
954 | |
1008 | |
955 | $cbor->filter (\&CBOR::XS::safe_filter); |
1009 | $cbor->filter (\&CBOR::XS::safe_filter); |
956 | |
1010 | |
… | |
… | |
970 | limit the size of CBOR data you accept, or make sure then when your |
1024 | limit the size of CBOR data you accept, or make sure then when your |
971 | resources run out, that's just fine (e.g. by using a separate |
1025 | resources run out, that's just fine (e.g. by using a separate |
972 | process that can crash safely). The size of a CBOR string in octets |
1026 | process that can crash safely). The size of a CBOR string in octets |
973 | is usually a good indication of the size of the resources required |
1027 | is usually a good indication of the size of the resources required |
974 | to decode it into a Perl structure. While CBOR::XS can check the |
1028 | to decode it into a Perl structure. While CBOR::XS can check the |
975 | size of the CBOR text (using "max_size"), it might be too late when |
1029 | size of the CBOR text (using "max_size" - done by "new_safe"), it |
976 | you already have it in memory, so you might want to check the size |
1030 | might be too late when you already have it in memory, so you might |
977 | before you accept the string. |
1031 | want to check the size before you accept the string. |
978 | |
1032 | |
979 | As for encoding, it is possible to construct data structures that |
1033 | As for encoding, it is possible to construct data structures that |
980 | are relatively small but result in large CBOR texts (for example by |
1034 | are relatively small but result in large CBOR texts (for example by |
981 | having an array full of references to the same big data structure, |
1035 | having an array full of references to the same big data structure, |
982 | which will all be deep-cloned during encoding by default). This is |
1036 | which will all be deep-cloned during encoding by default). This is |
… | |
… | |
996 | |
1050 | |
997 | Resource-starving attacks: CPU en-/decoding complexity |
1051 | Resource-starving attacks: CPU en-/decoding complexity |
998 | CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat |
1052 | CBOR::XS will use the Math::BigInt, Math::BigFloat and Math::BigRat |
999 | libraries to represent encode/decode bignums. These can be very slow |
1053 | libraries to represent encode/decode bignums. These can be very slow |
1000 | (as in, centuries of CPU time) and can even crash your program (and |
1054 | (as in, centuries of CPU time) and can even crash your program (and |
1001 | are generally not very trustworthy). See the next section for |
1055 | are generally not very trustworthy). See the next section on bignum |
1002 | details. |
1056 | security for details. |
1003 | |
1057 | |
1004 | Data breaches: leaking information in error messages |
1058 | Data breaches: leaking information in error messages |
1005 | CBOR::XS might leak contents of your Perl data structures in its |
1059 | CBOR::XS might leak contents of your Perl data structures in its |
1006 | error messages, so when you serialise sensitive information you |
1060 | error messages, so when you serialise sensitive information you |
1007 | might want to make sure that exceptions thrown by CBOR::XS will not |
1061 | might want to make sure that exceptions thrown by CBOR::XS will not |
… | |
… | |
1060 | |
1114 | |
1061 | LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT |
1115 | LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT |
1062 | On perls that were built without 64 bit integer support (these are rare |
1116 | On perls that were built without 64 bit integer support (these are rare |
1063 | nowadays, even on 32 bit architectures, as all major Perl distributions |
1117 | nowadays, even on 32 bit architectures, as all major Perl distributions |
1064 | are built with 64 bit integer support), support for any kind of 64 bit |
1118 | are built with 64 bit integer support), support for any kind of 64 bit |
1065 | integer in CBOR is very limited - most likely, these 64 bit values will |
1119 | value in CBOR is very limited - most likely, these 64 bit values will be |
1066 | be truncated, corrupted, or otherwise not decoded correctly. This also |
1120 | truncated, corrupted, or otherwise not decoded correctly. This also |
1067 | includes string, array and map sizes that are stored as 64 bit integers. |
1121 | includes string, float, array and map sizes that are stored as 64 bit |
|
|
1122 | integers. |
1068 | |
1123 | |
1069 | THREADS |
1124 | THREADS |
1070 | This module is *not* guaranteed to be thread safe and there are no plans |
1125 | This module is *not* guaranteed to be thread safe and there are no plans |
1071 | to change this until Perl gets thread support (as opposed to the |
1126 | to change this until Perl gets thread support (as opposed to the |
1072 | horribly slow so-called "threads" which are simply slow and bloated |
1127 | horribly slow so-called "threads" which are simply slow and bloated |