--- CBOR-XS/README 2016/02/08 04:37:12 1.16 +++ CBOR-XS/README 2016/04/27 09:40:18 1.17 @@ -207,6 +207,49 @@ This option does not affect "decode" in any way - string references will always be decoded properly if present. + $cbor = $cbor->text_keys ([$enable]) + $enabled = $cbor->get_text_keys + If $enabled is true (or missing), then "encode" will encode all perl + hash keys as CBOR text strings/UTF-8 string, upgrading them as + needed. + + If $enable is false (the default), then "encode" will encode hash + keys normally - upgraded perl strings (strings internally encoded as + UTF-8) as CBOR text strings, and downgraded perl strings as CBOR + byte strings. + + This option does not affect "decode" in any way. + + This option is useful for interoperability with CBOR decoders that + don't treat byte strings as a form of text. It is especially useful + as Perl gives very little control over hash keys. + + Enabling this option can be slow, as all downgraded hash keys that + are encoded need to be scanned and converted to UTF-8. + + $cbor = $cbor->text_strings ([$enable]) + $enabled = $cbor->get_text_strings + This option works similar to "text_keys", above, but works on all + strings (including hash keys), so "text_keys" has no further effect + after enabling "text_strings". + + If $enabled is true (or missing), then "encode" will encode all perl + strings as CBOR text strings/UTF-8 strings, upgrading them as + needed. + + If $enable is false (the default), then "encode" will encode strings + normally (but see "text_keys") - upgraded perl strings (strings + internally encoded as UTF-8) as CBOR text strings, and downgraded + perl strings as CBOR byte strings. + + This option does not affect "decode" in any way. + + This option has similar advantages and disadvantages as "text_keys". + In addition, this option effectively removes the ability to encode + byte strings, which might break some "FREEZE" and "TO_CBOR" methods + that rely on this, such as bignum encoding, so this option is mainly + useful for very simple data. + $cbor = $cbor->validate_utf8 ([$enable]) $enabled = $cbor->get_validate_utf8 If $enable is true (or missing), then "decode" will validate that @@ -219,7 +262,7 @@ If $enable is false (the default), then "decode" will blindly accept UTF-8 data, marking them as valid UTF-8 in the resulting data - structure regardless of whether thats true or not. + structure regardless of whether that's true or not. Perl isn't too happy about corrupted UTF-8 in strings, but should generally not crash or do similarly evil things. Extensions might be @@ -411,7 +454,7 @@ Perl hash references become CBOR maps. As there is no inherent ordering in hash keys (or CBOR maps), they will usually be encoded in a pseudo-random order. This order can be different each time a - hahs is encoded. + hash is encoded. Currently, tied hashes will use the indefinite-length format, while normal hashes will use the fixed-length format. @@ -470,15 +513,18 @@ $x .= ""; # another, more awkward way to stringify print $x; # perl does it for you, too, quite often - You can force whether a string ie encoded as byte or text string by - using "utf8::upgrade" and "utf8::downgrade"): + You can force whether a string is encoded as byte or text string by + using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is + disabled): utf8::upgrade $x; # encode $x as text string utf8::downgrade $x; # encode $x as byte string Perl doesn't define what operations up- and downgrade strings, so if the difference between byte and text is important, you should up- or - downgrade your string as late as possible before encoding. + downgrade your string as late as possible before encoding. You can + also force the use of CBOR text strings by using "text_keys" or + "text_strings". You can force the type to be a CBOR number by numifying it: @@ -583,7 +629,6 @@ sub URI::THAW { my ($class, $serialiser, $uri) = @_; - $class->new ($uri) } @@ -689,7 +734,7 @@ ENFORCED TAGS These tags are always handled when decoding, and their handling cannot - be overriden by the user. + be overridden by the user. 26 (perl-object, ) These tags are automatically created (and decoded) for serialisable @@ -722,8 +767,8 @@ 22098 (indirection, ) This tag is automatically generated when a reference are encountered - (with the exception of hash and array refernces). It is converted to - a reference when decoding. + (with the exception of hash and array references). It is converted + to a reference when decoding. 55799 (self-describe CBOR, RFC 7049) This value is not generated on encoding (unless explicitly requested @@ -731,8 +776,8 @@ NON-ENFORCED TAGS These tags have default filters provided when decoding. Their handling - can be overriden by changing the %CBOR::XS::FILTER entry for the tag, or - by providing a custom "filter" callback when decoding. + can be overridden by changing the %CBOR::XS::FILTER entry for the tag, + or by providing a custom "filter" callback when decoding. When they result in decoding into a specific Perl class, the module usually provides a corresponding "TO_CBOR" method as well. @@ -757,15 +802,23 @@ "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal CBOR integers, and others into positive/negative CBOR bignums. - 4, 5 (decimal fraction/bigfloat) + 4, 5, 264, 265 (decimal fraction/bigfloat) Both decimal fractions and bigfloats are decoded into Math::BigFloat objects. The corresponding "Math::BigFloat::TO_CBOR" method *always* - encodes into a decimal fraction. + encodes into a decimal fraction (either tag 4 or 264). + + NaN and infinities are not encoded properly, as they cannot be + represented in CBOR. - CBOR cannot represent bigfloats with *very* large exponents - - conversion of such big float objects is undefined. + See "BIGNUM SECURITY CONSIDERATIONS" for more info. - Also, NaN and infinities are not encoded properly. + 30 (rational numbers) + These tags are decoded into Math::BigRat objects. The corresponding + "Math::BigRat::TO_CBOR" method encodes rational numbers with + denominator 1 via their numerator only, i.e., they become normal + integers or "bignums". + + See "BIGNUM SECURITY CONSIDERATIONS" for more info. 21, 22, 23 (expected later JSON conversion) CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore @@ -822,6 +875,35 @@ information you might want to make sure that exceptions thrown by CBOR::XS will not end up in front of untrusted eyes. +BIGNUM SECURITY CONSIDERATIONS + CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and + Math::BigFloat that tries to encode the number in the simplest possible + way, that is, either a CBOR integer, a CBOR bigint/decimal fraction (tag + 4) or an arbitrary-exponent decimal fraction (tag 264). Rational numbers + (Math::BigRat, tag 30) can also contain bignums as members. + + CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent + bigfloats (tags 5 and 265), but it will never generate these on its own. + + Using the built-in Math::BigInt::Calc support, encoding and decoding + decimal fractions is generally fast. Decoding bigints can be slow for + very big numbers (tens of thousands of digits, something that could + potentially be caught by limiting the size of CBOR texts), and decoding + bigfloats or arbitrary-exponent bigfloats can be *extremely* slow + (minutes, decades) for large exponents (roughly 40 bit and longer). + + Additionally, Math::BigInt can take advantage of other bignum libraries, + such as Math::GMP, which cannot handle big floats with large exponents, + and might simply abort or crash your program, due to their code quality. + + This can be a concern if you want to parse untrusted CBOR. If it is, you + might want to disable decoding of tag 2 (bigint) and 3 (negative bigint) + types. You should also disable types 5 and 265, as these can be slow + even without bigints. + + Disabling bigints will also partially or fully disable types that rely + on them, e.g. rational numbers that use bignums. + CBOR IMPLEMENTATION NOTES This section contains some random implementation notes. They do not describe guaranteed behaviour, but merely behaviour as-is implemented