--- CBOR-XS/README	2013/11/22 16:18:59	1.9
+++ CBOR-XS/README	2016/04/27 09:40:18	1.17
@@ -23,38 +23,31 @@
      }
 
 DESCRIPTION
-    WARNING! This module is very new, and not very well tested (that's up to
-    you to do). Furthermore, details of the implementation might change
-    freely before version 1.0. And lastly, most extensions depend on an IANA
-    assignment, and until that assignment is official, this implementation
-    is not interoperable with other implementations (even future versions of
-    this module) until the assignment is done.
-
-    You are still invited to try out CBOR, and this module.
-
     This module converts Perl data structures to the Concise Binary Object
     Representation (CBOR) and vice versa. CBOR is a fast binary
-    serialisation format that aims to use a superset of the JSON data model,
-    i.e. when you can represent something in JSON, you should be able to
-    represent it in CBOR.
+    serialisation format that aims to use an (almost) superset of the JSON
+    data model, i.e. when you can represent something useful in JSON, you
+    should be able to represent it in CBOR.
 
-    In short, CBOR is a faster and very compact binary alternative to JSON,
+    In short, CBOR is a faster and quite compact binary alternative to JSON,
     with the added ability of supporting serialisation of Perl objects.
     (JSON often compresses better than CBOR though, so if you plan to
-    compress the data later you might want to compare both formats first).
+    compress the data later and speed is less important you might want to
+    compare both formats first).
 
     To give you a general idea about speed, with texts in the megabyte
     range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
     JSON::XS and decodes about 15%-30% faster than those. The shorter the
     data, the worse Storable performs in comparison.
 
-    As for compactness, "CBOR::XS" encoded data structures are usually about
-    20% smaller than the same data encoded as (compact) JSON or Storable.
+    Regarding compactness, "CBOR::XS"-encoded data structures are usually
+    about 20% smaller than the same data encoded as (compact) JSON or
+    Storable.
 
     In addition to the core CBOR data format, this module implements a
-    number of extensions, to support cyclic and self-referencing data
-    structures (see "allow_sharing"), string deduplication (see
-    "allow_stringref") and scalar references (always enabled).
+    number of extensions, to support cyclic and shared data structures (see
+    "allow_sharing" and "allow_cycles"), string deduplication (see
+    "pack_strings") and scalar references (always enabled).
 
     The primary goal of this module is to be *correct* and the secondary
     goal is to be *fast*. To reach the latter goal it was written in C.
@@ -149,48 +142,137 @@
 
         This means that such values will only be encoded once, and will not
         result in a deep cloning of the value on decode, in decoders
-        supporting the value sharing extension.
+        supporting the value sharing extension. This also makes it possible
+        to encode cyclic data structures (which need "allow_cycles" to ne
+        enabled to be decoded by this module).
 
         It is recommended to leave it off unless you know your communication
         partner supports the value sharing extensions to CBOR
-        (http://cbor.schmorp.de/value-sharing).
+        (<http://cbor.schmorp.de/value-sharing>), as without decoder
+        support, the resulting data structure might be unusable.
 
         Detecting shared values incurs a runtime overhead when values are
         encoded that have a reference counter large than one, and might
         unnecessarily increase the encoded size, as potentially shared
-        values are encode as sharable whether or not they are actually
+        values are encode as shareable whether or not they are actually
         shared.
 
         At the moment, only targets of references can be shared (e.g.
         scalars, arrays or hashes pointed to by a reference). Weirder
         constructs, such as an array with multiple "copies" of the *same*
         string, which are hard but not impossible to create in Perl, are not
-        supported (this is the same as for Storable).
+        supported (this is the same as with Storable).
 
-        If $enable is false (the default), then "encode" will encode
-        exception when it encounters anything it cannot encode as CBOR.
+        If $enable is false (the default), then "encode" will encode shared
+        data structures repeatedly, unsharing them in the process. Cyclic
+        data structures cannot be encoded in this mode.
 
         This option does not affect "decode" in any way - shared values and
         references will always be decoded properly if present.
 
-    $cbor = $cbor->allow_stringref ([$enable])
-    $enabled = $cbor->get_allow_stringref
+    $cbor = $cbor->allow_cycles ([$enable])
+    $enabled = $cbor->get_allow_cycles
+        If $enable is true (or missing), then "decode" will happily decode
+        self-referential (cyclic) data structures. By default these will not
+        be decoded, as they need manual cleanup to avoid memory leaks, so
+        code that isn't prepared for this will not leak memory.
+
+        If $enable is false (the default), then "decode" will throw an error
+        when it encounters a self-referential/cyclic data structure.
+
+        FUTURE DIRECTION: the motivation behind this option is to avoid
+        *real* cycles - future versions of this module might chose to decode
+        cyclic data structures using weak references when this option is
+        off, instead of throwing an error.
+
+        This option does not affect "encode" in any way - shared values and
+        references will always be encoded properly if present.
+
+    $cbor = $cbor->pack_strings ([$enable])
+    $enabled = $cbor->get_pack_strings
         If $enable is true (or missing), then "encode" will try not to
         encode the same string twice, but will instead encode a reference to
-        the string instead. Depending on your data format. this can save a
+        the string instead. Depending on your data format, this can save a
         lot of space, but also results in a very large runtime overhead
         (expect encoding times to be 2-4 times as high as without).
 
         It is recommended to leave it off unless you know your
         communications partner supports the stringref extension to CBOR
-        (http://cbor.schmorp.de/stringref).
+        (<http://cbor.schmorp.de/stringref>), as without decoder support,
+        the resulting data structure might not be usable.
 
-        If $enable is false (the default), then "encode" will encode
-        exception when it encounters anything it cannot encode as CBOR.
+        If $enable is false (the default), then "encode" will encode strings
+        the standard CBOR way.
 
         This option does not affect "decode" in any way - string references
         will always be decoded properly if present.
 
+    $cbor = $cbor->text_keys ([$enable])
+    $enabled = $cbor->get_text_keys
+        If $enabled is true (or missing), then "encode" will encode all perl
+        hash keys as CBOR text strings/UTF-8 string, upgrading them as
+        needed.
+
+        If $enable is false (the default), then "encode" will encode hash
+        keys normally - upgraded perl strings (strings internally encoded as
+        UTF-8) as CBOR text strings, and downgraded perl strings as CBOR
+        byte strings.
+
+        This option does not affect "decode" in any way.
+
+        This option is useful for interoperability with CBOR decoders that
+        don't treat byte strings as a form of text. It is especially useful
+        as Perl gives very little control over hash keys.
+
+        Enabling this option can be slow, as all downgraded hash keys that
+        are encoded need to be scanned and converted to UTF-8.
+
+    $cbor = $cbor->text_strings ([$enable])
+    $enabled = $cbor->get_text_strings
+        This option works similar to "text_keys", above, but works on all
+        strings (including hash keys), so "text_keys" has no further effect
+        after enabling "text_strings".
+
+        If $enabled is true (or missing), then "encode" will encode all perl
+        strings as CBOR text strings/UTF-8 strings, upgrading them as
+        needed.
+
+        If $enable is false (the default), then "encode" will encode strings
+        normally (but see "text_keys") - upgraded perl strings (strings
+        internally encoded as UTF-8) as CBOR text strings, and downgraded
+        perl strings as CBOR byte strings.
+
+        This option does not affect "decode" in any way.
+
+        This option has similar advantages and disadvantages as "text_keys".
+        In addition, this option effectively removes the ability to encode
+        byte strings, which might break some "FREEZE" and "TO_CBOR" methods
+        that rely on this, such as bignum encoding, so this option is mainly
+        useful for very simple data.
+
+    $cbor = $cbor->validate_utf8 ([$enable])
+    $enabled = $cbor->get_validate_utf8
+        If $enable is true (or missing), then "decode" will validate that
+        elements (text strings) containing UTF-8 data in fact contain valid
+        UTF-8 data (instead of blindly accepting it). This validation
+        obviously takes extra time during decoding.
+
+        The concept of "valid UTF-8" used is perl's concept, which is a
+        superset of the official UTF-8.
+
+        If $enable is false (the default), then "decode" will blindly accept
+        UTF-8 data, marking them as valid UTF-8 in the resulting data
+        structure regardless of whether that's true or not.
+
+        Perl isn't too happy about corrupted UTF-8 in strings, but should
+        generally not crash or do similarly evil things. Extensions might be
+        not so forgiving, so it's recommended to turn on this setting if you
+        receive untrusted CBOR.
+
+        This option does not affect "encode" in any way - strings that are
+        supposedly valid UTF-8 will simply be dumped into the resulting CBOR
+        string without checking whether that is, in fact, true or not.
+
     $cbor = $cbor->filter ([$cb->($tag, $value)])
     $cb_or_undef = $cbor->get_filter
         Sets or replaces the tagged value decoding filter (when $cb is
@@ -220,7 +302,7 @@
         returns no values.
 
         Example: decode all tags not handled internally into
-        CBOR::XS::Tagged objects, with no other special handling (useful
+        "CBOR::XS::Tagged" objects, with no other special handling (useful
         when working with potentially "unsafe" CBOR data).
 
            CBOR::XS->new->filter (sub { })->decode ($cbor_data);
@@ -255,6 +337,64 @@
            CBOR::XS->new->decode_prefix ("......")
            => ("...", 3)
 
+  INCREMENTAL PARSING
+    In some cases, there is the need for incremental parsing of JSON texts.
+    While this module always has to keep both CBOR text and resulting Perl
+    data structure in memory at one time, it does allow you to parse a CBOR
+    stream incrementally, using a similar to using "decode_prefix" to see if
+    a full CBOR object is available, but is much more efficient.
+
+    It basically works by parsing as much of a CBOR string as possible - if
+    the CBOR data is not complete yet, the pasrer will remember where it
+    was, to be able to restart when more data has been accumulated. Once
+    enough data is available to either decode a complete CBOR value or raise
+    an error, a real decode will be attempted.
+
+    A typical use case would be a network protocol that consists of sending
+    and receiving CBOR-encoded messages. The solution that works with CBOR
+    and about anything else is by prepending a length to every CBOR value,
+    so the receiver knows how many octets to read. More compact (and
+    slightly slower) would be to just send CBOR values back-to-back, as
+    "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
+    length.
+
+    The following methods help with this:
+
+    @decoded = $cbor->incr_parse ($buffer)
+        This method attempts to decode exactly one CBOR value from the
+        beginning of the given $buffer. The value is removed from the
+        $buffer on success. When $buffer doesn't contain a complete value
+        yet, it returns nothing. Finally, when the $buffer doesn't start
+        with something that could ever be a valid CBOR value, it raises an
+        exception, just as "decode" would. In the latter case the decoder
+        state is undefined and must be reset before being able to parse
+        further.
+
+        This method modifies the $buffer in place. When no CBOR value can be
+        decoded, the decoder stores the current string offset. On the next
+        call, continues decoding at the place where it stopped before. For
+        this to make sense, the $buffer must begin with the same octets as
+        on previous unsuccessful calls.
+
+        You can call this method in scalar context, in which case it either
+        returns a decoded value or "undef". This makes it impossible to
+        distinguish between CBOR null values (which decode to "undef") and
+        an unsuccessful decode, which is often acceptable.
+
+    @decoded = $cbor->incr_parse_multiple ($buffer)
+        Same as "incr_parse", but attempts to decode as many CBOR values as
+        possible in one go, instead of at most one. Calls to "incr_parse"
+        and "incr_parse_multiple" can be interleaved.
+
+    $cbor->incr_reset
+        Resets the incremental decoder. This throws away any saved state, so
+        that subsequent calls to "incr_parse" or "incr_parse_multiple" start
+        to parse a new CBOR value from the beginning of the $buffer again.
+
+        This method can be caled at any time, but it *must* be called if you
+        want to change your $buffer or there was a decoding error and you
+        want to reuse the $cbor object for future incremental parsings.
+
 MAPPING
     This section describes how CBOR::XS maps Perl values to CBOR values and
     vice versa. These mappings are designed to "do the right thing" in most
@@ -271,7 +411,7 @@
         support, 64 bit integers will be truncated or otherwise corrupted.
 
     byte strings
-        Byte strings will become octet strings in Perl (the byte values
+        Byte strings will become octet strings in Perl (the Byte values
         0..255 will simply become characters of the same value in Perl).
 
     UTF-8 strings
@@ -299,7 +439,7 @@
         Tagged items consists of a numeric tag and another CBOR value.
 
         See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
-        for details.
+        for details on which tags are handled how.
 
     anything else
         Anything else (e.g. unsupported simple values) will raise a decoding
@@ -307,13 +447,14 @@
 
   PERL -> CBOR
     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
-    truly typeless language, so we can only guess which CBOR type is meant
-    by a Perl value.
+    typeless language. That means this module can only guess which CBOR type
+    is meant by a perl value.
 
     hash references
         Perl hash references become CBOR maps. As there is no inherent
         ordering in hash keys (or CBOR maps), they will usually be encoded
-        in a pseudo-random order.
+        in a pseudo-random order. This order can be different each time a
+        hash is encoded.
 
         Currently, tied hashes will use the indefinite-length format, while
         normal hashes will use the fixed-length format.
@@ -322,14 +463,17 @@
         Perl array references become fixed-length CBOR arrays.
 
     other references
-        Other unblessed references are generally not allowed and will cause
-        an exception to be thrown, except for references to the integers 0
-        and 1, which get turned into false and true in CBOR.
+        Other unblessed references will be represented using the indirection
+        tag extension (tag value 22098,
+        <http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
+        to be able to decode these values somehow, by either "doing the
+        right thing", decoding into a generic tagged object, simply ignoring
+        the tag, or something else.
 
     CBOR::XS::Tagged objects
         Objects of this type must be arrays consisting of a single "[tag,
         value]" pair. The (numerical) tag will be encoded as a CBOR tag, the
-        value will be encoded as appropriate for the value. You cna use
+        value will be encoded as appropriate for the value. You must use
         "CBOR::XS::tag" to create such objects.
 
     Types::Serialiser::true, Types::Serialiser::false,
@@ -355,7 +499,7 @@
            encode_cbor [-3.0e17]                # yields [-3e+17]
            my $value = 5; encode_cbor [$value]  # yields [5]
 
-           # used as string, so dump as string
+           # used as string, so dump as string (either byte or text)
            print $value;
            encode_cbor [$value]                 # yields ["5"]
 
@@ -369,6 +513,19 @@
            $x .= "";    # another, more awkward way to stringify
            print $x;    # perl does it for you, too, quite often
 
+        You can force whether a string is encoded as byte or text string by
+        using "utf8::upgrade" and "utf8::downgrade" (if "text_strings" is
+        disabled):
+
+          utf8::upgrade $x;   # encode $x as text string
+          utf8::downgrade $x; # encode $x as byte string
+
+        Perl doesn't define what operations up- and downgrade strings, so if
+        the difference between byte and text is important, you should up- or
+        downgrade your string as late as possible before encoding. You can
+        also force the use of CBOR text strings by using "text_keys" or
+        "text_strings".
+
         You can force the type to be a CBOR number by numifying it:
 
            my $x = "3"; # some variable containing a string
@@ -387,10 +544,15 @@
         might suffer loss of precision.
 
   OBJECT SERIALISATION
+    This module implements both a CBOR-specific and the generic
+    Types::Serialier object serialisation protocol. The following
+    subsections explain both methods.
+
+   ENCODING
     This module knows two way to serialise a Perl object: The CBOR-specific
     way, and the generic way.
 
-    Whenever the encoder encounters a Perl object that it cnanot serialise
+    Whenever the encoder encounters a Perl object that it cannot serialise
     directly (most of them), it will first look up the "TO_CBOR" method on
     it.
 
@@ -405,12 +567,17 @@
     The "FREEZE" method can return any number of values (i.e. zero or more).
     These will be encoded as CBOR perl object, together with the classname.
 
+    These methods *MUST NOT* change the data structure that is being
+    serialised. Failure to comply to this can result in memory corruption -
+    and worse.
+
     If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail
     with an error.
 
-    Objects encoded via "TO_CBOR" cannot be automatically decoded, but
-    objects encoded via "FREEZE" can be decoded using the following
-    protocol:
+   DECODING
+    Objects encoded via "TO_CBOR" cannot (normally) be automatically
+    decoded, but objects encoded via "FREEZE" can be decoded using the
+    following protocol:
 
     When an encoded CBOR perl object is encountered by the decoder, it will
     look up the "THAW" method, by using the stored classname, and will fail
@@ -441,7 +608,7 @@
          my ($self) = @_;
          my $uri = "$self"; # stringify uri
          utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
-         CBOR::XS::tagged 32, "$_[0]"
+         CBOR::XS::tag 32, "$_[0]"
       }
 
     This will encode URIs as a UTF-8 string with tag 32, which indicates an
@@ -462,7 +629,6 @@
 
        sub URI::THAW {
           my ($class, $serialiser, $uri) = @_;
-
           $class->new ($uri)
        }
 
@@ -568,28 +734,41 @@
 
   ENFORCED TAGS
     These tags are always handled when decoding, and their handling cannot
-    be overriden by the user.
+    be overridden by the user.
 
-    <unassigned> (perl-object, <http://cbor.schmorp.de/perl-object>)
+    26 (perl-object, <http://cbor.schmorp.de/perl-object>)
         These tags are automatically created (and decoded) for serialisable
         objects using the "FREEZE/THAW" methods (the Types::Serialier object
         serialisation protocol). See "OBJECT SERIALISATION" for details.
 
-    <unassigned>, <unassigned> (sharable, sharedref, L
-    <http://cbor.schmorp.de/value-sharing>)
-        These tags are automatically decoded when encountered, resulting in
-        shared values in the decoded object. They are only encoded, however,
-        when "allow_sharable" is enabled.
+    28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
+        These tags are automatically decoded when encountered (and they do
+        not result in a cyclic data structure, see "allow_cycles"),
+        resulting in shared values in the decoded object. They are only
+        encoded, however, when "allow_sharing" is enabled.
+
+        Not all shared values can be successfully decoded: values that
+        reference themselves will *currently* decode as "undef" (this is not
+        the same as a reference pointing to itself, which will be
+        represented as a value that contains an indirect reference to itself
+        - these will be decoded properly).
+
+        Note that considerably more shared value data structures can be
+        decoded than will be encoded - currently, only values pointed to by
+        references will be shared, others will not. While non-reference
+        shared values can be generated in Perl with some effort, they were
+        considered too unimportant to be supported in the encoder. The
+        decoder, however, will decode these values as shared values.
 
-    <unassigned>, <unassigned> (stringref-namespace, stringref, L
+    256, 25 (stringref-namespace, stringref,
     <http://cbor.schmorp.de/stringref>)
         These tags are automatically decoded when encountered. They are only
-        encoded, however, when "allow_stringref" is enabled.
+        encoded, however, when "pack_strings" is enabled.
 
     22098 (indirection, <http://cbor.schmorp.de/indirection>)
         This tag is automatically generated when a reference are encountered
-        (with the exception of hash and array refernces). It is converted to
-        a reference when decoding.
+        (with the exception of hash and array references). It is converted
+        to a reference when decoding.
 
     55799 (self-describe CBOR, RFC 7049)
         This value is not generated on encoding (unless explicitly requested
@@ -597,8 +776,8 @@
 
   NON-ENFORCED TAGS
     These tags have default filters provided when decoding. Their handling
-    can be overriden by changing the %CBOR::XS::FILTER entry for the tag, or
-    by providing a custom "filter" callback when decoding.
+    can be overridden by changing the %CBOR::XS::FILTER entry for the tag,
+    or by providing a custom "filter" callback when decoding.
 
     When they result in decoding into a specific Perl class, the module
     usually provides a corresponding "TO_CBOR" method as well.
@@ -608,20 +787,38 @@
     to provide these modules. The decoding usually fails with an exception
     if the required module cannot be loaded.
 
+    0, 1 (date/time string, seconds since the epoch)
+        These tags are decoded into Time::Piece objects. The corresponding
+        "Time::Piece::TO_CBOR" method always encodes into tag 1 values
+        currently.
+
+        The Time::Piece API is generally surprisingly bad, and fractional
+        seconds are only accidentally kept intact, so watch out. On the plus
+        side, the module comes with perl since 5.10, which has to count for
+        something.
+
     2, 3 (positive/negative bignum)
         These tags are decoded into Math::BigInt objects. The corresponding
         "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
         CBOR integers, and others into positive/negative CBOR bignums.
 
-    4, 5 (decimal fraction/bigfloat)
+    4, 5, 264, 265 (decimal fraction/bigfloat)
         Both decimal fractions and bigfloats are decoded into Math::BigFloat
         objects. The corresponding "Math::BigFloat::TO_CBOR" method *always*
-        encodes into a decimal fraction.
+        encodes into a decimal fraction (either tag 4 or 264).
+
+        NaN and infinities are not encoded properly, as they cannot be
+        represented in CBOR.
 
-        CBOR cannot represent bigfloats with *very* large exponents -
-        conversion of such big float objects is undefined.
+        See "BIGNUM SECURITY CONSIDERATIONS" for more info.
 
-        Also, NaN and infinities are not encoded properly.
+    30 (rational numbers)
+        These tags are decoded into Math::BigRat objects. The corresponding
+        "Math::BigRat::TO_CBOR" method encodes rational numbers with
+        denominator 1 via their numerator only, i.e., they become normal
+        integers or "bignums".
+
+        See "BIGNUM SECURITY CONSIDERATIONS" for more info.
 
     21, 22, 23 (expected later JSON conversion)
         CBOR::XS is not a CBOR-to-JSON converter, and will simply ignore
@@ -678,6 +875,35 @@
     information you might want to make sure that exceptions thrown by
     CBOR::XS will not end up in front of untrusted eyes.
 
+BIGNUM SECURITY CONSIDERATIONS
+    CBOR::XS provides a "TO_CBOR" method for both Math::BigInt and
+    Math::BigFloat that tries to encode the number in the simplest possible
+    way, that is, either a CBOR integer, a CBOR bigint/decimal fraction (tag
+    4) or an arbitrary-exponent decimal fraction (tag 264). Rational numbers
+    (Math::BigRat, tag 30) can also contain bignums as members.
+
+    CBOR::XS will also understand base-2 bigfloat or arbitrary-exponent
+    bigfloats (tags 5 and 265), but it will never generate these on its own.
+
+    Using the built-in Math::BigInt::Calc support, encoding and decoding
+    decimal fractions is generally fast. Decoding bigints can be slow for
+    very big numbers (tens of thousands of digits, something that could
+    potentially be caught by limiting the size of CBOR texts), and decoding
+    bigfloats or arbitrary-exponent bigfloats can be *extremely* slow
+    (minutes, decades) for large exponents (roughly 40 bit and longer).
+
+    Additionally, Math::BigInt can take advantage of other bignum libraries,
+    such as Math::GMP, which cannot handle big floats with large exponents,
+    and might simply abort or crash your program, due to their code quality.
+
+    This can be a concern if you want to parse untrusted CBOR. If it is, you
+    might want to disable decoding of tag 2 (bigint) and 3 (negative bigint)
+    types. You should also disable types 5 and 265, as these can be slow
+    even without bigints.
+
+    Disabling bigints will also partially or fully disable types that rely
+    on them, e.g. rational numbers that use bignums.
+
 CBOR IMPLEMENTATION NOTES
     This section contains some random implementation notes. They do not
     describe guaranteed behaviour, but merely behaviour as-is implemented
@@ -695,6 +921,14 @@
 
     Strict mode and canonical mode are not implemented.
 
+LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
+    On perls that were built without 64 bit integer support (these are rare
+    nowadays, even on 32 bit architectures, as all major Perl distributions
+    are built with 64 bit integer support), support for any kind of 64 bit
+    integer in CBOR is very limited - most likely, these 64 bit values will
+    be truncated, corrupted, or otherwise not decoded correctly. This also
+    includes string, array and map sizes that are stored as 64 bit integers.
+
 THREADS
     This module is *not* guaranteed to be thread safe and there are no plans
     to change this until Perl gets thread support (as opposed to the