[ViewVC] Diff of: cvs/CBOR-XS/README

Comparing CBOR-XS/README (file contents):
Revision 1.9 by root, Fri Nov 22 16:18:59 2013 UTC vs.
Revision 1.13 by root, Sun Jan 5 14:24:54 2014 UTC

         # data was decoded
         substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
      }
 DESCRIPTION
-    WARNING! This module is very new, and not very well tested (that's up to
-    you to do). Furthermore, details of the implementation might change
-    freely before version 1.0. And lastly, most extensions depend on an IANA
-    assignment, and until that assignment is official, this implementation
-    is not interoperable with other implementations (even future versions of
-    this module) until the assignment is done.
-    You are still invited to try out CBOR, and this module.
     This module converts Perl data structures to the Concise Binary Object
     Representation (CBOR) and vice versa. CBOR is a fast binary
-    serialisation format that aims to use a superset of the JSON data model,
+    serialisation format that aims to use an (almost) superset of the JSON
-    i.e. when you can represent something in JSON, you should be able to
+    data model, i.e. when you can represent something useful in JSON, you
-    represent it in CBOR.
+    should be able to represent it in CBOR.
-    In short, CBOR is a faster and very compact binary alternative to JSON,
+    In short, CBOR is a faster and quite compact binary alternative to JSON,
     with the added ability of supporting serialisation of Perl objects.
     (JSON often compresses better than CBOR though, so if you plan to
-    compress the data later you might want to compare both formats first).
+    compress the data later and speed is less important you might want to
+    compare both formats first).
     To give you a general idea about speed, with texts in the megabyte
     range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
     JSON::XS and decodes about 15%-30% faster than those. The shorter the
     data, the worse Storable performs in comparison.
-    As for compactness, "CBOR::XS" encoded data structures are usually about
+    Regarding compactness, "CBOR::XS"-encoded data structures are usually
-    20% smaller than the same data encoded as (compact) JSON or Storable.
+    about 20% smaller than the same data encoded as (compact) JSON or
+    Storable.
     In addition to the core CBOR data format, this module implements a
-    number of extensions, to support cyclic and self-referencing data
+    number of extensions, to support cyclic and shared data structures (see
-    structures (see "allow_sharing"), string deduplication (see
+    "allow_sharing" and "allow_cycles"), string deduplication (see
-    "allow_stringref") and scalar references (always enabled).
+    "pack_strings") and scalar references (always enabled).
     The primary goal of this module is to be *correct* and the secondary
     goal is to be *fast*. To reach the latter goal it was written in C.
     See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
         same object, such as an array, is referenced multiple times), but
         instead will emit a reference to the earlier value.
         This means that such values will only be encoded once, and will not
         result in a deep cloning of the value on decode, in decoders
-        supporting the value sharing extension.
+        supporting the value sharing extension. This also makes it possible
+        to encode cyclic data structures (which need "allow_cycles" to ne
+        enabled to be decoded by this module).
         It is recommended to leave it off unless you know your communication
         partner supports the value sharing extensions to CBOR
-        (http://cbor.schmorp.de/value-sharing).
+        (<http://cbor.schmorp.de/value-sharing>), as without decoder
+        support, the resulting data structure might be unusable.
         Detecting shared values incurs a runtime overhead when values are
         encoded that have a reference counter large than one, and might
         unnecessarily increase the encoded size, as potentially shared
-        values are encode as sharable whether or not they are actually
+        values are encode as shareable whether or not they are actually
         shared.
         At the moment, only targets of references can be shared (e.g.
         scalars, arrays or hashes pointed to by a reference). Weirder
         constructs, such as an array with multiple "copies" of the *same*
         string, which are hard but not impossible to create in Perl, are not
-        supported (this is the same as for Storable).
+        supported (this is the same as with Storable).
-        If $enable is false (the default), then "encode" will encode
+        If $enable is false (the default), then "encode" will encode shared
-        exception when it encounters anything it cannot encode as CBOR.
+        data structures repeatedly, unsharing them in the process. Cyclic
+        data structures cannot be encoded in this mode.
         This option does not affect "decode" in any way - shared values and
         references will always be decoded properly if present.
+    $cbor = $cbor->allow_cycles ([$enable])
+    $enabled = $cbor->get_allow_cycles
+        If $enable is true (or missing), then "decode" will happily decode
+        self-referential (cyclic) data structures. By default these will not
+        be decoded, as they need manual cleanup to avoid memory leaks, so
+        code that isn't prepared for this will not leak memory.
+        If $enable is false (the default), then "decode" will throw an error
+        when it encounters a self-referential/cyclic data structure.
+        This option does not affect "encode" in any way - shared values and
+        references will always be decoded properly if present.
-    $cbor = $cbor->allow_stringref ([$enable])
+    $cbor = $cbor->pack_strings ([$enable])
-    $enabled = $cbor->get_allow_stringref
+    $enabled = $cbor->get_pack_strings
         If $enable is true (or missing), then "encode" will try not to
         encode the same string twice, but will instead encode a reference to
-        the string instead. Depending on your data format. this can save a
+        the string instead. Depending on your data format, this can save a
         lot of space, but also results in a very large runtime overhead
         (expect encoding times to be 2-4 times as high as without).
         It is recommended to leave it off unless you know your
         communications partner supports the stringref extension to CBOR
-        (http://cbor.schmorp.de/stringref).
+        (<http://cbor.schmorp.de/stringref>), as without decoder support,
+        the resulting data structure might not be usable.
-        If $enable is false (the default), then "encode" will encode
+        If $enable is false (the default), then "encode" will encode strings
-        exception when it encounters anything it cannot encode as CBOR.
+        the standard CBOR way.
         This option does not affect "decode" in any way - string references
         will always be decoded properly if present.
+    $cbor = $cbor->validate_utf8 ([$enable])
+    $enabled = $cbor->get_validate_utf8
+        If $enable is true (or missing), then "decode" will validate that
+        elements (text strings) containing UTF-8 data in fact contain valid
+        UTF-8 data (instead of blindly accepting it). This validation
+        obviously takes extra time during decoding.
+        The concept of "valid UTF-8" used is perl's concept, which is a
+        superset of the official UTF-8.
+        If $enable is false (the default), then "decode" will blindly accept
+        UTF-8 data, marking them as valid UTF-8 in the resulting data
+        structure regardless of whether thats true or not.
+        Perl isn't too happy about corrupted UTF-8 in strings, but should
+        generally not crash or do similarly evil things. Extensions might be
+        not so forgiving, so it's recommended to turn on this setting if you
+        receive untrusted CBOR.
+        This option does not affect "encode" in any way - strings that are
+        supposedly valid UTF-8 will simply be dumped into the resulting CBOR
+        string without checking whether that is, in fact, true or not.
     $cbor = $cbor->filter ([$cb->($tag, $value)])
     $cb_or_undef = $cbor->get_filter
         Sets or replaces the tagged value decoding filter (when $cb is
         specified) or clears the filter (if no argument or "undef" is
         it must be a code reference that is called with tag and value, and
         is responsible for decoding the value. If no entry exists, it
         returns no values.
         Example: decode all tags not handled internally into
-        CBOR::XS::Tagged objects, with no other special handling (useful
+        "CBOR::XS::Tagged" objects, with no other special handling (useful
         when working with potentially "unsafe" CBOR data).
            CBOR::XS->new->filter (sub { })->decode ($cbor_data);
         Example: provide a global filter for tag 1347375694, converting the
         the next one starts.
            CBOR::XS->new->decode_prefix ("......")
            => ("...", 3)
+  INCREMENTAL PARSING
+    In some cases, there is the need for incremental parsing of JSON texts.
+    While this module always has to keep both CBOR text and resulting Perl
+    data structure in memory at one time, it does allow you to parse a CBOR
+    stream incrementally, using a similar to using "decode_prefix" to see if
+    a full CBOR object is available, but is much more efficient.
+    It basically works by parsing as much of a CBOR string as possible - if
+    the CBOR data is not complete yet, the pasrer will remember where it
+    was, to be able to restart when more data has been accumulated. Once
+    enough data is available to either decode a complete CBOR value or raise
+    an error, a real decode will be attempted.
+    A typical use case would be a network protocol that consists of sending
+    and receiving CBOR-encoded messages. The solution that works with CBOR
+    and about anything else is by prepending a length to every CBOR value,
+    so the receiver knows how many octets to read. More compact (and
+    slightly slower) would be to just send CBOR values back-to-back, as
+    "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
+    length.
+    The following methods help with this:
+    @decoded = $cbor->incr_parse ($buffer)
+        This method attempts to decode exactly one CBOR value from the
+        beginning of the given $buffer. The value is removed from the
+        $buffer on success. When $buffer doesn't contain a complete value
+        yet, it returns nothing. Finally, when the $buffer doesn't start
+        with something that could ever be a valid CBOR value, it raises an
+        exception, just as "decode" would. In the latter case the decoder
+        state is undefined and must be reset before being able to parse
+        further.
+        This method modifies the $buffer in place. When no CBOR value can be
+        decoded, the decoder stores the current string offset. On the next
+        call, continues decoding at the place where it stopped before. For
+        this to make sense, the $buffer must begin with the same octets as
+        on previous unsuccessful calls.
+        You can call this method in scalar context, in which case it either
+        returns a decoded value or "undef". This makes it impossible to
+        distinguish between CBOR null values (which decode to "undef") and
+        an unsuccessful decode, which is often acceptable.
+    @decoded = $cbor->incr_parse_multiple ($buffer)
+        Same as "incr_parse", but attempts to decode as many CBOR values as
+        possible in one go, instead of at most one. Calls to "incr_parse"
+        and "incr_parse_multiple" can be interleaved.
+    $cbor->incr_reset
+        Resets the incremental decoder. This throws away any saved state, so
+        that subsequent calls to "incr_parse" or "incr_parse_multiple" start
+        to parse a new CBOR value from the beginning of the $buffer again.
+        This method can be caled at any time, but it *must* be called if you
+        want to change your $buffer or there was a decoding error and you
+        want to reuse the $cbor object for future incremental parsings.
 MAPPING
     This section describes how CBOR::XS maps Perl values to CBOR values and
     vice versa. These mappings are designed to "do the right thing" in most
     circumstances automatically, preserving round-tripping characteristics
     (what you put in comes out as something equivalent).
     integers
         CBOR integers become (numeric) perl scalars. On perls without 64 bit
         support, 64 bit integers will be truncated or otherwise corrupted.
     byte strings
-        Byte strings will become octet strings in Perl (the byte values
+        Byte strings will become octet strings in Perl (the Byte values
         0..255 will simply become characters of the same value in Perl).
     UTF-8 strings
         UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
         decoded into proper Unicode code points. At the moment, the validity
     tagged values
         Tagged items consists of a numeric tag and another CBOR value.
         See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
-        for details.
+        for details on which tags are handled how.
     anything else
         Anything else (e.g. unsupported simple values) will raise a decoding
         error.
   PERL -> CBOR
     The mapping from Perl to CBOR is slightly more difficult, as Perl is a
-    truly typeless language, so we can only guess which CBOR type is meant
+    typeless language. That means this module can only guess which CBOR type
-    by a Perl value.
+    is meant by a perl value.
     hash references
         Perl hash references become CBOR maps. As there is no inherent
         ordering in hash keys (or CBOR maps), they will usually be encoded
-        in a pseudo-random order.
+        in a pseudo-random order. This order can be different each time a
+        hahs is encoded.
         Currently, tied hashes will use the indefinite-length format, while
         normal hashes will use the fixed-length format.
     array references
         Perl array references become fixed-length CBOR arrays.
     other references
-        Other unblessed references are generally not allowed and will cause
+        Other unblessed references will be represented using the indirection
-        an exception to be thrown, except for references to the integers 0
+        tag extension (tag value 22098,
-        and 1, which get turned into false and true in CBOR.
+        <http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
+        to be able to decode these values somehow, by either "doing the
+        right thing", decoding into a generic tagged object, simply ignoring
+        the tag, or something else.
     CBOR::XS::Tagged objects
         Objects of this type must be arrays consisting of a single "[tag,
         value]" pair. The (numerical) tag will be encoded as a CBOR tag, the
-        value will be encoded as appropriate for the value. You cna use
+        value will be encoded as appropriate for the value. You must use
         "CBOR::XS::tag" to create such objects.
     Types::Serialiser::true, Types::Serialiser::false,
     Types::Serialiser::error
         These special values become CBOR true, CBOR false and CBOR undefined
            # dump as number
            encode_cbor [2]                      # yields [2]
            encode_cbor [-3.0e17]                # yields [-3e+17]
            my $value = 5; encode_cbor [$value]  # yields [5]
-           # used as string, so dump as string
+           # used as string, so dump as string (either byte or text)
            print $value;
            encode_cbor [$value]                 # yields ["5"]
            # undef becomes null
            encode_cbor [undef]                  # yields [null]
            my $x = 3.1; # some variable containing a number
            "$x";        # stringified
            $x .= "";    # another, more awkward way to stringify
            print $x;    # perl does it for you, too, quite often
+        You can force whether a string ie encoded as byte or text string by
+        using "utf8::upgrade" and "utf8::downgrade"):
+          utf8::upgrade $x;   # encode $x as text string
+          utf8::downgrade $x; # encode $x as byte string
+        Perl doesn't define what operations up- and downgrade strings, so if
+        the difference between byte and text is important, you should up- or
+        downgrade your string as late as possible before encoding.
         You can force the type to be a CBOR number by numifying it:
            my $x = "3"; # some variable containing a string
            $x += 0;     # numify it, ensuring it will be dumped as a number
         the IEEE double format will be used. Perls that use formats other
         than IEEE double to represent numerical values are supported, but
         might suffer loss of precision.
   OBJECT SERIALISATION
+    This module implements both a CBOR-specific and the generic
+    Types::Serialier object serialisation protocol. The following
+    subsections explain both methods.
+   ENCODING
     This module knows two way to serialise a Perl object: The CBOR-specific
     way, and the generic way.
-    Whenever the encoder encounters a Perl object that it cnanot serialise
+    Whenever the encoder encounters a Perl object that it cannot serialise
     directly (most of them), it will first look up the "TO_CBOR" method on
     it.
     If it has a "TO_CBOR" method, it will call it with the object as only
     argument, and expects exactly one return value, which it will then
     "CBOR" as the second argument, to distinguish it from other serialisers.
     The "FREEZE" method can return any number of values (i.e. zero or more).
     These will be encoded as CBOR perl object, together with the classname.
+    These methods *MUST NOT* change the data structure that is being
+    serialised. Failure to comply to this can result in memory corruption -
+    and worse.
     If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail
     with an error.
+   DECODING
-    Objects encoded via "TO_CBOR" cannot be automatically decoded, but
+    Objects encoded via "TO_CBOR" cannot (normally) be automatically
-    objects encoded via "FREEZE" can be decoded using the following
+    decoded, but objects encoded via "FREEZE" can be decoded using the
-    protocol:
+    following protocol:
     When an encoded CBOR perl object is encountered by the decoder, it will
     look up the "THAW" method, by using the stored classname, and will fail
     if the method cannot be found.
       sub URI::TO_CBOR {
          my ($self) = @_;
          my $uri = "$self"; # stringify uri
          utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
-         CBOR::XS::tagged 32, "$_[0]"
+         CBOR::XS::tag 32, "$_[0]"
       }
     This will encode URIs as a UTF-8 string with tag 32, which indicates an
     URI.
   ENFORCED TAGS
     These tags are always handled when decoding, and their handling cannot
     be overriden by the user.
-    <unassigned> (perl-object, <http://cbor.schmorp.de/perl-object>)
+    26 (perl-object, <http://cbor.schmorp.de/perl-object>)
         These tags are automatically created (and decoded) for serialisable
         objects using the "FREEZE/THAW" methods (the Types::Serialier object
         serialisation protocol). See "OBJECT SERIALISATION" for details.
-    <unassigned>, <unassigned> (sharable, sharedref, L
+    28, 29 (shareable, sharedref, L <http://cbor.schmorp.de/value-sharing>)
-    <http://cbor.schmorp.de/value-sharing>)
-        These tags are automatically decoded when encountered, resulting in
+        These tags are automatically decoded when encountered (and they do
+        not result in a cyclic data structure, see "allow_cycles"),
-        shared values in the decoded object. They are only encoded, however,
+        resulting in shared values in the decoded object. They are only
-        when "allow_sharable" is enabled.
+        encoded, however, when "allow_sharing" is enabled.
-    <unassigned>, <unassigned> (stringref-namespace, stringref, L
+        Not all shared values can be successfully decoded: values that
+        reference themselves will *currently* decode as "undef" (this is not
+        the same as a reference pointing to itself, which will be
+        represented as a value that contains an indirect reference to itself
+        - these will be decoded properly).
+        Note that considerably more shared value data structures can be
+        decoded than will be encoded - currently, only values pointed to by
+        references will be shared, others will not. While non-reference
+        shared values can be generated in Perl with some effort, they were
+        considered too unimportant to be supported in the encoder. The
+        decoder, however, will decode these values as shared values.
+    256, 25 (stringref-namespace, stringref, L
     <http://cbor.schmorp.de/stringref>)
         These tags are automatically decoded when encountered. They are only
-        encoded, however, when "allow_stringref" is enabled.
+        encoded, however, when "pack_strings" is enabled.
     22098 (indirection, <http://cbor.schmorp.de/indirection>)
         This tag is automatically generated when a reference are encountered
         (with the exception of hash and array refernces). It is converted to
         a reference when decoding.
     When any of these need to load additional modules that are not part of
     the perl core distribution (e.g. URI), it is (currently) up to the user
     to provide these modules. The decoding usually fails with an exception
     if the required module cannot be loaded.
+    0, 1 (date/time string, seconds since the epoch)
+        These tags are decoded into Time::Piece objects. The corresponding
+        "Time::Piece::TO_CBOR" method always encodes into tag 1 values
+        currently.
+        The Time::Piece API is generally surprisingly bad, and fractional
+        seconds are only accidentally kept intact, so watch out. On the plus
+        side, the module comes with perl since 5.10, which has to count for
+        something.
     2, 3 (positive/negative bignum)
         These tags are decoded into Math::BigInt objects. The corresponding
         "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
         CBOR integers, and others into positive/negative CBOR bignums.
     uses long double to represent floating point values, they might not be
     encoded properly. Half precision types are accepted, but not encoded.
     Strict mode and canonical mode are not implemented.
+LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
+    On perls that were built without 64 bit integer support (these are rare
+    nowadays, even on 32 bit architectures), support for any kind of 64 bit
+    integer in CBOR is very limited - most likely, these 64 bit values will
+    be truncated, corrupted, or otherwise not decoded correctly. This also
+    includes string, array and map sizes that are stored as 64 bit integers.
 THREADS
     This module is *not* guaranteed to be thread safe and there are no plans
     to change this until Perl gets thread support (as opposed to the
     horribly slow so-called "threads" which are simply slow and bloated
     process simulations - use fork, it's *much* faster, cheaper, better).

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing CBOR-XS/README (file contents): Revision 1.9 by root, Fri Nov 22 16:18:59 2013 UTC vs. Revision 1.13 by root, Sun Jan 5 14:24:54 2014 UTC

Diff Legend

Comparing CBOR-XS/README (file contents):
Revision 1.9 by root, Fri Nov 22 16:18:59 2013 UTC vs.
Revision 1.13 by root, Sun Jan 5 14:24:54 2014 UTC