--- CBOR-XS/XS.pm 2013/11/28 12:11:06 1.26 +++ CBOR-XS/XS.pm 2013/11/30 15:23:59 1.29 @@ -28,38 +28,31 @@ =head1 DESCRIPTION -WARNING! This module is very new, and not very well tested (that's up -to you to do). Furthermore, details of the implementation might change -freely before version 1.0. And lastly, most extensions depend on an IANA -assignment, and until that assignment is official, this implementation is -not interoperable with other implementations (even future versions of this -module) until the assignment is done. - -You are still invited to try out CBOR, and this module. - This module converts Perl data structures to the Concise Binary Object Representation (CBOR) and vice versa. CBOR is a fast binary serialisation -format that aims to use a superset of the JSON data model, i.e. when you -can represent something in JSON, you should be able to represent it in -CBOR. +format that aims to use an (almost) superset of the JSON data model, i.e. +when you can represent something useful in JSON, you should be able to +represent it in CBOR. -In short, CBOR is a faster and very compact binary alternative to JSON, +In short, CBOR is a faster and quite compact binary alternative to JSON, with the added ability of supporting serialisation of Perl objects. (JSON often compresses better than CBOR though, so if you plan to compress the -data later you might want to compare both formats first). +data later and speed is less important you might want to compare both +formats first). To give you a general idea about speed, with texts in the megabyte range, C usually encodes roughly twice as fast as L or L and decodes about 15%-30% faster than those. The shorter the data, the worse L performs in comparison. -As for compactness, C encoded data structures are usually about -20% smaller than the same data encoded as (compact) JSON or L. - -In addition to the core CBOR data format, this module implements a number -of extensions, to support cyclic and self-referencing data structures -(see C), string deduplication (see C) and -scalar references (always enabled). +Regarding compactness, C-encoded data structures are usually +about 20% smaller than the same data encoded as (compact) JSON or +L. + +In addition to the core CBOR data format, this module implements a +number of extensions, to support cyclic and shared data structures (see +C), string deduplication (see C) and scalar +references (always enabled). The primary goal of this module is to be I and the secondary goal is to be I. To reach the latter goal it was written in C. @@ -73,7 +66,7 @@ use common::sense; -our $VERSION = 0.09; +our $VERSION = '1.0'; our @ISA = qw(Exporter); our @EXPORT = qw(encode_cbor decode_cbor); @@ -261,7 +254,7 @@ a code reference that is called with tag and value, and is responsible for decoding the value. If no entry exists, it returns no values. -Example: decode all tags not handled internally into CBOR::XS::Tagged +Example: decode all tags not handled internally into C objects, with no other special handling (useful when working with potentially "unsafe" CBOR data). @@ -325,7 +318,7 @@ =item byte strings -Byte strings will become octet strings in Perl (the byte values 0..255 +Byte strings will become octet strings in Perl (the Byte values 0..255 will simply become characters of the same value in Perl). =item UTF-8 strings @@ -358,7 +351,7 @@ Tagged items consists of a numeric tag and another CBOR value. See L and the description of C<< ->filter >> -for details. +for details on which tags are handled how. =item anything else @@ -371,8 +364,8 @@ =head2 PERL -> CBOR The mapping from Perl to CBOR is slightly more difficult, as Perl is a -truly typeless language, so we can only guess which CBOR type is meant by -a Perl value. +typeless language. That means this module can only guess which CBOR type +is meant by a perl value. =over 4 @@ -380,7 +373,7 @@ Perl hash references become CBOR maps. As there is no inherent ordering in hash keys (or CBOR maps), they will usually be encoded in a pseudo-random -order. +order. This order can be different each time a hahs is encoded. Currently, tied hashes will use the indefinite-length format, while normal hashes will use the fixed-length format. @@ -391,15 +384,18 @@ =item other references -Other unblessed references are generally not allowed and will cause an -exception to be thrown, except for references to the integers C<0> and -C<1>, which get turned into false and true in CBOR. +Other unblessed references will be represented using +the indirection tag extension (tag value C<22098>, +L). CBOR decoders are guaranteed +to be able to decode these values somehow, by either "doing the right +thing", decoding into a generic tagged object, simply ignoring the tag, or +something else. =item CBOR::XS::Tagged objects Objects of this type must be arrays consisting of a single C<[tag, value]> pair. The (numerical) tag will be encoded as a CBOR tag, the value will -be encoded as appropriate for the value. You cna use C to +be encoded as appropriate for the value. You must use C to create such objects. =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error @@ -426,7 +422,7 @@ encode_cbor [-3.0e17] # yields [-3e+17] my $value = 5; encode_cbor [$value] # yields [5] - # used as string, so dump as string + # used as string, so dump as string (either byte or text) print $value; encode_cbor [$value] # yields ["5"] @@ -440,6 +436,16 @@ $x .= ""; # another, more awkward way to stringify print $x; # perl does it for you, too, quite often +You can force whether a string ie encoded as byte or text string by using +C and C): + + utf8::upgrade $x; # encode $x as text string + utf8::downgrade $x; # encode $x as byte string + +Perl doesn't define what operations up- and downgrade strings, so if the +difference between byte and text is important, you should up- or downgrade +your string as late as possible before encoding. + You can force the type to be a CBOR number by numifying it: my $x = "3"; # some variable containing a string @@ -461,10 +467,16 @@ =head2 OBJECT SERIALISATION +This module implements both a CBOR-specific and the generic +L object serialisation protocol. The following +subsections explain both methods. + +=head3 ENCODING + This module knows two way to serialise a Perl object: The CBOR-specific way, and the generic way. -Whenever the encoder encounters a Perl object that it cnanot serialise +Whenever the encoder encounters a Perl object that it cannot serialise directly (most of them), it will first look up the C method on it. @@ -480,11 +492,18 @@ more). These will be encoded as CBOR perl object, together with the classname. +These methods I change the data structure that is being +serialised. Failure to comply to this can result in memory corruption - +and worse. + If an object supports neither C nor C, encoding will fail with an error. -Objects encoded via C cannot be automatically decoded, but -objects encoded via C can be decoded using the following protocol: +=head3 DECODING + +Objects encoded via C cannot (normally) be automatically decoded, +but objects encoded via C can be decoded using the following +protocol: When an encoded CBOR perl object is encountered by the decoder, it will look up the C method, by using the stored classname, and will fail @@ -494,7 +513,7 @@ as first argument, the constant string C as second argument, and all values returned by C as remaining arguments. -=head4 EXAMPLES +=head3 EXAMPLES Here is an example C method: @@ -515,7 +534,7 @@ my ($self) = @_; my $uri = "$self"; # stringify uri utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string - CBOR::XS::tagged 32, "$_[0]" + CBOR::XS::tag 32, "$_[0]" } This will encode URIs as a UTF-8 string with tag 32, which indicates an