--- CBOR-XS/XS.pm 2013/10/27 20:40:25 1.6 +++ CBOR-XS/XS.pm 2013/10/29 20:59:16 1.14 @@ -28,9 +28,14 @@ =head1 DESCRIPTION -WARNING! THIS IS A PRE-ALPHA RELEASE! IT WILL CRASH, CORRUPT YOUR DATA -AND EAT YOUR CHILDREN! (Actually, apart from being untested and a bit -feature-limited, it might already be useful). +WARNING! This module is very new, and not very well tested (that's up to +you to do). Furthermore, details of the implementation might change freely +before version 1.0. And lastly, the object serialisation protocol depends +on a pending IANA assignment, and until that assignment is official, this +implementation is not interoperable with other implementations (even +future versions of this module) until the assignment is done. + +You are still invited to try out CBOR, and this module. This module converts Perl data structures to the Concise Binary Object Representation (CBOR) and vice versa. CBOR is a fast binary serialisation @@ -38,8 +43,15 @@ can represent something in JSON, you should be able to represent it in CBOR. -This makes it a faster and more compact binary alternative to JSON, with -the added ability of supporting serialising of perl objects. +In short, CBOR is a faster and very compact binary alternative to JSON, +with the added ability of supporting serialisation of Perl objects. (JSON +often compresses better than CBOR though, so if you plan to compress the +data later you might want to compare both formats first). + +To give you a general idea, with texts in the megabyte range, C +usually encodes roughly twice as fast as L or L and +decodes about 15%-30% faster than those. The shorter the data, the worse +L performs in comparison. The primary goal of this module is to be I and the secondary goal is to be I. To reach the latter goal it was written in C. @@ -53,7 +65,7 @@ use common::sense; -our $VERSION = 0.03; +our $VERSION = 0.06; our @ISA = qw(Exporter); our @EXPORT = qw(encode_cbor decode_cbor); @@ -223,16 +235,9 @@ =item CBOR tag 256 (perl object) -The tag value C<256> (TODO: pending iana registration) will be used to -deserialise a Perl object. - -TODO For this to work, the class must be loaded and must have a -C method. The decoder will then call the C method -with the constructor arguments provided by the C method (see -below). - -The C method must return a single value that will then be used -as the deserialised value. +The tag value C<256> (TODO: pending iana registration) will be used +to deserialise a Perl object serialised with C. See L, below, for details. =item CBOR tag 55799 (magic header) @@ -285,8 +290,9 @@ =item CBOR::XS::Tagged objects Objects of this type must be arrays consisting of a single C<[tag, value]> -pair. The (numerical) tag will be encoded as a CBOR tag, the value will be -encoded as appropriate for the value. +pair. The (numerical) tag will be encoded as a CBOR tag, the value will +be encoded as appropriate for the value. You cna use C to +create such objects. =item Types::Serialiser::true, Types::Serialiser::false, Types::Serialiser::error @@ -294,11 +300,10 @@ values, respectively. You can also use C<\1>, C<\0> and C<\undef> directly if you want. -=item blessed objects +=item other blessed objects -Other blessed objects currently need to have a C method. It -will be called on every object that is being serialised, and must return -something that can be encoded in CBOR. +Other blessed objects are serialised via C or C. See +L, below, for details. =item simple scalars @@ -346,8 +351,105 @@ =back +=head2 OBJECT SERIALISATION + +This module knows two way to serialise a Perl object: The CBOR-specific +way, and the generic way. + +Whenever the encoder encounters a Perl object that it cnanot serialise +directly (most of them), it will first look up the C method on +it. + +If it has a C method, it will call it with the object as only +argument, and expects exactly one return value, which it will then +substitute and encode it in the place of the object. + +Otherwise, it will look up the C method. If it exists, it will +call it with the object as first argument, and the constant string C +as the second argument, to distinguish it from other serialisers. + +The C method can return any number of values (i.e. zero or +more). These will be encoded as CBOR perl object, together with the +classname. + +If an object supports neither C nor C, encoding will fail +with an error. + +Objects encoded via C cannot be automatically decoded, but +objects encoded via C can be decoded using the following protocol: + +When an encoded CBOR perl object is encountered by the decoder, it will +look up the C method, by using the stored classname, and will fail +if the method cannot be found. + +After the lookup it will call the C method with the stored classname +as first argument, the constant string C as second argument, and all +values returned by C as remaining arguments. + +=head4 EXAMPLES + +Here is an example C method: + + sub My::Object::TO_CBOR { + my ($obj) = @_; + + ["this is a serialised My::Object object", $obj->{id}] + } + +When a C is encoded to CBOR, it will instead encode a simple +array with two members: a string, and the "object id". Decoding this CBOR +string will yield a normal perl array reference in place of the object. -=head2 MAGIC HEADER +A more useful and practical example would be a serialisation method for +the URI module. CBOR has a custom tag value for URIs, namely 32: + + sub URI::TO_CBOR { + my ($self) = @_; + my $uri = "$self"; # stringify uri + utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string + CBOR::XS::tagged 32, "$_[0]" + } + +This will encode URIs as a UTF-8 string with tag 32, which indicates an +URI. + +Decoding such an URI will not (currently) give you an URI object, but +instead a CBOR::XS::Tagged object with tag number 32 and the string - +exactly what was returned by C. + +To serialise an object so it can automatically be deserialised, you need +to use C and C. To take the URI module as example, this +would be a possible implementation: + + sub URI::FREEZE { + my ($self, $serialiser) = @_; + "$self" # encode url string + } + + sub URI::THAW { + my ($class, $serialiser, $uri) = @_; + + $class->new ($uri) + } + +Unlike C, multiple values can be returned by C. For +example, a C method that returns "type", "id" and "variant" values +would cause an invocation of C with 5 arguments: + + sub My::Object::FREEZE { + my ($self, $serialiser) = @_; + + ($self->{type}, $self->{id}, $self->{variant}) + } + + sub My::Object::THAW { + my ($class, $serialiser, $type, $id, $variant) = @_; + + $class- $type, id => $id, variant => $variant) + } + + +=head1 MAGIC HEADER There is no way to distinguish CBOR from other formats programmatically. To make it easier to distinguish CBOR from other @@ -360,7 +462,96 @@ required. -=head2 CBOR and JSON +=head1 THE CBOR::XS::Tagged CLASS + +CBOR has the concept of tagged values - any CBOR value can be tagged with +a numeric 64 bit number, which are centrally administered. + +C handles a few tags internally when en- or decoding. You can +also create tags yourself by encoding C objects, and the +decoder will create C objects itself when it hits an +unknown tag. + +These objects are simply blessed array references - the first member of +the array being the numerical tag, the second being the value. + +You can interact with C objects in the following ways: + +=over 4 + +=item $tagged = CBOR::XS::tag $tag, $value + +This function(!) creates a new C object using the given +C<$tag> (0..2**64-1) to tag the given C<$value> (which can be any Perl +value that can be encoded in CBOR, including serialisable Perl objects and +C objects). + +=item $tagged->[0] + +=item $tagged->[0] = $new_tag + +=item $tag = $tagged->tag + +=item $new_tag = $tagged->tag ($new_tag) + +Access/mutate the tag. + +=item $tagged->[1] + +=item $tagged->[1] = $new_value + +=item $value = $tagged->value + +=item $new_value = $tagged->value ($new_value) + +Access/mutate the tagged value. + +=back + +=cut + +sub tag($$) { + bless [@_], CBOR::XS::Tagged::; +} + +sub CBOR::XS::Tagged::tag { + $_[0][0] = $_[1] if $#_; + $_[0][0] +} + +sub CBOR::XS::Tagged::value { + $_[0][1] = $_[1] if $#_; + $_[0][1] +} + +=head2 EXAMPLES + +Here are some examples of C uses to tag objects. + +You can look up CBOR tag value and emanings in the IANA registry at +L. + +Prepend a magic header (C<$CBOR::XS::MAGIC>): + + my $cbor = encode_cbor CBOR::XS::tag 55799, $value; + # same as: + my $cbor = $CBOR::XS::MAGIC . encode_cbor $value; + +Serialise some URIs and a regex in an array: + + my $cbor = encode_cbor [ + (CBOR::XS::tag 32, "http://www.nethype.de/"), + (CBOR::XS::tag 32, "http://software.schmorp.de/"), + (CBOR::XS::tag 35, "^[Pp][Ee][Rr][lL]\$"), + ]; + +Wrap CBOR data in CBOR: + + my $cbor_cbor = encode_cbor + CBOR::XS::tag 24, + encode_cbor [1, 2, 3]; + +=head1 CBOR and JSON CBOR is supposed to implement a superset of the JSON data model, and is, with some coercion, able to represent all JSON texts (something that other