ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/README
(Generate patch)

Comparing CBOR-XS/README (file contents):
Revision 1.9 by root, Fri Nov 22 16:18:59 2013 UTC vs.
Revision 1.11 by root, Sat Nov 30 18:42:27 2013 UTC

21 # data was decoded 21 # data was decoded
22 substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string 22 substr $many_cbor_strings, 0, $length, ""; # remove decoded cbor string
23 } 23 }
24 24
25DESCRIPTION 25DESCRIPTION
26 WARNING! This module is very new, and not very well tested (that's up to
27 you to do). Furthermore, details of the implementation might change
28 freely before version 1.0. And lastly, most extensions depend on an IANA
29 assignment, and until that assignment is official, this implementation
30 is not interoperable with other implementations (even future versions of
31 this module) until the assignment is done.
32
33 You are still invited to try out CBOR, and this module.
34
35 This module converts Perl data structures to the Concise Binary Object 26 This module converts Perl data structures to the Concise Binary Object
36 Representation (CBOR) and vice versa. CBOR is a fast binary 27 Representation (CBOR) and vice versa. CBOR is a fast binary
37 serialisation format that aims to use a superset of the JSON data model, 28 serialisation format that aims to use an (almost) superset of the JSON
38 i.e. when you can represent something in JSON, you should be able to 29 data model, i.e. when you can represent something useful in JSON, you
39 represent it in CBOR. 30 should be able to represent it in CBOR.
40 31
41 In short, CBOR is a faster and very compact binary alternative to JSON, 32 In short, CBOR is a faster and quite compact binary alternative to JSON,
42 with the added ability of supporting serialisation of Perl objects. 33 with the added ability of supporting serialisation of Perl objects.
43 (JSON often compresses better than CBOR though, so if you plan to 34 (JSON often compresses better than CBOR though, so if you plan to
44 compress the data later you might want to compare both formats first). 35 compress the data later and speed is less important you might want to
36 compare both formats first).
45 37
46 To give you a general idea about speed, with texts in the megabyte 38 To give you a general idea about speed, with texts in the megabyte
47 range, "CBOR::XS" usually encodes roughly twice as fast as Storable or 39 range, "CBOR::XS" usually encodes roughly twice as fast as Storable or
48 JSON::XS and decodes about 15%-30% faster than those. The shorter the 40 JSON::XS and decodes about 15%-30% faster than those. The shorter the
49 data, the worse Storable performs in comparison. 41 data, the worse Storable performs in comparison.
50 42
51 As for compactness, "CBOR::XS" encoded data structures are usually about 43 Regarding compactness, "CBOR::XS"-encoded data structures are usually
52 20% smaller than the same data encoded as (compact) JSON or Storable. 44 about 20% smaller than the same data encoded as (compact) JSON or
45 Storable.
53 46
54 In addition to the core CBOR data format, this module implements a 47 In addition to the core CBOR data format, this module implements a
55 number of extensions, to support cyclic and self-referencing data 48 number of extensions, to support cyclic and shared data structures (see
56 structures (see "allow_sharing"), string deduplication (see 49 "allow_sharing" and "allow_cycles"), string deduplication (see
57 "allow_stringref") and scalar references (always enabled). 50 "pack_strings") and scalar references (always enabled).
58 51
59 The primary goal of this module is to be *correct* and the secondary 52 The primary goal of this module is to be *correct* and the secondary
60 goal is to be *fast*. To reach the latter goal it was written in C. 53 goal is to be *fast*. To reach the latter goal it was written in C.
61 54
62 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and 55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
147 same object, such as an array, is referenced multiple times), but 140 same object, such as an array, is referenced multiple times), but
148 instead will emit a reference to the earlier value. 141 instead will emit a reference to the earlier value.
149 142
150 This means that such values will only be encoded once, and will not 143 This means that such values will only be encoded once, and will not
151 result in a deep cloning of the value on decode, in decoders 144 result in a deep cloning of the value on decode, in decoders
152 supporting the value sharing extension. 145 supporting the value sharing extension. This also makes it possible
146 to encode cyclic data structures (which need "allow_cycles" to ne
147 enabled to be decoded by this module).
153 148
154 It is recommended to leave it off unless you know your communication 149 It is recommended to leave it off unless you know your communication
155 partner supports the value sharing extensions to CBOR 150 partner supports the value sharing extensions to CBOR
156 (http://cbor.schmorp.de/value-sharing). 151 (<http://cbor.schmorp.de/value-sharing>), as without decoder
152 support, the resulting data structure might be unusable.
157 153
158 Detecting shared values incurs a runtime overhead when values are 154 Detecting shared values incurs a runtime overhead when values are
159 encoded that have a reference counter large than one, and might 155 encoded that have a reference counter large than one, and might
160 unnecessarily increase the encoded size, as potentially shared 156 unnecessarily increase the encoded size, as potentially shared
161 values are encode as sharable whether or not they are actually 157 values are encode as shareable whether or not they are actually
162 shared. 158 shared.
163 159
164 At the moment, only targets of references can be shared (e.g. 160 At the moment, only targets of references can be shared (e.g.
165 scalars, arrays or hashes pointed to by a reference). Weirder 161 scalars, arrays or hashes pointed to by a reference). Weirder
166 constructs, such as an array with multiple "copies" of the *same* 162 constructs, such as an array with multiple "copies" of the *same*
167 string, which are hard but not impossible to create in Perl, are not 163 string, which are hard but not impossible to create in Perl, are not
168 supported (this is the same as for Storable). 164 supported (this is the same as with Storable).
169 165
170 If $enable is false (the default), then "encode" will encode 166 If $enable is false (the default), then "encode" will encode shared
171 exception when it encounters anything it cannot encode as CBOR. 167 data structures repeatedly, unsharing them in the process. Cyclic
168 data structures cannot be encoded in this mode.
172 169
173 This option does not affect "decode" in any way - shared values and 170 This option does not affect "decode" in any way - shared values and
174 references will always be decoded properly if present. 171 references will always be decoded properly if present.
175 172
173 $cbor = $cbor->allow_cycles ([$enable])
174 $enabled = $cbor->get_allow_cycles
175 If $enable is true (or missing), then "decode" will happily decode
176 self-referential (cyclic) data structures. By default these will not
177 be decoded, as they need manual cleanup to avoid memory leaks, so
178 code that isn't prepared for this will not leak memory.
179
180 If $enable is false (the default), then "decode" will throw an error
181 when it encounters a self-referential/cyclic data structure.
182
183 This option does not affect "encode" in any way - shared values and
184 references will always be decoded properly if present.
185
176 $cbor = $cbor->allow_stringref ([$enable]) 186 $cbor = $cbor->pack_strings ([$enable])
177 $enabled = $cbor->get_allow_stringref 187 $enabled = $cbor->get_pack_strings
178 If $enable is true (or missing), then "encode" will try not to 188 If $enable is true (or missing), then "encode" will try not to
179 encode the same string twice, but will instead encode a reference to 189 encode the same string twice, but will instead encode a reference to
180 the string instead. Depending on your data format. this can save a 190 the string instead. Depending on your data format, this can save a
181 lot of space, but also results in a very large runtime overhead 191 lot of space, but also results in a very large runtime overhead
182 (expect encoding times to be 2-4 times as high as without). 192 (expect encoding times to be 2-4 times as high as without).
183 193
184 It is recommended to leave it off unless you know your 194 It is recommended to leave it off unless you know your
185 communications partner supports the stringref extension to CBOR 195 communications partner supports the stringref extension to CBOR
186 (http://cbor.schmorp.de/stringref). 196 (<http://cbor.schmorp.de/stringref>), as without decoder support,
197 the resulting data structure might not be usable.
187 198
188 If $enable is false (the default), then "encode" will encode 199 If $enable is false (the default), then "encode" will encode strings
189 exception when it encounters anything it cannot encode as CBOR. 200 the standard CBOR way.
190 201
191 This option does not affect "decode" in any way - string references 202 This option does not affect "decode" in any way - string references
192 will always be decoded properly if present. 203 will always be decoded properly if present.
193 204
194 $cbor = $cbor->filter ([$cb->($tag, $value)]) 205 $cbor = $cbor->filter ([$cb->($tag, $value)])
218 it must be a code reference that is called with tag and value, and 229 it must be a code reference that is called with tag and value, and
219 is responsible for decoding the value. If no entry exists, it 230 is responsible for decoding the value. If no entry exists, it
220 returns no values. 231 returns no values.
221 232
222 Example: decode all tags not handled internally into 233 Example: decode all tags not handled internally into
223 CBOR::XS::Tagged objects, with no other special handling (useful 234 "CBOR::XS::Tagged" objects, with no other special handling (useful
224 when working with potentially "unsafe" CBOR data). 235 when working with potentially "unsafe" CBOR data).
225 236
226 CBOR::XS->new->filter (sub { })->decode ($cbor_data); 237 CBOR::XS->new->filter (sub { })->decode ($cbor_data);
227 238
228 Example: provide a global filter for tag 1347375694, converting the 239 Example: provide a global filter for tag 1347375694, converting the
269 integers 280 integers
270 CBOR integers become (numeric) perl scalars. On perls without 64 bit 281 CBOR integers become (numeric) perl scalars. On perls without 64 bit
271 support, 64 bit integers will be truncated or otherwise corrupted. 282 support, 64 bit integers will be truncated or otherwise corrupted.
272 283
273 byte strings 284 byte strings
274 Byte strings will become octet strings in Perl (the byte values 285 Byte strings will become octet strings in Perl (the Byte values
275 0..255 will simply become characters of the same value in Perl). 286 0..255 will simply become characters of the same value in Perl).
276 287
277 UTF-8 strings 288 UTF-8 strings
278 UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be 289 UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
279 decoded into proper Unicode code points. At the moment, the validity 290 decoded into proper Unicode code points. At the moment, the validity
297 308
298 tagged values 309 tagged values
299 Tagged items consists of a numeric tag and another CBOR value. 310 Tagged items consists of a numeric tag and another CBOR value.
300 311
301 See "TAG HANDLING AND EXTENSIONS" and the description of "->filter" 312 See "TAG HANDLING AND EXTENSIONS" and the description of "->filter"
302 for details. 313 for details on which tags are handled how.
303 314
304 anything else 315 anything else
305 Anything else (e.g. unsupported simple values) will raise a decoding 316 Anything else (e.g. unsupported simple values) will raise a decoding
306 error. 317 error.
307 318
308 PERL -> CBOR 319 PERL -> CBOR
309 The mapping from Perl to CBOR is slightly more difficult, as Perl is a 320 The mapping from Perl to CBOR is slightly more difficult, as Perl is a
310 truly typeless language, so we can only guess which CBOR type is meant 321 typeless language. That means this module can only guess which CBOR type
311 by a Perl value. 322 is meant by a perl value.
312 323
313 hash references 324 hash references
314 Perl hash references become CBOR maps. As there is no inherent 325 Perl hash references become CBOR maps. As there is no inherent
315 ordering in hash keys (or CBOR maps), they will usually be encoded 326 ordering in hash keys (or CBOR maps), they will usually be encoded
316 in a pseudo-random order. 327 in a pseudo-random order. This order can be different each time a
328 hahs is encoded.
317 329
318 Currently, tied hashes will use the indefinite-length format, while 330 Currently, tied hashes will use the indefinite-length format, while
319 normal hashes will use the fixed-length format. 331 normal hashes will use the fixed-length format.
320 332
321 array references 333 array references
322 Perl array references become fixed-length CBOR arrays. 334 Perl array references become fixed-length CBOR arrays.
323 335
324 other references 336 other references
325 Other unblessed references are generally not allowed and will cause 337 Other unblessed references will be represented using the indirection
326 an exception to be thrown, except for references to the integers 0 338 tag extension (tag value 22098,
327 and 1, which get turned into false and true in CBOR. 339 <http://cbor.schmorp.de/indirection>). CBOR decoders are guaranteed
340 to be able to decode these values somehow, by either "doing the
341 right thing", decoding into a generic tagged object, simply ignoring
342 the tag, or something else.
328 343
329 CBOR::XS::Tagged objects 344 CBOR::XS::Tagged objects
330 Objects of this type must be arrays consisting of a single "[tag, 345 Objects of this type must be arrays consisting of a single "[tag,
331 value]" pair. The (numerical) tag will be encoded as a CBOR tag, the 346 value]" pair. The (numerical) tag will be encoded as a CBOR tag, the
332 value will be encoded as appropriate for the value. You cna use 347 value will be encoded as appropriate for the value. You must use
333 "CBOR::XS::tag" to create such objects. 348 "CBOR::XS::tag" to create such objects.
334 349
335 Types::Serialiser::true, Types::Serialiser::false, 350 Types::Serialiser::true, Types::Serialiser::false,
336 Types::Serialiser::error 351 Types::Serialiser::error
337 These special values become CBOR true, CBOR false and CBOR undefined 352 These special values become CBOR true, CBOR false and CBOR undefined
353 # dump as number 368 # dump as number
354 encode_cbor [2] # yields [2] 369 encode_cbor [2] # yields [2]
355 encode_cbor [-3.0e17] # yields [-3e+17] 370 encode_cbor [-3.0e17] # yields [-3e+17]
356 my $value = 5; encode_cbor [$value] # yields [5] 371 my $value = 5; encode_cbor [$value] # yields [5]
357 372
358 # used as string, so dump as string 373 # used as string, so dump as string (either byte or text)
359 print $value; 374 print $value;
360 encode_cbor [$value] # yields ["5"] 375 encode_cbor [$value] # yields ["5"]
361 376
362 # undef becomes null 377 # undef becomes null
363 encode_cbor [undef] # yields [null] 378 encode_cbor [undef] # yields [null]
366 381
367 my $x = 3.1; # some variable containing a number 382 my $x = 3.1; # some variable containing a number
368 "$x"; # stringified 383 "$x"; # stringified
369 $x .= ""; # another, more awkward way to stringify 384 $x .= ""; # another, more awkward way to stringify
370 print $x; # perl does it for you, too, quite often 385 print $x; # perl does it for you, too, quite often
386
387 You can force whether a string ie encoded as byte or text string by
388 using "utf8::upgrade" and "utf8::downgrade"):
389
390 utf8::upgrade $x; # encode $x as text string
391 utf8::downgrade $x; # encode $x as byte string
392
393 Perl doesn't define what operations up- and downgrade strings, so if
394 the difference between byte and text is important, you should up- or
395 downgrade your string as late as possible before encoding.
371 396
372 You can force the type to be a CBOR number by numifying it: 397 You can force the type to be a CBOR number by numifying it:
373 398
374 my $x = "3"; # some variable containing a string 399 my $x = "3"; # some variable containing a string
375 $x += 0; # numify it, ensuring it will be dumped as a number 400 $x += 0; # numify it, ensuring it will be dumped as a number
385 the IEEE double format will be used. Perls that use formats other 410 the IEEE double format will be used. Perls that use formats other
386 than IEEE double to represent numerical values are supported, but 411 than IEEE double to represent numerical values are supported, but
387 might suffer loss of precision. 412 might suffer loss of precision.
388 413
389 OBJECT SERIALISATION 414 OBJECT SERIALISATION
415 This module implements both a CBOR-specific and the generic
416 Types::Serialier object serialisation protocol. The following
417 subsections explain both methods.
418
419 ENCODING
390 This module knows two way to serialise a Perl object: The CBOR-specific 420 This module knows two way to serialise a Perl object: The CBOR-specific
391 way, and the generic way. 421 way, and the generic way.
392 422
393 Whenever the encoder encounters a Perl object that it cnanot serialise 423 Whenever the encoder encounters a Perl object that it cannot serialise
394 directly (most of them), it will first look up the "TO_CBOR" method on 424 directly (most of them), it will first look up the "TO_CBOR" method on
395 it. 425 it.
396 426
397 If it has a "TO_CBOR" method, it will call it with the object as only 427 If it has a "TO_CBOR" method, it will call it with the object as only
398 argument, and expects exactly one return value, which it will then 428 argument, and expects exactly one return value, which it will then
403 "CBOR" as the second argument, to distinguish it from other serialisers. 433 "CBOR" as the second argument, to distinguish it from other serialisers.
404 434
405 The "FREEZE" method can return any number of values (i.e. zero or more). 435 The "FREEZE" method can return any number of values (i.e. zero or more).
406 These will be encoded as CBOR perl object, together with the classname. 436 These will be encoded as CBOR perl object, together with the classname.
407 437
438 These methods *MUST NOT* change the data structure that is being
439 serialised. Failure to comply to this can result in memory corruption -
440 and worse.
441
408 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail 442 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail
409 with an error. 443 with an error.
410 444
445 DECODING
411 Objects encoded via "TO_CBOR" cannot be automatically decoded, but 446 Objects encoded via "TO_CBOR" cannot (normally) be automatically
412 objects encoded via "FREEZE" can be decoded using the following 447 decoded, but objects encoded via "FREEZE" can be decoded using the
413 protocol: 448 following protocol:
414 449
415 When an encoded CBOR perl object is encountered by the decoder, it will 450 When an encoded CBOR perl object is encountered by the decoder, it will
416 look up the "THAW" method, by using the stored classname, and will fail 451 look up the "THAW" method, by using the stored classname, and will fail
417 if the method cannot be found. 452 if the method cannot be found.
418 453
439 474
440 sub URI::TO_CBOR { 475 sub URI::TO_CBOR {
441 my ($self) = @_; 476 my ($self) = @_;
442 my $uri = "$self"; # stringify uri 477 my $uri = "$self"; # stringify uri
443 utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string 478 utf8::upgrade $uri; # make sure it will be encoded as UTF-8 string
444 CBOR::XS::tagged 32, "$_[0]" 479 CBOR::XS::tag 32, "$_[0]"
445 } 480 }
446 481
447 This will encode URIs as a UTF-8 string with tag 32, which indicates an 482 This will encode URIs as a UTF-8 string with tag 32, which indicates an
448 URI. 483 URI.
449 484
568 603
569 ENFORCED TAGS 604 ENFORCED TAGS
570 These tags are always handled when decoding, and their handling cannot 605 These tags are always handled when decoding, and their handling cannot
571 be overriden by the user. 606 be overriden by the user.
572 607
573 <unassigned> (perl-object, <http://cbor.schmorp.de/perl-object>) 608 26 (perl-object, <http://cbor.schmorp.de/perl-object>)
574 These tags are automatically created (and decoded) for serialisable 609 These tags are automatically created (and decoded) for serialisable
575 objects using the "FREEZE/THAW" methods (the Types::Serialier object 610 objects using the "FREEZE/THAW" methods (the Types::Serialier object
576 serialisation protocol). See "OBJECT SERIALISATION" for details. 611 serialisation protocol). See "OBJECT SERIALISATION" for details.
577 612
578 <unassigned>, <unassigned> (sharable, sharedref, L 613 28, 29 (shareable, sharedref, L <http://cbor.schmorp.de/value-sharing>)
579 <http://cbor.schmorp.de/value-sharing>)
580 These tags are automatically decoded when encountered, resulting in 614 These tags are automatically decoded when encountered (and they do
615 not result in a cyclic data structure, see "allow_cycles"),
581 shared values in the decoded object. They are only encoded, however, 616 resulting in shared values in the decoded object. They are only
582 when "allow_sharable" is enabled. 617 encoded, however, when "allow_sharing" is enabled.
583 618
584 <unassigned>, <unassigned> (stringref-namespace, stringref, L 619 Not all shared values can be successfully decoded: values that
620 reference themselves will *currently* decode as "undef" (this is not
621 the same as a reference pointing to itself, which will be
622 represented as a value that contains an indirect reference to itself
623 - these will be decoded properly).
624
625 Note that considerably more shared value data structures can be
626 decoded than will be encoded - currently, only values pointed to by
627 references will be shared, others will not. While non-reference
628 shared values can be generated in Perl with some effort, they were
629 considered too unimportant to be supported in the encoder. The
630 decoder, however, will decode these values as shared values.
631
632 256, 25 (stringref-namespace, stringref, L
585 <http://cbor.schmorp.de/stringref>) 633 <http://cbor.schmorp.de/stringref>)
586 These tags are automatically decoded when encountered. They are only 634 These tags are automatically decoded when encountered. They are only
587 encoded, however, when "allow_stringref" is enabled. 635 encoded, however, when "pack_strings" is enabled.
588 636
589 22098 (indirection, <http://cbor.schmorp.de/indirection>) 637 22098 (indirection, <http://cbor.schmorp.de/indirection>)
590 This tag is automatically generated when a reference are encountered 638 This tag is automatically generated when a reference are encountered
591 (with the exception of hash and array refernces). It is converted to 639 (with the exception of hash and array refernces). It is converted to
592 a reference when decoding. 640 a reference when decoding.
693 uses long double to represent floating point values, they might not be 741 uses long double to represent floating point values, they might not be
694 encoded properly. Half precision types are accepted, but not encoded. 742 encoded properly. Half precision types are accepted, but not encoded.
695 743
696 Strict mode and canonical mode are not implemented. 744 Strict mode and canonical mode are not implemented.
697 745
746LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
747 On perls that were built without 64 bit integer support (these are rare
748 nowadays, even on 32 bit architectures), support for any kind of 64 bit
749 integer in CBOR is very limited - most likely, these 64 bit values will
750 be truncated, corrupted, or otherwise not decoded correctly. This also
751 includes string, array and map sizes that are stored as 64 bit integers.
752
698THREADS 753THREADS
699 This module is *not* guaranteed to be thread safe and there are no plans 754 This module is *not* guaranteed to be thread safe and there are no plans
700 to change this until Perl gets thread support (as opposed to the 755 to change this until Perl gets thread support (as opposed to the
701 horribly slow so-called "threads" which are simply slow and bloated 756 horribly slow so-called "threads" which are simply slow and bloated
702 process simulations - use fork, it's *much* faster, cheaper, better). 757 process simulations - use fork, it's *much* faster, cheaper, better).

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines