ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/README
(Generate patch)

Comparing CBOR-XS/README (file contents):
Revision 1.10 by root, Thu Nov 28 16:09:04 2013 UTC vs.
Revision 1.16 by root, Mon Feb 8 04:37:12 2016 UTC

44 about 20% smaller than the same data encoded as (compact) JSON or 44 about 20% smaller than the same data encoded as (compact) JSON or
45 Storable. 45 Storable.
46 46
47 In addition to the core CBOR data format, this module implements a 47 In addition to the core CBOR data format, this module implements a
48 number of extensions, to support cyclic and shared data structures (see 48 number of extensions, to support cyclic and shared data structures (see
49 "allow_sharing"), string deduplication (see "pack_strings") and scalar 49 "allow_sharing" and "allow_cycles"), string deduplication (see
50 references (always enabled). 50 "pack_strings") and scalar references (always enabled).
51 51
52 The primary goal of this module is to be *correct* and the secondary 52 The primary goal of this module is to be *correct* and the secondary
53 goal is to be *fast*. To reach the latter goal it was written in C. 53 goal is to be *fast*. To reach the latter goal it was written in C.
54 54
55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and 55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
141 instead will emit a reference to the earlier value. 141 instead will emit a reference to the earlier value.
142 142
143 This means that such values will only be encoded once, and will not 143 This means that such values will only be encoded once, and will not
144 result in a deep cloning of the value on decode, in decoders 144 result in a deep cloning of the value on decode, in decoders
145 supporting the value sharing extension. This also makes it possible 145 supporting the value sharing extension. This also makes it possible
146 to encode cyclic data structures. 146 to encode cyclic data structures (which need "allow_cycles" to ne
147 enabled to be decoded by this module).
147 148
148 It is recommended to leave it off unless you know your communication 149 It is recommended to leave it off unless you know your communication
149 partner supports the value sharing extensions to CBOR 150 partner supports the value sharing extensions to CBOR
150 (<http://cbor.schmorp.de/value-sharing>), as without decoder 151 (<http://cbor.schmorp.de/value-sharing>), as without decoder
151 support, the resulting data structure might be unusable. 152 support, the resulting data structure might be unusable.
152 153
153 Detecting shared values incurs a runtime overhead when values are 154 Detecting shared values incurs a runtime overhead when values are
154 encoded that have a reference counter large than one, and might 155 encoded that have a reference counter large than one, and might
155 unnecessarily increase the encoded size, as potentially shared 156 unnecessarily increase the encoded size, as potentially shared
156 values are encode as sharable whether or not they are actually 157 values are encode as shareable whether or not they are actually
157 shared. 158 shared.
158 159
159 At the moment, only targets of references can be shared (e.g. 160 At the moment, only targets of references can be shared (e.g.
160 scalars, arrays or hashes pointed to by a reference). Weirder 161 scalars, arrays or hashes pointed to by a reference). Weirder
161 constructs, such as an array with multiple "copies" of the *same* 162 constructs, such as an array with multiple "copies" of the *same*
166 data structures repeatedly, unsharing them in the process. Cyclic 167 data structures repeatedly, unsharing them in the process. Cyclic
167 data structures cannot be encoded in this mode. 168 data structures cannot be encoded in this mode.
168 169
169 This option does not affect "decode" in any way - shared values and 170 This option does not affect "decode" in any way - shared values and
170 references will always be decoded properly if present. 171 references will always be decoded properly if present.
172
173 $cbor = $cbor->allow_cycles ([$enable])
174 $enabled = $cbor->get_allow_cycles
175 If $enable is true (or missing), then "decode" will happily decode
176 self-referential (cyclic) data structures. By default these will not
177 be decoded, as they need manual cleanup to avoid memory leaks, so
178 code that isn't prepared for this will not leak memory.
179
180 If $enable is false (the default), then "decode" will throw an error
181 when it encounters a self-referential/cyclic data structure.
182
183 FUTURE DIRECTION: the motivation behind this option is to avoid
184 *real* cycles - future versions of this module might chose to decode
185 cyclic data structures using weak references when this option is
186 off, instead of throwing an error.
187
188 This option does not affect "encode" in any way - shared values and
189 references will always be encoded properly if present.
171 190
172 $cbor = $cbor->pack_strings ([$enable]) 191 $cbor = $cbor->pack_strings ([$enable])
173 $enabled = $cbor->get_pack_strings 192 $enabled = $cbor->get_pack_strings
174 If $enable is true (or missing), then "encode" will try not to 193 If $enable is true (or missing), then "encode" will try not to
175 encode the same string twice, but will instead encode a reference to 194 encode the same string twice, but will instead encode a reference to
186 the standard CBOR way. 205 the standard CBOR way.
187 206
188 This option does not affect "decode" in any way - string references 207 This option does not affect "decode" in any way - string references
189 will always be decoded properly if present. 208 will always be decoded properly if present.
190 209
210 $cbor = $cbor->validate_utf8 ([$enable])
211 $enabled = $cbor->get_validate_utf8
212 If $enable is true (or missing), then "decode" will validate that
213 elements (text strings) containing UTF-8 data in fact contain valid
214 UTF-8 data (instead of blindly accepting it). This validation
215 obviously takes extra time during decoding.
216
217 The concept of "valid UTF-8" used is perl's concept, which is a
218 superset of the official UTF-8.
219
220 If $enable is false (the default), then "decode" will blindly accept
221 UTF-8 data, marking them as valid UTF-8 in the resulting data
222 structure regardless of whether thats true or not.
223
224 Perl isn't too happy about corrupted UTF-8 in strings, but should
225 generally not crash or do similarly evil things. Extensions might be
226 not so forgiving, so it's recommended to turn on this setting if you
227 receive untrusted CBOR.
228
229 This option does not affect "encode" in any way - strings that are
230 supposedly valid UTF-8 will simply be dumped into the resulting CBOR
231 string without checking whether that is, in fact, true or not.
232
191 $cbor = $cbor->filter ([$cb->($tag, $value)]) 233 $cbor = $cbor->filter ([$cb->($tag, $value)])
192 $cb_or_undef = $cbor->get_filter 234 $cb_or_undef = $cbor->get_filter
193 Sets or replaces the tagged value decoding filter (when $cb is 235 Sets or replaces the tagged value decoding filter (when $cb is
194 specified) or clears the filter (if no argument or "undef" is 236 specified) or clears the filter (if no argument or "undef" is
195 provided). 237 provided).
249 protocol and you need to know where the first CBOR string ends amd 291 protocol and you need to know where the first CBOR string ends amd
250 the next one starts. 292 the next one starts.
251 293
252 CBOR::XS->new->decode_prefix ("......") 294 CBOR::XS->new->decode_prefix ("......")
253 => ("...", 3) 295 => ("...", 3)
296
297 INCREMENTAL PARSING
298 In some cases, there is the need for incremental parsing of JSON texts.
299 While this module always has to keep both CBOR text and resulting Perl
300 data structure in memory at one time, it does allow you to parse a CBOR
301 stream incrementally, using a similar to using "decode_prefix" to see if
302 a full CBOR object is available, but is much more efficient.
303
304 It basically works by parsing as much of a CBOR string as possible - if
305 the CBOR data is not complete yet, the pasrer will remember where it
306 was, to be able to restart when more data has been accumulated. Once
307 enough data is available to either decode a complete CBOR value or raise
308 an error, a real decode will be attempted.
309
310 A typical use case would be a network protocol that consists of sending
311 and receiving CBOR-encoded messages. The solution that works with CBOR
312 and about anything else is by prepending a length to every CBOR value,
313 so the receiver knows how many octets to read. More compact (and
314 slightly slower) would be to just send CBOR values back-to-back, as
315 "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
316 length.
317
318 The following methods help with this:
319
320 @decoded = $cbor->incr_parse ($buffer)
321 This method attempts to decode exactly one CBOR value from the
322 beginning of the given $buffer. The value is removed from the
323 $buffer on success. When $buffer doesn't contain a complete value
324 yet, it returns nothing. Finally, when the $buffer doesn't start
325 with something that could ever be a valid CBOR value, it raises an
326 exception, just as "decode" would. In the latter case the decoder
327 state is undefined and must be reset before being able to parse
328 further.
329
330 This method modifies the $buffer in place. When no CBOR value can be
331 decoded, the decoder stores the current string offset. On the next
332 call, continues decoding at the place where it stopped before. For
333 this to make sense, the $buffer must begin with the same octets as
334 on previous unsuccessful calls.
335
336 You can call this method in scalar context, in which case it either
337 returns a decoded value or "undef". This makes it impossible to
338 distinguish between CBOR null values (which decode to "undef") and
339 an unsuccessful decode, which is often acceptable.
340
341 @decoded = $cbor->incr_parse_multiple ($buffer)
342 Same as "incr_parse", but attempts to decode as many CBOR values as
343 possible in one go, instead of at most one. Calls to "incr_parse"
344 and "incr_parse_multiple" can be interleaved.
345
346 $cbor->incr_reset
347 Resets the incremental decoder. This throws away any saved state, so
348 that subsequent calls to "incr_parse" or "incr_parse_multiple" start
349 to parse a new CBOR value from the beginning of the $buffer again.
350
351 This method can be caled at any time, but it *must* be called if you
352 want to change your $buffer or there was a decoding error and you
353 want to reuse the $cbor object for future incremental parsings.
254 354
255MAPPING 355MAPPING
256 This section describes how CBOR::XS maps Perl values to CBOR values and 356 This section describes how CBOR::XS maps Perl values to CBOR values and
257 vice versa. These mappings are designed to "do the right thing" in most 357 vice versa. These mappings are designed to "do the right thing" in most
258 circumstances automatically, preserving round-tripping characteristics 358 circumstances automatically, preserving round-tripping characteristics
396 the IEEE double format will be used. Perls that use formats other 496 the IEEE double format will be used. Perls that use formats other
397 than IEEE double to represent numerical values are supported, but 497 than IEEE double to represent numerical values are supported, but
398 might suffer loss of precision. 498 might suffer loss of precision.
399 499
400 OBJECT SERIALISATION 500 OBJECT SERIALISATION
501 This module implements both a CBOR-specific and the generic
502 Types::Serialier object serialisation protocol. The following
503 subsections explain both methods.
504
505 ENCODING
401 This module knows two way to serialise a Perl object: The CBOR-specific 506 This module knows two way to serialise a Perl object: The CBOR-specific
402 way, and the generic way. 507 way, and the generic way.
403 508
404 Whenever the encoder encounters a Perl object that it cnanot serialise 509 Whenever the encoder encounters a Perl object that it cannot serialise
405 directly (most of them), it will first look up the "TO_CBOR" method on 510 directly (most of them), it will first look up the "TO_CBOR" method on
406 it. 511 it.
407 512
408 If it has a "TO_CBOR" method, it will call it with the object as only 513 If it has a "TO_CBOR" method, it will call it with the object as only
409 argument, and expects exactly one return value, which it will then 514 argument, and expects exactly one return value, which it will then
414 "CBOR" as the second argument, to distinguish it from other serialisers. 519 "CBOR" as the second argument, to distinguish it from other serialisers.
415 520
416 The "FREEZE" method can return any number of values (i.e. zero or more). 521 The "FREEZE" method can return any number of values (i.e. zero or more).
417 These will be encoded as CBOR perl object, together with the classname. 522 These will be encoded as CBOR perl object, together with the classname.
418 523
524 These methods *MUST NOT* change the data structure that is being
525 serialised. Failure to comply to this can result in memory corruption -
526 and worse.
527
419 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail 528 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail
420 with an error. 529 with an error.
421 530
531 DECODING
422 Objects encoded via "TO_CBOR" cannot be automatically decoded, but 532 Objects encoded via "TO_CBOR" cannot (normally) be automatically
423 objects encoded via "FREEZE" can be decoded using the following 533 decoded, but objects encoded via "FREEZE" can be decoded using the
424 protocol: 534 following protocol:
425 535
426 When an encoded CBOR perl object is encountered by the decoder, it will 536 When an encoded CBOR perl object is encountered by the decoder, it will
427 look up the "THAW" method, by using the stored classname, and will fail 537 look up the "THAW" method, by using the stored classname, and will fail
428 if the method cannot be found. 538 if the method cannot be found.
429 539
584 26 (perl-object, <http://cbor.schmorp.de/perl-object>) 694 26 (perl-object, <http://cbor.schmorp.de/perl-object>)
585 These tags are automatically created (and decoded) for serialisable 695 These tags are automatically created (and decoded) for serialisable
586 objects using the "FREEZE/THAW" methods (the Types::Serialier object 696 objects using the "FREEZE/THAW" methods (the Types::Serialier object
587 serialisation protocol). See "OBJECT SERIALISATION" for details. 697 serialisation protocol). See "OBJECT SERIALISATION" for details.
588 698
589 28, 29 (sharable, sharedref, L <http://cbor.schmorp.de/value-sharing>) 699 28, 29 (shareable, sharedref, <http://cbor.schmorp.de/value-sharing>)
590 These tags are automatically decoded when encountered, resulting in 700 These tags are automatically decoded when encountered (and they do
701 not result in a cyclic data structure, see "allow_cycles"),
591 shared values in the decoded object. They are only encoded, however, 702 resulting in shared values in the decoded object. They are only
592 when "allow_sharable" is enabled. 703 encoded, however, when "allow_sharing" is enabled.
593 704
705 Not all shared values can be successfully decoded: values that
706 reference themselves will *currently* decode as "undef" (this is not
707 the same as a reference pointing to itself, which will be
708 represented as a value that contains an indirect reference to itself
709 - these will be decoded properly).
710
711 Note that considerably more shared value data structures can be
712 decoded than will be encoded - currently, only values pointed to by
713 references will be shared, others will not. While non-reference
714 shared values can be generated in Perl with some effort, they were
715 considered too unimportant to be supported in the encoder. The
716 decoder, however, will decode these values as shared values.
717
594 256, 25 (stringref-namespace, stringref, L 718 256, 25 (stringref-namespace, stringref,
595 <http://cbor.schmorp.de/stringref>) 719 <http://cbor.schmorp.de/stringref>)
596 These tags are automatically decoded when encountered. They are only 720 These tags are automatically decoded when encountered. They are only
597 encoded, however, when "pack_strings" is enabled. 721 encoded, however, when "pack_strings" is enabled.
598 722
599 22098 (indirection, <http://cbor.schmorp.de/indirection>) 723 22098 (indirection, <http://cbor.schmorp.de/indirection>)
615 739
616 When any of these need to load additional modules that are not part of 740 When any of these need to load additional modules that are not part of
617 the perl core distribution (e.g. URI), it is (currently) up to the user 741 the perl core distribution (e.g. URI), it is (currently) up to the user
618 to provide these modules. The decoding usually fails with an exception 742 to provide these modules. The decoding usually fails with an exception
619 if the required module cannot be loaded. 743 if the required module cannot be loaded.
744
745 0, 1 (date/time string, seconds since the epoch)
746 These tags are decoded into Time::Piece objects. The corresponding
747 "Time::Piece::TO_CBOR" method always encodes into tag 1 values
748 currently.
749
750 The Time::Piece API is generally surprisingly bad, and fractional
751 seconds are only accidentally kept intact, so watch out. On the plus
752 side, the module comes with perl since 5.10, which has to count for
753 something.
620 754
621 2, 3 (positive/negative bignum) 755 2, 3 (positive/negative bignum)
622 These tags are decoded into Math::BigInt objects. The corresponding 756 These tags are decoded into Math::BigInt objects. The corresponding
623 "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal 757 "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
624 CBOR integers, and others into positive/negative CBOR bignums. 758 CBOR integers, and others into positive/negative CBOR bignums.
703 uses long double to represent floating point values, they might not be 837 uses long double to represent floating point values, they might not be
704 encoded properly. Half precision types are accepted, but not encoded. 838 encoded properly. Half precision types are accepted, but not encoded.
705 839
706 Strict mode and canonical mode are not implemented. 840 Strict mode and canonical mode are not implemented.
707 841
842LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
843 On perls that were built without 64 bit integer support (these are rare
844 nowadays, even on 32 bit architectures, as all major Perl distributions
845 are built with 64 bit integer support), support for any kind of 64 bit
846 integer in CBOR is very limited - most likely, these 64 bit values will
847 be truncated, corrupted, or otherwise not decoded correctly. This also
848 includes string, array and map sizes that are stored as 64 bit integers.
849
708THREADS 850THREADS
709 This module is *not* guaranteed to be thread safe and there are no plans 851 This module is *not* guaranteed to be thread safe and there are no plans
710 to change this until Perl gets thread support (as opposed to the 852 to change this until Perl gets thread support (as opposed to the
711 horribly slow so-called "threads" which are simply slow and bloated 853 horribly slow so-called "threads" which are simply slow and bloated
712 process simulations - use fork, it's *much* faster, cheaper, better). 854 process simulations - use fork, it's *much* faster, cheaper, better).

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines