ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/README
(Generate patch)

Comparing CBOR-XS/README (file contents):
Revision 1.10 by root, Thu Nov 28 16:09:04 2013 UTC vs.
Revision 1.13 by root, Sun Jan 5 14:24:54 2014 UTC

44 about 20% smaller than the same data encoded as (compact) JSON or 44 about 20% smaller than the same data encoded as (compact) JSON or
45 Storable. 45 Storable.
46 46
47 In addition to the core CBOR data format, this module implements a 47 In addition to the core CBOR data format, this module implements a
48 number of extensions, to support cyclic and shared data structures (see 48 number of extensions, to support cyclic and shared data structures (see
49 "allow_sharing"), string deduplication (see "pack_strings") and scalar 49 "allow_sharing" and "allow_cycles"), string deduplication (see
50 references (always enabled). 50 "pack_strings") and scalar references (always enabled).
51 51
52 The primary goal of this module is to be *correct* and the secondary 52 The primary goal of this module is to be *correct* and the secondary
53 goal is to be *fast*. To reach the latter goal it was written in C. 53 goal is to be *fast*. To reach the latter goal it was written in C.
54 54
55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and 55 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
141 instead will emit a reference to the earlier value. 141 instead will emit a reference to the earlier value.
142 142
143 This means that such values will only be encoded once, and will not 143 This means that such values will only be encoded once, and will not
144 result in a deep cloning of the value on decode, in decoders 144 result in a deep cloning of the value on decode, in decoders
145 supporting the value sharing extension. This also makes it possible 145 supporting the value sharing extension. This also makes it possible
146 to encode cyclic data structures. 146 to encode cyclic data structures (which need "allow_cycles" to ne
147 enabled to be decoded by this module).
147 148
148 It is recommended to leave it off unless you know your communication 149 It is recommended to leave it off unless you know your communication
149 partner supports the value sharing extensions to CBOR 150 partner supports the value sharing extensions to CBOR
150 (<http://cbor.schmorp.de/value-sharing>), as without decoder 151 (<http://cbor.schmorp.de/value-sharing>), as without decoder
151 support, the resulting data structure might be unusable. 152 support, the resulting data structure might be unusable.
152 153
153 Detecting shared values incurs a runtime overhead when values are 154 Detecting shared values incurs a runtime overhead when values are
154 encoded that have a reference counter large than one, and might 155 encoded that have a reference counter large than one, and might
155 unnecessarily increase the encoded size, as potentially shared 156 unnecessarily increase the encoded size, as potentially shared
156 values are encode as sharable whether or not they are actually 157 values are encode as shareable whether or not they are actually
157 shared. 158 shared.
158 159
159 At the moment, only targets of references can be shared (e.g. 160 At the moment, only targets of references can be shared (e.g.
160 scalars, arrays or hashes pointed to by a reference). Weirder 161 scalars, arrays or hashes pointed to by a reference). Weirder
161 constructs, such as an array with multiple "copies" of the *same* 162 constructs, such as an array with multiple "copies" of the *same*
165 If $enable is false (the default), then "encode" will encode shared 166 If $enable is false (the default), then "encode" will encode shared
166 data structures repeatedly, unsharing them in the process. Cyclic 167 data structures repeatedly, unsharing them in the process. Cyclic
167 data structures cannot be encoded in this mode. 168 data structures cannot be encoded in this mode.
168 169
169 This option does not affect "decode" in any way - shared values and 170 This option does not affect "decode" in any way - shared values and
171 references will always be decoded properly if present.
172
173 $cbor = $cbor->allow_cycles ([$enable])
174 $enabled = $cbor->get_allow_cycles
175 If $enable is true (or missing), then "decode" will happily decode
176 self-referential (cyclic) data structures. By default these will not
177 be decoded, as they need manual cleanup to avoid memory leaks, so
178 code that isn't prepared for this will not leak memory.
179
180 If $enable is false (the default), then "decode" will throw an error
181 when it encounters a self-referential/cyclic data structure.
182
183 This option does not affect "encode" in any way - shared values and
170 references will always be decoded properly if present. 184 references will always be decoded properly if present.
171 185
172 $cbor = $cbor->pack_strings ([$enable]) 186 $cbor = $cbor->pack_strings ([$enable])
173 $enabled = $cbor->get_pack_strings 187 $enabled = $cbor->get_pack_strings
174 If $enable is true (or missing), then "encode" will try not to 188 If $enable is true (or missing), then "encode" will try not to
186 the standard CBOR way. 200 the standard CBOR way.
187 201
188 This option does not affect "decode" in any way - string references 202 This option does not affect "decode" in any way - string references
189 will always be decoded properly if present. 203 will always be decoded properly if present.
190 204
205 $cbor = $cbor->validate_utf8 ([$enable])
206 $enabled = $cbor->get_validate_utf8
207 If $enable is true (or missing), then "decode" will validate that
208 elements (text strings) containing UTF-8 data in fact contain valid
209 UTF-8 data (instead of blindly accepting it). This validation
210 obviously takes extra time during decoding.
211
212 The concept of "valid UTF-8" used is perl's concept, which is a
213 superset of the official UTF-8.
214
215 If $enable is false (the default), then "decode" will blindly accept
216 UTF-8 data, marking them as valid UTF-8 in the resulting data
217 structure regardless of whether thats true or not.
218
219 Perl isn't too happy about corrupted UTF-8 in strings, but should
220 generally not crash or do similarly evil things. Extensions might be
221 not so forgiving, so it's recommended to turn on this setting if you
222 receive untrusted CBOR.
223
224 This option does not affect "encode" in any way - strings that are
225 supposedly valid UTF-8 will simply be dumped into the resulting CBOR
226 string without checking whether that is, in fact, true or not.
227
191 $cbor = $cbor->filter ([$cb->($tag, $value)]) 228 $cbor = $cbor->filter ([$cb->($tag, $value)])
192 $cb_or_undef = $cbor->get_filter 229 $cb_or_undef = $cbor->get_filter
193 Sets or replaces the tagged value decoding filter (when $cb is 230 Sets or replaces the tagged value decoding filter (when $cb is
194 specified) or clears the filter (if no argument or "undef" is 231 specified) or clears the filter (if no argument or "undef" is
195 provided). 232 provided).
249 protocol and you need to know where the first CBOR string ends amd 286 protocol and you need to know where the first CBOR string ends amd
250 the next one starts. 287 the next one starts.
251 288
252 CBOR::XS->new->decode_prefix ("......") 289 CBOR::XS->new->decode_prefix ("......")
253 => ("...", 3) 290 => ("...", 3)
291
292 INCREMENTAL PARSING
293 In some cases, there is the need for incremental parsing of JSON texts.
294 While this module always has to keep both CBOR text and resulting Perl
295 data structure in memory at one time, it does allow you to parse a CBOR
296 stream incrementally, using a similar to using "decode_prefix" to see if
297 a full CBOR object is available, but is much more efficient.
298
299 It basically works by parsing as much of a CBOR string as possible - if
300 the CBOR data is not complete yet, the pasrer will remember where it
301 was, to be able to restart when more data has been accumulated. Once
302 enough data is available to either decode a complete CBOR value or raise
303 an error, a real decode will be attempted.
304
305 A typical use case would be a network protocol that consists of sending
306 and receiving CBOR-encoded messages. The solution that works with CBOR
307 and about anything else is by prepending a length to every CBOR value,
308 so the receiver knows how many octets to read. More compact (and
309 slightly slower) would be to just send CBOR values back-to-back, as
310 "CBOR::XS" knows where a CBOR value ends, and doesn't need an explicit
311 length.
312
313 The following methods help with this:
314
315 @decoded = $cbor->incr_parse ($buffer)
316 This method attempts to decode exactly one CBOR value from the
317 beginning of the given $buffer. The value is removed from the
318 $buffer on success. When $buffer doesn't contain a complete value
319 yet, it returns nothing. Finally, when the $buffer doesn't start
320 with something that could ever be a valid CBOR value, it raises an
321 exception, just as "decode" would. In the latter case the decoder
322 state is undefined and must be reset before being able to parse
323 further.
324
325 This method modifies the $buffer in place. When no CBOR value can be
326 decoded, the decoder stores the current string offset. On the next
327 call, continues decoding at the place where it stopped before. For
328 this to make sense, the $buffer must begin with the same octets as
329 on previous unsuccessful calls.
330
331 You can call this method in scalar context, in which case it either
332 returns a decoded value or "undef". This makes it impossible to
333 distinguish between CBOR null values (which decode to "undef") and
334 an unsuccessful decode, which is often acceptable.
335
336 @decoded = $cbor->incr_parse_multiple ($buffer)
337 Same as "incr_parse", but attempts to decode as many CBOR values as
338 possible in one go, instead of at most one. Calls to "incr_parse"
339 and "incr_parse_multiple" can be interleaved.
340
341 $cbor->incr_reset
342 Resets the incremental decoder. This throws away any saved state, so
343 that subsequent calls to "incr_parse" or "incr_parse_multiple" start
344 to parse a new CBOR value from the beginning of the $buffer again.
345
346 This method can be caled at any time, but it *must* be called if you
347 want to change your $buffer or there was a decoding error and you
348 want to reuse the $cbor object for future incremental parsings.
254 349
255MAPPING 350MAPPING
256 This section describes how CBOR::XS maps Perl values to CBOR values and 351 This section describes how CBOR::XS maps Perl values to CBOR values and
257 vice versa. These mappings are designed to "do the right thing" in most 352 vice versa. These mappings are designed to "do the right thing" in most
258 circumstances automatically, preserving round-tripping characteristics 353 circumstances automatically, preserving round-tripping characteristics
396 the IEEE double format will be used. Perls that use formats other 491 the IEEE double format will be used. Perls that use formats other
397 than IEEE double to represent numerical values are supported, but 492 than IEEE double to represent numerical values are supported, but
398 might suffer loss of precision. 493 might suffer loss of precision.
399 494
400 OBJECT SERIALISATION 495 OBJECT SERIALISATION
496 This module implements both a CBOR-specific and the generic
497 Types::Serialier object serialisation protocol. The following
498 subsections explain both methods.
499
500 ENCODING
401 This module knows two way to serialise a Perl object: The CBOR-specific 501 This module knows two way to serialise a Perl object: The CBOR-specific
402 way, and the generic way. 502 way, and the generic way.
403 503
404 Whenever the encoder encounters a Perl object that it cnanot serialise 504 Whenever the encoder encounters a Perl object that it cannot serialise
405 directly (most of them), it will first look up the "TO_CBOR" method on 505 directly (most of them), it will first look up the "TO_CBOR" method on
406 it. 506 it.
407 507
408 If it has a "TO_CBOR" method, it will call it with the object as only 508 If it has a "TO_CBOR" method, it will call it with the object as only
409 argument, and expects exactly one return value, which it will then 509 argument, and expects exactly one return value, which it will then
414 "CBOR" as the second argument, to distinguish it from other serialisers. 514 "CBOR" as the second argument, to distinguish it from other serialisers.
415 515
416 The "FREEZE" method can return any number of values (i.e. zero or more). 516 The "FREEZE" method can return any number of values (i.e. zero or more).
417 These will be encoded as CBOR perl object, together with the classname. 517 These will be encoded as CBOR perl object, together with the classname.
418 518
519 These methods *MUST NOT* change the data structure that is being
520 serialised. Failure to comply to this can result in memory corruption -
521 and worse.
522
419 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail 523 If an object supports neither "TO_CBOR" nor "FREEZE", encoding will fail
420 with an error. 524 with an error.
421 525
526 DECODING
422 Objects encoded via "TO_CBOR" cannot be automatically decoded, but 527 Objects encoded via "TO_CBOR" cannot (normally) be automatically
423 objects encoded via "FREEZE" can be decoded using the following 528 decoded, but objects encoded via "FREEZE" can be decoded using the
424 protocol: 529 following protocol:
425 530
426 When an encoded CBOR perl object is encountered by the decoder, it will 531 When an encoded CBOR perl object is encountered by the decoder, it will
427 look up the "THAW" method, by using the stored classname, and will fail 532 look up the "THAW" method, by using the stored classname, and will fail
428 if the method cannot be found. 533 if the method cannot be found.
429 534
584 26 (perl-object, <http://cbor.schmorp.de/perl-object>) 689 26 (perl-object, <http://cbor.schmorp.de/perl-object>)
585 These tags are automatically created (and decoded) for serialisable 690 These tags are automatically created (and decoded) for serialisable
586 objects using the "FREEZE/THAW" methods (the Types::Serialier object 691 objects using the "FREEZE/THAW" methods (the Types::Serialier object
587 serialisation protocol). See "OBJECT SERIALISATION" for details. 692 serialisation protocol). See "OBJECT SERIALISATION" for details.
588 693
589 28, 29 (sharable, sharedref, L <http://cbor.schmorp.de/value-sharing>) 694 28, 29 (shareable, sharedref, L <http://cbor.schmorp.de/value-sharing>)
590 These tags are automatically decoded when encountered, resulting in 695 These tags are automatically decoded when encountered (and they do
696 not result in a cyclic data structure, see "allow_cycles"),
591 shared values in the decoded object. They are only encoded, however, 697 resulting in shared values in the decoded object. They are only
592 when "allow_sharable" is enabled. 698 encoded, however, when "allow_sharing" is enabled.
699
700 Not all shared values can be successfully decoded: values that
701 reference themselves will *currently* decode as "undef" (this is not
702 the same as a reference pointing to itself, which will be
703 represented as a value that contains an indirect reference to itself
704 - these will be decoded properly).
705
706 Note that considerably more shared value data structures can be
707 decoded than will be encoded - currently, only values pointed to by
708 references will be shared, others will not. While non-reference
709 shared values can be generated in Perl with some effort, they were
710 considered too unimportant to be supported in the encoder. The
711 decoder, however, will decode these values as shared values.
593 712
594 256, 25 (stringref-namespace, stringref, L 713 256, 25 (stringref-namespace, stringref, L
595 <http://cbor.schmorp.de/stringref>) 714 <http://cbor.schmorp.de/stringref>)
596 These tags are automatically decoded when encountered. They are only 715 These tags are automatically decoded when encountered. They are only
597 encoded, however, when "pack_strings" is enabled. 716 encoded, however, when "pack_strings" is enabled.
615 734
616 When any of these need to load additional modules that are not part of 735 When any of these need to load additional modules that are not part of
617 the perl core distribution (e.g. URI), it is (currently) up to the user 736 the perl core distribution (e.g. URI), it is (currently) up to the user
618 to provide these modules. The decoding usually fails with an exception 737 to provide these modules. The decoding usually fails with an exception
619 if the required module cannot be loaded. 738 if the required module cannot be loaded.
739
740 0, 1 (date/time string, seconds since the epoch)
741 These tags are decoded into Time::Piece objects. The corresponding
742 "Time::Piece::TO_CBOR" method always encodes into tag 1 values
743 currently.
744
745 The Time::Piece API is generally surprisingly bad, and fractional
746 seconds are only accidentally kept intact, so watch out. On the plus
747 side, the module comes with perl since 5.10, which has to count for
748 something.
620 749
621 2, 3 (positive/negative bignum) 750 2, 3 (positive/negative bignum)
622 These tags are decoded into Math::BigInt objects. The corresponding 751 These tags are decoded into Math::BigInt objects. The corresponding
623 "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal 752 "Math::BigInt::TO_CBOR" method encodes "small" bigints into normal
624 CBOR integers, and others into positive/negative CBOR bignums. 753 CBOR integers, and others into positive/negative CBOR bignums.
703 uses long double to represent floating point values, they might not be 832 uses long double to represent floating point values, they might not be
704 encoded properly. Half precision types are accepted, but not encoded. 833 encoded properly. Half precision types are accepted, but not encoded.
705 834
706 Strict mode and canonical mode are not implemented. 835 Strict mode and canonical mode are not implemented.
707 836
837LIMITATIONS ON PERLS WITHOUT 64-BIT INTEGER SUPPORT
838 On perls that were built without 64 bit integer support (these are rare
839 nowadays, even on 32 bit architectures), support for any kind of 64 bit
840 integer in CBOR is very limited - most likely, these 64 bit values will
841 be truncated, corrupted, or otherwise not decoded correctly. This also
842 includes string, array and map sizes that are stored as 64 bit integers.
843
708THREADS 844THREADS
709 This module is *not* guaranteed to be thread safe and there are no plans 845 This module is *not* guaranteed to be thread safe and there are no plans
710 to change this until Perl gets thread support (as opposed to the 846 to change this until Perl gets thread support (as opposed to the
711 horribly slow so-called "threads" which are simply slow and bloated 847 horribly slow so-called "threads" which are simply slow and bloated
712 process simulations - use fork, it's *much* faster, cheaper, better). 848 process simulations - use fork, it's *much* faster, cheaper, better).

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines