ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/XS.pm
Revision: 1.5
Committed: Sat Oct 26 23:02:55 2013 UTC (10 years, 6 months ago) by root
Branch: MAIN
CVS Tags: rel-0_03
Changes since 1.4: +14 -6 lines
Log Message:
0.03

File Contents

# Content
1 =head1 NAME
2
3 CBOR::XS - Concise Binary Object Representation (CBOR, RFC7049)
4
5 =encoding utf-8
6
7 =head1 SYNOPSIS
8
9 use CBOR::XS;
10
11 $binary_cbor_data = encode_cbor $perl_value;
12 $perl_value = decode_cbor $binary_cbor_data;
13
14 # OO-interface
15
16 $coder = CBOR::XS->new;
17 #TODO
18
19 =head1 DESCRIPTION
20
21 WARNING! THIS IS A PRE-ALPHA RELEASE! IT WILL CRASH, CORRUPT YOUR DATA
22 AND EAT YOUR CHILDREN! (Actually, apart from being untested and a bit
23 feature-limited, it might already be useful).
24
25 This module converts Perl data structures to the Concise Binary Object
26 Representation (CBOR) and vice versa. CBOR is a fast binary serialisation
27 format that aims to use a superset of the JSON data model, i.e. when you
28 can represent something in JSON, you should be able to represent it in
29 CBOR.
30
31 This makes it a faster and more compact binary alternative to JSON.
32
33 The primary goal of this module is to be I<correct> and the secondary goal
34 is to be I<fast>. To reach the latter goal it was written in C.
35
36 See MAPPING, below, on how CBOR::XS maps perl values to CBOR values and
37 vice versa.
38
39 =cut
40
41 package CBOR::XS;
42
43 use common::sense;
44
45 our $VERSION = 0.03;
46 our @ISA = qw(Exporter);
47
48 our @EXPORT = qw(encode_cbor decode_cbor);
49
50 use Exporter;
51 use XSLoader;
52
53 our $MAGIC = "\xd9\xd9\xf7";
54
55 =head1 FUNCTIONAL INTERFACE
56
57 The following convenience methods are provided by this module. They are
58 exported by default:
59
60 =over 4
61
62 =item $cbor_data = encode_cbor $perl_scalar
63
64 Converts the given Perl data structure to CBOR representation. Croaks on
65 error.
66
67 =item $perl_scalar = decode_cbor $cbor_data
68
69 The opposite of C<encode_cbor>: expects a valid CBOR string to parse,
70 returning the resulting perl scalar. Croaks on error.
71
72 =back
73
74
75 =head1 OBJECT-ORIENTED INTERFACE
76
77 The object oriented interface lets you configure your own encoding or
78 decoding style, within the limits of supported formats.
79
80 =over 4
81
82 =item $cbor = new CBOR::XS
83
84 Creates a new CBOR::XS object that can be used to de/encode CBOR
85 strings. All boolean flags described below are by default I<disabled>.
86
87 The mutators for flags all return the CBOR object again and thus calls can
88 be chained:
89
90 #TODO
91 my $cbor = CBOR::XS->new->encode ({a => [1,2]});
92
93 =item $cbor = $cbor->max_depth ([$maximum_nesting_depth])
94
95 =item $max_depth = $cbor->get_max_depth
96
97 Sets the maximum nesting level (default C<512>) accepted while encoding
98 or decoding. If a higher nesting level is detected in CBOR data or a Perl
99 data structure, then the encoder and decoder will stop and croak at that
100 point.
101
102 Nesting level is defined by number of hash- or arrayrefs that the encoder
103 needs to traverse to reach a given point or the number of C<{> or C<[>
104 characters without their matching closing parenthesis crossed to reach a
105 given character in a string.
106
107 Setting the maximum depth to one disallows any nesting, so that ensures
108 that the object is only a single hash/object or array.
109
110 If no argument is given, the highest possible setting will be used, which
111 is rarely useful.
112
113 Note that nesting is implemented by recursion in C. The default value has
114 been chosen to be as large as typical operating systems allow without
115 crashing.
116
117 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
118
119 =item $cbor = $cbor->max_size ([$maximum_string_size])
120
121 =item $max_size = $cbor->get_max_size
122
123 Set the maximum length a CBOR string may have (in bytes) where decoding
124 is being attempted. The default is C<0>, meaning no limit. When C<decode>
125 is called on a string that is longer then this many bytes, it will not
126 attempt to decode the string but throw an exception. This setting has no
127 effect on C<encode> (yet).
128
129 If no argument is given, the limit check will be deactivated (same as when
130 C<0> is specified).
131
132 See SECURITY CONSIDERATIONS, below, for more info on why this is useful.
133
134 =item $cbor_data = $cbor->encode ($perl_scalar)
135
136 Converts the given Perl data structure (a scalar value) to its CBOR
137 representation.
138
139 =item $perl_scalar = $cbor->decode ($cbor_data)
140
141 The opposite of C<encode>: expects CBOR data and tries to parse it,
142 returning the resulting simple scalar or reference. Croaks on error.
143
144 =item ($perl_scalar, $octets) = $cbor->decode_prefix ($cbor_data)
145
146 This works like the C<decode> method, but instead of raising an exception
147 when there is trailing garbage after the CBOR string, it will silently
148 stop parsing there and return the number of characters consumed so far.
149
150 This is useful if your CBOR texts are not delimited by an outer protocol
151 and you need to know where the first CBOR string ends amd the next one
152 starts.
153
154 CBOR::XS->new->decode_prefix ("......")
155 => ("...", 3)
156
157 =back
158
159
160 =head1 MAPPING
161
162 This section describes how CBOR::XS maps Perl values to CBOR values and
163 vice versa. These mappings are designed to "do the right thing" in most
164 circumstances automatically, preserving round-tripping characteristics
165 (what you put in comes out as something equivalent).
166
167 For the more enlightened: note that in the following descriptions,
168 lowercase I<perl> refers to the Perl interpreter, while uppercase I<Perl>
169 refers to the abstract Perl language itself.
170
171
172 =head2 CBOR -> PERL
173
174 =over 4
175
176 =item integers
177
178 CBOR integers become (numeric) perl scalars. On perls without 64 bit
179 support, 64 bit integers will be truncated or otherwise corrupted.
180
181 =item byte strings
182
183 Byte strings will become octet strings in Perl (the byte values 0..255
184 will simply become characters of the same value in Perl).
185
186 =item UTF-8 strings
187
188 UTF-8 strings in CBOR will be decoded, i.e. the UTF-8 octets will be
189 decoded into proper Unicode code points. At the moment, the validity of
190 the UTF-8 octets will not be validated - corrupt input will result in
191 corrupted Perl strings.
192
193 =item arrays, maps
194
195 CBOR arrays and CBOR maps will be converted into references to a Perl
196 array or hash, respectively. The keys of the map will be stringified
197 during this process.
198
199 =item true, false
200
201 These CBOR values become C<CBOR::XS::true> and C<CBOR::XS::false>,
202 respectively. They are overloaded to act almost exactly like the numbers
203 C<1> and C<0>. You can check whether a scalar is a CBOR boolean by using
204 the C<CBOR::XS::is_bool> function.
205
206 =item null, undefined
207
208 CBOR null and undefined values becomes C<undef> in Perl (in the future,
209 Undefined may raise an exception or something else).
210
211 =item tags
212
213 Tagged items consists of a numeric tag and another CBOR value. The tag
214 55799 is ignored (this tag implements the magic header).
215
216 All other tags are currently converted into a L<CBOR::XS::Tagged> object,
217 which is simply a blessed array reference consistsing of the numeric tag
218 value followed by the (decoded) BOR value.
219
220 =item anything else
221
222 Anything else (e.g. unsupported simple values) will raise a decoding
223 error.
224
225 =back
226
227
228 =head2 PERL -> CBOR
229
230 The mapping from Perl to CBOR is slightly more difficult, as Perl is a
231 truly typeless language, so we can only guess which CBOR type is meant by
232 a Perl value.
233
234 =over 4
235
236 =item hash references
237
238 Perl hash references become CBOR maps. As there is no inherent ordering in
239 hash keys (or CBOR maps), they will usually be encoded in a pseudo-random
240 order.
241
242 Currently, tied hashes will use the indefinite-length format, while normal
243 hashes will use the fixed-length format.
244
245 =item array references
246
247 Perl array references become fixed-length CBOR arrays.
248
249 =item other references
250
251 Other unblessed references are generally not allowed and will cause an
252 exception to be thrown, except for references to the integers C<0> and
253 C<1>, which get turned into false and true in CBOR.
254
255 =item CBOR::XS::Tagged objects
256
257 Objects of this type must be arrays consisting of a single C<[tag, value]>
258 pair. The (numerical) tag will be encoded as a CBOR tag, the value will be
259 encoded as appropriate for the value.
260
261 =item CBOR::XS::true, CBOR::XS::false
262
263 These special values become CBOR true and CBOR false values,
264 respectively. You can also use C<\1> and C<\0> directly if you want.
265
266 =item blessed objects
267
268 Other blessed objects currently need to have a C<TO_CBOR> method. It
269 will be called on every object that is being serialised, and must return
270 something that can be encoded in CBOR.
271
272 =item simple scalars
273
274 TODO
275 Simple Perl scalars (any scalar that is not a reference) are the most
276 difficult objects to encode: CBOR::XS will encode undefined scalars as
277 CBOR null values, scalars that have last been used in a string context
278 before encoding as CBOR strings, and anything else as number value:
279
280 # dump as number
281 encode_cbor [2] # yields [2]
282 encode_cbor [-3.0e17] # yields [-3e+17]
283 my $value = 5; encode_cbor [$value] # yields [5]
284
285 # used as string, so dump as string
286 print $value;
287 encode_cbor [$value] # yields ["5"]
288
289 # undef becomes null
290 encode_cbor [undef] # yields [null]
291
292 You can force the type to be a CBOR string by stringifying it:
293
294 my $x = 3.1; # some variable containing a number
295 "$x"; # stringified
296 $x .= ""; # another, more awkward way to stringify
297 print $x; # perl does it for you, too, quite often
298
299 You can force the type to be a CBOR number by numifying it:
300
301 my $x = "3"; # some variable containing a string
302 $x += 0; # numify it, ensuring it will be dumped as a number
303 $x *= 1; # same thing, the choice is yours.
304
305 You can not currently force the type in other, less obscure, ways. Tell me
306 if you need this capability (but don't forget to explain why it's needed
307 :).
308
309 Perl values that seem to be integers generally use the shortest possible
310 representation. Floating-point values will use either the IEEE single
311 format if possible without loss of precision, otherwise the IEEE double
312 format will be used. Perls that use formats other than IEEE double to
313 represent numerical values are supported, but might suffer loss of
314 precision.
315
316 =back
317
318
319 =head2 MAGIC HEADER
320
321 There is no way to distinguish CBOR from other formats
322 programmatically. To make it easier to distinguish CBOR from other
323 formats, the CBOR specification has a special "magic string" that can be
324 prepended to any CBOR string without changing it's meaning.
325
326 This string is available as C<$CBOR::XS::MAGIC>. This module does not
327 prepend this string tot he CBOR data it generates, but it will ignroe it
328 if present, so users can prepend this string as a "file type" indicator as
329 required.
330
331
332 =head2 CBOR and JSON
333
334 CBOR is supposed to implement a superset of the JSON data model, and is,
335 with some coercion, able to represent all JSON texts (something that other
336 "binary JSON" formats such as BSON generally do not support).
337
338 CBOR implements some extra hints and support for JSON interoperability,
339 and the spec offers further guidance for conversion between CBOR and
340 JSON. None of this is currently implemented in CBOR, and the guidelines
341 in the spec do not result in correct round-tripping of data. If JSON
342 interoperability is improved in the future, then the goal will be to
343 ensure that decoded JSON data will round-trip encoding and decoding to
344 CBOR intact.
345
346
347 =head1 SECURITY CONSIDERATIONS
348
349 When you are using CBOR in a protocol, talking to untrusted potentially
350 hostile creatures requires relatively few measures.
351
352 First of all, your CBOR decoder should be secure, that is, should not have
353 any buffer overflows. Obviously, this module should ensure that and I am
354 trying hard on making that true, but you never know.
355
356 Second, you need to avoid resource-starving attacks. That means you should
357 limit the size of CBOR data you accept, or make sure then when your
358 resources run out, that's just fine (e.g. by using a separate process that
359 can crash safely). The size of a CBOR string in octets is usually a good
360 indication of the size of the resources required to decode it into a Perl
361 structure. While CBOR::XS can check the size of the CBOR text, it might be
362 too late when you already have it in memory, so you might want to check
363 the size before you accept the string.
364
365 Third, CBOR::XS recurses using the C stack when decoding objects and
366 arrays. The C stack is a limited resource: for instance, on my amd64
367 machine with 8MB of stack size I can decode around 180k nested arrays but
368 only 14k nested CBOR objects (due to perl itself recursing deeply on croak
369 to free the temporary). If that is exceeded, the program crashes. To be
370 conservative, the default nesting limit is set to 512. If your process
371 has a smaller stack, you should adjust this setting accordingly with the
372 C<max_depth> method.
373
374 Something else could bomb you, too, that I forgot to think of. In that
375 case, you get to keep the pieces. I am always open for hints, though...
376
377 Also keep in mind that CBOR::XS might leak contents of your Perl data
378 structures in its error messages, so when you serialise sensitive
379 information you might want to make sure that exceptions thrown by CBOR::XS
380 will not end up in front of untrusted eyes.
381
382 =head1 CBOR IMPLEMENTATION NOTES
383
384 This section contains some random implementation notes. They do not
385 describe guaranteed behaviour, but merely behaviour as-is implemented
386 right now.
387
388 64 bit integers are only properly decoded when Perl was built with 64 bit
389 support.
390
391 Strings and arrays are encoded with a definite length. Hashes as well,
392 unless they are tied (or otherwise magical).
393
394 Only the double data type is supported for NV data types - when Perl uses
395 long double to represent floating point values, they might not be encoded
396 properly. Half precision types are accepted, but not encoded.
397
398 Strict mode and canonical mode are not implemented.
399
400
401 =head1 THREADS
402
403 This module is I<not> guaranteed to be thread safe and there are no
404 plans to change this until Perl gets thread support (as opposed to the
405 horribly slow so-called "threads" which are simply slow and bloated
406 process simulations - use fork, it's I<much> faster, cheaper, better).
407
408 (It might actually work, but you have been warned).
409
410
411 =head1 BUGS
412
413 While the goal of this module is to be correct, that unfortunately does
414 not mean it's bug-free, only that I think its design is bug-free. If you
415 keep reporting bugs they will be fixed swiftly, though.
416
417 Please refrain from using rt.cpan.org or any other bug reporting
418 service. I put the contact address into my modules for a reason.
419
420 =cut
421
422 our $true = do { bless \(my $dummy = 1), "CBOR::XS::Boolean" };
423 our $false = do { bless \(my $dummy = 0), "CBOR::XS::Boolean" };
424
425 sub true() { $true }
426 sub false() { $false }
427
428 sub is_bool($) {
429 UNIVERSAL::isa $_[0], "CBOR::XS::Boolean"
430 # or UNIVERSAL::isa $_[0], "CBOR::Literal"
431 }
432
433 XSLoader::load "CBOR::XS", $VERSION;
434
435 package CBOR::XS::Boolean;
436
437 use overload
438 "0+" => sub { ${$_[0]} },
439 "++" => sub { $_[0] = ${$_[0]} + 1 },
440 "--" => sub { $_[0] = ${$_[0]} - 1 },
441 fallback => 1;
442
443 1;
444
445 =head1 SEE ALSO
446
447 The L<JSON> and L<JSON::XS> modules that do similar, but human-readable,
448 serialisation.
449
450 =head1 AUTHOR
451
452 Marc Lehmann <schmorp@schmorp.de>
453 http://home.schmorp.de/
454
455 =cut
456