ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/doc/stringref.pod
(Generate patch)

Comparing CBOR-XS/doc/stringref.pod (file contents):
Revision 1.2 by root, Thu Nov 28 09:13:12 2013 UTC vs.
Revision 1.6 by root, Mon Apr 30 11:24:17 2018 UTC

1=head1 REGISTRATION INFORMATION 1=head1 REGISTRATION INFORMATION
2 2
3 Tag <unassigned> (stringref-namespace) 3 Tag 256 (stringref-namespace)
4 Data Item multiple 4 Data Item multiple
5 Semantics mark value as having string references 5 Semantics mark value as having string references
6 Reference http://cbor.schmorp.de/stringref 6 Reference http://cbor.schmorp.de/stringref
7 Contact Marc A. Lehmann <cbor@schmorp.de> 7 Contact Marc A. Lehmann <cbor@schmorp.de>
8 8
9 Tag <unassigned> (stringref) 9 Tag 25 (stringref)
10 Data Item unsigned integer 10 Data Item unsigned integer
11 Semantics reference the nth previously seen string 11 Semantics reference the nth previously seen string
12 Reference http://cbor.schmorp.de/stringref 12 Reference http://cbor.schmorp.de/stringref
13 Contact Marc A. Lehmann <cbor@schmorp.de> 13 Contact Marc A. Lehmann <cbor@schmorp.de>
14 14
49This scheme can be used to reduce this overhead with a simple scheme that 49This scheme can be used to reduce this overhead with a simple scheme that
50is easily implementable. 50is easily implementable.
51 51
52=head1 DESCRIPTION 52=head1 DESCRIPTION
53 53
54Stringref consists of two tags, stringref-namespace (value <unassigned>), 54Stringref consists of two tags, stringref-namespace (value C<256>),
55which marks a value as containing string references, and stringref (value 55which marks a value as containing string references, and stringref (value
56<unassigned>), which references a string previously encoded in the value. 56C<25>), which references a string previously encoded in the value.
57 57
58The stringref-namespace tag is used to define a namespace for the string 58The stringref-namespace tag is used to define a namespace for the string
59reference ids. stringref tags are only valid inside CBOR values marked 59reference ids. stringref tags are only valid inside CBOR values marked
60with stringref-namespace. 60with stringref-namespace.
61 61
182the array length as the next index to be assigned, and pushing the 182the array length as the next index to be assigned, and pushing the
183string onto the end of the array when it is long enough. 183string onto the end of the array when it is long enough.
184 184
185=head2 IMPLEMENTATION NOTE 185=head2 IMPLEMENTATION NOTE
186 186
187 The semantics of stringref tags require the decoder to be aware and 187The semantics of stringref tags require the decoder to be aware and the
188 the encoder to be under control of the sequence in which data items 188encoder to be under control of the sequence in which data items are
189 are encoded into the CBOR stream. This means these tags cannot be 189encoded into the CBOR stream. This means these tags cannot be implemented
190 implemented on top of every generic CBOR encoder/decoder (which might 190on top of every generic CBOR encoder/decoder (which might reorder entries
191 reorder entries in a map); they need to be integrated into their works. 191in a map); they typically need to be integrated into their works.
192
193=head2 DESIGN RATIONALE
194
195The stringref tag was chosen to be short, without requiring standards
196action. The namespace tag is rare, so doesn't benefit from a short
197encoding as much.
198
199Implicit tagging/counting was chosen to support stream encoders. Having
200to tag strings first requires either multiple passes over the data (which
201might not be available, ruling out some encoders) or tagging more strings
202than needed (wasting space). Explicit tagging also isn't necessarily
203better even under optimal conditions, as the explicit tags waste space.
204
205Stream decoders are affected less by implicit tagging than encoders.
206
207The namespace tag was introduced for two reasons: first to allow embedding
208of CBOR strings into other CBOR strings, secondly for decoding efficiency
209- the decoder only has to expect stringref tags inside namespaces and
210therefore doesn't have to maintain extra state outside of them.
192 211
193=head1 EXAMPLES 212=head1 EXAMPLES
194 213
195<TBD> 214The array-of-maps from the rationale example would normally compress to a
215CBOR text of 83 bytes. Using this extension where possible, this reduces
216to 74 bytes:
196 217
218 d9 0100 # tag(256)
219 83 # array(3)
220 a3 # map(3)
221 44 # bytes(4)
222 72616e6b # "rank"
223 04 # unsigned(4)
224 45 # bytes(5)
225 636f756e74 # "count"
226 19 01a1 # unsigned(417)
227 44 # bytes(4)
228 6e616d65 # "name"
229 48 # bytes(8)
230 436f636b7461696c # "Cocktail"
231 a3 # map(3)
232 d8 19 # tag(25)
233 02 # unsigned(2)
234 44 # bytes(4)
235 42617468 # "Bath"
236 d8 19 # tag(25)
237 01 # unsigned(1)
238 19 0138 # unsigned(312)
239 d8 19 # tag(25)
240 00 # unsigned(0)
241 04 # unsigned(4)
242 a3 # map(3)
243 d8 19 # tag(25)
244 02 # unsigned(2)
245 44 # bytes(4)
246 466f6f64 # "Food"
247 d8 19 # tag(25)
248 01 # unsigned(1)
249 19 02b3 # unsigned(691)
250 d8 19 # tag(25)
251 00 # unsigned(0)
252 04 # unsigned(4)
253
254The following JSON array illustrates the effect of the index on the
255minimum string length:
256
257 [ "1", "222", "333", "4", "555", "666", "777", "888", "999",
258 "aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii",
259 "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr",
260 "333",
261 "ssss",
262 "qqq", "rrr", "ssss"]
263
264The strings "1", "4" and "rrr" are too short to get an index assigned. All
265others that are not encoded with a stringref do (this assumes that JSON
266strings are encoded as CBOR byte strings):
267
268 d9 0100 # tag(256)
269 98 20 # array(32)
270 41 # bytes(1)
271 31 # "1"
272 43 # bytes(3)
273 323232 # "222"
274 43 # bytes(3)
275 333333 # "333"
276 41 # bytes(1)
277 34 # "4"
278 43 # bytes(3)
279 353535 # "555"
280 43 # bytes(3)
281 363636 # "666"
282 43 # bytes(3)
283 373737 # "777"
284 43 # bytes(3)
285 383838 # "888"
286 43 # bytes(3)
287 393939 # "999"
288 43 # bytes(3)
289 616161 # "aaa"
290 43 # bytes(3)
291 626262 # "bbb"
292 43 # bytes(3)
293 636363 # "ccc"
294 43 # bytes(3)
295 646464 # "ddd"
296 43 # bytes(3)
297 656565 # "eee"
298 43 # bytes(3)
299 666666 # "fff"
300 43 # bytes(3)
301 676767 # "ggg"
302 43 # bytes(3)
303 686868 # "hhh"
304 43 # bytes(3)
305 696969 # "iii"
306 43 # bytes(3)
307 6a6a6a # "jjj"
308 43 # bytes(3)
309 6b6b6b # "kkk"
310 43 # bytes(3)
311 6c6c6c # "lll"
312 43 # bytes(3)
313 6d6d6d # "mmm"
314 43 # bytes(3)
315 6e6e6e # "nnn"
316 43 # bytes(3)
317 6f6f6f # "ooo"
318 43 # bytes(3)
319 707070 # "ppp"
320 43 # bytes(3)
321 717171 # "qqq"
322 43 # bytes(3)
323 727272 # "rrr"
324 d8 19 # tag(25)
325 01 # unsigned(1)
326 44 # bytes(4)
327 73737373 # "ssss"
328 d8 19 # tag(25)
329 17 # unsigned(23)
330 43 # bytes(3)
331 727272 # "rrr"
332 d8 19 # tag(25)
333 18 18 # unsigned(24)
334
335This example shows three stringref-namespace tags, two of which are nested
336inside another:
337
338 256(["aaa", 25(0), 256(["bbb", "aaa", 25(1)]), 256(["ccc", 25(0)]), 25(0)])
339
340 d9 0100 # tag(256)
341 85 # array(5)
342 63 # text(3)
343 616161 # "aaa"
344 d8 19 # tag(25)
345 00 # unsigned(0)
346 d9 0100 # tag(256)
347 83 # array(3)
348 63 # text(3)
349 626262 # "bbb"
350 63 # text(3)
351 616161 # "aaa"
352 d8 19 # tag(25)
353 01 # unsigned(1)
354 d9 0100 # tag(256)
355 82 # array(2)
356 63 # text(3)
357 636363 # "ccc"
358 d8 19 # tag(25)
359 00 # unsigned(0)
360 d8 19 # tag(25)
361 00 # unsigned(0)
362
363The decoded data structure might look like this:
364
365 ["aaa","aaa",["bbb","aaa","aaa"],["ccc","ccc"],"aaa"]
366
367=head1 IMPLEMENTATIONS
368
369This section lists known implementations of this extension (L<drop me a
370mail|mailto:cbor@schmorp.de?Subject=CBOR-stringref> if you want to be
371listed here).
372
373=over 4
374
375=item * [Perl] L<CBOR::XS|http://software.schmorp.de/pkg/CBOR-XS.html> (reference implementation)
376
377=back
378

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines