ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/doc/stringref.pod
(Generate patch)

Comparing CBOR-XS/doc/stringref.pod (file contents):
Revision 1.2 by root, Thu Nov 28 09:13:12 2013 UTC vs.
Revision 1.3 by root, Thu Nov 28 10:16:50 2013 UTC

1=head1 REGISTRATION INFORMATION 1=head1 REGISTRATION INFORMATION
2 2
3 Tag <unassigned> (stringref-namespace) 3 Tag 256 (stringref-namespace)
4 Data Item multiple 4 Data Item multiple
5 Semantics mark value as having string references 5 Semantics mark value as having string references
6 Reference http://cbor.schmorp.de/stringref 6 Reference http://cbor.schmorp.de/stringref
7 Contact Marc A. Lehmann <cbor@schmorp.de> 7 Contact Marc A. Lehmann <cbor@schmorp.de>
8 8
9 Tag <unassigned> (stringref) 9 Tag 25 (stringref)
10 Data Item unsigned integer 10 Data Item unsigned integer
11 Semantics reference the nth previously seen string 11 Semantics reference the nth previously seen string
12 Reference http://cbor.schmorp.de/stringref 12 Reference http://cbor.schmorp.de/stringref
13 Contact Marc A. Lehmann <cbor@schmorp.de> 13 Contact Marc A. Lehmann <cbor@schmorp.de>
14 14
49This scheme can be used to reduce this overhead with a simple scheme that 49This scheme can be used to reduce this overhead with a simple scheme that
50is easily implementable. 50is easily implementable.
51 51
52=head1 DESCRIPTION 52=head1 DESCRIPTION
53 53
54Stringref consists of two tags, stringref-namespace (value <unassigned>), 54Stringref consists of two tags, stringref-namespace (value C<256>),
55which marks a value as containing string references, and stringref (value 55which marks a value as containing string references, and stringref (value
56<unassigned>), which references a string previously encoded in the value. 56C<25>), which references a string previously encoded in the value.
57 57
58The stringref-namespace tag is used to define a namespace for the string 58The stringref-namespace tag is used to define a namespace for the string
59reference ids. stringref tags are only valid inside CBOR values marked 59reference ids. stringref tags are only valid inside CBOR values marked
60with stringref-namespace. 60with stringref-namespace.
61 61
182the array length as the next index to be assigned, and pushing the 182the array length as the next index to be assigned, and pushing the
183string onto the end of the array when it is long enough. 183string onto the end of the array when it is long enough.
184 184
185=head2 IMPLEMENTATION NOTE 185=head2 IMPLEMENTATION NOTE
186 186
187 The semantics of stringref tags require the decoder to be aware and 187The semantics of stringref tags require the decoder to be aware and the
188 the encoder to be under control of the sequence in which data items 188encoder to be under control of the sequence in which data items are
189 are encoded into the CBOR stream. This means these tags cannot be 189encoded into the CBOR stream. This means these tags cannot be implemented
190 implemented on top of every generic CBOR encoder/decoder (which might 190on top of every generic CBOR encoder/decoder (which might reorder entries
191 reorder entries in a map); they need to be integrated into their works. 191in a map); they typically need to be integrated into their works.
192 192
193=head1 EXAMPLES 193=head1 EXAMPLES
194 194
195<TBD> 195The array-of-maps from the rationale example would normally compress to a
196CBOR text of 83 bytes. Using this extension where possible, this reduces
197to 74 bytes:
196 198
199 d9 0100 # tag(256)
200 83 # array(3)
201 a3 # map(3)
202 44 # bytes(4)
203 72616e6b # "rank"
204 04 # unsigned(4)
205 45 # bytes(5)
206 636f756e74 # "count"
207 19 01a1 # unsigned(417)
208 44 # bytes(4)
209 6e616d65 # "name"
210 48 # bytes(8)
211 436f636b7461696c # "Cocktail"
212 a3 # map(3)
213 d8 19 # tag(25)
214 02 # unsigned(2)
215 44 # bytes(4)
216 42617468 # "Bath"
217 d8 19 # tag(25)
218 01 # unsigned(1)
219 19 0138 # unsigned(312)
220 d8 19 # tag(25)
221 00 # unsigned(0)
222 04 # unsigned(4)
223 a3 # map(3)
224 d8 19 # tag(25)
225 02 # unsigned(2)
226 44 # bytes(4)
227 466f6f64 # "Food"
228 d8 19 # tag(25)
229 01 # unsigned(1)
230 19 02b3 # unsigned(691)
231 d8 19 # tag(25)
232 00 # unsigned(0)
233 04 # unsigned(4)
234
235The following JSON array illustrates the effect of the index on the
236minimum string length:
237
238 [ "1", "222", "333", "4", "555", "666", "777", "888", "999",
239 "aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii",
240 "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr",
241 "333",
242 "ssss",
243 "qqq", "rrr", "ssss"]
244
245The strings "1", "4" and "rrr" are too short to get an index assigned. All others that are
246not encoded with a stringref do:
247
248 d9 0100 # tag(256)
249 98 20 # array(32)
250 41 # bytes(1)
251 31 # "1"
252 43 # bytes(3)
253 323232 # "222"
254 43 # bytes(3)
255 333333 # "333"
256 41 # bytes(1)
257 34 # "4"
258 43 # bytes(3)
259 353535 # "555"
260 43 # bytes(3)
261 363636 # "666"
262 43 # bytes(3)
263 373737 # "777"
264 43 # bytes(3)
265 383838 # "888"
266 43 # bytes(3)
267 393939 # "999"
268 43 # bytes(3)
269 616161 # "aaa"
270 43 # bytes(3)
271 626262 # "bbb"
272 43 # bytes(3)
273 636363 # "ccc"
274 43 # bytes(3)
275 646464 # "ddd"
276 43 # bytes(3)
277 656565 # "eee"
278 43 # bytes(3)
279 666666 # "fff"
280 43 # bytes(3)
281 676767 # "ggg"
282 43 # bytes(3)
283 686868 # "hhh"
284 43 # bytes(3)
285 696969 # "iii"
286 43 # bytes(3)
287 6a6a6a # "jjj"
288 43 # bytes(3)
289 6b6b6b # "kkk"
290 43 # bytes(3)
291 6c6c6c # "lll"
292 43 # bytes(3)
293 6d6d6d # "mmm"
294 43 # bytes(3)
295 6e6e6e # "nnn"
296 43 # bytes(3)
297 6f6f6f # "ooo"
298 43 # bytes(3)
299 707070 # "ppp"
300 43 # bytes(3)
301 717171 # "qqq"
302 43 # bytes(3)
303 727272 # "rrr"
304 44 # bytes(4)
305 73737373 # "ssss"
306 d8 19 # tag(25)
307 01 # unsigned(1)
308 d8 19 # tag(25)
309 17 # unsigned(23)
310 43 # bytes(3)
311 727272 # "rrr"
312 d8 19 # tag(25)
313 18 18 # unsigned(24)
314
315This example shows three stringref-namespace tags, two of which are nested
316inside another:
317
318 256(["aaa", 25(0), 256(["bbb", "aaa", 25(1)]), 256(["ccc", 25(0)]), 25(0)])
319
320 d9 0100 # tag(256)
321 85 # array(5)
322 63 # text(3)
323 616161 # "aaa"
324 d8 19 # tag(25)
325 00 # unsigned(0)
326 d9 0100 # tag(256)
327 83 # array(3)
328 63 # text(3)
329 626262 # "bbb"
330 63 # text(3)
331 616161 # "aaa"
332 d8 19 # tag(25)
333 01 # unsigned(1)
334 d9 0100 # tag(256)
335 82 # array(2)
336 63 # text(3)
337 636363 # "ccc"
338 d8 19 # tag(25)
339 00 # unsigned(0)
340 d8 19 # tag(25)
341 00 # unsigned(0)
342
343The decoded data structure might look like this:
344
345 ["aaa","aaa",["bbb","aaa","aaa"],["ccc","ccc"],"aaa"]
346
347

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines