ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/doc/stringref.pod
(Generate patch)

Comparing CBOR-XS/doc/stringref.pod (file contents):
Revision 1.2 by root, Thu Nov 28 09:13:12 2013 UTC vs.
Revision 1.5 by root, Wed Apr 25 06:37:12 2018 UTC

1=head1 REGISTRATION INFORMATION 1=head1 REGISTRATION INFORMATION
2 2
3 Tag <unassigned> (stringref-namespace) 3 Tag 256 (stringref-namespace)
4 Data Item multiple 4 Data Item multiple
5 Semantics mark value as having string references 5 Semantics mark value as having string references
6 Reference http://cbor.schmorp.de/stringref 6 Reference http://cbor.schmorp.de/stringref
7 Contact Marc A. Lehmann <cbor@schmorp.de> 7 Contact Marc A. Lehmann <cbor@schmorp.de>
8 8
9 Tag <unassigned> (stringref) 9 Tag 25 (stringref)
10 Data Item unsigned integer 10 Data Item unsigned integer
11 Semantics reference the nth previously seen string 11 Semantics reference the nth previously seen string
12 Reference http://cbor.schmorp.de/stringref 12 Reference http://cbor.schmorp.de/stringref
13 Contact Marc A. Lehmann <cbor@schmorp.de> 13 Contact Marc A. Lehmann <cbor@schmorp.de>
14 14
49This scheme can be used to reduce this overhead with a simple scheme that 49This scheme can be used to reduce this overhead with a simple scheme that
50is easily implementable. 50is easily implementable.
51 51
52=head1 DESCRIPTION 52=head1 DESCRIPTION
53 53
54Stringref consists of two tags, stringref-namespace (value <unassigned>), 54Stringref consists of two tags, stringref-namespace (value C<256>),
55which marks a value as containing string references, and stringref (value 55which marks a value as containing string references, and stringref (value
56<unassigned>), which references a string previously encoded in the value. 56C<25>), which references a string previously encoded in the value.
57 57
58The stringref-namespace tag is used to define a namespace for the string 58The stringref-namespace tag is used to define a namespace for the string
59reference ids. stringref tags are only valid inside CBOR values marked 59reference ids. stringref tags are only valid inside CBOR values marked
60with stringref-namespace. 60with stringref-namespace.
61 61
182the array length as the next index to be assigned, and pushing the 182the array length as the next index to be assigned, and pushing the
183string onto the end of the array when it is long enough. 183string onto the end of the array when it is long enough.
184 184
185=head2 IMPLEMENTATION NOTE 185=head2 IMPLEMENTATION NOTE
186 186
187 The semantics of stringref tags require the decoder to be aware and 187The semantics of stringref tags require the decoder to be aware and the
188 the encoder to be under control of the sequence in which data items 188encoder to be under control of the sequence in which data items are
189 are encoded into the CBOR stream. This means these tags cannot be 189encoded into the CBOR stream. This means these tags cannot be implemented
190 implemented on top of every generic CBOR encoder/decoder (which might 190on top of every generic CBOR encoder/decoder (which might reorder entries
191 reorder entries in a map); they need to be integrated into their works. 191in a map); they typically need to be integrated into their works.
192 192
193=head1 EXAMPLES 193=head1 EXAMPLES
194 194
195<TBD> 195The array-of-maps from the rationale example would normally compress to a
196CBOR text of 83 bytes. Using this extension where possible, this reduces
197to 74 bytes:
196 198
199 d9 0100 # tag(256)
200 83 # array(3)
201 a3 # map(3)
202 44 # bytes(4)
203 72616e6b # "rank"
204 04 # unsigned(4)
205 45 # bytes(5)
206 636f756e74 # "count"
207 19 01a1 # unsigned(417)
208 44 # bytes(4)
209 6e616d65 # "name"
210 48 # bytes(8)
211 436f636b7461696c # "Cocktail"
212 a3 # map(3)
213 d8 19 # tag(25)
214 02 # unsigned(2)
215 44 # bytes(4)
216 42617468 # "Bath"
217 d8 19 # tag(25)
218 01 # unsigned(1)
219 19 0138 # unsigned(312)
220 d8 19 # tag(25)
221 00 # unsigned(0)
222 04 # unsigned(4)
223 a3 # map(3)
224 d8 19 # tag(25)
225 02 # unsigned(2)
226 44 # bytes(4)
227 466f6f64 # "Food"
228 d8 19 # tag(25)
229 01 # unsigned(1)
230 19 02b3 # unsigned(691)
231 d8 19 # tag(25)
232 00 # unsigned(0)
233 04 # unsigned(4)
234
235The following JSON array illustrates the effect of the index on the
236minimum string length:
237
238 [ "1", "222", "333", "4", "555", "666", "777", "888", "999",
239 "aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii",
240 "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr",
241 "333",
242 "ssss",
243 "qqq", "rrr", "ssss"]
244
245The strings "1", "4" and "rrr" are too short to get an index assigned. All
246others that are not encoded with a stringref do (this assumes that JSON
247strings are encoded as CBOR byte strings):
248
249 d9 0100 # tag(256)
250 98 20 # array(32)
251 41 # bytes(1)
252 31 # "1"
253 43 # bytes(3)
254 323232 # "222"
255 43 # bytes(3)
256 333333 # "333"
257 41 # bytes(1)
258 34 # "4"
259 43 # bytes(3)
260 353535 # "555"
261 43 # bytes(3)
262 363636 # "666"
263 43 # bytes(3)
264 373737 # "777"
265 43 # bytes(3)
266 383838 # "888"
267 43 # bytes(3)
268 393939 # "999"
269 43 # bytes(3)
270 616161 # "aaa"
271 43 # bytes(3)
272 626262 # "bbb"
273 43 # bytes(3)
274 636363 # "ccc"
275 43 # bytes(3)
276 646464 # "ddd"
277 43 # bytes(3)
278 656565 # "eee"
279 43 # bytes(3)
280 666666 # "fff"
281 43 # bytes(3)
282 676767 # "ggg"
283 43 # bytes(3)
284 686868 # "hhh"
285 43 # bytes(3)
286 696969 # "iii"
287 43 # bytes(3)
288 6a6a6a # "jjj"
289 43 # bytes(3)
290 6b6b6b # "kkk"
291 43 # bytes(3)
292 6c6c6c # "lll"
293 43 # bytes(3)
294 6d6d6d # "mmm"
295 43 # bytes(3)
296 6e6e6e # "nnn"
297 43 # bytes(3)
298 6f6f6f # "ooo"
299 43 # bytes(3)
300 707070 # "ppp"
301 43 # bytes(3)
302 717171 # "qqq"
303 43 # bytes(3)
304 727272 # "rrr"
305 d8 19 # tag(25)
306 01 # unsigned(1)
307 44 # bytes(4)
308 73737373 # "ssss"
309 d8 19 # tag(25)
310 17 # unsigned(23)
311 43 # bytes(3)
312 727272 # "rrr"
313 d8 19 # tag(25)
314 18 18 # unsigned(24)
315
316This example shows three stringref-namespace tags, two of which are nested
317inside another:
318
319 256(["aaa", 25(0), 256(["bbb", "aaa", 25(1)]), 256(["ccc", 25(0)]), 25(0)])
320
321 d9 0100 # tag(256)
322 85 # array(5)
323 63 # text(3)
324 616161 # "aaa"
325 d8 19 # tag(25)
326 00 # unsigned(0)
327 d9 0100 # tag(256)
328 83 # array(3)
329 63 # text(3)
330 626262 # "bbb"
331 63 # text(3)
332 616161 # "aaa"
333 d8 19 # tag(25)
334 01 # unsigned(1)
335 d9 0100 # tag(256)
336 82 # array(2)
337 63 # text(3)
338 636363 # "ccc"
339 d8 19 # tag(25)
340 00 # unsigned(0)
341 d8 19 # tag(25)
342 00 # unsigned(0)
343
344The decoded data structure might look like this:
345
346 ["aaa","aaa",["bbb","aaa","aaa"],["ccc","ccc"],"aaa"]
347
348=head1 IMPLEMENTATIONS
349
350This section lists known implementations of this extension (L<drop me a
351mail|mailto:cbor@schmorp.de?Subject=CBOR-stringref> if you want to be
352listed here).
353
354=over 4
355
356=item * [Perl] L<CBOR::XS|http://software.schmorp.de/pkg/CBOR-XS.html> (reference implementation)
357
358=back
359

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines