1 | =head1 REGISTRATION INFORMATION |
1 | =head1 REGISTRATION INFORMATION |
2 | |
2 | |
3 | Tag <unassigned> (stringref-namespace) |
3 | Tag 256 (stringref-namespace) |
4 | Data Item multiple |
4 | Data Item multiple |
5 | Semantics mark value as having string references |
5 | Semantics mark value as having string references |
6 | Reference http://cbor.schmorp.de/stringref |
6 | Reference http://cbor.schmorp.de/stringref |
7 | Contact Marc A. Lehmann <cbor@schmorp.de> |
7 | Contact Marc A. Lehmann <cbor@schmorp.de> |
8 | |
8 | |
9 | Tag <unassigned> (stringref) |
9 | Tag 25 (stringref) |
10 | Data Item unsigned integer |
10 | Data Item unsigned integer |
11 | Semantics reference the nth previously seen string |
11 | Semantics reference the nth previously seen string |
12 | Reference http://cbor.schmorp.de/stringref |
12 | Reference http://cbor.schmorp.de/stringref |
13 | Contact Marc A. Lehmann <cbor@schmorp.de> |
13 | Contact Marc A. Lehmann <cbor@schmorp.de> |
14 | |
14 | |
… | |
… | |
49 | This scheme can be used to reduce this overhead with a simple scheme that |
49 | This scheme can be used to reduce this overhead with a simple scheme that |
50 | is easily implementable. |
50 | is easily implementable. |
51 | |
51 | |
52 | =head1 DESCRIPTION |
52 | =head1 DESCRIPTION |
53 | |
53 | |
54 | Stringref consists of two tags, stringref-namespace (value <unassigned>), |
54 | Stringref consists of two tags, stringref-namespace (value C<256>), |
55 | which marks a value as containing string references, and stringref (value |
55 | which marks a value as containing string references, and stringref (value |
56 | <unassigned>), which references a string previously encoded in the value. |
56 | C<25>), which references a string previously encoded in the value. |
57 | |
57 | |
58 | The stringref-namespace tag is used to define a namespace for the string |
58 | The stringref-namespace tag is used to define a namespace for the string |
59 | reference ids. stringref tags are only valid inside CBOR values marked |
59 | reference ids. stringref tags are only valid inside CBOR values marked |
60 | with stringref-namespace. |
60 | with stringref-namespace. |
61 | |
61 | |
… | |
… | |
91 | |
91 | |
92 | string index minimum string length in octets |
92 | string index minimum string length in octets |
93 | 0 .. 23 3 |
93 | 0 .. 23 3 |
94 | 24 .. 255 4 |
94 | 24 .. 255 4 |
95 | 256 .. 65535 5 |
95 | 256 .. 65535 5 |
96 | 65536 .. 4294967295 6 |
96 | 65536 .. 4294967295 7 |
97 | 4294967296 .. 7 |
97 | 4294967296 .. 11 |
98 | |
98 | |
99 | The minimum string length is simply the length of the stringref tag (2 |
99 | The minimum string length is simply the length of the stringref tag (2 |
100 | octets), plus the minimum size the resulting index takes up, using major |
100 | octets), plus the minimum size the resulting index takes up, using major |
101 | type 0 encoding. |
101 | type 0 encoding. |
102 | |
102 | |
… | |
… | |
182 | the array length as the next index to be assigned, and pushing the |
182 | the array length as the next index to be assigned, and pushing the |
183 | string onto the end of the array when it is long enough. |
183 | string onto the end of the array when it is long enough. |
184 | |
184 | |
185 | =head2 IMPLEMENTATION NOTE |
185 | =head2 IMPLEMENTATION NOTE |
186 | |
186 | |
187 | The semantics of stringref tags require the decoder to be aware and |
187 | The semantics of stringref tags require the decoder to be aware and the |
188 | the encoder to be under control of the sequence in which data items |
188 | encoder to be under control of the sequence in which data items are |
189 | are encoded into the CBOR stream. This means these tags cannot be |
189 | encoded into the CBOR stream. This means these tags cannot be implemented |
190 | implemented on top of every generic CBOR encoder/decoder (which might |
190 | on top of every generic CBOR encoder/decoder (which might reorder entries |
191 | reorder entries in a map); they need to be integrated into their works. |
191 | in a map); they typically need to be integrated into their works. |
192 | |
192 | |
193 | =head1 EXAMPLES |
193 | =head1 EXAMPLES |
194 | |
194 | |
195 | <TBD> |
195 | The array-of-maps from the rationale example would normally compress to a |
|
|
196 | CBOR text of 83 bytes. Using this extension where possible, this reduces |
|
|
197 | to 74 bytes: |
196 | |
198 | |
|
|
199 | d9 0100 # tag(256) |
|
|
200 | 83 # array(3) |
|
|
201 | a3 # map(3) |
|
|
202 | 44 # bytes(4) |
|
|
203 | 72616e6b # "rank" |
|
|
204 | 04 # unsigned(4) |
|
|
205 | 45 # bytes(5) |
|
|
206 | 636f756e74 # "count" |
|
|
207 | 19 01a1 # unsigned(417) |
|
|
208 | 44 # bytes(4) |
|
|
209 | 6e616d65 # "name" |
|
|
210 | 48 # bytes(8) |
|
|
211 | 436f636b7461696c # "Cocktail" |
|
|
212 | a3 # map(3) |
|
|
213 | d8 19 # tag(25) |
|
|
214 | 02 # unsigned(2) |
|
|
215 | 44 # bytes(4) |
|
|
216 | 42617468 # "Bath" |
|
|
217 | d8 19 # tag(25) |
|
|
218 | 01 # unsigned(1) |
|
|
219 | 19 0138 # unsigned(312) |
|
|
220 | d8 19 # tag(25) |
|
|
221 | 00 # unsigned(0) |
|
|
222 | 04 # unsigned(4) |
|
|
223 | a3 # map(3) |
|
|
224 | d8 19 # tag(25) |
|
|
225 | 02 # unsigned(2) |
|
|
226 | 44 # bytes(4) |
|
|
227 | 466f6f64 # "Food" |
|
|
228 | d8 19 # tag(25) |
|
|
229 | 01 # unsigned(1) |
|
|
230 | 19 02b3 # unsigned(691) |
|
|
231 | d8 19 # tag(25) |
|
|
232 | 00 # unsigned(0) |
|
|
233 | 04 # unsigned(4) |
|
|
234 | |
|
|
235 | The following JSON array illustrates the effect of the index on the |
|
|
236 | minimum string length: |
|
|
237 | |
|
|
238 | [ "1", "222", "333", "4", "555", "666", "777", "888", "999", |
|
|
239 | "aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", |
|
|
240 | "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr", |
|
|
241 | "333", |
|
|
242 | "ssss", |
|
|
243 | "qqq", "rrr", "ssss"] |
|
|
244 | |
|
|
245 | The strings "1", "4" and "rrr" are too short to get an index assigned. All others that are |
|
|
246 | not encoded with a stringref do: |
|
|
247 | |
|
|
248 | d9 0100 # tag(256) |
|
|
249 | 98 20 # array(32) |
|
|
250 | 41 # bytes(1) |
|
|
251 | 31 # "1" |
|
|
252 | 43 # bytes(3) |
|
|
253 | 323232 # "222" |
|
|
254 | 43 # bytes(3) |
|
|
255 | 333333 # "333" |
|
|
256 | 41 # bytes(1) |
|
|
257 | 34 # "4" |
|
|
258 | 43 # bytes(3) |
|
|
259 | 353535 # "555" |
|
|
260 | 43 # bytes(3) |
|
|
261 | 363636 # "666" |
|
|
262 | 43 # bytes(3) |
|
|
263 | 373737 # "777" |
|
|
264 | 43 # bytes(3) |
|
|
265 | 383838 # "888" |
|
|
266 | 43 # bytes(3) |
|
|
267 | 393939 # "999" |
|
|
268 | 43 # bytes(3) |
|
|
269 | 616161 # "aaa" |
|
|
270 | 43 # bytes(3) |
|
|
271 | 626262 # "bbb" |
|
|
272 | 43 # bytes(3) |
|
|
273 | 636363 # "ccc" |
|
|
274 | 43 # bytes(3) |
|
|
275 | 646464 # "ddd" |
|
|
276 | 43 # bytes(3) |
|
|
277 | 656565 # "eee" |
|
|
278 | 43 # bytes(3) |
|
|
279 | 666666 # "fff" |
|
|
280 | 43 # bytes(3) |
|
|
281 | 676767 # "ggg" |
|
|
282 | 43 # bytes(3) |
|
|
283 | 686868 # "hhh" |
|
|
284 | 43 # bytes(3) |
|
|
285 | 696969 # "iii" |
|
|
286 | 43 # bytes(3) |
|
|
287 | 6a6a6a # "jjj" |
|
|
288 | 43 # bytes(3) |
|
|
289 | 6b6b6b # "kkk" |
|
|
290 | 43 # bytes(3) |
|
|
291 | 6c6c6c # "lll" |
|
|
292 | 43 # bytes(3) |
|
|
293 | 6d6d6d # "mmm" |
|
|
294 | 43 # bytes(3) |
|
|
295 | 6e6e6e # "nnn" |
|
|
296 | 43 # bytes(3) |
|
|
297 | 6f6f6f # "ooo" |
|
|
298 | 43 # bytes(3) |
|
|
299 | 707070 # "ppp" |
|
|
300 | 43 # bytes(3) |
|
|
301 | 717171 # "qqq" |
|
|
302 | 43 # bytes(3) |
|
|
303 | 727272 # "rrr" |
|
|
304 | 44 # bytes(4) |
|
|
305 | 73737373 # "ssss" |
|
|
306 | d8 19 # tag(25) |
|
|
307 | 01 # unsigned(1) |
|
|
308 | d8 19 # tag(25) |
|
|
309 | 17 # unsigned(23) |
|
|
310 | 43 # bytes(3) |
|
|
311 | 727272 # "rrr" |
|
|
312 | d8 19 # tag(25) |
|
|
313 | 18 18 # unsigned(24) |
|
|
314 | |
|
|
315 | This example shows three stringref-namespace tags, two of which are nested |
|
|
316 | inside another: |
|
|
317 | |
|
|
318 | 256(["aaa", 25(0), 256(["bbb", "aaa", 25(1)]), 256(["ccc", 25(0)]), 25(0)]) |
|
|
319 | |
|
|
320 | d9 0100 # tag(256) |
|
|
321 | 85 # array(5) |
|
|
322 | 63 # text(3) |
|
|
323 | 616161 # "aaa" |
|
|
324 | d8 19 # tag(25) |
|
|
325 | 00 # unsigned(0) |
|
|
326 | d9 0100 # tag(256) |
|
|
327 | 83 # array(3) |
|
|
328 | 63 # text(3) |
|
|
329 | 626262 # "bbb" |
|
|
330 | 63 # text(3) |
|
|
331 | 616161 # "aaa" |
|
|
332 | d8 19 # tag(25) |
|
|
333 | 01 # unsigned(1) |
|
|
334 | d9 0100 # tag(256) |
|
|
335 | 82 # array(2) |
|
|
336 | 63 # text(3) |
|
|
337 | 636363 # "ccc" |
|
|
338 | d8 19 # tag(25) |
|
|
339 | 00 # unsigned(0) |
|
|
340 | d8 19 # tag(25) |
|
|
341 | 00 # unsigned(0) |
|
|
342 | |
|
|
343 | The decoded data structure might look like this: |
|
|
344 | |
|
|
345 | ["aaa","aaa",["bbb","aaa","aaa"],["ccc","ccc"],"aaa"] |
|
|
346 | |
|
|
347 | |