--- CBOR-XS/doc/stringref.pod 2013/11/28 09:13:12 1.2 +++ CBOR-XS/doc/stringref.pod 2013/11/28 10:16:50 1.3 @@ -1,12 +1,12 @@ =head1 REGISTRATION INFORMATION - Tag (stringref-namespace) + Tag 256 (stringref-namespace) Data Item multiple Semantics mark value as having string references Reference http://cbor.schmorp.de/stringref Contact Marc A. Lehmann - Tag (stringref) + Tag 25 (stringref) Data Item unsigned integer Semantics reference the nth previously seen string Reference http://cbor.schmorp.de/stringref @@ -51,9 +51,9 @@ =head1 DESCRIPTION -Stringref consists of two tags, stringref-namespace (value ), +Stringref consists of two tags, stringref-namespace (value C<256>), which marks a value as containing string references, and stringref (value -), which references a string previously encoded in the value. +C<25>), which references a string previously encoded in the value. The stringref-namespace tag is used to define a namespace for the string reference ids. stringref tags are only valid inside CBOR values marked @@ -184,13 +184,164 @@ =head2 IMPLEMENTATION NOTE - The semantics of stringref tags require the decoder to be aware and - the encoder to be under control of the sequence in which data items - are encoded into the CBOR stream. This means these tags cannot be - implemented on top of every generic CBOR encoder/decoder (which might - reorder entries in a map); they need to be integrated into their works. +The semantics of stringref tags require the decoder to be aware and the +encoder to be under control of the sequence in which data items are +encoded into the CBOR stream. This means these tags cannot be implemented +on top of every generic CBOR encoder/decoder (which might reorder entries +in a map); they typically need to be integrated into their works. =head1 EXAMPLES - +The array-of-maps from the rationale example would normally compress to a +CBOR text of 83 bytes. Using this extension where possible, this reduces +to 74 bytes: + + d9 0100 # tag(256) + 83 # array(3) + a3 # map(3) + 44 # bytes(4) + 72616e6b # "rank" + 04 # unsigned(4) + 45 # bytes(5) + 636f756e74 # "count" + 19 01a1 # unsigned(417) + 44 # bytes(4) + 6e616d65 # "name" + 48 # bytes(8) + 436f636b7461696c # "Cocktail" + a3 # map(3) + d8 19 # tag(25) + 02 # unsigned(2) + 44 # bytes(4) + 42617468 # "Bath" + d8 19 # tag(25) + 01 # unsigned(1) + 19 0138 # unsigned(312) + d8 19 # tag(25) + 00 # unsigned(0) + 04 # unsigned(4) + a3 # map(3) + d8 19 # tag(25) + 02 # unsigned(2) + 44 # bytes(4) + 466f6f64 # "Food" + d8 19 # tag(25) + 01 # unsigned(1) + 19 02b3 # unsigned(691) + d8 19 # tag(25) + 00 # unsigned(0) + 04 # unsigned(4) + +The following JSON array illustrates the effect of the index on the +minimum string length: + + [ "1", "222", "333", "4", "555", "666", "777", "888", "999", + "aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", + "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr", + "333", + "ssss", + "qqq", "rrr", "ssss"] + +The strings "1", "4" and "rrr" are too short to get an index assigned. All others that are +not encoded with a stringref do: + + d9 0100 # tag(256) + 98 20 # array(32) + 41 # bytes(1) + 31 # "1" + 43 # bytes(3) + 323232 # "222" + 43 # bytes(3) + 333333 # "333" + 41 # bytes(1) + 34 # "4" + 43 # bytes(3) + 353535 # "555" + 43 # bytes(3) + 363636 # "666" + 43 # bytes(3) + 373737 # "777" + 43 # bytes(3) + 383838 # "888" + 43 # bytes(3) + 393939 # "999" + 43 # bytes(3) + 616161 # "aaa" + 43 # bytes(3) + 626262 # "bbb" + 43 # bytes(3) + 636363 # "ccc" + 43 # bytes(3) + 646464 # "ddd" + 43 # bytes(3) + 656565 # "eee" + 43 # bytes(3) + 666666 # "fff" + 43 # bytes(3) + 676767 # "ggg" + 43 # bytes(3) + 686868 # "hhh" + 43 # bytes(3) + 696969 # "iii" + 43 # bytes(3) + 6a6a6a # "jjj" + 43 # bytes(3) + 6b6b6b # "kkk" + 43 # bytes(3) + 6c6c6c # "lll" + 43 # bytes(3) + 6d6d6d # "mmm" + 43 # bytes(3) + 6e6e6e # "nnn" + 43 # bytes(3) + 6f6f6f # "ooo" + 43 # bytes(3) + 707070 # "ppp" + 43 # bytes(3) + 717171 # "qqq" + 43 # bytes(3) + 727272 # "rrr" + 44 # bytes(4) + 73737373 # "ssss" + d8 19 # tag(25) + 01 # unsigned(1) + d8 19 # tag(25) + 17 # unsigned(23) + 43 # bytes(3) + 727272 # "rrr" + d8 19 # tag(25) + 18 18 # unsigned(24) + +This example shows three stringref-namespace tags, two of which are nested +inside another: + + 256(["aaa", 25(0), 256(["bbb", "aaa", 25(1)]), 256(["ccc", 25(0)]), 25(0)]) + + d9 0100 # tag(256) + 85 # array(5) + 63 # text(3) + 616161 # "aaa" + d8 19 # tag(25) + 00 # unsigned(0) + d9 0100 # tag(256) + 83 # array(3) + 63 # text(3) + 626262 # "bbb" + 63 # text(3) + 616161 # "aaa" + d8 19 # tag(25) + 01 # unsigned(1) + d9 0100 # tag(256) + 82 # array(2) + 63 # text(3) + 636363 # "ccc" + d8 19 # tag(25) + 00 # unsigned(0) + d8 19 # tag(25) + 00 # unsigned(0) + +The decoded data structure might look like this: + + ["aaa","aaa",["bbb","aaa","aaa"],["ccc","ccc"],"aaa"] +