--- CBOR-XS/doc/stringref.pod 2016/12/27 21:24:33 1.4 +++ CBOR-XS/doc/stringref.pod 2018/04/30 11:24:17 1.6 @@ -190,6 +190,25 @@ on top of every generic CBOR encoder/decoder (which might reorder entries in a map); they typically need to be integrated into their works. +=head2 DESIGN RATIONALE + +The stringref tag was chosen to be short, without requiring standards +action. The namespace tag is rare, so doesn't benefit from a short +encoding as much. + +Implicit tagging/counting was chosen to support stream encoders. Having +to tag strings first requires either multiple passes over the data (which +might not be available, ruling out some encoders) or tagging more strings +than needed (wasting space). Explicit tagging also isn't necessarily +better even under optimal conditions, as the explicit tags waste space. + +Stream decoders are affected less by implicit tagging than encoders. + +The namespace tag was introduced for two reasons: first to allow embedding +of CBOR strings into other CBOR strings, secondly for decoding efficiency +- the decoder only has to expect stringref tags inside namespaces and +therefore doesn't have to maintain extra state outside of them. + =head1 EXAMPLES The array-of-maps from the rationale example would normally compress to a @@ -242,8 +261,9 @@ "ssss", "qqq", "rrr", "ssss"] -The strings "1", "4" and "rrr" are too short to get an index assigned. All others that are -not encoded with a stringref do: +The strings "1", "4" and "rrr" are too short to get an index assigned. All +others that are not encoded with a stringref do (this assumes that JSON +strings are encoded as CBOR byte strings): d9 0100 # tag(256) 98 20 # array(32) @@ -301,10 +321,10 @@ 717171 # "qqq" 43 # bytes(3) 727272 # "rrr" - 44 # bytes(4) - 73737373 # "ssss" d8 19 # tag(25) 01 # unsigned(1) + 44 # bytes(4) + 73737373 # "ssss" d8 19 # tag(25) 17 # unsigned(23) 43 # bytes(3)