ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/CBOR-XS/doc/value-sharing.pod
Revision: 1.10
Committed: Thu Sep 22 08:04:27 2022 UTC (19 months, 3 weeks ago) by root
Branch: MAIN
CVS Tags: rel-1_87, HEAD
Changes since 1.9: +27 -3 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 REGISTRATION INFORMATION
2
3 Tag 28 (shareable)
4 Data Item multiple
5 Semantics mark value as (potentially) shared
6 Reference http://cbor.schmorp.de/value-sharing
7 Contact Marc A. Lehmann <cbor@schmorp.de>
8
9 Tag 29 (sharedref)
10 Data Item unsigned integer
11 Semantics reference nth marked value
12 Reference http://cbor.schmorp.de/value-sharing
13 Contact Marc A. Lehmann <cbor@schmorp.de>
14
15 =head1 SUMMARY
16
17 These two tags can be used to implement shared value support in CBOR.
18
19 =head1 RATIONALE
20
21 Many serialisable data structures can contain values that are
22 shared. For example, in Perl, you could have an array with two hash
23 references pointing to the same object.
24
25 When serialising these data structures to CBOR, these values either
26 become unshared (duplicated), or, when the structure contains cycles,
27 they are not serialisable into CBOR at all.
28
29 This extension implements explicit shared value support - encoders need
30 to explicitly mark values as potentially shared and can later refer to
31 them.
32
33 This extension is not meant to save space in the CBOR representation by
34 encoding duplicated values only once - the shared values are supposed
35 to refer to the same value after decoding (e.g. when implemented as a
36 pointer, all references to the value should point to the same memory
37 object).
38
39 Space saving is a side effect of encoding data structures with large
40 shared substructures using this extension - the CBOR representation will
41 then have similar space requirements as the original data structure.
42
43 =head1 DESCRIPTION
44
45 To share values, the first occurrence of the value must be explicitly
46 tagged with the shareable tag (value C<28>).
47
48 Subsequent occurrences can then be encoded by encoding the index
49 of a previously marked value tagged with the sharedref tag (value
50 C<29>). That is, index 0 refers to the first value marked as
51 shareable in the CBOR stream, index 1 to the second and so on.
52
53 Any taggable value can be marked for sharing, and there is no requirement
54 to actually refer to a value marked as shareable - encoders can mark any
55 value they want without ever referring to them.
56
57 Implementors are advised that, to be able to encode cyclic structures,
58 it must be possible to refer to a value before it is completely
59 decoded. For example, during decoding of a map, some entries can
60 refer to the map being decoded. Thus an implementation cannot decode
61 a shareable value and then record it for later references - it has to
62 record the reference before decoding the value.
63
64 This can be handled in a variety of ways. For example, in Perl, values
65 not explicitly referenced (hash, array, scalar ref) can not (normally) be
66 shared, so it only has to handle explicit references. Other shared values
67 will usually become unshared, for efficiency reasons.
68
69 Implementations that do not support sharing can duplicate the values
70 after decoding, or they can use fix-up lists to fix shared references
71 after decoding.
72
73 =head2 IMPLEMENTATION NOTE
74
75 The semantics of shareable/sharedref tags require the decoder to be
76 aware and the encoder to be under control of the sequence in which data
77 items are encoded into the CBOR stream. This means these tags cannot be
78 implemented on top of every generic CBOR encoder/decoder (which might
79 reorder entries in a map); they typically need to be integrated into their
80 works.
81
82 =head1 EXAMPLES
83
84 =head2 A simple shared array
85
86 The following Perl fragment creates an array reference with three entries,
87 all of which array references themselves. All of them contain the same
88 data, but the first two actually reference the same (shared) data
89 structure:
90
91 $data = [ ([]) x 2, [] ];
92
93 This is another way to create it:
94
95 my $shared = [];
96 $data = [ $shared, $shared, [] ];
97
98 The shared aspect means that setting an element of the first nested
99 arrayref also makes it visible inside the second nested arrayref, as it is
100 the same array:
101
102 $data->[0][0] = "test";
103 # results in: [["test"],["test"],[]]
104
105 A standard CBOR en-/decoder will encode three separate arrays, which
106 will decode into three separate arrays again. So when the original data
107 structure is en- and then decoded, the arrays will be unshared:
108
109 $unshared = decode_cbor encode_cbor [ ([]) x 2, [] ];
110 $unshared->[0][0] = "test";
111 # results in: [["test"],[],[]]
112
113 The CBOR encoding might be:
114
115 83 # array(3)
116 80 # array(0)
117 80 # array(0)
118 80 # array(0)
119
120 The value-sharing extension allows an encoder to flag the two arrays and
121 keep the shared arrays actually shared. For example, the CBOR::XS encoder,
122 when configured to use the value sharing extension, will emit this CBOR
123 value:
124
125 [28([]), 29(0), []]
126
127 83 # array(3)
128 d8 1c # tag(28)
129 80 # array(0)
130 d8 1d # tag(29)
131 00 # unsigned(0)
132 80 # array(0)
133
134 When decoding it, the first two array references will point to the same array.
135
136 =head2 A cyclic data structure
137
138 The following cyclic Perl data structure references itself from within
139 itself. Here a decoder will see a reference to the shared value I<before>
140 it has completely decoded the shared value:
141
142 $data = [];
143 $data->[0] = $data; # make the first array element refer to the array
144
145 This data structure is not representable in standard CBOR. Using the value
146 sharing extension, it can be encoded as follows:
147
148 28([29(0)])
149
150 d8 1c # tag(28)
151 81 # array(1)
152 d8 1d # tag(29)
153 00 # unsigned(0)
154
155 =head1 SECURITY CONSIDERATIONS
156
157 Implementing this extension can open up a decoder for additional resource
158 exhaustion attacks.
159
160 The possibility to create cyclic data structures can create problems
161 for implementtaions that rely on garbage collection to clean up data
162 structures, and it could potentially lead to problems with algorithms that
163 can't cope with this, leading to infinite loops or similar problems. One
164 way to counter that is to not enable decoding of shared values by default,
165 so code that wants it can opt-in. Another way would be to disallow
166 decoding of cyclic data structures while still allowing other forms of
167 shared values (for example, by tracking whether a referenced value has
168 been fully decoded already and fail if not).
169
170 Implementations that decode shared values by duplicating them could
171 also suffer from excessive expansion (e.g. having a string referenced
172 multiple times in an array, then having this array referenced multiple
173 times in another array and so on). This is not usually a problem as most
174 implementations already treat arrays and objects as references, and can
175 therefore avoid duplication.
176
177 =head1 IMPLEMENTATIONS
178
179 This section lists known implementations of this extension (L<drop me a
180 mail|mailto:cbor@schmorp.de?Subject=CBOR-value-sharing> if you want to be
181 listed here).
182
183 =over 4
184
185 =item * [Perl] L<CBOR::XS|http://software.schmorp.de/pkg/CBOR-XS.html> (reference implementation)
186
187 =item * [JavaScript] L<borc-refs|https://github.com/sandhawke/borc-refs>
188
189 =item * [Perl] L<CBOR::Free|https://metacpan.org/pod/CBOR::Free>
190
191 =item * [Nodejs] L<node-cbor|https://github.com/hildjj/node-cbor/>
192
193 =back
194