1 |
=head1 REGISTRATION INFORMATION |
2 |
|
3 |
Tag 28 (shareable) |
4 |
Data Item multiple |
5 |
Semantics mark value as (potentially) shared |
6 |
Reference http://cbor.schmorp.de/value-sharing |
7 |
Contact Marc A. Lehmann <cbor@schmorp.de> |
8 |
|
9 |
Tag 29 (sharedref) |
10 |
Data Item unsigned integer |
11 |
Semantics reference nth marked value |
12 |
Reference http://cbor.schmorp.de/value-sharing |
13 |
Contact Marc A. Lehmann <cbor@schmorp.de> |
14 |
|
15 |
=head1 SUMMARY |
16 |
|
17 |
These two tags can be used to implement shared value support in CBOR. |
18 |
|
19 |
=head1 RATIONALE |
20 |
|
21 |
Many serialisable data structures can contain values that are |
22 |
shared. For example, in Perl, you could have an array with two hash |
23 |
references pointing to the same object. |
24 |
|
25 |
When serialising these data structures to CBOR, these values either |
26 |
become unshared (duplicated), or, when the structure contains cycles, |
27 |
they are not serialisable into CBOR at all. |
28 |
|
29 |
This extension implements explicit shared value support - encoders need |
30 |
to explicitly mark values as potentially shared and can later refer to |
31 |
them. |
32 |
|
33 |
This extension is not meant to save space in the CBOR representation by |
34 |
encoding duplicated values only once - the shared values are supposed |
35 |
to refer to the same value after decoding (e.g. when implemented as a |
36 |
pointer, all references to the value should point to the same memory |
37 |
object). |
38 |
|
39 |
Space saving is a side effect of encoding data structures with large |
40 |
shared substructures using this extension - the CBOR representation will |
41 |
then have similar space requirements as the original data structure. |
42 |
|
43 |
=head1 DESCRIPTION |
44 |
|
45 |
To share values, the first occurrence of the value must be explicitly |
46 |
tagged with the shareable tag (value C<28>). |
47 |
|
48 |
Subsequent occurrences can then be encoded by encoding the index |
49 |
of a previously marked value tagged with the sharedref tag (value |
50 |
C<29>). That is, index 0 refers to the first value marked as |
51 |
shareable in the CBOR stream, index 1 to the second and so on. |
52 |
|
53 |
Any taggable value can be marked for sharing, and there is no requirement |
54 |
to actually refer to a value marked as shareable - encoders can mark any |
55 |
value they want without ever referring to them. |
56 |
|
57 |
Implementors are advised that, to be able to encode cyclic structures, |
58 |
it must be possible to refer to a value before it is completely |
59 |
decoded. For example, during decoding of a map, some entries can |
60 |
refer to the map being decoded. Thus an implementation cannot decode |
61 |
a shareable value and then record it for later references - it has to |
62 |
record the reference before decoding the value. |
63 |
|
64 |
This can be handled in a variety of ways. For example, in Perl, values |
65 |
not explicitly referenced (hash, array, scalar ref) can not (normally) be |
66 |
shared, so it only has to handle explicit references. Other shared values |
67 |
will usually become unshared, for efficiency reasons. |
68 |
|
69 |
Implementations that do not support sharing can duplicate the values |
70 |
after decoding, or they can use fix-up lists to fix shared references |
71 |
after decoding. |
72 |
|
73 |
=head2 IMPLEMENTATION NOTE |
74 |
|
75 |
The semantics of shareable/sharedref tags require the decoder to be |
76 |
aware and the encoder to be under control of the sequence in which data |
77 |
items are encoded into the CBOR stream. This means these tags cannot be |
78 |
implemented on top of every generic CBOR encoder/decoder (which might |
79 |
reorder entries in a map); they typically need to be integrated into their |
80 |
works. |
81 |
|
82 |
=head1 EXAMPLES |
83 |
|
84 |
=head2 A simple shared array |
85 |
|
86 |
The following Perl fragment creates an array reference with three entries, |
87 |
all of which array references themselves. All of them contain the same |
88 |
data, but the first two actually reference the same (shared) data |
89 |
structure: |
90 |
|
91 |
$data = [ ([]) x 2, [] ]; |
92 |
|
93 |
This is another way to create it: |
94 |
|
95 |
my $shared = []; |
96 |
$data = [ $shared, $shared, [] ]; |
97 |
|
98 |
The shared aspect means that setting an element of the first nested |
99 |
arrayref also makes it visible inside the second nested arrayref, as it is |
100 |
the same array: |
101 |
|
102 |
$data->[0][0] = "test"; |
103 |
# results in: [["test"],["test"],[]] |
104 |
|
105 |
A standard CBOR en-/decoder will encode three separate arrays, which |
106 |
will decode into three separate arrays again. So when the original data |
107 |
structure is en- and then decoded, the arrays will be unshared: |
108 |
|
109 |
$unshared = decode_cbor encode_cbor [ ([]) x 2, [] ]; |
110 |
$unshared->[0][0] = "test"; |
111 |
# results in: [["test"],[],[]] |
112 |
|
113 |
The CBOR encoding might be: |
114 |
|
115 |
83 # array(3) |
116 |
80 # array(0) |
117 |
80 # array(0) |
118 |
80 # array(0) |
119 |
|
120 |
The value-sharing extension allows an encoder to flag the two arrays and |
121 |
keep the shared arrays actually shared. For example, the CBOR::XS encoder, |
122 |
when configured to use the value sharing extension, will emit this CBOR |
123 |
value: |
124 |
|
125 |
[28([]), 29(0), []] |
126 |
|
127 |
83 # array(3) |
128 |
d8 1c # tag(28) |
129 |
80 # array(0) |
130 |
d8 1d # tag(29) |
131 |
00 # unsigned(0) |
132 |
80 # array(0) |
133 |
|
134 |
When decoding it, the first two array references will point to the same array. |
135 |
|
136 |
=head2 A cyclic data structure |
137 |
|
138 |
The following cyclic Perl data structure references itself from within |
139 |
itself. Here a decoder will see a reference to the shared value I<before> |
140 |
it has completely decoded the shared value: |
141 |
|
142 |
$data = []; |
143 |
$data->[0] = $data; # make the first array element refer to the array |
144 |
|
145 |
This data structure is not representable in standard CBOR. Using the value |
146 |
sharing extension, it can be encoded as follows: |
147 |
|
148 |
28([29(0)]) |
149 |
|
150 |
d8 1c # tag(28) |
151 |
81 # array(1) |
152 |
d8 1d # tag(29) |
153 |
00 # unsigned(0) |
154 |
|
155 |
=head1 SECURITY CONSIDERATIONS |
156 |
|
157 |
Implementing this extension can open up a decoder for additional resource |
158 |
exhaustion attacks. |
159 |
|
160 |
The possibility to create cyclic data structures can create problems |
161 |
for implementtaions that rely on garbage collection to clean up data |
162 |
structures, and it could potentially lead to problems with algorithms that |
163 |
can't cope with this, leading to infinite loops or similar problems. One |
164 |
way to counter that is to not enable decoding of shared values by default, |
165 |
so code that wants it can opt-in. Another way would be to disallow |
166 |
decoding of cyclic data structures while still allowing other forms of |
167 |
shared values (for example, by tracking whether a referenced value has |
168 |
been fully decoded already and fail if not). |
169 |
|
170 |
Implementations that decode shared values by duplicating them could |
171 |
also suffer from excessive expansion (e.g. having a string referenced |
172 |
multiple times in an array, then having this array referenced multiple |
173 |
times in another array and so on). This is not usually a problem as most |
174 |
implementations already treat arrays and objects as references, and can |
175 |
therefore avoid duplication. |
176 |
|
177 |
=head1 IMPLEMENTATIONS |
178 |
|
179 |
This section lists known implementations of this extension (L<drop me a |
180 |
mail|mailto:cbor@schmorp.de?Subject=CBOR-value-sharing> if you want to be |
181 |
listed here). |
182 |
|
183 |
=over 4 |
184 |
|
185 |
=item * [Perl] L<CBOR::XS|http://software.schmorp.de/pkg/CBOR-XS.html> (reference implementation) |
186 |
|
187 |
=item * [JavaScript] L<borc-refs|https://github.com/sandhawke/borc-refs> |
188 |
|
189 |
=item * [Perl] L<CBOR::Free|https://metacpan.org/pod/CBOR::Free> |
190 |
|
191 |
=item * [Nodejs] L<node-cbor|https://github.com/hildjj/node-cbor/> |
192 |
|
193 |
=back |
194 |
|