1 |
root |
1.1 |
=head1 REGISTRATION INFORMATION |
2 |
|
|
|
3 |
root |
1.2 |
Tag 28 (shareable) |
4 |
root |
1.1 |
Data Item multiple |
5 |
|
|
Semantics mark value as (potentially) shared |
6 |
|
|
Reference http://cbor.schmorp.de/value-sharing |
7 |
|
|
Contact Marc A. Lehmann <cbor@schmorp.de> |
8 |
|
|
|
9 |
root |
1.2 |
Tag 29 (sharedref) |
10 |
root |
1.1 |
Data Item unsigned integer |
11 |
|
|
Semantics reference nth marked value |
12 |
|
|
Reference http://cbor.schmorp.de/value-sharing |
13 |
|
|
Contact Marc A. Lehmann <cbor@schmorp.de> |
14 |
|
|
|
15 |
|
|
=head1 SUMMARY |
16 |
|
|
|
17 |
|
|
These two tags can be used to implement shared value support in CBOR. |
18 |
|
|
|
19 |
|
|
=head1 RATIONALE |
20 |
|
|
|
21 |
|
|
Many serialisable data structures can contain values that are |
22 |
|
|
shared. For example, in Perl, you could have an array with two hash |
23 |
|
|
references pointing to the same object. |
24 |
|
|
|
25 |
|
|
When serialising these data structures to CBOR, these values either |
26 |
|
|
become unshared (duplicated), or, when the structure contains cycles, |
27 |
|
|
they are not serialisable into CBOR at all. |
28 |
|
|
|
29 |
|
|
This extension implements explicit shared value support - encoders need |
30 |
|
|
to explicitly mark values as potentially shared and can later refer to |
31 |
|
|
them. |
32 |
|
|
|
33 |
|
|
This extension is not meant to save space in the CBOR representation by |
34 |
|
|
encoding duplicated values only once - the shared values are supposed |
35 |
|
|
to refer to the same value after decoding (e.g. when implemented as a |
36 |
|
|
pointer, all references to the value should point to the same memory |
37 |
|
|
object). |
38 |
|
|
|
39 |
root |
1.2 |
Space saving is a side effect of encoding data structures with large |
40 |
|
|
shared substructures using this extension - the CBOR representation will |
41 |
root |
1.5 |
then have similar space requirements as the original data structure. |
42 |
root |
1.2 |
|
43 |
root |
1.1 |
=head1 DESCRIPTION |
44 |
|
|
|
45 |
|
|
To share values, the first occurrence of the value must be explicitly |
46 |
root |
1.2 |
tagged with the shareable tag (value C<28>). |
47 |
root |
1.1 |
|
48 |
|
|
Subsequent occurrences can then be encoded by encoding the index |
49 |
|
|
of a previously marked value tagged with the sharedref tag (value |
50 |
root |
1.2 |
C<29>). That is, index 0 refers to the first value marked as |
51 |
root |
1.1 |
shareable in the CBOR stream, index 1 to the second and so on. |
52 |
|
|
|
53 |
root |
1.10 |
Any taggable value can be marked for sharing, and there is no requirement |
54 |
|
|
to actually refer to a value marked as shareable - encoders can mark any |
55 |
|
|
value they want without ever referring to them. |
56 |
root |
1.1 |
|
57 |
|
|
Implementors are advised that, to be able to encode cyclic structures, |
58 |
|
|
it must be possible to refer to a value before it is completely |
59 |
|
|
decoded. For example, during decoding of a map, some entries can |
60 |
|
|
refer to the map being decoded. Thus an implementation cannot decode |
61 |
|
|
a shareable value and then record it for later references - it has to |
62 |
|
|
record the reference before decoding the value. |
63 |
|
|
|
64 |
|
|
This can be handled in a variety of ways. For example, in Perl, values |
65 |
root |
1.5 |
not explicitly referenced (hash, array, scalar ref) can not (normally) be |
66 |
|
|
shared, so it only has to handle explicit references. Other shared values |
67 |
|
|
will usually become unshared, for efficiency reasons. |
68 |
root |
1.1 |
|
69 |
|
|
Implementations that do not support sharing can duplicate the values |
70 |
|
|
after decoding, or they can use fix-up lists to fix shared references |
71 |
|
|
after decoding. |
72 |
|
|
|
73 |
|
|
=head2 IMPLEMENTATION NOTE |
74 |
|
|
|
75 |
|
|
The semantics of shareable/sharedref tags require the decoder to be |
76 |
|
|
aware and the encoder to be under control of the sequence in which data |
77 |
|
|
items are encoded into the CBOR stream. This means these tags cannot be |
78 |
|
|
implemented on top of every generic CBOR encoder/decoder (which might |
79 |
root |
1.2 |
reorder entries in a map); they typically need to be integrated into their |
80 |
|
|
works. |
81 |
root |
1.1 |
|
82 |
|
|
=head1 EXAMPLES |
83 |
|
|
|
84 |
root |
1.2 |
=head2 A simple shared array |
85 |
|
|
|
86 |
|
|
The following Perl fragment creates an array reference with three entries, |
87 |
|
|
all of which array references themselves. All of them contain the same |
88 |
|
|
data, but the first two actually reference the same (shared) data |
89 |
|
|
structure: |
90 |
|
|
|
91 |
|
|
$data = [ ([]) x 2, [] ]; |
92 |
|
|
|
93 |
|
|
This is another way to create it: |
94 |
|
|
|
95 |
|
|
my $shared = []; |
96 |
|
|
$data = [ $shared, $shared, [] ]; |
97 |
|
|
|
98 |
|
|
The shared aspect means that setting an element of the first nested |
99 |
|
|
arrayref also makes it visible inside the second nested arrayref, as it is |
100 |
|
|
the same array: |
101 |
|
|
|
102 |
|
|
$data->[0][0] = "test"; |
103 |
|
|
# results in: [["test"],["test"],[]] |
104 |
|
|
|
105 |
|
|
A standard CBOR en-/decoder will encode three separate arrays, which |
106 |
|
|
will decode into three separate arrays again. So when the original data |
107 |
|
|
structure is en- and then decoded, the arrays will be unshared: |
108 |
|
|
|
109 |
|
|
$unshared = decode_cbor encode_cbor [ ([]) x 2, [] ]; |
110 |
|
|
$unshared->[0][0] = "test"; |
111 |
|
|
# results in: [["test"],[],[]] |
112 |
|
|
|
113 |
|
|
The CBOR encoding might be: |
114 |
|
|
|
115 |
|
|
83 # array(3) |
116 |
|
|
80 # array(0) |
117 |
|
|
80 # array(0) |
118 |
|
|
80 # array(0) |
119 |
|
|
|
120 |
|
|
The value-sharing extension allows an encoder to flag the two arrays and |
121 |
|
|
keep the shared arrays actually shared. For example, the CBOR::XS encoder, |
122 |
|
|
when configured to use the value sharing extension, will emit this CBOR |
123 |
|
|
value: |
124 |
|
|
|
125 |
|
|
[28([]), 29(0), []] |
126 |
|
|
|
127 |
|
|
83 # array(3) |
128 |
|
|
d8 1c # tag(28) |
129 |
|
|
80 # array(0) |
130 |
|
|
d8 1d # tag(29) |
131 |
|
|
00 # unsigned(0) |
132 |
|
|
80 # array(0) |
133 |
|
|
|
134 |
|
|
When decoding it, the first two array references will point to the same array. |
135 |
|
|
|
136 |
|
|
=head2 A cyclic data structure |
137 |
|
|
|
138 |
|
|
The following cyclic Perl data structure references itself from within |
139 |
|
|
itself. Here a decoder will see a reference to the shared value I<before> |
140 |
root |
1.3 |
it has completely decoded the shared value: |
141 |
root |
1.2 |
|
142 |
|
|
$data = []; |
143 |
root |
1.4 |
$data->[0] = $data; # make the first array element refer to the array |
144 |
root |
1.2 |
|
145 |
|
|
This data structure is not representable in standard CBOR. Using the value |
146 |
|
|
sharing extension, it can be encoded as follows: |
147 |
|
|
|
148 |
|
|
28([29(0)]) |
149 |
|
|
|
150 |
|
|
d8 1c # tag(28) |
151 |
|
|
81 # array(1) |
152 |
|
|
d8 1d # tag(29) |
153 |
|
|
00 # unsigned(0) |
154 |
root |
1.1 |
|
155 |
root |
1.10 |
=head1 SECURITY CONSIDERATIONS |
156 |
|
|
|
157 |
|
|
Implementing this extension can open up a decoder for additional resource |
158 |
|
|
exhaustion attacks. |
159 |
|
|
|
160 |
|
|
The possibility to create cyclic data structures can create problems |
161 |
|
|
for implementtaions that rely on garbage collection to clean up data |
162 |
|
|
structures, and it could potentially lead to problems with algorithms that |
163 |
|
|
can't cope with this, leading to infinite loops or similar problems. One |
164 |
|
|
way to counter that is to not enable decoding of shared values by default, |
165 |
|
|
so code that wants it can opt-in. Another way would be to disallow |
166 |
|
|
decoding of cyclic data structures while still allowing other forms of |
167 |
|
|
shared values (for example, by tracking whether a referenced value has |
168 |
|
|
been fully decoded already and fail if not). |
169 |
|
|
|
170 |
|
|
Implementations that decode shared values by duplicating them could |
171 |
|
|
also suffer from excessive expansion (e.g. having a string referenced |
172 |
|
|
multiple times in an array, then having this array referenced multiple |
173 |
|
|
times in another array and so on). This is not usually a problem as most |
174 |
|
|
implementations already treat arrays and objects as references, and can |
175 |
|
|
therefore avoid duplication. |
176 |
|
|
|
177 |
root |
1.7 |
=head1 IMPLEMENTATIONS |
178 |
root |
1.6 |
|
179 |
root |
1.7 |
This section lists known implementations of this extension (L<drop me a |
180 |
|
|
mail|mailto:cbor@schmorp.de?Subject=CBOR-value-sharing> if you want to be |
181 |
|
|
listed here). |
182 |
root |
1.6 |
|
183 |
|
|
=over 4 |
184 |
|
|
|
185 |
root |
1.7 |
=item * [Perl] L<CBOR::XS|http://software.schmorp.de/pkg/CBOR-XS.html> (reference implementation) |
186 |
|
|
|
187 |
|
|
=item * [JavaScript] L<borc-refs|https://github.com/sandhawke/borc-refs> |
188 |
root |
1.6 |
|
189 |
root |
1.9 |
=item * [Perl] L<CBOR::Free|https://metacpan.org/pod/CBOR::Free> |
190 |
|
|
|
191 |
root |
1.10 |
=item * [Nodejs] L<node-cbor|https://github.com/hildjj/node-cbor/> |
192 |
|
|
|
193 |
root |
1.6 |
=back |
194 |
root |
1.8 |
|