[ViewVC] Diff of: cvs/CBOR-XS/XS.pm

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.31 by root, Sat Nov 30 18:13:53 2013 UTC vs.
Revision 1.41 by root, Mon Jan 6 04:15:31 2014 UTC

…		…
64		64
65	package CBOR::XS;	65	package CBOR::XS;
66		66
67	use common::sense;	67	use common::sense;
68		68
69	our $VERSION = '1.0';	69	our $VERSION = 1.25;
70	our @ISA = qw(Exporter);	70	our @ISA = qw(Exporter);
71		71
72	our @EXPORT = qw(encode_cbor decode_cbor);	72	our @EXPORT = qw(encode_cbor decode_cbor);
73		73
74	use Exporter;	74	use Exporter;
…		…
218	isn't prepared for this will not leak memory.	218	isn't prepared for this will not leak memory.
219		219
220	If C<$enable> is false (the default), then C<decode> will throw an error	220	If C<$enable> is false (the default), then C<decode> will throw an error
221	when it encounters a self-referential/cyclic data structure.	221	when it encounters a self-referential/cyclic data structure.
222		222
		223	FUTURE DIRECTION: the motivation behind this option is to avoid I<real>
		224	cycles - future versions of this module might chose to decode cyclic data
		225	structures using weak references when this option is off, instead of
		226	throwing an error.
		227
223	This option does not affect C<encode> in any way - shared values and	228	This option does not affect C<encode> in any way - shared values and
224	references will always be decoded properly if present.	229	references will always be decoded properly if present.
225		230
226	=item $cbor = $cbor->pack_strings ([$enable])	231	=item $cbor = $cbor->pack_strings ([$enable])
227		232
…		…
242	the standard CBOR way.	247	the standard CBOR way.
243		248
244	This option does not affect C<decode> in any way - string references will	249	This option does not affect C<decode> in any way - string references will
245	always be decoded properly if present.	250	always be decoded properly if present.
246		251
		252	=item $cbor = $cbor->validate_utf8 ([$enable])
		253
		254	=item $enabled = $cbor->get_validate_utf8
		255
		256	If C<$enable> is true (or missing), then C<decode> will validate that
		257	elements (text strings) containing UTF-8 data in fact contain valid UTF-8
		258	data (instead of blindly accepting it). This validation obviously takes
		259	extra time during decoding.
		260
		261	The concept of "valid UTF-8" used is perl's concept, which is a superset
		262	of the official UTF-8.
		263
		264	If C<$enable> is false (the default), then C<decode> will blindly accept
		265	UTF-8 data, marking them as valid UTF-8 in the resulting data structure
		266	regardless of whether thats true or not.
		267
		268	Perl isn't too happy about corrupted UTF-8 in strings, but should
		269	generally not crash or do similarly evil things. Extensions might be not
		270	so forgiving, so it's recommended to turn on this setting if you receive
		271	untrusted CBOR.
		272
		273	This option does not affect C<encode> in any way - strings that are
		274	supposedly valid UTF-8 will simply be dumped into the resulting CBOR
		275	string without checking whether that is, in fact, true or not.
		276
247	=item $cbor = $cbor->filter ([$cb->($tag, $value)])	277	=item $cbor = $cbor->filter ([$cb->($tag, $value)])
248		278
249	=item $cb_or_undef = $cbor->get_filter	279	=item $cb_or_undef = $cbor->get_filter
250		280
251	Sets or replaces the tagged value decoding filter (when C<$cb> is	281	Sets or replaces the tagged value decoding filter (when C<$cb> is
…		…
305	and you need to know where the first CBOR string ends amd the next one	335	and you need to know where the first CBOR string ends amd the next one
306	starts.	336	starts.
307		337
308	CBOR::XS->new->decode_prefix ("......")	338	CBOR::XS->new->decode_prefix ("......")
309	=> ("...", 3)	339	=> ("...", 3)
		340
		341	=back
		342
		343	=head2 INCREMENTAL PARSING
		344
		345	In some cases, there is the need for incremental parsing of JSON
		346	texts. While this module always has to keep both CBOR text and resulting
		347	Perl data structure in memory at one time, it does allow you to parse a
		348	CBOR stream incrementally, using a similar to using "decode_prefix" to see
		349	if a full CBOR object is available, but is much more efficient.
		350
		351	It basically works by parsing as much of a CBOR string as possible - if
		352	the CBOR data is not complete yet, the pasrer will remember where it was,
		353	to be able to restart when more data has been accumulated. Once enough
		354	data is available to either decode a complete CBOR value or raise an
		355	error, a real decode will be attempted.
		356
		357	A typical use case would be a network protocol that consists of sending
		358	and receiving CBOR-encoded messages. The solution that works with CBOR and
		359	about anything else is by prepending a length to every CBOR value, so the
		360	receiver knows how many octets to read. More compact (and slightly slower)
		361	would be to just send CBOR values back-to-back, as C<CBOR::XS> knows where
		362	a CBOR value ends, and doesn't need an explicit length.
		363
		364	The following methods help with this:
		365
		366	=over 4
		367
		368	=item @decoded = $cbor->incr_parse ($buffer)
		369
		370	This method attempts to decode exactly one CBOR value from the beginning
		371	of the given C<$buffer>. The value is removed from the C<$buffer> on
		372	success. When C<$buffer> doesn't contain a complete value yet, it returns
		373	nothing. Finally, when the C<$buffer> doesn't start with something
		374	that could ever be a valid CBOR value, it raises an exception, just as
		375	C<decode> would. In the latter case the decoder state is undefined and
		376	must be reset before being able to parse further.
		377
		378	This method modifies the C<$buffer> in place. When no CBOR value can be
		379	decoded, the decoder stores the current string offset. On the next call,
		380	continues decoding at the place where it stopped before. For this to make
		381	sense, the C<$buffer> must begin with the same octets as on previous
		382	unsuccessful calls.
		383
		384	You can call this method in scalar context, in which case it either
		385	returns a decoded value or C<undef>. This makes it impossible to
		386	distinguish between CBOR null values (which decode to C<undef>) and an
		387	unsuccessful decode, which is often acceptable.
		388
		389	=item @decoded = $cbor->incr_parse_multiple ($buffer)
		390
		391	Same as C<incr_parse>, but attempts to decode as many CBOR values as
		392	possible in one go, instead of at most one. Calls to C<incr_parse> and
		393	C<incr_parse_multiple> can be interleaved.
		394
		395	=item $cbor->incr_reset
		396
		397	Resets the incremental decoder. This throws away any saved state, so that
		398	subsequent calls to C<incr_parse> or C<incr_parse_multiple> start to parse
		399	a new CBOR value from the beginning of the C<$buffer> again.
		400
		401	This method can be caled at any time, but it I<must> be called if you want
		402	to change your C<$buffer> or there was a decoding error and you want to
		403	reuse the C<$cbor> object for future incremental parsings.
310		404
311	=back	405	=back
312		406
313		407
314	=head1 MAPPING	408	=head1 MAPPING
…		…
773	perl core distribution (e.g. L<URI>), it is (currently) up to the user to	867	perl core distribution (e.g. L<URI>), it is (currently) up to the user to
774	provide these modules. The decoding usually fails with an exception if the	868	provide these modules. The decoding usually fails with an exception if the
775	required module cannot be loaded.	869	required module cannot be loaded.
776		870
777	=over 4	871	=over 4
		872
		873	=item 0, 1 (date/time string, seconds since the epoch)
		874
		875	These tags are decoded into L<Time::Piece> objects. The corresponding
		876	C<Time::Piece::TO_CBOR> method always encodes into tag 1 values currently.
		877
		878	The L<Time::Piece> API is generally surprisingly bad, and fractional
		879	seconds are only accidentally kept intact, so watch out. On the plus side,
		880	the module comes with perl since 5.10, which has to count for something.
778		881
779	=item 2, 3 (positive/negative bignum)	882	=item 2, 3 (positive/negative bignum)
780		883
781	These tags are decoded into L<Math::BigInt> objects. The corresponding	884	These tags are decoded into L<Math::BigInt> objects. The corresponding
782	C<Math::BigInt::TO_CBOR> method encodes "small" bigints into normal CBOR	885	C<Math::BigInt::TO_CBOR> method encodes "small" bigints into normal CBOR
…		…
947	service. I put the contact address into my modules for a reason.	1050	service. I put the contact address into my modules for a reason.
948		1051
949	=cut	1052	=cut
950		1053
951	our %FILTER = (	1054	our %FILTER = (
952	# 0 # rfc4287 datetime, utf-8	1055	0 => sub { # rfc4287 datetime, utf-8
953	# 1 # unix timestamp, any	1056	require Time::Piece;
		1057	# Time::Piece::Strptime uses the "incredibly flexible date parsing routine"
		1058	# from FreeBSD, which can't parse ISO 8601, RFC3339, RFC4287 or much of anything
		1059	# else either. Whats incredibe over standard strptime totally escapes me.
		1060	# doesn't do fractional times, either. sigh.
		1061	# In fact, it's all a lie, it uses whatever strptime it wants, and of course,
		1062	# they are all incomptible. The openbsd one simply ignores %z (but according to the
		1063	# docs, it would be much more incredibly flexible indeed. If it worked, that is.).
		1064	scalar eval {
		1065	my $s = $_[1];
		1066
		1067	$s =~ s/Z$/+00:00/;
		1068	$s =~ s/(\.[0-9]+)?([+-][0-9][0-9]):([0-9][0-9])$//
		1069	or die;
		1070
		1071	my $b = $1 - ($2 * 60 + $3) * 60; # fractional part + offset. hopefully
		1072	my $d = Time::Piece->strptime ($s, "%Y-%m-%dT%H:%M:%S");
		1073
		1074	Time::Piece::gmtime ($d->epoch + $b)
		1075	} \|\| die "corrupted CBOR date/time string ($_[0])";
		1076	},
		1077
		1078	1 => sub { # seconds since the epoch, possibly fractional
		1079	require Time::Piece;
		1080	scalar Time::Piece::gmtime (pop)
		1081	},
954		1082
955	2 => sub { # pos bigint	1083	2 => sub { # pos bigint
956	require Math::BigInt;	1084	require Math::BigInt;
957	Math::BigInt->new ("0x" . unpack "H*", pop)	1085	Math::BigInt->new ("0x" . unpack "H*", pop)
958	},	1086	},
…		…
994	}	1122	}
995		1123
996	sub URI::TO_CBOR {	1124	sub URI::TO_CBOR {
997	my $uri = $_[0]->as_string;	1125	my $uri = $_[0]->as_string;
998	utf8::upgrade $uri;	1126	utf8::upgrade $uri;
999	CBOR::XS::tag 32, $uri	1127	tag 32, $uri
1000	}	1128	}
1001		1129
1002	sub Math::BigInt::TO_CBOR {	1130	sub Math::BigInt::TO_CBOR {
1003	if ($_[0] >= -2147483648 && $_[0] <= 2147483647) {	1131	if ($_[0] >= -2147483648 && $_[0] <= 2147483647) {
1004	$_[0]->numify	1132	$_[0]->numify
1005	} else {	1133	} else {
1006	my $hex = substr $_[0]->as_hex, 2;	1134	my $hex = substr $_[0]->as_hex, 2;
1007	$hex = "0$hex" if 1 & length $hex; # sigh	1135	$hex = "0$hex" if 1 & length $hex; # sigh
1008	CBOR::XS::tag $_[0] >= 0 ? 2 : 3, pack "H*", $hex	1136	tag $_[0] >= 0 ? 2 : 3, pack "H*", $hex
1009	}	1137	}
1010	}	1138	}
1011		1139
1012	sub Math::BigFloat::TO_CBOR {	1140	sub Math::BigFloat::TO_CBOR {
1013	my ($m, $e) = $_[0]->parts;	1141	my ($m, $e) = $_[0]->parts;
1014	CBOR::XS::tag 4, [$e->numify, $m]	1142	tag 4, [$e->numify, $m]
		1143	}
		1144
		1145	sub Time::Piece::TO_CBOR {
		1146	tag 1, 0 + $_[0]->epoch
1015	}	1147	}
1016		1148
1017	XSLoader::load "CBOR::XS", $VERSION;	1149	XSLoader::load "CBOR::XS", $VERSION;
1018		1150
1019	=head1 SEE ALSO	1151	=head1 SEE ALSO

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing CBOR-XS/XS.pm (file contents): Revision 1.31 by root, Sat Nov 30 18:13:53 2013 UTC vs. Revision 1.41 by root, Mon Jan 6 04:15:31 2014 UTC

Diff Legend

Comparing CBOR-XS/XS.pm (file contents):
Revision 1.31 by root, Sat Nov 30 18:13:53 2013 UTC vs.
Revision 1.41 by root, Mon Jan 6 04:15:31 2014 UTC