ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.103 by root, Thu Feb 24 12:13:11 2011 UTC vs.
Revision 1.118 by root, Mon Nov 18 01:01:02 2013 UTC

46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '2.1'; 51our $VERSION = '2.15';
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
89C<http_request> returns a "cancellation guard" - you have to keep the 89C<http_request> returns a "cancellation guard" - you have to keep the
90object at least alive until the callback get called. If the object gets 90object at least alive until the callback get called. If the object gets
91destroyed before the callback is called, the request will be cancelled. 91destroyed before the callback is called, the request will be cancelled.
92 92
93The callback will be called with the response body data as first argument 93The callback will be called with the response body data as first argument
94(or C<undef> if an error occured), and a hash-ref with response headers 94(or C<undef> if an error occurred), and a hash-ref with response headers
95(and trailers) as second argument. 95(and trailers) as second argument.
96 96
97All the headers in that hash are lowercased. In addition to the response 97All the headers in that hash are lowercased. In addition to the response
98headers, the "pseudo-headers" (uppercase to avoid clashing with possible 98headers, the "pseudo-headers" (uppercase to avoid clashing with possible
99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
123C<590>-C<599> and the C<Reason> pseudo-header will contain an error 123C<590>-C<599> and the C<Reason> pseudo-header will contain an error
124message. Currently the following status codes are used: 124message. Currently the following status codes are used:
125 125
126=over 4 126=over 4
127 127
128=item 595 - errors during connection etsbalishment, proxy handshake. 128=item 595 - errors during connection establishment, proxy handshake.
129 129
130=item 596 - errors during TLS negotiation, request sending and header processing. 130=item 596 - errors during TLS negotiation, request sending and header processing.
131 131
132=item 597 - errors during body receiving or processing. 132=item 597 - errors during body receiving or processing.
133 133
154 154
155=over 4 155=over 4
156 156
157=item recurse => $count (default: $MAX_RECURSE) 157=item recurse => $count (default: $MAX_RECURSE)
158 158
159Whether to recurse requests or not, e.g. on redirects, authentication 159Whether to recurse requests or not, e.g. on redirects, authentication and
160retries and so on, and how often to do so. 160other retries and so on, and how often to do so.
161 161
162=item headers => hashref 162=item headers => hashref
163 163
164The request headers to use. Currently, C<http_request> may provide its own 164The request headers to use. Currently, C<http_request> may provide its own
165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
168they won't be sent at all). 168they won't be sent at all).
169 169
170You really should provide your own C<User-Agent:> header value that is 170You really should provide your own C<User-Agent:> header value that is
171appropriate for your program - I wouldn't be surprised if the default 171appropriate for your program - I wouldn't be surprised if the default
172AnyEvent string gets blocked by webservers sooner or later. 172AnyEvent string gets blocked by webservers sooner or later.
173
174Also, make sure that your headers names and values do not contain any
175embedded newlines.
173 176
174=item timeout => $seconds 177=item timeout => $seconds
175 178
176The time-out to use for various stages - each connect attempt will reset 179The time-out to use for various stages - each connect attempt will reset
177the timeout, as will read or write activity, i.e. this is not an overall 180the timeout, as will read or write activity, i.e. this is not an overall
239context) - only connections using the same unique ID will be reused. 242context) - only connections using the same unique ID will be reused.
240 243
241=item on_prepare => $callback->($fh) 244=item on_prepare => $callback->($fh)
242 245
243In rare cases you need to "tune" the socket before it is used to 246In rare cases you need to "tune" the socket before it is used to
244connect (for exmaple, to bind it on a given IP address). This parameter 247connect (for example, to bind it on a given IP address). This parameter
245overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 248overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
246and behaves exactly the same way (e.g. it has to provide a 249and behaves exactly the same way (e.g. it has to provide a
247timeout). See the description for the C<$prepare_cb> argument of 250timeout). See the description for the C<$prepare_cb> argument of
248C<AnyEvent::Socket::tcp_connect> for details. 251C<AnyEvent::Socket::tcp_connect> for details.
249 252
381 384
382Example: do a HTTP HEAD request on https://www.google.com/, use a 385Example: do a HTTP HEAD request on https://www.google.com/, use a
383timeout of 30 seconds. 386timeout of 30 seconds.
384 387
385 http_request 388 http_request
386 GET => "https://www.google.com", 389 HEAD => "https://www.google.com",
387 headers => { "user-agent" => "MySearchClient 1.0" }, 390 headers => { "user-agent" => "MySearchClient 1.0" },
388 timeout => 30, 391 timeout => 30,
389 sub { 392 sub {
390 my ($body, $hdr) = @_; 393 my ($body, $hdr) = @_;
391 use Data::Dumper; 394 use Data::Dumper;
686 689
687 $cb->(undef, $hdr); 690 $cb->(undef, $hdr);
688 () 691 ()
689} 692}
690 693
694our %IDEMPOTENT = (
695 DELETE => 1,
696 GET => 1,
697 HEAD => 1,
698 OPTIONS => 1,
699 PUT => 1,
700 TRACE => 1,
701
702 ACL => 1,
703 "BASELINE-CONTROL" => 1,
704 BIND => 1,
705 CHECKIN => 1,
706 CHECKOUT => 1,
707 COPY => 1,
708 LABEL => 1,
709 LINK => 1,
710 MERGE => 1,
711 MKACTIVITY => 1,
712 MKCALENDAR => 1,
713 MKCOL => 1,
714 MKREDIRECTREF => 1,
715 MKWORKSPACE => 1,
716 MOVE => 1,
717 ORDERPATCH => 1,
718 PROPFIND => 1,
719 PROPPATCH => 1,
720 REBIND => 1,
721 REPORT => 1,
722 SEARCH => 1,
723 UNBIND => 1,
724 UNCHECKOUT => 1,
725 UNLINK => 1,
726 UNLOCK => 1,
727 UPDATE => 1,
728 UPDATEREDIRECTREF => 1,
729 "VERSION-CONTROL" => 1,
730);
731
691sub http_request($$@) { 732sub http_request($$@) {
692 my $cb = pop; 733 my $cb = pop;
693 my ($method, $url, %arg) = @_; 734 my ($method, $url, %arg) = @_;
694 735
695 my %hdr; 736 my %hdr;
770 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 811 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
771 812
772 $hdr{"content-length"} = length $arg{body} 813 $hdr{"content-length"} = length $arg{body}
773 if length $arg{body} || $method ne "GET"; 814 if length $arg{body} || $method ne "GET";
774 815
775 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 816 my $idempotent = $IDEMPOTENT{$method};
776 817
777 # default value for keepalive is true iff the request is for an idempotent method 818 # default value for keepalive is true iff the request is for an idempotent method
778 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent; 819 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent;
779 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy; 820 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy;
780 my $was_persistent; # true if this is actually a recycled connection 821 my $was_persistent; # true if this is actually a recycled connection
781 822
782 # the key to use in the keepalive cache 823 # the key to use in the keepalive cache
783 my $ka_key = "$uhost\x00$arg{sessionid}"; 824 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}";
784 825
785 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive " : "" : "close ") . "Te"; #1.1 826 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive, " : "" : "close, ") . "Te"; #1.1
786 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 827 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
787 828
788 my %state = (connect_guard => 1); 829 my %state = (connect_guard => 1);
789 830
790 my $ae_error = 595; # connecting 831 my $ae_error = 595; # connecting
803 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 844 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
804 . "\015\012" 845 . "\015\012"
805 . (delete $arg{body}) 846 . (delete $arg{body})
806 ); 847 );
807 848
808 # return if error occured during push_write() 849 # return if error occurred during push_write()
809 return unless %state; 850 return unless %state;
810 851
811 # reduce memory usage, save a kitten, also re-use it for the response headers. 852 # reduce memory usage, save a kitten, also re-use it for the response headers.
812 %hdr = (); 853 %hdr = ();
813 854
959 my $body = ""; 1000 my $body = "";
960 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1001 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
961 1002
962 $state{read_chunk} = sub { 1003 $state{read_chunk} = sub {
963 $_[1] =~ /^([0-9a-fA-F]+)/ 1004 $_[1] =~ /^([0-9a-fA-F]+)/
964 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1005 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
965 1006
966 my $len = hex $1; 1007 my $len = hex $1;
967 1008
968 if ($len) { 1009 if ($len) {
969 $cl += $len; 1010 $cl += $len;
1047 _destroy_state %state; 1088 _destroy_state %state;
1048 1089
1049 %state = (); 1090 %state = ();
1050 $state{recurse} = 1091 $state{recurse} =
1051 http_request ( 1092 http_request (
1052 $method => $url, 1093 $method => $url,
1053 %arg, 1094 %arg,
1095 recurse => $recurse - 1,
1054 keepalive => 0, 1096 keepalive => 0,
1055 sub { 1097 sub {
1056 %state = (); 1098 %state = ();
1057 &$cb 1099 &$cb
1058 } 1100 }
1135 if ($persistent && $KA_CACHE{$ka_key}) { 1177 if ($persistent && $KA_CACHE{$ka_key}) {
1136 $was_persistent = 1; 1178 $was_persistent = 1;
1137 1179
1138 $state{handle} = ka_fetch $ka_key; 1180 $state{handle} = ka_fetch $ka_key;
1139 $state{handle}->destroyed 1181 $state{handle}->destroyed
1140 and die "got a destructed habndle. pah\n";#d# 1182 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1141 $prepare_handle->(); 1183 $prepare_handle->();
1142 $state{handle}->destroyed 1184 $state{handle}->destroyed
1143 and die "got a destructed habndle. pa2\n";#d# 1185 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1144 $handle_actual_request->(); 1186 $handle_actual_request->();
1145 $state{handle}->destroyed
1146 and die "got a destructed habndle. pa3\n";#d#
1147 1187
1148 } else { 1188 } else {
1149 my $tcp_connect = $arg{tcp_connect} 1189 my $tcp_connect = $arg{tcp_connect}
1150 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1190 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
1151 1191
1192Sets the default proxy server to use. The proxy-url must begin with a 1232Sets the default proxy server to use. The proxy-url must begin with a
1193string of the form C<http://host:port>, croaks otherwise. 1233string of the form C<http://host:port>, croaks otherwise.
1194 1234
1195To clear an already-set proxy, use C<undef>. 1235To clear an already-set proxy, use C<undef>.
1196 1236
1197When AnyEvent::HTTP is laoded for the first time it will query the 1237When AnyEvent::HTTP is loaded for the first time it will query the
1198default proxy from the operating system, currently by looking at 1238default proxy from the operating system, currently by looking at
1199C<$ENV{http_proxy>}. 1239C<$ENV{http_proxy>}.
1200 1240
1201=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1241=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1202 1242
1204C<$session_end> is given and true, then additionally remove all session 1244C<$session_end> is given and true, then additionally remove all session
1205cookies. 1245cookies.
1206 1246
1207You should call this function (with a true C<$session_end>) before you 1247You should call this function (with a true C<$session_end>) before you
1208save cookies to disk, and you should call this function after loading them 1248save cookies to disk, and you should call this function after loading them
1209again. If you have a long-running program you can additonally call this 1249again. If you have a long-running program you can additionally call this
1210function from time to time. 1250function from time to time.
1211 1251
1212A cookie jar is initially an empty hash-reference that is managed by this 1252A cookie jar is initially an empty hash-reference that is managed by this
1213module. It's format is subject to change, but currently it is like this: 1253module. It's format is subject to change, but currently it is like this:
1214 1254
1215The key C<version> has to contain C<1>, otherwise the hash gets 1255The key C<version> has to contain C<1>, otherwise the hash gets
1216emptied. All other keys are hostnames or IP addresses pointing to 1256emptied. All other keys are hostnames or IP addresses pointing to
1217hash-references. The key for these inner hash references is the 1257hash-references. The key for these inner hash references is the
1218server path for which this cookie is meant, and the values are again 1258server path for which this cookie is meant, and the values are again
1219hash-references. The keys of those hash-references is the cookie name, and 1259hash-references. Each key of those hash-references is a cookie name, and
1220the value, you guessed it, is another hash-reference, this time with the 1260the value, you guessed it, is another hash-reference, this time with the
1221key-value pairs from the cookie, except for C<expires> and C<max-age>, 1261key-value pairs from the cookie, except for C<expires> and C<max-age>,
1222which have been replaced by a C<_expires> key that contains the cookie 1262which have been replaced by a C<_expires> key that contains the cookie
1223expiry timestamp. 1263expiry timestamp. Session cookies are indicated by not having an
1264C<_expires> key.
1224 1265
1225Here is an example of a cookie jar with a single cookie, so you have a 1266Here is an example of a cookie jar with a single cookie, so you have a
1226chance of understanding the above paragraph: 1267chance of understanding the above paragraph:
1227 1268
1228 { 1269 {
1252 1293
1253The default value for the C<recurse> request parameter (default: C<10>). 1294The default value for the C<recurse> request parameter (default: C<10>).
1254 1295
1255=item $AnyEvent::HTTP::TIMEOUT 1296=item $AnyEvent::HTTP::TIMEOUT
1256 1297
1257The default timeout for conenction operations (default: C<300>). 1298The default timeout for connection operations (default: C<300>).
1258 1299
1259=item $AnyEvent::HTTP::USERAGENT 1300=item $AnyEvent::HTTP::USERAGENT
1260 1301
1261The default value for the C<User-Agent> header (the default is 1302The default value for the C<User-Agent> header (the default is
1262C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1303C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1270 1311
1271The default value for this is C<4>, and it is highly advisable to not 1312The default value for this is C<4>, and it is highly advisable to not
1272increase it much. 1313increase it much.
1273 1314
1274For comparison: the RFC's recommend 4 non-persistent or 2 persistent 1315For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1275connections, older browsers used 2, newers (such as firefox 3) typically 1316connections, older browsers used 2, newer ones (such as firefox 3)
1276use 6, and Opera uses 8 because like, they have the fastest browser and 1317typically use 6, and Opera uses 8 because like, they have the fastest
1277give a shit for everybody else on the planet. 1318browser and give a shit for everybody else on the planet.
1278 1319
1279=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 1320=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1280 1321
1281The time after which idle persistent conenctions get closed by 1322The time after which idle persistent connections get closed by
1282AnyEvent::HTTP (default: C<3>). 1323AnyEvent::HTTP (default: C<3>).
1283 1324
1284=item $AnyEvent::HTTP::ACTIVE 1325=item $AnyEvent::HTTP::ACTIVE
1285 1326
1286The number of active connections. This is not the number of currently 1327The number of active connections. This is not the number of currently
1327 # other formats fail in the loop below 1368 # other formats fail in the loop below
1328 1369
1329 for (0..11) { 1370 for (0..11) {
1330 if ($m eq $month[$_]) { 1371 if ($m eq $month[$_]) {
1331 require Time::Local; 1372 require Time::Local;
1332 return Time::Local::timegm ($S, $M, $H, $d, $_, $y); 1373 return eval { Time::Local::timegm ($S, $M, $H, $d, $_, $y) };
1333 } 1374 }
1334 } 1375 }
1335 1376
1336 undef 1377 undef
1337} 1378}
1351 set_proxy $ENV{http_proxy}; 1392 set_proxy $ENV{http_proxy};
1352}; 1393};
1353 1394
1354=head2 SHOWCASE 1395=head2 SHOWCASE
1355 1396
1356This section contaisn some more elaborate "real-world" examples or code 1397This section contains some more elaborate "real-world" examples or code
1357snippets. 1398snippets.
1358 1399
1359=head2 HTTP/1.1 FILE DOWNLOAD 1400=head2 HTTP/1.1 FILE DOWNLOAD
1360 1401
1361Downloading files with HTTP can be quite tricky, especially when something 1402Downloading files with HTTP can be quite tricky, especially when something
1365last modified time to check for file content changes, and works with many 1406last modified time to check for file content changes, and works with many
1366HTTP/1.0 servers as well, and usually falls back to a complete re-download 1407HTTP/1.0 servers as well, and usually falls back to a complete re-download
1367on older servers. 1408on older servers.
1368 1409
1369It calls the completion callback with either C<undef>, which means a 1410It calls the completion callback with either C<undef>, which means a
1370nonretryable error occured, C<0> when the download was partial and should 1411nonretryable error occurred, C<0> when the download was partial and should
1371be retried, and C<1> if it was successful. 1412be retried, and C<1> if it was successful.
1372 1413
1373 use AnyEvent::HTTP; 1414 use AnyEvent::HTTP;
1374 1415
1375 sub download($$$) { 1416 sub download($$$) {
1383 1424
1384 warn stat $fh; 1425 warn stat $fh;
1385 warn -s _; 1426 warn -s _;
1386 if (stat $fh and -s _) { 1427 if (stat $fh and -s _) {
1387 $ofs = -s _; 1428 $ofs = -s _;
1388 warn "-s is ", $ofs;#d# 1429 warn "-s is ", $ofs;
1389 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; 1430 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1390 $hdr{"range"} = "bytes=$ofs-"; 1431 $hdr{"range"} = "bytes=$ofs-";
1391 } 1432 }
1392 1433
1393 http_get $url, 1434 http_get $url,

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines