ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.94 by root, Tue Jan 11 23:49:37 2011 UTC vs.
Revision 1.116 by root, Fri May 17 07:19:23 2013 UTC

46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '2.01'; 51our $VERSION = '2.15';
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
123C<590>-C<599> and the C<Reason> pseudo-header will contain an error 123C<590>-C<599> and the C<Reason> pseudo-header will contain an error
124message. Currently the following status codes are used: 124message. Currently the following status codes are used:
125 125
126=over 4 126=over 4
127 127
128=item 595 - errors during connection etsbalishment, proxy handshake. 128=item 595 - errors during connection establishment, proxy handshake.
129 129
130=item 596 - errors during TLS negotiation, request sending and header processing. 130=item 596 - errors during TLS negotiation, request sending and header processing.
131 131
132=item 597 - errors during body receiving or processing. 132=item 597 - errors during body receiving or processing.
133 133
154 154
155=over 4 155=over 4
156 156
157=item recurse => $count (default: $MAX_RECURSE) 157=item recurse => $count (default: $MAX_RECURSE)
158 158
159Whether to recurse requests or not, e.g. on redirects, authentication 159Whether to recurse requests or not, e.g. on redirects, authentication and
160retries and so on, and how often to do so. 160other retries and so on, and how often to do so.
161 161
162=item headers => hashref 162=item headers => hashref
163 163
164The request headers to use. Currently, C<http_request> may provide its own 164The request headers to use. Currently, C<http_request> may provide its own
165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
169 169
170You really should provide your own C<User-Agent:> header value that is 170You really should provide your own C<User-Agent:> header value that is
171appropriate for your program - I wouldn't be surprised if the default 171appropriate for your program - I wouldn't be surprised if the default
172AnyEvent string gets blocked by webservers sooner or later. 172AnyEvent string gets blocked by webservers sooner or later.
173 173
174Also, make sure that your headers names and values do not contain any
175embedded newlines.
176
174=item timeout => $seconds 177=item timeout => $seconds
175 178
176The time-out to use for various stages - each connect attempt will reset 179The time-out to use for various stages - each connect attempt will reset
177the timeout, as will read or write activity, i.e. this is not an overall 180the timeout, as will read or write activity, i.e. this is not an overall
178timeout. 181timeout.
179 182
180Default timeout is 5 minutes. 183Default timeout is 5 minutes.
181 184
182=item proxy => [$host, $port[, $scheme]] or undef 185=item proxy => [$host, $port[, $scheme]] or undef
183 186
184Use the given http proxy for all requests. If not specified, then the 187Use the given http proxy for all requests, or no proxy if C<undef> is
185default proxy (as specified by C<$ENV{http_proxy}>) is used. 188used.
186 189
187C<$scheme> must be either missing or must be C<http> for HTTP. 190C<$scheme> must be either missing or must be C<http> for HTTP.
191
192If not specified, then the default proxy is used (see
193C<AnyEvent::HTTP::set_proxy>).
188 194
189=item body => $string 195=item body => $string
190 196
191The request body, usually empty. Will be sent as-is (future versions of 197The request body, usually empty. Will be sent as-is (future versions of
192this module might offer more options). 198this module might offer more options).
378 384
379Example: do a HTTP HEAD request on https://www.google.com/, use a 385Example: do a HTTP HEAD request on https://www.google.com/, use a
380timeout of 30 seconds. 386timeout of 30 seconds.
381 387
382 http_request 388 http_request
383 GET => "https://www.google.com", 389 HEAD => "https://www.google.com",
384 headers => { "user-agent" => "MySearchClient 1.0" }, 390 headers => { "user-agent" => "MySearchClient 1.0" },
385 timeout => 30, 391 timeout => 30,
386 sub { 392 sub {
387 my ($body, $hdr) = @_; 393 my ($body, $hdr) = @_;
388 use Data::Dumper; 394 use Data::Dumper;
529 while ( 535 while (
530 m{ 536 m{
531 \G\s* 537 \G\s*
532 (?: 538 (?:
533 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+) 539 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+)
534 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) )? 540 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^;,[:space:]]*) ) )?
535 ) 541 )
536 }gcxsi 542 }gcxsi
537 ) { 543 ) {
538 my $name = $2; 544 my $name = $2;
539 my $value = $4; 545 my $value = $4;
546 # quoted 552 # quoted
547 $value = $3; 553 $value = $3;
548 $value =~ s/\\(.)/$1/gs; 554 $value =~ s/\\(.)/$1/gs;
549 } 555 }
550 556
551 push @kv, lc $name, $value; 557 push @kv, @kv ? lc $name : $name, $value;
552 558
553 last unless /\G\s*;/gc; 559 last unless /\G\s*;/gc;
554 } 560 }
555 561
556 last unless @kv; 562 last unless @kv;
683 689
684 $cb->(undef, $hdr); 690 $cb->(undef, $hdr);
685 () 691 ()
686} 692}
687 693
694our %IDEMPOTENT = (
695 DELETE => 1,
696 GET => 1,
697 HEAD => 1,
698 OPTIONS => 1,
699 PUT => 1,
700 TRACE => 1,
701
702 ACL => 1,
703 "BASELINE-CONTROL" => 1,
704 BIND => 1,
705 CHECKIN => 1,
706 CHECKOUT => 1,
707 COPY => 1,
708 LABEL => 1,
709 LINK => 1,
710 MERGE => 1,
711 MKACTIVITY => 1,
712 MKCALENDAR => 1,
713 MKCOL => 1,
714 MKREDIRECTREF => 1,
715 MKWORKSPACE => 1,
716 MOVE => 1,
717 ORDERPATCH => 1,
718 PROPFIND => 1,
719 PROPPATCH => 1,
720 REBIND => 1,
721 REPORT => 1,
722 SEARCH => 1,
723 UNBIND => 1,
724 UNCHECKOUT => 1,
725 UNLINK => 1,
726 UNLOCK => 1,
727 UPDATE => 1,
728 UPDATEREDIRECTREF => 1,
729 "VERSION-CONTROL" => 1,
730);
731
688sub http_request($$@) { 732sub http_request($$@) {
689 my $cb = pop; 733 my $cb = pop;
690 my ($method, $url, %arg) = @_; 734 my ($method, $url, %arg) = @_;
691 735
692 my %hdr; 736 my %hdr;
709 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE; 753 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
710 754
711 return $cb->(undef, { @pseudo, Status => 599, Reason => "Too many redirections" }) 755 return $cb->(undef, { @pseudo, Status => 599, Reason => "Too many redirections" })
712 if $recurse < 0; 756 if $recurse < 0;
713 757
714 my $proxy = $arg{proxy} || $PROXY; 758 my $proxy = exists $arg{proxy} ? $arg{proxy} : $PROXY;
715 my $timeout = $arg{timeout} || $TIMEOUT; 759 my $timeout = $arg{timeout} || $TIMEOUT;
716 760
717 my ($uscheme, $uauthority, $upath, $query, undef) = # ignore fragment 761 my ($uscheme, $uauthority, $upath, $query, undef) = # ignore fragment
718 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|; 762 $url =~ m|^([^:]+):(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?$|;
719 763
720 $uscheme = lc $uscheme; 764 $uscheme = lc $uscheme;
721 765
722 my $uport = $uscheme eq "http" ? 80 766 my $uport = $uscheme eq "http" ? 80
723 : $uscheme eq "https" ? 443 767 : $uscheme eq "https" ? 443
767 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 811 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
768 812
769 $hdr{"content-length"} = length $arg{body} 813 $hdr{"content-length"} = length $arg{body}
770 if length $arg{body} || $method ne "GET"; 814 if length $arg{body} || $method ne "GET";
771 815
772 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 816 my $idempotent = $IDEMPOTENT{$method};
773 817
774 # default value for keepalive is true iff the request is for an idempotent method 818 # default value for keepalive is true iff the request is for an idempotent method
775 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : $idempotent; 819 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent;
776 my $keepalive10 = exists $arg{keepalive10} ? $arg{keepalive10} : !$proxy; 820 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy;
777 my $keptalive; # true if this is actually a recycled connection 821 my $was_persistent; # true if this is actually a recycled connection
778 822
779 # the key to use in the keepalive cache 823 # the key to use in the keepalive cache
780 my $ka_key = "$uhost\x00$arg{sessionid}"; 824 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}";
781 825
782 $hdr{connection} = ($keepalive ? $keepalive10 ? "keep-alive " : "" : "close ") . "Te"; #1.1 826 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive " : "" : "close ") . "Te"; #1.1
783 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 827 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
784 828
785 my %state = (connect_guard => 1); 829 my %state = (connect_guard => 1);
786 830
787 my $ae_error = 595; # connecting 831 my $ae_error = 595; # connecting
871 } elsif ($status == 307) { 915 } elsif ($status == 307) {
872 $redirect = 1; 916 $redirect = 1;
873 } 917 }
874 } 918 }
875 919
876 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive]) 920 my $finish = sub { # ($data, $err_status, $err_reason[, $persistent])
877 if ($state{handle}) { 921 if ($state{handle}) {
878 # handle keepalive 922 # handle keepalive
879 if ( 923 if (
880 $keepalive 924 $persistent
881 && $_[3] 925 && $_[3]
882 && ($hdr{HTTPVersion} < 1.1 926 && ($hdr{HTTPVersion} < 1.1
883 ? $hdr{connection} =~ /\bkeep-?alive\b/i 927 ? $hdr{connection} =~ /\bkeep-?alive\b/i
884 : $hdr{connection} !~ /\bclose\b/i) 928 : $hdr{connection} !~ /\bclose\b/i)
885 ) { 929 ) {
904 948
905 if ($redirect && exists $hdr{location}) { 949 if ($redirect && exists $hdr{location}) {
906 # we ignore any errors, as it is very common to receive 950 # we ignore any errors, as it is very common to receive
907 # Content-Length != 0 but no actual body 951 # Content-Length != 0 but no actual body
908 # we also access %hdr, as $_[1] might be an erro 952 # we also access %hdr, as $_[1] might be an erro
953 $state{recurse} =
909 http_request ( 954 http_request (
910 $method => $hdr{location}, 955 $method => $hdr{location},
911 %arg, 956 %arg,
912 recurse => $recurse - 1, 957 recurse => $recurse - 1,
913 Redirect => [$_[0], \%hdr], 958 Redirect => [$_[0], \%hdr],
959 sub {
960 %state = ();
914 $cb 961 &$cb
962 },
915 ); 963 );
916 } else { 964 } else {
917 $cb->($_[0], \%hdr); 965 $cb->($_[0], \%hdr);
918 } 966 }
919 }; 967 };
920 968
952 my $body = ""; 1000 my $body = "";
953 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1001 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
954 1002
955 $state{read_chunk} = sub { 1003 $state{read_chunk} = sub {
956 $_[1] =~ /^([0-9a-fA-F]+)/ 1004 $_[1] =~ /^([0-9a-fA-F]+)/
957 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1005 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
958 1006
959 my $len = hex $1; 1007 my $len = hex $1;
960 1008
961 if ($len) { 1009 if ($len) {
962 $cl += $len; 1010 $cl += $len;
1032 } 1080 }
1033 }; 1081 };
1034 1082
1035 # if keepalive is enabled, then the server closing the connection 1083 # if keepalive is enabled, then the server closing the connection
1036 # before a response can happen legally - we retry on idempotent methods. 1084 # before a response can happen legally - we retry on idempotent methods.
1037 if ($keptalive && $idempotent) { 1085 if ($was_persistent && $idempotent) {
1038 my $old_eof = $hdl->{on_eof}; 1086 my $old_eof = $hdl->{on_eof};
1039 $hdl->{on_eof} = sub { 1087 $hdl->{on_eof} = sub {
1040 _destroy_state %state; 1088 _destroy_state %state;
1041 1089
1090 %state = ();
1091 $state{recurse} =
1042 http_request ( 1092 http_request (
1043 $method => $url, 1093 $method => $url,
1044 %arg, 1094 %arg,
1095 recurse => $recurse - 1,
1045 keepalive => 0, 1096 keepalive => 0,
1097 sub {
1098 %state = ();
1046 $cb 1099 &$cb
1100 }
1047 ); 1101 );
1048 }; 1102 };
1049 $hdl->on_read (sub { 1103 $hdl->on_read (sub {
1050 return unless %state; 1104 return unless %state;
1051 1105
1052 # as soon as we receive something, a connection close 1106 # as soon as we receive something, a connection close
1060 }; 1114 };
1061 1115
1062 my $prepare_handle = sub { 1116 my $prepare_handle = sub {
1063 my ($hdl) = $state{handle}; 1117 my ($hdl) = $state{handle};
1064 1118
1065 $hdl->timeout ($timeout);
1066 $hdl->on_error (sub { 1119 $hdl->on_error (sub {
1067 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] }; 1120 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] };
1068 }); 1121 });
1069 $hdl->on_eof (sub { 1122 $hdl->on_eof (sub {
1070 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" }; 1123 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" };
1071 }); 1124 });
1125 $hdl->timeout_reset;
1126 $hdl->timeout ($timeout);
1072 }; 1127 };
1073 1128
1074 # connected to proxy (or origin server) 1129 # connected to proxy (or origin server)
1075 my $connect_cb = sub { 1130 my $connect_cb = sub {
1076 my $fh = shift 1131 my $fh = shift
1117 1172
1118 return unless $state{connect_guard}; 1173 return unless $state{connect_guard};
1119 1174
1120 # try to use an existing keepalive connection, but only if we, ourselves, plan 1175 # try to use an existing keepalive connection, but only if we, ourselves, plan
1121 # on a keepalive request (in theory, this should be a separate config option). 1176 # on a keepalive request (in theory, this should be a separate config option).
1122 if ($keepalive && $KA_CACHE{$ka_key}) { 1177 if ($persistent && $KA_CACHE{$ka_key}) {
1123 $keptalive = 1; 1178 $was_persistent = 1;
1179
1124 $state{handle} = ka_fetch $ka_key; 1180 $state{handle} = ka_fetch $ka_key;
1181 $state{handle}->destroyed
1182 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1125 $prepare_handle->(); 1183 $prepare_handle->();
1184 $state{handle}->destroyed
1185 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1126 $handle_actual_request->(); 1186 $handle_actual_request->();
1127 1187
1128 } else { 1188 } else {
1129 my $tcp_connect = $arg{tcp_connect} 1189 my $tcp_connect = $arg{tcp_connect}
1130 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1190 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
1172Sets the default proxy server to use. The proxy-url must begin with a 1232Sets the default proxy server to use. The proxy-url must begin with a
1173string of the form C<http://host:port>, croaks otherwise. 1233string of the form C<http://host:port>, croaks otherwise.
1174 1234
1175To clear an already-set proxy, use C<undef>. 1235To clear an already-set proxy, use C<undef>.
1176 1236
1237When AnyEvent::HTTP is loaded for the first time it will query the
1238default proxy from the operating system, currently by looking at
1239C<$ENV{http_proxy>}.
1240
1177=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1241=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1178 1242
1179Remove all cookies from the cookie jar that have been expired. If 1243Remove all cookies from the cookie jar that have been expired. If
1180C<$session_end> is given and true, then additionally remove all session 1244C<$session_end> is given and true, then additionally remove all session
1181cookies. 1245cookies.
1190 1254
1191The key C<version> has to contain C<1>, otherwise the hash gets 1255The key C<version> has to contain C<1>, otherwise the hash gets
1192emptied. All other keys are hostnames or IP addresses pointing to 1256emptied. All other keys are hostnames or IP addresses pointing to
1193hash-references. The key for these inner hash references is the 1257hash-references. The key for these inner hash references is the
1194server path for which this cookie is meant, and the values are again 1258server path for which this cookie is meant, and the values are again
1195hash-references. The keys of those hash-references is the cookie name, and 1259hash-references. Each key of those hash-references is a cookie name, and
1196the value, you guessed it, is another hash-reference, this time with the 1260the value, you guessed it, is another hash-reference, this time with the
1197key-value pairs from the cookie, except for C<expires> and C<max-age>, 1261key-value pairs from the cookie, except for C<expires> and C<max-age>,
1198which have been replaced by a C<_expires> key that contains the cookie 1262which have been replaced by a C<_expires> key that contains the cookie
1199expiry timestamp. 1263expiry timestamp. Session cookies are indicated by not having an
1264C<_expires> key.
1200 1265
1201Here is an example of a cookie jar with a single cookie, so you have a 1266Here is an example of a cookie jar with a single cookie, so you have a
1202chance of understanding the above paragraph: 1267chance of understanding the above paragraph:
1203 1268
1204 { 1269 {
1228 1293
1229The default value for the C<recurse> request parameter (default: C<10>). 1294The default value for the C<recurse> request parameter (default: C<10>).
1230 1295
1231=item $AnyEvent::HTTP::TIMEOUT 1296=item $AnyEvent::HTTP::TIMEOUT
1232 1297
1233The default timeout for conenction operations (default: C<300>). 1298The default timeout for connection operations (default: C<300>).
1234 1299
1235=item $AnyEvent::HTTP::USERAGENT 1300=item $AnyEvent::HTTP::USERAGENT
1236 1301
1237The default value for the C<User-Agent> header (the default is 1302The default value for the C<User-Agent> header (the default is
1238C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1303C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1252use 6, and Opera uses 8 because like, they have the fastest browser and 1317use 6, and Opera uses 8 because like, they have the fastest browser and
1253give a shit for everybody else on the planet. 1318give a shit for everybody else on the planet.
1254 1319
1255=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 1320=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1256 1321
1257The time after which idle persistent conenctions get closed by 1322The time after which idle persistent connections get closed by
1258AnyEvent::HTTP (default: C<3>). 1323AnyEvent::HTTP (default: C<3>).
1259 1324
1260=item $AnyEvent::HTTP::ACTIVE 1325=item $AnyEvent::HTTP::ACTIVE
1261 1326
1262The number of active connections. This is not the number of currently 1327The number of active connections. This is not the number of currently
1303 # other formats fail in the loop below 1368 # other formats fail in the loop below
1304 1369
1305 for (0..11) { 1370 for (0..11) {
1306 if ($m eq $month[$_]) { 1371 if ($m eq $month[$_]) {
1307 require Time::Local; 1372 require Time::Local;
1308 return Time::Local::timegm ($S, $M, $H, $d, $_, $y); 1373 return eval { Time::Local::timegm ($S, $M, $H, $d, $_, $y) };
1309 } 1374 }
1310 } 1375 }
1311 1376
1312 undef 1377 undef
1313} 1378}
1332This section contaisn some more elaborate "real-world" examples or code 1397This section contaisn some more elaborate "real-world" examples or code
1333snippets. 1398snippets.
1334 1399
1335=head2 HTTP/1.1 FILE DOWNLOAD 1400=head2 HTTP/1.1 FILE DOWNLOAD
1336 1401
1337Downloading files with HTTP cna be quite tricky, especially when something 1402Downloading files with HTTP can be quite tricky, especially when something
1338goes wrong and you want tor esume. 1403goes wrong and you want to resume.
1339 1404
1340Here is a function that initiates and resumes a download. It uses the 1405Here is a function that initiates and resumes a download. It uses the
1341last modified time to check for file content changes, and works with many 1406last modified time to check for file content changes, and works with many
1342HTTP/1.0 servers as well, and usually falls back to a complete re-download 1407HTTP/1.0 servers as well, and usually falls back to a complete re-download
1343on older servers. 1408on older servers.
1359 1424
1360 warn stat $fh; 1425 warn stat $fh;
1361 warn -s _; 1426 warn -s _;
1362 if (stat $fh and -s _) { 1427 if (stat $fh and -s _) {
1363 $ofs = -s _; 1428 $ofs = -s _;
1364 warn "-s is ", $ofs;#d# 1429 warn "-s is ", $ofs;
1365 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; 1430 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1366 $hdr{"range"} = "bytes=$ofs-"; 1431 $hdr{"range"} = "bytes=$ofs-";
1367 } 1432 }
1368 1433
1369 http_get $url, 1434 http_get $url,

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines