ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.102 by root, Sat Feb 19 06:46:14 2011 UTC vs.
Revision 1.140 by root, Wed Mar 6 19:29:18 2024 UTC

4 4
5=head1 SYNOPSIS 5=head1 SYNOPSIS
6 6
7 use AnyEvent::HTTP; 7 use AnyEvent::HTTP;
8 8
9 http_get "http://www.nethype.de/", sub { print $_[1] }; 9 http_get "http://www.nethype.de/", sub {
10 my ($body, $hdr) = @_;
11 print "$hdr->{URL} Status: $hdr->{Status}\n";
12 print $body;
13 };
10 14
11 # ... do something else here 15 # ... do something else here
12 16
13=head1 DESCRIPTION 17=head1 DESCRIPTION
14 18
46use AnyEvent::Util (); 50use AnyEvent::Util ();
47use AnyEvent::Handle (); 51use AnyEvent::Handle ();
48 52
49use base Exporter::; 53use base Exporter::;
50 54
51our $VERSION = '2.04'; 55our $VERSION = 2.25;
52 56
53our @EXPORT = qw(http_get http_post http_head http_request); 57our @EXPORT = qw(http_get http_post http_head http_request);
54 58
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 59our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 60our $MAX_RECURSE = 10;
89C<http_request> returns a "cancellation guard" - you have to keep the 93C<http_request> returns a "cancellation guard" - you have to keep the
90object at least alive until the callback get called. If the object gets 94object at least alive until the callback get called. If the object gets
91destroyed before the callback is called, the request will be cancelled. 95destroyed before the callback is called, the request will be cancelled.
92 96
93The callback will be called with the response body data as first argument 97The callback will be called with the response body data as first argument
94(or C<undef> if an error occured), and a hash-ref with response headers 98(or C<undef> if an error occurred), and a hash-ref with response headers
95(and trailers) as second argument. 99(and trailers) as second argument.
96 100
97All the headers in that hash are lowercased. In addition to the response 101All the headers in that hash are lowercased. In addition to the response
98headers, the "pseudo-headers" (uppercase to avoid clashing with possible 102headers, the "pseudo-headers" (uppercase to avoid clashing with possible
99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 103response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
123C<590>-C<599> and the C<Reason> pseudo-header will contain an error 127C<590>-C<599> and the C<Reason> pseudo-header will contain an error
124message. Currently the following status codes are used: 128message. Currently the following status codes are used:
125 129
126=over 4 130=over 4
127 131
128=item 595 - errors during connection etsbalishment, proxy handshake. 132=item 595 - errors during connection establishment, proxy handshake.
129 133
130=item 596 - errors during TLS negotiation, request sending and header processing. 134=item 596 - errors during TLS negotiation, request sending and header processing.
131 135
132=item 597 - errors during body receiving or processing. 136=item 597 - errors during body receiving or processing.
133 137
154 158
155=over 4 159=over 4
156 160
157=item recurse => $count (default: $MAX_RECURSE) 161=item recurse => $count (default: $MAX_RECURSE)
158 162
159Whether to recurse requests or not, e.g. on redirects, authentication 163Whether to recurse requests or not, e.g. on redirects, authentication and
160retries and so on, and how often to do so. 164other retries and so on, and how often to do so.
165
166Only redirects to http and https URLs are supported. While most common
167redirection forms are handled entirely within this module, some require
168the use of the optional L<URI> module. If it is required but missing, then
169the request will fail with an error.
161 170
162=item headers => hashref 171=item headers => hashref
163 172
164The request headers to use. Currently, C<http_request> may provide its own 173The request headers to use. Currently, C<http_request> may provide its own
165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 174C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
169 178
170You really should provide your own C<User-Agent:> header value that is 179You really should provide your own C<User-Agent:> header value that is
171appropriate for your program - I wouldn't be surprised if the default 180appropriate for your program - I wouldn't be surprised if the default
172AnyEvent string gets blocked by webservers sooner or later. 181AnyEvent string gets blocked by webservers sooner or later.
173 182
183Also, make sure that your headers names and values do not contain any
184embedded newlines.
185
174=item timeout => $seconds 186=item timeout => $seconds
175 187
176The time-out to use for various stages - each connect attempt will reset 188The time-out to use for various stages - each connect attempt will reset
177the timeout, as will read or write activity, i.e. this is not an overall 189the timeout, as will read or write activity, i.e. this is not an overall
178timeout. 190timeout.
186 198
187C<$scheme> must be either missing or must be C<http> for HTTP. 199C<$scheme> must be either missing or must be C<http> for HTTP.
188 200
189If not specified, then the default proxy is used (see 201If not specified, then the default proxy is used (see
190C<AnyEvent::HTTP::set_proxy>). 202C<AnyEvent::HTTP::set_proxy>).
203
204Currently, if your proxy requires authorization, you have to specify an
205appropriate "Proxy-Authorization" header in every request.
206
207Note that this module will prefer an existing persistent connection,
208even if that connection was made using another proxy. If you need to
209ensure that a new connection is made in this case, you can either force
210C<persistent> to false or e.g. use the proxy address in your C<sessionid>.
191 211
192=item body => $string 212=item body => $string
193 213
194The request body, usually empty. Will be sent as-is (future versions of 214The request body, usually empty. Will be sent as-is (future versions of
195this module might offer more options). 215this module might offer more options).
228The default for this option is C<low>, which could be interpreted as "give 248The default for this option is C<low>, which could be interpreted as "give
229me the page, no matter what". 249me the page, no matter what".
230 250
231See also the C<sessionid> parameter. 251See also the C<sessionid> parameter.
232 252
233=item session => $string 253=item sessionid => $string
234 254
235The module might reuse connections to the same host internally. Sometimes 255The module might reuse connections to the same host internally (regardless
236(e.g. when using TLS), you do not want to reuse connections from other 256of other settings, such as C<tcp_connect> or C<proxy>). Sometimes (e.g.
257when using TLS or a specfic proxy), you do not want to reuse connections
237sessions. This can be achieved by setting this parameter to some unique 258from other sessions. This can be achieved by setting this parameter to
238ID (such as the address of an object storing your state data, or the TLS 259some unique ID (such as the address of an object storing your state data
239context) - only connections using the same unique ID will be reused. 260or the TLS context, or the proxy IP) - only connections using the same
261unique ID will be reused.
240 262
241=item on_prepare => $callback->($fh) 263=item on_prepare => $callback->($fh)
242 264
243In rare cases you need to "tune" the socket before it is used to 265In rare cases you need to "tune" the socket before it is used to
244connect (for exmaple, to bind it on a given IP address). This parameter 266connect (for example, to bind it on a given IP address). This parameter
245overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 267overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
246and behaves exactly the same way (e.g. it has to provide a 268and behaves exactly the same way (e.g. it has to provide a
247timeout). See the description for the C<$prepare_cb> argument of 269timeout). See the description for the C<$prepare_cb> argument of
248C<AnyEvent::Socket::tcp_connect> for details. 270C<AnyEvent::Socket::tcp_connect> for details.
249 271
252In even rarer cases you want total control over how AnyEvent::HTTP 274In even rarer cases you want total control over how AnyEvent::HTTP
253establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect> 275establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
254to do this, but you can provide your own C<tcp_connect> function - 276to do this, but you can provide your own C<tcp_connect> function -
255obviously, it has to follow the same calling conventions, except that it 277obviously, it has to follow the same calling conventions, except that it
256may always return a connection guard object. 278may always return a connection guard object.
279
280The connections made by this hook will be treated as equivalent to
281connections made the built-in way, specifically, they will be put into
282and taken from the persistent connection cache. If your C<$tcp_connect>
283function is incompatible with this kind of re-use, consider switching off
284C<persistent> connections and/or providing a C<sessionid> identifier.
257 285
258There are probably lots of weird uses for this function, starting from 286There are probably lots of weird uses for this function, starting from
259tracing the hosts C<http_request> actually tries to connect, to (inexact 287tracing the hosts C<http_request> actually tries to connect, to (inexact
260but fast) host => IP address caching or even socks protocol support. 288but fast) host => IP address caching or even socks protocol support.
261 289
331=item persistent => $boolean 359=item persistent => $boolean
332 360
333Try to create/reuse a persistent connection. When this flag is set 361Try to create/reuse a persistent connection. When this flag is set
334(default: true for idempotent requests, false for all others), then 362(default: true for idempotent requests, false for all others), then
335C<http_request> tries to re-use an existing (previously-created) 363C<http_request> tries to re-use an existing (previously-created)
336persistent connection to the host and, failing that, tries to create a new 364persistent connection to same host (i.e. identical URL scheme, hostname,
337one. 365port and sessionid) and, failing that, tries to create a new one.
338 366
339Requests failing in certain ways will be automatically retried once, which 367Requests failing in certain ways will be automatically retried once, which
340is dangerous for non-idempotent requests, which is why it defaults to off 368is dangerous for non-idempotent requests, which is why it defaults to off
341for them. The reason for this is because the bozos who designed HTTP/1.1 369for them. The reason for this is because the bozos who designed HTTP/1.1
342made it impossible to distinguish between a fatal error and a normal 370made it impossible to distinguish between a fatal error and a normal
343connection timeout, so you never know whether there was a problem with 371connection timeout, so you never know whether there was a problem with
344your request or not. 372your request or not.
345 373
346When reusing an existent connection, many parameters (such as TLS context) 374When reusing an existent connection, many parameters (such as TLS context)
347will be ignored. See the C<session> parameter for a workaround. 375will be ignored. See the C<sessionid> parameter for a workaround.
348 376
349=item keepalive => $boolean 377=item keepalive => $boolean
350 378
351Only used when C<persistent> is also true. This parameter decides whether 379Only used when C<persistent> is also true. This parameter decides whether
352C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection 380C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection
381 409
382Example: do a HTTP HEAD request on https://www.google.com/, use a 410Example: do a HTTP HEAD request on https://www.google.com/, use a
383timeout of 30 seconds. 411timeout of 30 seconds.
384 412
385 http_request 413 http_request
386 GET => "https://www.google.com", 414 HEAD => "https://www.google.com",
387 headers => { "user-agent" => "MySearchClient 1.0" }, 415 headers => { "user-agent" => "MySearchClient 1.0" },
388 timeout => 30, 416 timeout => 30,
389 sub { 417 sub {
390 my ($body, $hdr) = @_; 418 my ($body, $hdr) = @_;
391 use Data::Dumper; 419 use Data::Dumper;
443 471
444# expire cookies 472# expire cookies
445sub cookie_jar_expire($;$) { 473sub cookie_jar_expire($;$) {
446 my ($jar, $session_end) = @_; 474 my ($jar, $session_end) = @_;
447 475
448 %$jar = () if $jar->{version} != 1; 476 %$jar = () if $jar->{version} != 2;
449 477
450 my $anow = AE::now; 478 my $anow = AE::now;
451 479
452 while (my ($chost, $paths) = each %$jar) { 480 while (my ($chost, $paths) = each %$jar) {
453 next unless ref $paths; 481 next unless ref $paths;
473 501
474# extract cookies from jar 502# extract cookies from jar
475sub cookie_jar_extract($$$$) { 503sub cookie_jar_extract($$$$) {
476 my ($jar, $scheme, $host, $path) = @_; 504 my ($jar, $scheme, $host, $path) = @_;
477 505
478 %$jar = () if $jar->{version} != 1; 506 %$jar = () if $jar->{version} != 2;
507
508 $host = AnyEvent::Util::idn_to_ascii $host
509 if $host =~ /[^\x00-\x7f]/;
479 510
480 my @cookies; 511 my @cookies;
481 512
482 while (my ($chost, $paths) = each %$jar) { 513 while (my ($chost, $paths) = each %$jar) {
483 next unless ref $paths; 514 next unless ref $paths;
484 515
485 if ($chost =~ /^\./) { 516 # exact match or suffix including . match
486 next unless $chost eq substr $host, -length $chost; 517 $chost eq $host or ".$chost" eq substr $host, -1 - length $chost
487 } elsif ($chost =~ /\./) {
488 next unless $chost eq $host;
489 } else {
490 next; 518 or next;
491 }
492 519
493 while (my ($cpath, $cookies) = each %$paths) { 520 while (my ($cpath, $cookies) = each %$paths) {
494 next unless $cpath eq substr $path, 0, length $cpath; 521 next unless $cpath eq substr $path, 0, length $cpath;
495 522
496 while (my ($cookie, $kv) = each %$cookies) { 523 while (my ($cookie, $kv) = each %$cookies) {
517} 544}
518 545
519# parse set_cookie header into jar 546# parse set_cookie header into jar
520sub cookie_jar_set_cookie($$$$) { 547sub cookie_jar_set_cookie($$$$) {
521 my ($jar, $set_cookie, $host, $date) = @_; 548 my ($jar, $set_cookie, $host, $date) = @_;
549
550 %$jar = () if $jar->{version} != 2;
522 551
523 my $anow = int AE::now; 552 my $anow = int AE::now;
524 my $snow; # server-now 553 my $snow; # server-now
525 554
526 for ($set_cookie) { 555 for ($set_cookie) {
572 601
573 my $cdom; 602 my $cdom;
574 my $cpath = (delete $kv{path}) || "/"; 603 my $cpath = (delete $kv{path}) || "/";
575 604
576 if (exists $kv{domain}) { 605 if (exists $kv{domain}) {
577 $cdom = delete $kv{domain}; 606 $cdom = $kv{domain};
578 607
579 $cdom =~ s/^\.?/./; # make sure it starts with a "." 608 $cdom =~ s/^\.?/./; # make sure it starts with a "."
580 609
581 next if $cdom =~ /\.$/; 610 next if $cdom =~ /\.$/;
582 611
583 # this is not rfc-like and not netscape-like. go figure. 612 # this is not rfc-like and not netscape-like. go figure.
584 my $ndots = $cdom =~ y/.//; 613 my $ndots = $cdom =~ y/.//;
585 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 614 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
615
616 $cdom = substr $cdom, 1; # remove initial .
586 } else { 617 } else {
587 $cdom = $host; 618 $cdom = $host;
588 } 619 }
589 620
590 # store it 621 # store it
591 $jar->{version} = 1; 622 $jar->{version} = 2;
592 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 623 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
593 624
594 redo if /\G\s*,/gc; 625 redo if /\G\s*,/gc;
595 } 626 }
596} 627}
686 717
687 $cb->(undef, $hdr); 718 $cb->(undef, $hdr);
688 () 719 ()
689} 720}
690 721
722our %IDEMPOTENT = (
723 DELETE => 1,
724 GET => 1,
725 QUERY => 1,
726 HEAD => 1,
727 OPTIONS => 1,
728 PUT => 1,
729 TRACE => 1,
730
731 ACL => 1,
732 "BASELINE-CONTROL" => 1,
733 BIND => 1,
734 CHECKIN => 1,
735 CHECKOUT => 1,
736 COPY => 1,
737 LABEL => 1,
738 LINK => 1,
739 MERGE => 1,
740 MKACTIVITY => 1,
741 MKCALENDAR => 1,
742 MKCOL => 1,
743 MKREDIRECTREF => 1,
744 MKWORKSPACE => 1,
745 MOVE => 1,
746 ORDERPATCH => 1,
747 PRI => 1,
748 PROPFIND => 1,
749 PROPPATCH => 1,
750 REBIND => 1,
751 REPORT => 1,
752 SEARCH => 1,
753 UNBIND => 1,
754 UNCHECKOUT => 1,
755 UNLINK => 1,
756 UNLOCK => 1,
757 UPDATE => 1,
758 UPDATEREDIRECTREF => 1,
759 "VERSION-CONTROL" => 1,
760);
761
691sub http_request($$@) { 762sub http_request($$@) {
692 my $cb = pop; 763 my $cb = pop;
693 my ($method, $url, %arg) = @_; 764 my ($method, $url, %arg) = @_;
694 765
695 my %hdr; 766 my %hdr;
724 795
725 my $uport = $uscheme eq "http" ? 80 796 my $uport = $uscheme eq "http" ? 80
726 : $uscheme eq "https" ? 443 797 : $uscheme eq "https" ? 443
727 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 798 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
728 799
729 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 800 $uauthority =~ /^(?: .*\@ )? ([^\@]+?) (?: : (\d+) )?$/x
730 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 801 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
731 802
732 my $uhost = lc $1; 803 my $uhost = lc $1;
733 $uport = $2 if defined $2; 804 $uport = $2 if defined $2;
734 805
770 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 841 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
771 842
772 $hdr{"content-length"} = length $arg{body} 843 $hdr{"content-length"} = length $arg{body}
773 if length $arg{body} || $method ne "GET"; 844 if length $arg{body} || $method ne "GET";
774 845
775 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 846 my $idempotent = $IDEMPOTENT{$method};
776 847
777 # default value for keepalive is true iff the request is for an idempotent method 848 # default value for keepalive is true iff the request is for an idempotent method
778 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : $idempotent; 849 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent;
779 my $keepalive10 = exists $arg{keepalive10} ? $arg{keepalive10} : !$proxy; 850 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy;
780 my $keptalive; # true if this is actually a recycled connection 851 my $was_persistent; # true if this is actually a recycled connection
781 852
782 # the key to use in the keepalive cache 853 # the key to use in the keepalive cache
783 my $ka_key = "$uhost\x00$arg{sessionid}"; 854 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}";
784 855
785 $hdr{connection} = ($keepalive ? $keepalive10 ? "keep-alive " : "" : "close ") . "Te"; #1.1 856 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive, " : "" : "close, ") . "Te"; #1.1
786 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 857 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
787 858
788 my %state = (connect_guard => 1); 859 my %state = (connect_guard => 1);
789 860
790 my $ae_error = 595; # connecting 861 my $ae_error = 595; # connecting
800 # send request 871 # send request
801 $hdl->push_write ( 872 $hdl->push_write (
802 "$method $rpath HTTP/1.1\015\012" 873 "$method $rpath HTTP/1.1\015\012"
803 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 874 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
804 . "\015\012" 875 . "\015\012"
805 . (delete $arg{body}) 876 . $arg{body}
806 ); 877 );
807 878
808 # return if error occured during push_write() 879 # return if error occurred during push_write()
809 return unless %state; 880 return unless %state;
810 881
811 # reduce memory usage, save a kitten, also re-use it for the response headers. 882 # reduce memory usage, save a kitten, also re-use it for the response headers.
812 %hdr = (); 883 %hdr = ();
813 884
840 911
841 %hdr = (%$hdr, @pseudo); 912 %hdr = (%$hdr, @pseudo);
842 } 913 }
843 914
844 # redirect handling 915 # redirect handling
845 # microsoft and other shitheads don't give a shit for following standards, 916 # relative uri handling forced by microsoft and other shitheads.
846 # try to support some common forms of broken Location headers. 917 # we give our best and fall back to URI if available.
847 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) { 918 if (exists $hdr{location}) {
919 my $loc = $hdr{location};
920
921 if ($loc =~ m%^//%) { # //
922 $loc = "$uscheme:$loc";
923
924 } elsif ($loc eq "") {
925 $loc = $url;
926
927 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
848 $hdr{location} =~ s/^\.\/+//; 928 $loc =~ s/^\.\/+//;
849 929
850 my $url = "$rscheme://$uhost:$uport"; 930 if ($loc !~ m%^[.?#]%) {
931 my $prefix = "$uscheme://$uauthority";
851 932
852 unless ($hdr{location} =~ s/^\///) { 933 unless ($loc =~ s/^\///) {
853 $url .= $upath; 934 $prefix .= $upath;
854 $url =~ s/\/[^\/]*$//; 935 $prefix =~ s/\/[^\/]*$//;
936 }
937
938 $loc = "$prefix/$loc";
939
940 } elsif (eval { require URI }) { # uri
941 $loc = URI->new_abs ($loc, $url)->as_string;
942
943 } else {
944 return _error %state, $cb, { @pseudo, Status => 599, Reason => "Cannot parse Location (URI module missing)" };
945 #$hdr{Status} = 599;
946 #$hdr{Reason} = "Unparsable Redirect (URI module missing)";
947 #$recurse = 0;
948 }
855 } 949 }
856 950
857 $hdr{location} = "$url/$hdr{location}"; 951 $hdr{location} = $loc;
858 } 952 }
859 953
860 my $redirect; 954 my $redirect;
861 955
862 if ($recurse) { 956 if ($recurse) {
864 958
865 # industry standard is to redirect POST as GET for 959 # industry standard is to redirect POST as GET for
866 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1. 960 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
867 # also, the UA should ask the user for 301 and 307 and POST, 961 # also, the UA should ask the user for 301 and 307 and POST,
868 # industry standard seems to be to simply follow. 962 # industry standard seems to be to simply follow.
869 # we go with the industry standard. 963 # we go with the industry standard. 308 is defined
964 # by rfc7538
870 if ($status == 301 or $status == 302 or $status == 303) { 965 if ($status == 301 or $status == 302 or $status == 303) {
966 $redirect = 1;
871 # HTTP/1.1 is unclear on how to mutate the method 967 # HTTP/1.1 is unclear on how to mutate the method
872 $method = "GET" unless $method eq "HEAD"; 968 unless ($method eq "HEAD") {
873 $redirect = 1; 969 $method = "GET";
970 delete $arg{body};
971 }
874 } elsif ($status == 307) { 972 } elsif ($status == 307 or $status == 308) {
875 $redirect = 1; 973 $redirect = 1;
876 } 974 }
877 } 975 }
878 976
879 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive]) 977 my $finish = sub { # ($data, $err_status, $err_reason[, $persistent])
880 if ($state{handle}) { 978 if ($state{handle}) {
881 # handle keepalive 979 # handle keepalive
882 if ( 980 if (
883 $keepalive 981 $persistent
884 && $_[3] 982 && $_[3]
885 && ($hdr{HTTPVersion} < 1.1 983 && ($hdr{HTTPVersion} < 1.1
886 ? $hdr{connection} =~ /\bkeep-?alive\b/i 984 ? $hdr{connection} =~ /\bkeep-?alive\b/i
887 : $hdr{connection} !~ /\bclose\b/i) 985 : $hdr{connection} !~ /\bclose\b/i)
888 ) { 986 ) {
907 1005
908 if ($redirect && exists $hdr{location}) { 1006 if ($redirect && exists $hdr{location}) {
909 # we ignore any errors, as it is very common to receive 1007 # we ignore any errors, as it is very common to receive
910 # Content-Length != 0 but no actual body 1008 # Content-Length != 0 but no actual body
911 # we also access %hdr, as $_[1] might be an erro 1009 # we also access %hdr, as $_[1] might be an erro
1010 $state{recurse} =
912 http_request ( 1011 http_request (
913 $method => $hdr{location}, 1012 $method => $hdr{location},
914 %arg, 1013 %arg,
915 recurse => $recurse - 1, 1014 recurse => $recurse - 1,
916 Redirect => [$_[0], \%hdr], 1015 Redirect => [$_[0], \%hdr],
1016 sub {
1017 %state = ();
917 $cb 1018 &$cb
1019 },
918 ); 1020 );
919 } else { 1021 } else {
920 $cb->($_[0], \%hdr); 1022 $cb->($_[0], \%hdr);
921 } 1023 }
922 }; 1024 };
923 1025
951 $finish->(delete $state{handle}); 1053 $finish->(delete $state{handle});
952 1054
953 } elsif ($chunked) { 1055 } elsif ($chunked) {
954 my $cl = 0; 1056 my $cl = 0;
955 my $body = ""; 1057 my $body = "";
956 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1058 my $on_body = (!$redirect && $arg{on_body}) || sub { $body .= shift; 1 };
957 1059
958 $state{read_chunk} = sub { 1060 $state{read_chunk} = sub {
959 $_[1] =~ /^([0-9a-fA-F]+)/ 1061 $_[1] =~ /^([0-9a-fA-F]+)/
960 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1062 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
961 1063
962 my $len = hex $1; 1064 my $len = hex $1;
963 1065
964 if ($len) { 1066 if ($len) {
965 $cl += $len; 1067 $cl += $len;
994 } 1096 }
995 }; 1097 };
996 1098
997 $_[0]->push_read (line => $state{read_chunk}); 1099 $_[0]->push_read (line => $state{read_chunk});
998 1100
999 } elsif ($arg{on_body}) { 1101 } elsif (!$redirect && $arg{on_body}) {
1000 if (defined $len) { 1102 if (defined $len) {
1001 $_[0]->on_read (sub { 1103 $_[0]->on_read (sub {
1002 $len -= length $_[0]{rbuf}; 1104 $len -= length $_[0]{rbuf};
1003 1105
1004 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 1106 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
1035 } 1137 }
1036 }; 1138 };
1037 1139
1038 # if keepalive is enabled, then the server closing the connection 1140 # if keepalive is enabled, then the server closing the connection
1039 # before a response can happen legally - we retry on idempotent methods. 1141 # before a response can happen legally - we retry on idempotent methods.
1040 if ($keptalive && $idempotent) { 1142 if ($was_persistent && $idempotent) {
1041 my $old_eof = $hdl->{on_eof}; 1143 my $old_eof = $hdl->{on_eof};
1042 $hdl->{on_eof} = sub { 1144 $hdl->{on_eof} = sub {
1043 _destroy_state %state; 1145 _destroy_state %state;
1044 1146
1147 %state = ();
1148 $state{recurse} =
1045 http_request ( 1149 http_request (
1046 $method => $url, 1150 $method => $url,
1047 %arg, 1151 %arg,
1048 keepalive => 0, 1152 recurse => $recurse - 1,
1153 persistent => 0,
1154 sub {
1155 %state = ();
1049 $cb 1156 &$cb
1157 }
1050 ); 1158 );
1051 }; 1159 };
1052 $hdl->on_read (sub { 1160 $hdl->on_read (sub {
1053 return unless %state; 1161 return unless %state;
1054 1162
1055 # as soon as we receive something, a connection close 1163 # as soon as we receive something, a connection close
1063 }; 1171 };
1064 1172
1065 my $prepare_handle = sub { 1173 my $prepare_handle = sub {
1066 my ($hdl) = $state{handle}; 1174 my ($hdl) = $state{handle};
1067 1175
1068 $hdl->timeout ($timeout);
1069 $hdl->on_error (sub { 1176 $hdl->on_error (sub {
1070 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] }; 1177 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] };
1071 }); 1178 });
1072 $hdl->on_eof (sub { 1179 $hdl->on_eof (sub {
1073 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" }; 1180 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" };
1074 }); 1181 });
1182 $hdl->timeout_reset;
1183 $hdl->timeout ($timeout);
1075 }; 1184 };
1076 1185
1077 # connected to proxy (or origin server) 1186 # connected to proxy (or origin server)
1078 my $connect_cb = sub { 1187 my $connect_cb = sub {
1079 my $fh = shift 1188 my $fh = shift
1095 1204
1096 # now handle proxy-CONNECT method 1205 # now handle proxy-CONNECT method
1097 if ($proxy && $uscheme eq "https") { 1206 if ($proxy && $uscheme eq "https") {
1098 # oh dear, we have to wrap it into a connect request 1207 # oh dear, we have to wrap it into a connect request
1099 1208
1209 my $auth = exists $hdr{"proxy-authorization"}
1210 ? "proxy-authorization: " . (delete $hdr{"proxy-authorization"}) . "\015\012"
1211 : "";
1212
1100 # maybe re-use $uauthority with patched port? 1213 # maybe re-use $uauthority with patched port?
1101 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012"); 1214 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012$auth\015\012");
1102 $state{handle}->push_read (line => $qr_nlnl, sub { 1215 $state{handle}->push_read (line => $qr_nlnl, sub {
1103 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 1216 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
1104 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }; 1217 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" };
1105 1218
1106 if ($2 == 200) { 1219 if ($2 == 200) {
1109 } else { 1222 } else {
1110 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 }; 1223 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 };
1111 } 1224 }
1112 }); 1225 });
1113 } else { 1226 } else {
1227 delete $hdr{"proxy-authorization"} unless $proxy;
1228
1114 $handle_actual_request->(); 1229 $handle_actual_request->();
1115 } 1230 }
1116 }; 1231 };
1117 1232
1118 _get_slot $uhost, sub { 1233 _get_slot $uhost, sub {
1120 1235
1121 return unless $state{connect_guard}; 1236 return unless $state{connect_guard};
1122 1237
1123 # try to use an existing keepalive connection, but only if we, ourselves, plan 1238 # try to use an existing keepalive connection, but only if we, ourselves, plan
1124 # on a keepalive request (in theory, this should be a separate config option). 1239 # on a keepalive request (in theory, this should be a separate config option).
1125 if ($keepalive && $KA_CACHE{$ka_key}) { 1240 if ($persistent && $KA_CACHE{$ka_key}) {
1126 $keptalive = 1; 1241 $was_persistent = 1;
1242
1127 $state{handle} = ka_fetch $ka_key; 1243 $state{handle} = ka_fetch $ka_key;
1244# $state{handle}->destroyed
1245# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1128 $prepare_handle->(); 1246 $prepare_handle->();
1247# $state{handle}->destroyed
1248# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1249 $rpath = $upath;
1129 $handle_actual_request->(); 1250 $handle_actual_request->();
1130 1251
1131 } else { 1252 } else {
1132 my $tcp_connect = $arg{tcp_connect} 1253 my $tcp_connect = $arg{tcp_connect}
1133 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1254 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
1175Sets the default proxy server to use. The proxy-url must begin with a 1296Sets the default proxy server to use. The proxy-url must begin with a
1176string of the form C<http://host:port>, croaks otherwise. 1297string of the form C<http://host:port>, croaks otherwise.
1177 1298
1178To clear an already-set proxy, use C<undef>. 1299To clear an already-set proxy, use C<undef>.
1179 1300
1180When AnyEvent::HTTP is laoded for the first time it will query the 1301When AnyEvent::HTTP is loaded for the first time it will query the
1181default proxy from the operating system, currently by looking at 1302default proxy from the operating system, currently by looking at
1182C<$ENV{http_proxy>}. 1303C<$ENV{http_proxy>}.
1183 1304
1184=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1305=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1185 1306
1187C<$session_end> is given and true, then additionally remove all session 1308C<$session_end> is given and true, then additionally remove all session
1188cookies. 1309cookies.
1189 1310
1190You should call this function (with a true C<$session_end>) before you 1311You should call this function (with a true C<$session_end>) before you
1191save cookies to disk, and you should call this function after loading them 1312save cookies to disk, and you should call this function after loading them
1192again. If you have a long-running program you can additonally call this 1313again. If you have a long-running program you can additionally call this
1193function from time to time. 1314function from time to time.
1194 1315
1195A cookie jar is initially an empty hash-reference that is managed by this 1316A cookie jar is initially an empty hash-reference that is managed by this
1196module. It's format is subject to change, but currently it is like this: 1317module. Its format is subject to change, but currently it is as follows:
1197 1318
1198The key C<version> has to contain C<1>, otherwise the hash gets 1319The key C<version> has to contain C<2>, otherwise the hash gets
1199emptied. All other keys are hostnames or IP addresses pointing to 1320cleared. All other keys are hostnames or IP addresses pointing to
1200hash-references. The key for these inner hash references is the 1321hash-references. The key for these inner hash references is the
1201server path for which this cookie is meant, and the values are again 1322server path for which this cookie is meant, and the values are again
1202hash-references. The keys of those hash-references is the cookie name, and 1323hash-references. Each key of those hash-references is a cookie name, and
1203the value, you guessed it, is another hash-reference, this time with the 1324the value, you guessed it, is another hash-reference, this time with the
1204key-value pairs from the cookie, except for C<expires> and C<max-age>, 1325key-value pairs from the cookie, except for C<expires> and C<max-age>,
1205which have been replaced by a C<_expires> key that contains the cookie 1326which have been replaced by a C<_expires> key that contains the cookie
1206expiry timestamp. 1327expiry timestamp. Session cookies are indicated by not having an
1328C<_expires> key.
1207 1329
1208Here is an example of a cookie jar with a single cookie, so you have a 1330Here is an example of a cookie jar with a single cookie, so you have a
1209chance of understanding the above paragraph: 1331chance of understanding the above paragraph:
1210 1332
1211 { 1333 {
1212 version => 1, 1334 version => 2,
1213 "10.0.0.1" => { 1335 "10.0.0.1" => {
1214 "/" => { 1336 "/" => {
1215 "mythweb_id" => { 1337 "mythweb_id" => {
1216 _expires => 1293917923, 1338 _expires => 1293917923,
1217 value => "ooRung9dThee3ooyXooM1Ohm", 1339 value => "ooRung9dThee3ooyXooM1Ohm",
1235 1357
1236The default value for the C<recurse> request parameter (default: C<10>). 1358The default value for the C<recurse> request parameter (default: C<10>).
1237 1359
1238=item $AnyEvent::HTTP::TIMEOUT 1360=item $AnyEvent::HTTP::TIMEOUT
1239 1361
1240The default timeout for conenction operations (default: C<300>). 1362The default timeout for connection operations (default: C<300>).
1241 1363
1242=item $AnyEvent::HTTP::USERAGENT 1364=item $AnyEvent::HTTP::USERAGENT
1243 1365
1244The default value for the C<User-Agent> header (the default is 1366The default value for the C<User-Agent> header (the default is
1245C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1367C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1246 1368
1247=item $AnyEvent::HTTP::MAX_PER_HOST 1369=item $AnyEvent::HTTP::MAX_PER_HOST
1248 1370
1249The maximum number of concurrent connections to the same host (identified 1371The maximum number of concurrent connections to the same host (identified
1250by the hostname). If the limit is exceeded, then the additional requests 1372by the hostname). If the limit is exceeded, then additional requests
1251are queued until previous connections are closed. Both persistent and 1373are queued until previous connections are closed. Both persistent and
1252non-persistent connections are counted in this limit. 1374non-persistent connections are counted in this limit.
1253 1375
1254The default value for this is C<4>, and it is highly advisable to not 1376The default value for this is C<4>, and it is highly advisable to not
1255increase it much. 1377increase it much.
1256 1378
1257For comparison: the RFC's recommend 4 non-persistent or 2 persistent 1379For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1258connections, older browsers used 2, newers (such as firefox 3) typically 1380connections, older browsers used 2, newer ones (such as firefox 3)
1259use 6, and Opera uses 8 because like, they have the fastest browser and 1381typically use 6, and Opera uses 8 because like, they have the fastest
1260give a shit for everybody else on the planet. 1382browser and give a shit for everybody else on the planet.
1261 1383
1262=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 1384=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1263 1385
1264The time after which idle persistent conenctions get closed by 1386The time after which idle persistent connections get closed by
1265AnyEvent::HTTP (default: C<3>). 1387AnyEvent::HTTP (default: C<3>).
1266 1388
1267=item $AnyEvent::HTTP::ACTIVE 1389=item $AnyEvent::HTTP::ACTIVE
1268 1390
1269The number of active connections. This is not the number of currently 1391The number of active connections. This is not the number of currently
1310 # other formats fail in the loop below 1432 # other formats fail in the loop below
1311 1433
1312 for (0..11) { 1434 for (0..11) {
1313 if ($m eq $month[$_]) { 1435 if ($m eq $month[$_]) {
1314 require Time::Local; 1436 require Time::Local;
1315 return Time::Local::timegm ($S, $M, $H, $d, $_, $y); 1437 return eval { Time::Local::timegm ($S, $M, $H, $d, $_, $y) };
1316 } 1438 }
1317 } 1439 }
1318 1440
1319 undef 1441 undef
1320} 1442}
1334 set_proxy $ENV{http_proxy}; 1456 set_proxy $ENV{http_proxy};
1335}; 1457};
1336 1458
1337=head2 SHOWCASE 1459=head2 SHOWCASE
1338 1460
1339This section contaisn some more elaborate "real-world" examples or code 1461This section contains some more elaborate "real-world" examples or code
1340snippets. 1462snippets.
1341 1463
1342=head2 HTTP/1.1 FILE DOWNLOAD 1464=head2 HTTP/1.1 FILE DOWNLOAD
1343 1465
1344Downloading files with HTTP can be quite tricky, especially when something 1466Downloading files with HTTP can be quite tricky, especially when something
1348last modified time to check for file content changes, and works with many 1470last modified time to check for file content changes, and works with many
1349HTTP/1.0 servers as well, and usually falls back to a complete re-download 1471HTTP/1.0 servers as well, and usually falls back to a complete re-download
1350on older servers. 1472on older servers.
1351 1473
1352It calls the completion callback with either C<undef>, which means a 1474It calls the completion callback with either C<undef>, which means a
1353nonretryable error occured, C<0> when the download was partial and should 1475nonretryable error occurred, C<0> when the download was partial and should
1354be retried, and C<1> if it was successful. 1476be retried, and C<1> if it was successful.
1355 1477
1356 use AnyEvent::HTTP; 1478 use AnyEvent::HTTP;
1357 1479
1358 sub download($$$) { 1480 sub download($$$) {
1362 or die "$file: $!"; 1484 or die "$file: $!";
1363 1485
1364 my %hdr; 1486 my %hdr;
1365 my $ofs = 0; 1487 my $ofs = 0;
1366 1488
1367 warn stat $fh;
1368 warn -s _;
1369 if (stat $fh and -s _) { 1489 if (stat $fh and -s _) {
1370 $ofs = -s _; 1490 $ofs = -s _;
1371 warn "-s is ", $ofs;#d# 1491 warn "-s is ", $ofs;
1372 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; 1492 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1373 $hdr{"range"} = "bytes=$ofs-"; 1493 $hdr{"range"} = "bytes=$ofs-";
1374 } 1494 }
1375 1495
1376 http_get $url, 1496 http_get $url,
1401 my (undef, $hdr) = @_; 1521 my (undef, $hdr) = @_;
1402 1522
1403 my $status = $hdr->{Status}; 1523 my $status = $hdr->{Status};
1404 1524
1405 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { 1525 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1406 utime $fh, $time, $time; 1526 utime $time, $time, $fh;
1407 } 1527 }
1408 1528
1409 if ($status == 200 || $status == 206 || $status == 416) { 1529 if ($status == 200 || $status == 206 || $status == 416) {
1410 # download ok || resume ok || file already fully downloaded 1530 # download ok || resume ok || file already fully downloaded
1411 $cb->(1, $hdr); 1531 $cb->(1, $hdr);

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines