ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.91 by root, Mon Jan 3 01:03:29 2011 UTC vs.
Revision 1.134 by root, Fri Sep 7 22:11:31 2018 UTC

15This module is an L<AnyEvent> user, you need to make sure that you use and 15This module is an L<AnyEvent> user, you need to make sure that you use and
16run a supported event loop. 16run a supported event loop.
17 17
18This module implements a simple, stateless and non-blocking HTTP 18This module implements a simple, stateless and non-blocking HTTP
19client. It supports GET, POST and other request methods, cookies and more, 19client. It supports GET, POST and other request methods, cookies and more,
20all on a very low level. It can follow redirects supports proxies and 20all on a very low level. It can follow redirects, supports proxies, and
21automatically limits the number of connections to the values specified in 21automatically limits the number of connections to the values specified in
22the RFC. 22the RFC.
23 23
24It should generally be a "good client" that is enough for most HTTP 24It should generally be a "good client" that is enough for most HTTP
25tasks. Simple tasks should be simple, but complex tasks should still be 25tasks. Simple tasks should be simple, but complex tasks should still be
46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '1.5'; 51our $VERSION = 2.24;
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
57our $MAX_PERSISTENT = 8;
58our $PERSISTENT_TIMEOUT = 2; 57our $PERSISTENT_TIMEOUT = 3;
59our $TIMEOUT = 300; 58our $TIMEOUT = 300;
60 59our $MAX_PER_HOST = 4; # changing this is evil
61# changing these is evil
62our $MAX_PERSISTENT_PER_HOST = 2;
63our $MAX_PER_HOST = 4;
64 60
65our $PROXY; 61our $PROXY;
66our $ACTIVE = 0; 62our $ACTIVE = 0;
67 63
68my %KA_COUNT; # number of open keep-alive connections per host 64my %KA_CACHE; # indexed by uhost currently, points to [$handle...] array
69my %CO_SLOT; # number of open connections, and wait queue, per host 65my %CO_SLOT; # number of open connections, and wait queue, per host
70 66
71=item http_get $url, key => value..., $cb->($data, $headers) 67=item http_get $url, key => value..., $cb->($data, $headers)
72 68
73Executes an HTTP-GET request. See the http_request function for details on 69Executes an HTTP-GET request. See the http_request function for details on
93C<http_request> returns a "cancellation guard" - you have to keep the 89C<http_request> returns a "cancellation guard" - you have to keep the
94object at least alive until the callback get called. If the object gets 90object at least alive until the callback get called. If the object gets
95destroyed before the callback is called, the request will be cancelled. 91destroyed before the callback is called, the request will be cancelled.
96 92
97The callback will be called with the response body data as first argument 93The callback will be called with the response body data as first argument
98(or C<undef> if an error occured), and a hash-ref with response headers 94(or C<undef> if an error occurred), and a hash-ref with response headers
99(and trailers) as second argument. 95(and trailers) as second argument.
100 96
101All the headers in that hash are lowercased. In addition to the response 97All the headers in that hash are lowercased. In addition to the response
102headers, the "pseudo-headers" (uppercase to avoid clashing with possible 98headers, the "pseudo-headers" (uppercase to avoid clashing with possible
103response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
127C<590>-C<599> and the C<Reason> pseudo-header will contain an error 123C<590>-C<599> and the C<Reason> pseudo-header will contain an error
128message. Currently the following status codes are used: 124message. Currently the following status codes are used:
129 125
130=over 4 126=over 4
131 127
132=item 595 - errors during connection etsbalishment, proxy handshake. 128=item 595 - errors during connection establishment, proxy handshake.
133 129
134=item 596 - errors during TLS negotiation, request sending and header processing. 130=item 596 - errors during TLS negotiation, request sending and header processing.
135 131
136=item 597 - errors during body receiving or processing. 132=item 597 - errors during body receiving or processing.
137 133
158 154
159=over 4 155=over 4
160 156
161=item recurse => $count (default: $MAX_RECURSE) 157=item recurse => $count (default: $MAX_RECURSE)
162 158
163Whether to recurse requests or not, e.g. on redirects, authentication 159Whether to recurse requests or not, e.g. on redirects, authentication and
164retries and so on, and how often to do so. 160other retries and so on, and how often to do so.
161
162Only redirects to http and https URLs are supported. While most common
163redirection forms are handled entirely within this module, some require
164the use of the optional L<URI> module. If it is required but missing, then
165the request will fail with an error.
165 166
166=item headers => hashref 167=item headers => hashref
167 168
168The request headers to use. Currently, C<http_request> may provide its own 169The request headers to use. Currently, C<http_request> may provide its own
169C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 170C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
173 174
174You really should provide your own C<User-Agent:> header value that is 175You really should provide your own C<User-Agent:> header value that is
175appropriate for your program - I wouldn't be surprised if the default 176appropriate for your program - I wouldn't be surprised if the default
176AnyEvent string gets blocked by webservers sooner or later. 177AnyEvent string gets blocked by webservers sooner or later.
177 178
179Also, make sure that your headers names and values do not contain any
180embedded newlines.
181
178=item timeout => $seconds 182=item timeout => $seconds
179 183
180The time-out to use for various stages - each connect attempt will reset 184The time-out to use for various stages - each connect attempt will reset
181the timeout, as will read or write activity, i.e. this is not an overall 185the timeout, as will read or write activity, i.e. this is not an overall
182timeout. 186timeout.
183 187
184Default timeout is 5 minutes. 188Default timeout is 5 minutes.
185 189
186=item proxy => [$host, $port[, $scheme]] or undef 190=item proxy => [$host, $port[, $scheme]] or undef
187 191
188Use the given http proxy for all requests. If not specified, then the 192Use the given http proxy for all requests, or no proxy if C<undef> is
189default proxy (as specified by C<$ENV{http_proxy}>) is used. 193used.
190 194
191C<$scheme> must be either missing, C<http> for HTTP or C<https> for 195C<$scheme> must be either missing or must be C<http> for HTTP.
192HTTPS. 196
197If not specified, then the default proxy is used (see
198C<AnyEvent::HTTP::set_proxy>).
199
200Currently, if your proxy requires authorization, you have to specify an
201appropriate "Proxy-Authorization" header in every request.
193 202
194=item body => $string 203=item body => $string
195 204
196The request body, usually empty. Will be sent as-is (future versions of 205The request body, usually empty. Will be sent as-is (future versions of
197this module might offer more options). 206this module might offer more options).
228verification) TLS context. 237verification) TLS context.
229 238
230The default for this option is C<low>, which could be interpreted as "give 239The default for this option is C<low>, which could be interpreted as "give
231me the page, no matter what". 240me the page, no matter what".
232 241
242See also the C<sessionid> parameter.
243
244=item session => $string
245
246The module might reuse connections to the same host internally. Sometimes
247(e.g. when using TLS), you do not want to reuse connections from other
248sessions. This can be achieved by setting this parameter to some unique
249ID (such as the address of an object storing your state data, or the TLS
250context) - only connections using the same unique ID will be reused.
251
233=item on_prepare => $callback->($fh) 252=item on_prepare => $callback->($fh)
234 253
235In rare cases you need to "tune" the socket before it is used to 254In rare cases you need to "tune" the socket before it is used to
236connect (for exmaple, to bind it on a given IP address). This parameter 255connect (for example, to bind it on a given IP address). This parameter
237overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 256overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
238and behaves exactly the same way (e.g. it has to provide a 257and behaves exactly the same way (e.g. it has to provide a
239timeout). See the description for the C<$prepare_cb> argument of 258timeout). See the description for the C<$prepare_cb> argument of
240C<AnyEvent::Socket::tcp_connect> for details. 259C<AnyEvent::Socket::tcp_connect> for details.
241 260
244In even rarer cases you want total control over how AnyEvent::HTTP 263In even rarer cases you want total control over how AnyEvent::HTTP
245establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect> 264establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
246to do this, but you can provide your own C<tcp_connect> function - 265to do this, but you can provide your own C<tcp_connect> function -
247obviously, it has to follow the same calling conventions, except that it 266obviously, it has to follow the same calling conventions, except that it
248may always return a connection guard object. 267may always return a connection guard object.
268
269The connections made by this hook will be treated as equivalent to
270connecitons made the built-in way, specifically, they will be put into
271and taken from the persistent conneciton cache. If your C<$tcp_connect>
272function is incompatible with this kind of re-use, consider switching off
273C<persistent> connections and/or providing a C<session> identifier.
249 274
250There are probably lots of weird uses for this function, starting from 275There are probably lots of weird uses for this function, starting from
251tracing the hosts C<http_request> actually tries to connect, to (inexact 276tracing the hosts C<http_request> actually tries to connect, to (inexact
252but fast) host => IP address caching or even socks protocol support. 277but fast) host => IP address caching or even socks protocol support.
253 278
306called. Instead of the C<$body> argument containing the body data, the 331called. Instead of the C<$body> argument containing the body data, the
307callback will receive the L<AnyEvent::Handle> object associated with the 332callback will receive the L<AnyEvent::Handle> object associated with the
308connection. In error cases, C<undef> will be passed. When there is no body 333connection. In error cases, C<undef> will be passed. When there is no body
309(e.g. status C<304>), the empty string will be passed. 334(e.g. status C<304>), the empty string will be passed.
310 335
311The handle object might or might not be in TLS mode, might be connected to 336The handle object might or might not be in TLS mode, might be connected
312a proxy, be a persistent connection etc., and configured in unspecified 337to a proxy, be a persistent connection, use chunked transfer encoding
313ways. The user is responsible for this handle (it will not be used by this 338etc., and configured in unspecified ways. The user is responsible for this
314module anymore). 339handle (it will not be used by this module anymore).
315 340
316This is useful with some push-type services, where, after the initial 341This is useful with some push-type services, where, after the initial
317headers, an interactive protocol is used (typical example would be the 342headers, an interactive protocol is used (typical example would be the
318push-style twitter API which starts a JSON/XML stream). 343push-style twitter API which starts a JSON/XML stream).
319 344
320If you think you need this, first have a look at C<on_body>, to see if 345If you think you need this, first have a look at C<on_body>, to see if
321that doesn't solve your problem in a better way. 346that doesn't solve your problem in a better way.
347
348=item persistent => $boolean
349
350Try to create/reuse a persistent connection. When this flag is set
351(default: true for idempotent requests, false for all others), then
352C<http_request> tries to re-use an existing (previously-created)
353persistent connection to same host (i.e. identical URL scheme, hostname,
354port and session) and, failing that, tries to create a new one.
355
356Requests failing in certain ways will be automatically retried once, which
357is dangerous for non-idempotent requests, which is why it defaults to off
358for them. The reason for this is because the bozos who designed HTTP/1.1
359made it impossible to distinguish between a fatal error and a normal
360connection timeout, so you never know whether there was a problem with
361your request or not.
362
363When reusing an existent connection, many parameters (such as TLS context)
364will be ignored. See the C<session> parameter for a workaround.
365
366=item keepalive => $boolean
367
368Only used when C<persistent> is also true. This parameter decides whether
369C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection
370(as opposed to only a HTTP/1.1 persistent connection).
371
372The default is true, except when using a proxy, in which case it defaults
373to false, as HTTP/1.0 proxies cannot support this in a meaningful way.
374
375=item handle_params => { key => value ... }
376
377The key-value pairs in this hash will be passed to any L<AnyEvent::Handle>
378constructor that is called - not all requests will create a handle, and
379sometimes more than one is created, so this parameter is only good for
380setting hints.
381
382Example: set the maximum read size to 4096, to potentially conserve memory
383at the cost of speed.
384
385 handle_params => {
386 max_read_size => 4096,
387 },
322 388
323=back 389=back
324 390
325Example: do a simple HTTP GET request for http://www.nethype.de/ and print 391Example: do a simple HTTP GET request for http://www.nethype.de/ and print
326the response body. 392the response body.
332 398
333Example: do a HTTP HEAD request on https://www.google.com/, use a 399Example: do a HTTP HEAD request on https://www.google.com/, use a
334timeout of 30 seconds. 400timeout of 30 seconds.
335 401
336 http_request 402 http_request
337 GET => "https://www.google.com", 403 HEAD => "https://www.google.com",
338 headers => { "user-agent" => "MySearchClient 1.0" }, 404 headers => { "user-agent" => "MySearchClient 1.0" },
339 timeout => 30, 405 timeout => 30,
340 sub { 406 sub {
341 my ($body, $hdr) = @_; 407 my ($body, $hdr) = @_;
342 use Data::Dumper; 408 use Data::Dumper;
353 }; 419 };
354 420
355 undef $request; 421 undef $request;
356 422
357=cut 423=cut
424
425#############################################################################
426# wait queue/slots
358 427
359sub _slot_schedule; 428sub _slot_schedule;
360sub _slot_schedule($) { 429sub _slot_schedule($) {
361 my $host = shift; 430 my $host = shift;
362 431
385 454
386 _slot_schedule $_[0]; 455 _slot_schedule $_[0];
387} 456}
388 457
389############################################################################# 458#############################################################################
459# cookie handling
390 460
391# expire cookies 461# expire cookies
392sub cookie_jar_expire($;$) { 462sub cookie_jar_expire($;$) {
393 my ($jar, $session_end) = @_; 463 my ($jar, $session_end) = @_;
394 464
395 %$jar = () if $jar->{version} != 1; 465 %$jar = () if $jar->{version} != 2;
396 466
397 my $anow = AE::now; 467 my $anow = AE::now;
398 468
399 while (my ($chost, $paths) = each %$jar) { 469 while (my ($chost, $paths) = each %$jar) {
400 next unless ref $paths; 470 next unless ref $paths;
418 } 488 }
419} 489}
420 490
421# extract cookies from jar 491# extract cookies from jar
422sub cookie_jar_extract($$$$) { 492sub cookie_jar_extract($$$$) {
423 my ($jar, $uscheme, $uhost, $upath) = @_; 493 my ($jar, $scheme, $host, $path) = @_;
424 494
425 %$jar = () if $jar->{version} != 1; 495 %$jar = () if $jar->{version} != 2;
496
497 $host = AnyEvent::Util::idn_to_ascii $host
498 if $host =~ /[^\x00-\x7f]/;
426 499
427 my @cookies; 500 my @cookies;
428 501
429 while (my ($chost, $paths) = each %$jar) { 502 while (my ($chost, $paths) = each %$jar) {
430 next unless ref $paths; 503 next unless ref $paths;
431 504
432 if ($chost =~ /^\./) { 505 # exact match or suffix including . match
433 next unless $chost eq substr $uhost, -length $chost; 506 $chost eq $host or ".$chost" eq substr $host, -1 - length $chost
434 } elsif ($chost =~ /\./) {
435 next unless $chost eq $uhost;
436 } else {
437 next; 507 or next;
438 }
439 508
440 while (my ($cpath, $cookies) = each %$paths) { 509 while (my ($cpath, $cookies) = each %$paths) {
441 next unless $cpath eq substr $upath, 0, length $cpath; 510 next unless $cpath eq substr $path, 0, length $cpath;
442 511
443 while (my ($cookie, $kv) = each %$cookies) { 512 while (my ($cookie, $kv) = each %$cookies) {
444 next if $uscheme ne "https" && exists $kv->{secure}; 513 next if $scheme ne "https" && exists $kv->{secure};
445 514
446 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) { 515 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) {
447 delete $cookies->{$cookie}; 516 delete $cookies->{$cookie};
448 next; 517 next;
449 } 518 }
463 \@cookies 532 \@cookies
464} 533}
465 534
466# parse set_cookie header into jar 535# parse set_cookie header into jar
467sub cookie_jar_set_cookie($$$$) { 536sub cookie_jar_set_cookie($$$$) {
468 my ($jar, $set_cookie, $uhost, $date) = @_; 537 my ($jar, $set_cookie, $host, $date) = @_;
538
539 %$jar = () if $jar->{version} != 2;
469 540
470 my $anow = int AE::now; 541 my $anow = int AE::now;
471 my $snow; # server-now 542 my $snow; # server-now
472 543
473 for ($set_cookie) { 544 for ($set_cookie) {
479 while ( 550 while (
480 m{ 551 m{
481 \G\s* 552 \G\s*
482 (?: 553 (?:
483 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+) 554 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+)
484 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) )? 555 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^;,[:space:]]*) ) )?
485 ) 556 )
486 }gcxsi 557 }gcxsi
487 ) { 558 ) {
488 my $name = $2; 559 my $name = $2;
489 my $value = $4; 560 my $value = $4;
496 # quoted 567 # quoted
497 $value = $3; 568 $value = $3;
498 $value =~ s/\\(.)/$1/gs; 569 $value =~ s/\\(.)/$1/gs;
499 } 570 }
500 571
501 push @kv, lc $name, $value; 572 push @kv, @kv ? lc $name : $name, $value;
502 573
503 last unless /\G\s*;/gc; 574 last unless /\G\s*;/gc;
504 } 575 }
505 576
506 last unless @kv; 577 last unless @kv;
519 590
520 my $cdom; 591 my $cdom;
521 my $cpath = (delete $kv{path}) || "/"; 592 my $cpath = (delete $kv{path}) || "/";
522 593
523 if (exists $kv{domain}) { 594 if (exists $kv{domain}) {
524 $cdom = delete $kv{domain}; 595 $cdom = $kv{domain};
525 596
526 $cdom =~ s/^\.?/./; # make sure it starts with a "." 597 $cdom =~ s/^\.?/./; # make sure it starts with a "."
527 598
528 next if $cdom =~ /\.$/; 599 next if $cdom =~ /\.$/;
529 600
530 # this is not rfc-like and not netscape-like. go figure. 601 # this is not rfc-like and not netscape-like. go figure.
531 my $ndots = $cdom =~ y/.//; 602 my $ndots = $cdom =~ y/.//;
532 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 603 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
604
605 $cdom = substr $cdom, 1; # remove initial .
533 } else { 606 } else {
534 $cdom = $uhost; 607 $cdom = $host;
535 } 608 }
536 609
537 # store it 610 # store it
538 $jar->{version} = 1; 611 $jar->{version} = 2;
539 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 612 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
540 613
541 redo if /\G\s*,/gc; 614 redo if /\G\s*,/gc;
542 } 615 }
543} 616}
544 617
618#############################################################################
619# keepalive/persistent connection cache
620
621# fetch a connection from the keepalive cache
622sub ka_fetch($) {
623 my $ka_key = shift;
624
625 my $hdl = pop @{ $KA_CACHE{$ka_key} }; # currently we reuse the MOST RECENTLY USED connection
626 delete $KA_CACHE{$ka_key}
627 unless @{ $KA_CACHE{$ka_key} };
628
629 $hdl
630}
631
632sub ka_store($$) {
633 my ($ka_key, $hdl) = @_;
634
635 my $kaa = $KA_CACHE{$ka_key} ||= [];
636
637 my $destroy = sub {
638 my @ka = grep $_ != $hdl, @{ $KA_CACHE{$ka_key} };
639
640 $hdl->destroy;
641
642 @ka
643 ? $KA_CACHE{$ka_key} = \@ka
644 : delete $KA_CACHE{$ka_key};
645 };
646
647 # on error etc., destroy
648 $hdl->on_error ($destroy);
649 $hdl->on_eof ($destroy);
650 $hdl->on_read ($destroy);
651 $hdl->timeout ($PERSISTENT_TIMEOUT);
652
653 push @$kaa, $hdl;
654 shift @$kaa while @$kaa > $MAX_PER_HOST;
655}
656
657#############################################################################
658# utilities
659
545# continue to parse $_ for headers and place them into the arg 660# continue to parse $_ for headers and place them into the arg
546sub parse_hdr() { 661sub _parse_hdr() {
547 my %hdr; 662 my %hdr;
548 663
549 # things seen, not parsed: 664 # things seen, not parsed:
550 # p3pP="NON CUR OTPi OUR NOR UNI" 665 # p3pP="NON CUR OTPi OUR NOR UNI"
551 666
565 for values %hdr; 680 for values %hdr;
566 681
567 \%hdr 682 \%hdr
568} 683}
569 684
685#############################################################################
686# http_get
687
570our $qr_nlnl = qr{(?<![^\012])\015?\012}; 688our $qr_nlnl = qr{(?<![^\012])\015?\012};
571 689
572our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 690our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
573our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 691our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
692
693# maybe it should just become a normal object :/
694
695sub _destroy_state(\%) {
696 my ($state) = @_;
697
698 $state->{handle}->destroy if $state->{handle};
699 %$state = ();
700}
701
702sub _error(\%$$) {
703 my ($state, $cb, $hdr) = @_;
704
705 &_destroy_state ($state);
706
707 $cb->(undef, $hdr);
708 ()
709}
710
711our %IDEMPOTENT = (
712 DELETE => 1,
713 GET => 1,
714 HEAD => 1,
715 OPTIONS => 1,
716 PUT => 1,
717 TRACE => 1,
718
719 ACL => 1,
720 "BASELINE-CONTROL" => 1,
721 BIND => 1,
722 CHECKIN => 1,
723 CHECKOUT => 1,
724 COPY => 1,
725 LABEL => 1,
726 LINK => 1,
727 MERGE => 1,
728 MKACTIVITY => 1,
729 MKCALENDAR => 1,
730 MKCOL => 1,
731 MKREDIRECTREF => 1,
732 MKWORKSPACE => 1,
733 MOVE => 1,
734 ORDERPATCH => 1,
735 PROPFIND => 1,
736 PROPPATCH => 1,
737 REBIND => 1,
738 REPORT => 1,
739 SEARCH => 1,
740 UNBIND => 1,
741 UNCHECKOUT => 1,
742 UNLINK => 1,
743 UNLOCK => 1,
744 UPDATE => 1,
745 UPDATEREDIRECTREF => 1,
746 "VERSION-CONTROL" => 1,
747);
574 748
575sub http_request($$@) { 749sub http_request($$@) {
576 my $cb = pop; 750 my $cb = pop;
577 my ($method, $url, %arg) = @_; 751 my ($method, $url, %arg) = @_;
578 752
596 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE; 770 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
597 771
598 return $cb->(undef, { @pseudo, Status => 599, Reason => "Too many redirections" }) 772 return $cb->(undef, { @pseudo, Status => 599, Reason => "Too many redirections" })
599 if $recurse < 0; 773 if $recurse < 0;
600 774
601 my $proxy = $arg{proxy} || $PROXY; 775 my $proxy = exists $arg{proxy} ? $arg{proxy} : $PROXY;
602 my $timeout = $arg{timeout} || $TIMEOUT; 776 my $timeout = $arg{timeout} || $TIMEOUT;
603 777
604 my ($uscheme, $uauthority, $upath, $query, $fragment) = 778 my ($uscheme, $uauthority, $upath, $query, undef) = # ignore fragment
605 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|; 779 $url =~ m|^([^:]+):(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?$|;
606 780
607 $uscheme = lc $uscheme; 781 $uscheme = lc $uscheme;
608 782
609 my $uport = $uscheme eq "http" ? 80 783 my $uport = $uscheme eq "http" ? 80
610 : $uscheme eq "https" ? 443 784 : $uscheme eq "https" ? 443
611 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 785 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
612 786
613 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 787 $uauthority =~ /^(?: .*\@ )? ([^\@]+?) (?: : (\d+) )?$/x
614 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 788 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
615 789
616 my $uhost = lc $1; 790 my $uhost = lc $1;
617 $uport = $2 if defined $2; 791 $uport = $2 if defined $2;
618 792
654 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 828 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
655 829
656 $hdr{"content-length"} = length $arg{body} 830 $hdr{"content-length"} = length $arg{body}
657 if length $arg{body} || $method ne "GET"; 831 if length $arg{body} || $method ne "GET";
658 832
659 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 833 my $idempotent = $IDEMPOTENT{$method};
660 834
661 # default value for keepalive is true iff the request is for an idempotent method 835 # default value for keepalive is true iff the request is for an idempotent method
662 my $keepalive = exists $arg{keepalive} 836 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent;
663 ? $arg{keepalive}*1 837 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy;
664 : $idempotent ? $PERSISTENT_TIMEOUT : 0; 838 my $was_persistent; # true if this is actually a recycled connection
665 839
840 # the key to use in the keepalive cache
841 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}";
842
666 $hdr{connection} = ($keepalive ? "" : "close ") . "Te"; #1.1 843 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive, " : "" : "close, ") . "Te"; #1.1
667 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 844 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
668 845
669 my %state = (connect_guard => 1); 846 my %state = (connect_guard => 1);
670 847
671 my $ae_error = 595; # connecting 848 my $ae_error = 595; # connecting
672 849
673 # handle actual, non-tunneled, request 850 # handle actual, non-tunneled, request
674 my $handle_actual_request = sub { 851 my $handle_actual_request = sub {
675 $ae_error = 596; # request phase 852 $ae_error = 596; # request phase
676 853
854 my $hdl = $state{handle};
855
677 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls}; 856 $hdl->starttls ("connect") if $uscheme eq "https" && !exists $hdl->{tls};
678 857
679 # send request 858 # send request
680 $state{handle}->push_write ( 859 $hdl->push_write (
681 "$method $rpath HTTP/1.1\015\012" 860 "$method $rpath HTTP/1.1\015\012"
682 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 861 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
683 . "\015\012" 862 . "\015\012"
684 . (delete $arg{body}) 863 . $arg{body}
685 ); 864 );
686 865
687 # return if error occured during push_write() 866 # return if error occurred during push_write()
688 return unless %state; 867 return unless %state;
689 868
690 # reduce memory usage, save a kitten, also re-use it for the response headers. 869 # reduce memory usage, save a kitten, also re-use it for the response headers.
691 %hdr = (); 870 %hdr = ();
692 871
693 # status line and headers 872 # status line and headers
694 $state{read_response} = sub { 873 $state{read_response} = sub {
874 return unless %state;
875
695 for ("$_[1]") { 876 for ("$_[1]") {
696 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 877 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
697 878
698 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci 879 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
699 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" })); 880 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid server response" };
700 881
701 # 100 Continue handling 882 # 100 Continue handling
702 # should not happen as we don't send expect: 100-continue, 883 # should not happen as we don't send expect: 100-continue,
703 # but we handle it just in case. 884 # but we handle it just in case.
704 # since we send the request body regardless, if we get an error 885 # since we send the request body regardless, if we get an error
710 HTTPVersion => $1, 891 HTTPVersion => $1,
711 Status => $2, 892 Status => $2,
712 Reason => $3, 893 Reason => $3,
713 ; 894 ;
714 895
715 my $hdr = parse_hdr 896 my $hdr = _parse_hdr
716 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" })); 897 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Garbled response headers" };
717 898
718 %hdr = (%$hdr, @pseudo); 899 %hdr = (%$hdr, @pseudo);
719 } 900 }
720 901
721 # redirect handling 902 # redirect handling
722 # microsoft and other shitheads don't give a shit for following standards, 903 # relative uri handling forced by microsoft and other shitheads.
723 # try to support some common forms of broken Location headers. 904 # we give our best and fall back to URI if available.
724 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) { 905 if (exists $hdr{location}) {
906 my $loc = $hdr{location};
907
908 if ($loc =~ m%^//%) { # //
909 $loc = "$uscheme:$loc";
910
911 } elsif ($loc eq "") {
912 $loc = $url;
913
914 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
725 $hdr{location} =~ s/^\.\/+//; 915 $loc =~ s/^\.\/+//;
726 916
727 my $url = "$rscheme://$uhost:$uport"; 917 if ($loc !~ m%^[.?#]%) {
918 my $prefix = "$uscheme://$uauthority";
728 919
729 unless ($hdr{location} =~ s/^\///) { 920 unless ($loc =~ s/^\///) {
730 $url .= $upath; 921 $prefix .= $upath;
731 $url =~ s/\/[^\/]*$//; 922 $prefix =~ s/\/[^\/]*$//;
923 }
924
925 $loc = "$prefix/$loc";
926
927 } elsif (eval { require URI }) { # uri
928 $loc = URI->new_abs ($loc, $url)->as_string;
929
930 } else {
931 return _error %state, $cb, { @pseudo, Status => 599, Reason => "Cannot parse Location (URI module missing)" };
932 #$hdr{Status} = 599;
933 #$hdr{Reason} = "Unparsable Redirect (URI module missing)";
934 #$recurse = 0;
935 }
732 } 936 }
733 937
734 $hdr{location} = "$url/$hdr{location}"; 938 $hdr{location} = $loc;
735 } 939 }
736 940
737 my $redirect; 941 my $redirect;
738 942
739 if ($recurse) { 943 if ($recurse) {
741 945
742 # industry standard is to redirect POST as GET for 946 # industry standard is to redirect POST as GET for
743 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1. 947 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
744 # also, the UA should ask the user for 301 and 307 and POST, 948 # also, the UA should ask the user for 301 and 307 and POST,
745 # industry standard seems to be to simply follow. 949 # industry standard seems to be to simply follow.
746 # we go with the industry standard. 950 # we go with the industry standard. 308 is defined
951 # by rfc7538
747 if ($status == 301 or $status == 302 or $status == 303) { 952 if ($status == 301 or $status == 302 or $status == 303) {
953 $redirect = 1;
748 # HTTP/1.1 is unclear on how to mutate the method 954 # HTTP/1.1 is unclear on how to mutate the method
749 $method = "GET" unless $method eq "HEAD"; 955 unless ($method eq "HEAD") {
750 $redirect = 1; 956 $method = "GET";
957 delete $arg{body};
958 }
751 } elsif ($status == 307) { 959 } elsif ($status == 307 or $status == 308) {
752 $redirect = 1; 960 $redirect = 1;
753 } 961 }
754 } 962 }
755 963
756 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive]) 964 my $finish = sub { # ($data, $err_status, $err_reason[, $persistent])
757 my $may_keep_alive = $_[3]; 965 if ($state{handle}) {
758 966 # handle keepalive
967 if (
968 $persistent
969 && $_[3]
970 && ($hdr{HTTPVersion} < 1.1
971 ? $hdr{connection} =~ /\bkeep-?alive\b/i
972 : $hdr{connection} !~ /\bclose\b/i)
973 ) {
974 ka_store $ka_key, delete $state{handle};
975 } else {
976 # no keepalive, destroy the handle
759 $state{handle}->destroy if $state{handle}; 977 $state{handle}->destroy;
978 }
979 }
980
760 %state = (); 981 %state = ();
761 982
762 if (defined $_[1]) { 983 if (defined $_[1]) {
763 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1]; 984 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
764 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2]; 985 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
771 992
772 if ($redirect && exists $hdr{location}) { 993 if ($redirect && exists $hdr{location}) {
773 # we ignore any errors, as it is very common to receive 994 # we ignore any errors, as it is very common to receive
774 # Content-Length != 0 but no actual body 995 # Content-Length != 0 but no actual body
775 # we also access %hdr, as $_[1] might be an erro 996 # we also access %hdr, as $_[1] might be an erro
997 $state{recurse} =
776 http_request ( 998 http_request (
777 $method => $hdr{location}, 999 $method => $hdr{location},
778 %arg, 1000 %arg,
779 recurse => $recurse - 1, 1001 recurse => $recurse - 1,
780 Redirect => [$_[0], \%hdr], 1002 Redirect => [$_[0], \%hdr],
1003 sub {
1004 %state = ();
1005 &$cb
1006 },
781 $cb); 1007 );
782 } else { 1008 } else {
783 $cb->($_[0], \%hdr); 1009 $cb->($_[0], \%hdr);
784 } 1010 }
785 }; 1011 };
786 1012
813 1039
814 $finish->(delete $state{handle}); 1040 $finish->(delete $state{handle});
815 1041
816 } elsif ($chunked) { 1042 } elsif ($chunked) {
817 my $cl = 0; 1043 my $cl = 0;
818 my $body = undef; 1044 my $body = "";
819 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1045 my $on_body = (!$redirect && $arg{on_body}) || sub { $body .= shift; 1 };
820 1046
821 $state{read_chunk} = sub { 1047 $state{read_chunk} = sub {
822 $_[1] =~ /^([0-9a-fA-F]+)/ 1048 $_[1] =~ /^([0-9a-fA-F]+)/
823 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1049 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
824 1050
825 my $len = hex $1; 1051 my $len = hex $1;
826 1052
827 if ($len) { 1053 if ($len) {
828 $cl += $len; 1054 $cl += $len;
843 $_[0]->push_read (line => $qr_nlnl, sub { 1069 $_[0]->push_read (line => $qr_nlnl, sub {
844 if (length $_[1]) { 1070 if (length $_[1]) {
845 for ("$_[1]") { 1071 for ("$_[1]") {
846 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 1072 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
847 1073
848 my $hdr = parse_hdr 1074 my $hdr = _parse_hdr
849 or return $finish->(undef, $ae_error => "Garbled response trailers"); 1075 or return $finish->(undef, $ae_error => "Garbled response trailers");
850 1076
851 %hdr = (%hdr, %$hdr); 1077 %hdr = (%hdr, %$hdr);
852 } 1078 }
853 } 1079 }
857 } 1083 }
858 }; 1084 };
859 1085
860 $_[0]->push_read (line => $state{read_chunk}); 1086 $_[0]->push_read (line => $state{read_chunk});
861 1087
862 } elsif ($arg{on_body}) { 1088 } elsif (!$redirect && $arg{on_body}) {
863 if (defined $len) { 1089 if (defined $len) {
864 $_[0]->on_read (sub { 1090 $_[0]->on_read (sub {
865 $len -= length $_[0]{rbuf}; 1091 $len -= length $_[0]{rbuf};
866 1092
867 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 1093 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
896 $_[0]->on_read (sub { }); 1122 $_[0]->on_read (sub { });
897 } 1123 }
898 } 1124 }
899 }; 1125 };
900 1126
1127 # if keepalive is enabled, then the server closing the connection
1128 # before a response can happen legally - we retry on idempotent methods.
1129 if ($was_persistent && $idempotent) {
1130 my $old_eof = $hdl->{on_eof};
1131 $hdl->{on_eof} = sub {
1132 _destroy_state %state;
1133
1134 %state = ();
1135 $state{recurse} =
1136 http_request (
1137 $method => $url,
1138 %arg,
1139 recurse => $recurse - 1,
1140 persistent => 0,
1141 sub {
1142 %state = ();
1143 &$cb
1144 }
1145 );
1146 };
1147 $hdl->on_read (sub {
1148 return unless %state;
1149
1150 # as soon as we receive something, a connection close
1151 # once more becomes a hard error
1152 $hdl->{on_eof} = $old_eof;
1153 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1154 });
1155 } else {
901 $state{handle}->push_read (line => $qr_nlnl, $state{read_response}); 1156 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1157 }
902 }; 1158 };
903 1159
1160 my $prepare_handle = sub {
1161 my ($hdl) = $state{handle};
1162
1163 $hdl->on_error (sub {
1164 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] };
1165 });
1166 $hdl->on_eof (sub {
1167 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" };
1168 });
1169 $hdl->timeout_reset;
1170 $hdl->timeout ($timeout);
1171 };
1172
1173 # connected to proxy (or origin server)
904 my $connect_cb = sub { 1174 my $connect_cb = sub {
905 $state{fh} = shift 1175 my $fh = shift
906 or do {
907 my $err = "$!";
908 %state = ();
909 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err }); 1176 or return _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "$!" };
910 };
911 1177
912 return unless delete $state{connect_guard}; 1178 return unless delete $state{connect_guard};
913 1179
914 # get handle 1180 # get handle
915 $state{handle} = new AnyEvent::Handle 1181 $state{handle} = new AnyEvent::Handle
1182 %{ $arg{handle_params} },
916 fh => $state{fh}, 1183 fh => $fh,
917 peername => $rhost, 1184 peername => $uhost,
918 tls_ctx => $arg{tls_ctx}, 1185 tls_ctx => $arg{tls_ctx},
919 # these need to be reconfigured on keepalive handles
920 timeout => $timeout,
921 on_error => sub {
922 %state = ();
923 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
924 },
925 on_eof => sub {
926 %state = ();
927 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
928 },
929 ; 1186 ;
930 1187
931 # limit the number of persistent connections 1188 $prepare_handle->();
932 # keepalive not yet supported
933# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
934# ++$KA_COUNT{$_[1]};
935# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
936# --$KA_COUNT{$_[1]}
937# };
938# $hdr{connection} = "keep-alive";
939# }
940 1189
941 $state{handle}->starttls ("connect") if $rscheme eq "https"; 1190 #$state{handle}->starttls ("connect") if $rscheme eq "https";
942 1191
943 # now handle proxy-CONNECT method 1192 # now handle proxy-CONNECT method
944 if ($proxy && $uscheme eq "https") { 1193 if ($proxy && $uscheme eq "https") {
945 # oh dear, we have to wrap it into a connect request 1194 # oh dear, we have to wrap it into a connect request
946 1195
1196 my $auth = exists $hdr{"proxy-authorization"}
1197 ? "proxy-authorization: " . (delete $hdr{"proxy-authorization"}) . "\015\012"
1198 : "";
1199
947 # maybe re-use $uauthority with patched port? 1200 # maybe re-use $uauthority with patched port?
948 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012"); 1201 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012$auth\015\012");
949 $state{handle}->push_read (line => $qr_nlnl, sub { 1202 $state{handle}->push_read (line => $qr_nlnl, sub {
950 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 1203 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
951 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" })); 1204 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" };
952 1205
953 if ($2 == 200) { 1206 if ($2 == 200) {
954 $rpath = $upath; 1207 $rpath = $upath;
955 $handle_actual_request->(); 1208 $handle_actual_request->();
956 } else { 1209 } else {
957 %state = ();
958 $cb->(undef, { @pseudo, Status => $2, Reason => $3 }); 1210 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 };
959 } 1211 }
960 }); 1212 });
961 } else { 1213 } else {
1214 delete $hdr{"proxy-authorization"} unless $proxy;
1215
962 $handle_actual_request->(); 1216 $handle_actual_request->();
963 } 1217 }
964 }; 1218 };
965 1219
966 _get_slot $uhost, sub { 1220 _get_slot $uhost, sub {
967 $state{slot_guard} = shift; 1221 $state{slot_guard} = shift;
968 1222
969 return unless $state{connect_guard}; 1223 return unless $state{connect_guard};
970 1224
1225 # try to use an existing keepalive connection, but only if we, ourselves, plan
1226 # on a keepalive request (in theory, this should be a separate config option).
1227 if ($persistent && $KA_CACHE{$ka_key}) {
1228 $was_persistent = 1;
1229
1230 $state{handle} = ka_fetch $ka_key;
1231# $state{handle}->destroyed
1232# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1233 $prepare_handle->();
1234# $state{handle}->destroyed
1235# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1236 $handle_actual_request->();
1237
1238 } else {
971 my $tcp_connect = $arg{tcp_connect} 1239 my $tcp_connect = $arg{tcp_connect}
972 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1240 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
973 1241
974 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout }); 1242 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
1243 }
975 }; 1244 };
976 1245
977 defined wantarray && AnyEvent::Util::guard { %state = () } 1246 defined wantarray && AnyEvent::Util::guard { _destroy_state %state }
978} 1247}
979 1248
980sub http_get($@) { 1249sub http_get($@) {
981 unshift @_, "GET"; 1250 unshift @_, "GET";
982 &http_request 1251 &http_request
1000AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for 1269AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for
1001the actual connection, which in turn uses AnyEvent::DNS to resolve 1270the actual connection, which in turn uses AnyEvent::DNS to resolve
1002hostnames. The latter is a simple stub resolver and does no caching 1271hostnames. The latter is a simple stub resolver and does no caching
1003on its own. If you want DNS caching, you currently have to provide 1272on its own. If you want DNS caching, you currently have to provide
1004your own default resolver (by storing a suitable resolver object in 1273your own default resolver (by storing a suitable resolver object in
1005C<$AnyEvent::DNS::RESOLVER>). 1274C<$AnyEvent::DNS::RESOLVER>) or your own C<tcp_connect> callback.
1006 1275
1007=head2 GLOBAL FUNCTIONS AND VARIABLES 1276=head2 GLOBAL FUNCTIONS AND VARIABLES
1008 1277
1009=over 4 1278=over 4
1010 1279
1011=item AnyEvent::HTTP::set_proxy "proxy-url" 1280=item AnyEvent::HTTP::set_proxy "proxy-url"
1012 1281
1013Sets the default proxy server to use. The proxy-url must begin with a 1282Sets the default proxy server to use. The proxy-url must begin with a
1014string of the form C<http://host:port> (optionally C<https:...>), croaks 1283string of the form C<http://host:port>, croaks otherwise.
1015otherwise.
1016 1284
1017To clear an already-set proxy, use C<undef>. 1285To clear an already-set proxy, use C<undef>.
1286
1287When AnyEvent::HTTP is loaded for the first time it will query the
1288default proxy from the operating system, currently by looking at
1289C<$ENV{http_proxy>}.
1018 1290
1019=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1291=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1020 1292
1021Remove all cookies from the cookie jar that have been expired. If 1293Remove all cookies from the cookie jar that have been expired. If
1022C<$session_end> is given and true, then additionally remove all session 1294C<$session_end> is given and true, then additionally remove all session
1023cookies. 1295cookies.
1024 1296
1025You should call this function (with a true C<$session_end>) before you 1297You should call this function (with a true C<$session_end>) before you
1026save cookies to disk, and you should call this function after loading them 1298save cookies to disk, and you should call this function after loading them
1027again. If you have a long-running program you can additonally call this 1299again. If you have a long-running program you can additionally call this
1028function from time to time. 1300function from time to time.
1029 1301
1030A cookie jar is initially an empty hash-reference that is managed by this 1302A cookie jar is initially an empty hash-reference that is managed by this
1031module. It's format is subject to change, but currently it is like this: 1303module. Its format is subject to change, but currently it is as follows:
1032 1304
1033The key C<version> has to contain C<1>, otherwise the hash gets 1305The key C<version> has to contain C<2>, otherwise the hash gets
1034emptied. All other keys are hostnames or IP addresses pointing to 1306cleared. All other keys are hostnames or IP addresses pointing to
1035hash-references. The key for these inner hash references is the 1307hash-references. The key for these inner hash references is the
1036server path for which this cookie is meant, and the values are again 1308server path for which this cookie is meant, and the values are again
1037hash-references. The keys of those hash-references is the cookie name, and 1309hash-references. Each key of those hash-references is a cookie name, and
1038the value, you guessed it, is another hash-reference, this time with the 1310the value, you guessed it, is another hash-reference, this time with the
1039key-value pairs from the cookie, except for C<expires> and C<max-age>, 1311key-value pairs from the cookie, except for C<expires> and C<max-age>,
1040which have been replaced by a C<_expires> key that contains the cookie 1312which have been replaced by a C<_expires> key that contains the cookie
1041expiry timestamp. 1313expiry timestamp. Session cookies are indicated by not having an
1314C<_expires> key.
1042 1315
1043Here is an example of a cookie jar with a single cookie, so you have a 1316Here is an example of a cookie jar with a single cookie, so you have a
1044chance of understanding the above paragraph: 1317chance of understanding the above paragraph:
1045 1318
1046 { 1319 {
1047 version => 1, 1320 version => 2,
1048 "10.0.0.1" => { 1321 "10.0.0.1" => {
1049 "/" => { 1322 "/" => {
1050 "mythweb_id" => { 1323 "mythweb_id" => {
1051 _expires => 1293917923, 1324 _expires => 1293917923,
1052 value => "ooRung9dThee3ooyXooM1Ohm", 1325 value => "ooRung9dThee3ooyXooM1Ohm",
1068 1341
1069=item $AnyEvent::HTTP::MAX_RECURSE 1342=item $AnyEvent::HTTP::MAX_RECURSE
1070 1343
1071The default value for the C<recurse> request parameter (default: C<10>). 1344The default value for the C<recurse> request parameter (default: C<10>).
1072 1345
1346=item $AnyEvent::HTTP::TIMEOUT
1347
1348The default timeout for connection operations (default: C<300>).
1349
1073=item $AnyEvent::HTTP::USERAGENT 1350=item $AnyEvent::HTTP::USERAGENT
1074 1351
1075The default value for the C<User-Agent> header (the default is 1352The default value for the C<User-Agent> header (the default is
1076C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1353C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1077 1354
1078=item $AnyEvent::HTTP::MAX_PER_HOST 1355=item $AnyEvent::HTTP::MAX_PER_HOST
1079 1356
1080The maximum number of concurrent connections to the same host (identified 1357The maximum number of concurrent connections to the same host (identified
1081by the hostname). If the limit is exceeded, then the additional requests 1358by the hostname). If the limit is exceeded, then additional requests
1082are queued until previous connections are closed. 1359are queued until previous connections are closed. Both persistent and
1360non-persistent connections are counted in this limit.
1083 1361
1084The default value for this is C<4>, and it is highly advisable to not 1362The default value for this is C<4>, and it is highly advisable to not
1085increase it. 1363increase it much.
1364
1365For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1366connections, older browsers used 2, newer ones (such as firefox 3)
1367typically use 6, and Opera uses 8 because like, they have the fastest
1368browser and give a shit for everybody else on the planet.
1369
1370=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1371
1372The time after which idle persistent connections get closed by
1373AnyEvent::HTTP (default: C<3>).
1086 1374
1087=item $AnyEvent::HTTP::ACTIVE 1375=item $AnyEvent::HTTP::ACTIVE
1088 1376
1089The number of active connections. This is not the number of currently 1377The number of active connections. This is not the number of currently
1090running requests, but the number of currently open and non-idle TCP 1378running requests, but the number of currently open and non-idle TCP
1091connections. This number of can be useful for load-leveling. 1379connections. This number can be useful for load-leveling.
1092 1380
1093=back 1381=back
1094 1382
1095=cut 1383=cut
1096 1384
1130 # other formats fail in the loop below 1418 # other formats fail in the loop below
1131 1419
1132 for (0..11) { 1420 for (0..11) {
1133 if ($m eq $month[$_]) { 1421 if ($m eq $month[$_]) {
1134 require Time::Local; 1422 require Time::Local;
1135 return Time::Local::timegm ($S, $M, $H, $d, $_, $y); 1423 return eval { Time::Local::timegm ($S, $M, $H, $d, $_, $y) };
1136 } 1424 }
1137 } 1425 }
1138 1426
1139 undef 1427 undef
1140} 1428}
1141 1429
1142sub set_proxy($) { 1430sub set_proxy($) {
1143 if (length $_[0]) { 1431 if (length $_[0]) {
1144 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix 1432 $_[0] =~ m%^(http):// ([^:/]+) (?: : (\d*) )?%ix
1145 or Carp::croak "$_[0]: invalid proxy URL"; 1433 or Carp::croak "$_[0]: invalid proxy URL";
1146 $PROXY = [$2, $3 || 3128, $1] 1434 $PROXY = [$2, $3 || 3128, $1]
1147 } else { 1435 } else {
1148 undef $PROXY; 1436 undef $PROXY;
1149 } 1437 }
1152# initialise proxy from environment 1440# initialise proxy from environment
1153eval { 1441eval {
1154 set_proxy $ENV{http_proxy}; 1442 set_proxy $ENV{http_proxy};
1155}; 1443};
1156 1444
1445=head2 SHOWCASE
1446
1447This section contains some more elaborate "real-world" examples or code
1448snippets.
1449
1450=head2 HTTP/1.1 FILE DOWNLOAD
1451
1452Downloading files with HTTP can be quite tricky, especially when something
1453goes wrong and you want to resume.
1454
1455Here is a function that initiates and resumes a download. It uses the
1456last modified time to check for file content changes, and works with many
1457HTTP/1.0 servers as well, and usually falls back to a complete re-download
1458on older servers.
1459
1460It calls the completion callback with either C<undef>, which means a
1461nonretryable error occurred, C<0> when the download was partial and should
1462be retried, and C<1> if it was successful.
1463
1464 use AnyEvent::HTTP;
1465
1466 sub download($$$) {
1467 my ($url, $file, $cb) = @_;
1468
1469 open my $fh, "+<", $file
1470 or die "$file: $!";
1471
1472 my %hdr;
1473 my $ofs = 0;
1474
1475 if (stat $fh and -s _) {
1476 $ofs = -s _;
1477 warn "-s is ", $ofs;
1478 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1479 $hdr{"range"} = "bytes=$ofs-";
1480 }
1481
1482 http_get $url,
1483 headers => \%hdr,
1484 on_header => sub {
1485 my ($hdr) = @_;
1486
1487 if ($hdr->{Status} == 200 && $ofs) {
1488 # resume failed
1489 truncate $fh, $ofs = 0;
1490 }
1491
1492 sysseek $fh, $ofs, 0;
1493
1494 1
1495 },
1496 on_body => sub {
1497 my ($data, $hdr) = @_;
1498
1499 if ($hdr->{Status} =~ /^2/) {
1500 length $data == syswrite $fh, $data
1501 or return; # abort on write errors
1502 }
1503
1504 1
1505 },
1506 sub {
1507 my (undef, $hdr) = @_;
1508
1509 my $status = $hdr->{Status};
1510
1511 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1512 utime $time, $time, $fh;
1513 }
1514
1515 if ($status == 200 || $status == 206 || $status == 416) {
1516 # download ok || resume ok || file already fully downloaded
1517 $cb->(1, $hdr);
1518
1519 } elsif ($status == 412) {
1520 # file has changed while resuming, delete and retry
1521 unlink $file;
1522 $cb->(0, $hdr);
1523
1524 } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
1525 # retry later
1526 $cb->(0, $hdr);
1527
1528 } else {
1529 $cb->(undef, $hdr);
1530 }
1531 }
1532 ;
1533 }
1534
1535 download "http://server/somelargefile", "/tmp/somelargefile", sub {
1536 if ($_[0]) {
1537 print "OK!\n";
1538 } elsif (defined $_[0]) {
1539 print "please retry later\n";
1540 } else {
1541 print "ERROR\n";
1542 }
1543 };
1544
1157=head2 SOCKS PROXIES 1545=head3 SOCKS PROXIES
1158 1546
1159Socks proxies are not directly supported by AnyEvent::HTTP. You can 1547Socks proxies are not directly supported by AnyEvent::HTTP. You can
1160compile your perl to support socks, or use an external program such as 1548compile your perl to support socks, or use an external program such as
1161F<socksify> (dante) or F<tsocks> to make your program use a socks proxy 1549F<socksify> (dante) or F<tsocks> to make your program use a socks proxy
1162transparently. 1550transparently.

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines