ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.89 by root, Mon Jan 3 00:23:25 2011 UTC vs.
Revision 1.95 by root, Wed Jan 12 03:30:05 2011 UTC

15This module is an L<AnyEvent> user, you need to make sure that you use and 15This module is an L<AnyEvent> user, you need to make sure that you use and
16run a supported event loop. 16run a supported event loop.
17 17
18This module implements a simple, stateless and non-blocking HTTP 18This module implements a simple, stateless and non-blocking HTTP
19client. It supports GET, POST and other request methods, cookies and more, 19client. It supports GET, POST and other request methods, cookies and more,
20all on a very low level. It can follow redirects supports proxies and 20all on a very low level. It can follow redirects, supports proxies, and
21automatically limits the number of connections to the values specified in 21automatically limits the number of connections to the values specified in
22the RFC. 22the RFC.
23 23
24It should generally be a "good client" that is enough for most HTTP 24It should generally be a "good client" that is enough for most HTTP
25tasks. Simple tasks should be simple, but complex tasks should still be 25tasks. Simple tasks should be simple, but complex tasks should still be
46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '1.5'; 51our $VERSION = '2.02';
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
57our $MAX_PERSISTENT = 8;
58our $PERSISTENT_TIMEOUT = 2; 57our $PERSISTENT_TIMEOUT = 3;
59our $TIMEOUT = 300; 58our $TIMEOUT = 300;
60 59our $MAX_PER_HOST = 4; # changing this is evil
61# changing these is evil
62our $MAX_PERSISTENT_PER_HOST = 2;
63our $MAX_PER_HOST = 4;
64 60
65our $PROXY; 61our $PROXY;
66our $ACTIVE = 0; 62our $ACTIVE = 0;
67 63
68my %KA_COUNT; # number of open keep-alive connections per host 64my %KA_CACHE; # indexed by uhost currently, points to [$handle...] array
69my %CO_SLOT; # number of open connections, and wait queue, per host 65my %CO_SLOT; # number of open connections, and wait queue, per host
70 66
71=item http_get $url, key => value..., $cb->($data, $headers) 67=item http_get $url, key => value..., $cb->($data, $headers)
72 68
73Executes an HTTP-GET request. See the http_request function for details on 69Executes an HTTP-GET request. See the http_request function for details on
169C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
170will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:> 166will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
171(this can be suppressed by using C<undef> for these headers in which case 167(this can be suppressed by using C<undef> for these headers in which case
172they won't be sent at all). 168they won't be sent at all).
173 169
170You really should provide your own C<User-Agent:> header value that is
171appropriate for your program - I wouldn't be surprised if the default
172AnyEvent string gets blocked by webservers sooner or later.
173
174=item timeout => $seconds 174=item timeout => $seconds
175 175
176The time-out to use for various stages - each connect attempt will reset 176The time-out to use for various stages - each connect attempt will reset
177the timeout, as will read or write activity, i.e. this is not an overall 177the timeout, as will read or write activity, i.e. this is not an overall
178timeout. 178timeout.
182=item proxy => [$host, $port[, $scheme]] or undef 182=item proxy => [$host, $port[, $scheme]] or undef
183 183
184Use the given http proxy for all requests. If not specified, then the 184Use the given http proxy for all requests. If not specified, then the
185default proxy (as specified by C<$ENV{http_proxy}>) is used. 185default proxy (as specified by C<$ENV{http_proxy}>) is used.
186 186
187C<$scheme> must be either missing, C<http> for HTTP or C<https> for 187C<$scheme> must be either missing or must be C<http> for HTTP.
188HTTPS.
189 188
190=item body => $string 189=item body => $string
191 190
192The request body, usually empty. Will be sent as-is (future versions of 191The request body, usually empty. Will be sent as-is (future versions of
193this module might offer more options). 192this module might offer more options).
223verification, highest compatibility) and high-security (CA and common-name 222verification, highest compatibility) and high-security (CA and common-name
224verification) TLS context. 223verification) TLS context.
225 224
226The default for this option is C<low>, which could be interpreted as "give 225The default for this option is C<low>, which could be interpreted as "give
227me the page, no matter what". 226me the page, no matter what".
227
228See also the C<sessionid> parameter.
229
230=item session => $string
231
232The module might reuse connections to the same host internally. Sometimes
233(e.g. when using TLS), you do not want to reuse connections from other
234sessions. This can be achieved by setting this parameter to some unique
235ID (such as the address of an object storing your state data, or the TLS
236context) - only connections using the same unique ID will be reused.
228 237
229=item on_prepare => $callback->($fh) 238=item on_prepare => $callback->($fh)
230 239
231In rare cases you need to "tune" the socket before it is used to 240In rare cases you need to "tune" the socket before it is used to
232connect (for exmaple, to bind it on a given IP address). This parameter 241connect (for exmaple, to bind it on a given IP address). This parameter
302called. Instead of the C<$body> argument containing the body data, the 311called. Instead of the C<$body> argument containing the body data, the
303callback will receive the L<AnyEvent::Handle> object associated with the 312callback will receive the L<AnyEvent::Handle> object associated with the
304connection. In error cases, C<undef> will be passed. When there is no body 313connection. In error cases, C<undef> will be passed. When there is no body
305(e.g. status C<304>), the empty string will be passed. 314(e.g. status C<304>), the empty string will be passed.
306 315
307The handle object might or might not be in TLS mode, might be connected to 316The handle object might or might not be in TLS mode, might be connected
308a proxy, be a persistent connection etc., and configured in unspecified 317to a proxy, be a persistent connection, use chunked transfer encoding
309ways. The user is responsible for this handle (it will not be used by this 318etc., and configured in unspecified ways. The user is responsible for this
310module anymore). 319handle (it will not be used by this module anymore).
311 320
312This is useful with some push-type services, where, after the initial 321This is useful with some push-type services, where, after the initial
313headers, an interactive protocol is used (typical example would be the 322headers, an interactive protocol is used (typical example would be the
314push-style twitter API which starts a JSON/XML stream). 323push-style twitter API which starts a JSON/XML stream).
315 324
316If you think you need this, first have a look at C<on_body>, to see if 325If you think you need this, first have a look at C<on_body>, to see if
317that doesn't solve your problem in a better way. 326that doesn't solve your problem in a better way.
327
328=item persistent => $boolean
329
330Try to create/reuse a persistent connection. When this flag is set
331(default: true for idempotent requests, false for all others), then
332C<http_request> tries to re-use an existing (previously-created)
333persistent connection to the host and, failing that, tries to create a new
334one.
335
336Requests failing in certain ways will be automatically retried once, which
337is dangerous for non-idempotent requests, which is why it defaults to off
338for them. The reason for this is because the bozos who designed HTTP/1.1
339made it impossible to distinguish between a fatal error and a normal
340connection timeout, so you never know whether there was a problem with
341your request or not.
342
343When reusing an existent connection, many parameters (such as TLS context)
344will be ignored. See the C<session> parameter for a workaround.
345
346=item keepalive => $boolean
347
348Only used when C<persistent> is also true. This parameter decides whether
349C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection
350(as opposed to only a HTTP/1.1 persistent connection).
351
352The default is true, except when using a proxy, in which case it defaults
353to false, as HTTP/1.0 proxies cannot support this in a meaningful way.
354
355=item handle_params => { key => value ... }
356
357The key-value pairs in this hash will be passed to any L<AnyEvent::Handle>
358constructor that is called - not all requests will create a handle, and
359sometimes more than one is created, so this parameter is only good for
360setting hints.
361
362Example: set the maximum read size to 4096, to potentially conserve memory
363at the cost of speed.
364
365 handle_params => {
366 max_read_size => 4096,
367 },
318 368
319=back 369=back
320 370
321Example: do a simple HTTP GET request for http://www.nethype.de/ and print 371Example: do a simple HTTP GET request for http://www.nethype.de/ and print
322the response body. 372the response body.
329Example: do a HTTP HEAD request on https://www.google.com/, use a 379Example: do a HTTP HEAD request on https://www.google.com/, use a
330timeout of 30 seconds. 380timeout of 30 seconds.
331 381
332 http_request 382 http_request
333 GET => "https://www.google.com", 383 GET => "https://www.google.com",
384 headers => { "user-agent" => "MySearchClient 1.0" },
334 timeout => 30, 385 timeout => 30,
335 sub { 386 sub {
336 my ($body, $hdr) = @_; 387 my ($body, $hdr) = @_;
337 use Data::Dumper; 388 use Data::Dumper;
338 print Dumper $hdr; 389 print Dumper $hdr;
348 }; 399 };
349 400
350 undef $request; 401 undef $request;
351 402
352=cut 403=cut
404
405#############################################################################
406# wait queue/slots
353 407
354sub _slot_schedule; 408sub _slot_schedule;
355sub _slot_schedule($) { 409sub _slot_schedule($) {
356 my $host = shift; 410 my $host = shift;
357 411
380 434
381 _slot_schedule $_[0]; 435 _slot_schedule $_[0];
382} 436}
383 437
384############################################################################# 438#############################################################################
439# cookie handling
385 440
386# expire cookies 441# expire cookies
387sub cookie_jar_expire($;$) { 442sub cookie_jar_expire($;$) {
388 my ($jar, $session_end) = @_; 443 my ($jar, $session_end) = @_;
389 444
413 } 468 }
414} 469}
415 470
416# extract cookies from jar 471# extract cookies from jar
417sub cookie_jar_extract($$$$) { 472sub cookie_jar_extract($$$$) {
418 my ($jar, $uscheme, $uhost, $upath) = @_; 473 my ($jar, $scheme, $host, $path) = @_;
419 474
420 %$jar = () if $jar->{version} != 1; 475 %$jar = () if $jar->{version} != 1;
421 476
422 my @cookies; 477 my @cookies;
423 478
424 while (my ($chost, $paths) = each %$jar) { 479 while (my ($chost, $paths) = each %$jar) {
425 next unless ref $paths; 480 next unless ref $paths;
426 481
427 if ($chost =~ /^\./) { 482 if ($chost =~ /^\./) {
428 next unless $chost eq substr $uhost, -length $chost; 483 next unless $chost eq substr $host, -length $chost;
429 } elsif ($chost =~ /\./) { 484 } elsif ($chost =~ /\./) {
430 next unless $chost eq $uhost; 485 next unless $chost eq $host;
431 } else { 486 } else {
432 next; 487 next;
433 } 488 }
434 489
435 while (my ($cpath, $cookies) = each %$paths) { 490 while (my ($cpath, $cookies) = each %$paths) {
436 next unless $cpath eq substr $upath, 0, length $cpath; 491 next unless $cpath eq substr $path, 0, length $cpath;
437 492
438 while (my ($cookie, $kv) = each %$cookies) { 493 while (my ($cookie, $kv) = each %$cookies) {
439 next if $uscheme ne "https" && exists $kv->{secure}; 494 next if $scheme ne "https" && exists $kv->{secure};
440 495
441 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) { 496 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) {
442 delete $cookies->{$cookie}; 497 delete $cookies->{$cookie};
443 next; 498 next;
444 } 499 }
458 \@cookies 513 \@cookies
459} 514}
460 515
461# parse set_cookie header into jar 516# parse set_cookie header into jar
462sub cookie_jar_set_cookie($$$$) { 517sub cookie_jar_set_cookie($$$$) {
463 my ($jar, $set_cookie, $uhost, $date) = @_; 518 my ($jar, $set_cookie, $host, $date) = @_;
464 519
465 my $anow = int AE::now; 520 my $anow = int AE::now;
466 my $snow; # server-now 521 my $snow; # server-now
467 522
468 for ($set_cookie) { 523 for ($set_cookie) {
491 # quoted 546 # quoted
492 $value = $3; 547 $value = $3;
493 $value =~ s/\\(.)/$1/gs; 548 $value =~ s/\\(.)/$1/gs;
494 } 549 }
495 550
496 push @kv, lc $name, $value; 551 push @kv, @kv ? lc $name : $name, $value;
497 552
498 last unless /\G\s*;/gc; 553 last unless /\G\s*;/gc;
499 } 554 }
500 555
501 last unless @kv; 556 last unless @kv;
524 579
525 # this is not rfc-like and not netscape-like. go figure. 580 # this is not rfc-like and not netscape-like. go figure.
526 my $ndots = $cdom =~ y/.//; 581 my $ndots = $cdom =~ y/.//;
527 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 582 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
528 } else { 583 } else {
529 $cdom = $uhost; 584 $cdom = $host;
530 } 585 }
531 586
532 # store it 587 # store it
533 $jar->{version} = 1; 588 $jar->{version} = 1;
534 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 589 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
535 590
536 redo if /\G\s*,/gc; 591 redo if /\G\s*,/gc;
537 } 592 }
538} 593}
539 594
595#############################################################################
596# keepalive/persistent connection cache
597
598# fetch a connection from the keepalive cache
599sub ka_fetch($) {
600 my $ka_key = shift;
601
602 my $hdl = pop @{ $KA_CACHE{$ka_key} }; # currently we reuse the MOST RECENTLY USED connection
603 delete $KA_CACHE{$ka_key}
604 unless @{ $KA_CACHE{$ka_key} };
605
606 $hdl
607}
608
609sub ka_store($$) {
610 my ($ka_key, $hdl) = @_;
611
612 my $kaa = $KA_CACHE{$ka_key} ||= [];
613
614 my $destroy = sub {
615 my @ka = grep $_ != $hdl, @{ $KA_CACHE{$ka_key} };
616
617 $hdl->destroy;
618
619 @ka
620 ? $KA_CACHE{$ka_key} = \@ka
621 : delete $KA_CACHE{$ka_key};
622 };
623
624 # on error etc., destroy
625 $hdl->on_error ($destroy);
626 $hdl->on_eof ($destroy);
627 $hdl->on_read ($destroy);
628 $hdl->timeout ($PERSISTENT_TIMEOUT);
629
630 push @$kaa, $hdl;
631 shift @$kaa while @$kaa > $MAX_PER_HOST;
632}
633
634#############################################################################
635# utilities
636
540# continue to parse $_ for headers and place them into the arg 637# continue to parse $_ for headers and place them into the arg
541sub parse_hdr() { 638sub _parse_hdr() {
542 my %hdr; 639 my %hdr;
543 640
544 # things seen, not parsed: 641 # things seen, not parsed:
545 # p3pP="NON CUR OTPi OUR NOR UNI" 642 # p3pP="NON CUR OTPi OUR NOR UNI"
546 643
560 for values %hdr; 657 for values %hdr;
561 658
562 \%hdr 659 \%hdr
563} 660}
564 661
662#############################################################################
663# http_get
664
565our $qr_nlnl = qr{(?<![^\012])\015?\012}; 665our $qr_nlnl = qr{(?<![^\012])\015?\012};
566 666
567our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 667our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
568our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 668our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
669
670# maybe it should just become a normal object :/
671
672sub _destroy_state(\%) {
673 my ($state) = @_;
674
675 $state->{handle}->destroy if $state->{handle};
676 %$state = ();
677}
678
679sub _error(\%$$) {
680 my ($state, $cb, $hdr) = @_;
681
682 &_destroy_state ($state);
683
684 $cb->(undef, $hdr);
685 ()
686}
569 687
570sub http_request($$@) { 688sub http_request($$@) {
571 my $cb = pop; 689 my $cb = pop;
572 my ($method, $url, %arg) = @_; 690 my ($method, $url, %arg) = @_;
573 691
594 if $recurse < 0; 712 if $recurse < 0;
595 713
596 my $proxy = $arg{proxy} || $PROXY; 714 my $proxy = $arg{proxy} || $PROXY;
597 my $timeout = $arg{timeout} || $TIMEOUT; 715 my $timeout = $arg{timeout} || $TIMEOUT;
598 716
599 my ($uscheme, $uauthority, $upath, $query, $fragment) = 717 my ($uscheme, $uauthority, $upath, $query, undef) = # ignore fragment
600 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|; 718 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|;
601 719
602 $uscheme = lc $uscheme; 720 $uscheme = lc $uscheme;
603 721
604 my $uport = $uscheme eq "http" ? 80 722 my $uport = $uscheme eq "http" ? 80
652 if length $arg{body} || $method ne "GET"; 770 if length $arg{body} || $method ne "GET";
653 771
654 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 772 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/;
655 773
656 # default value for keepalive is true iff the request is for an idempotent method 774 # default value for keepalive is true iff the request is for an idempotent method
657 my $keepalive = exists $arg{keepalive} 775 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : $idempotent;
658 ? $arg{keepalive}*1 776 my $keepalive10 = exists $arg{keepalive10} ? $arg{keepalive10} : !$proxy;
659 : $idempotent ? $PERSISTENT_TIMEOUT : 0; 777 my $keptalive; # true if this is actually a recycled connection
660 778
779 # the key to use in the keepalive cache
780 my $ka_key = "$uhost\x00$arg{sessionid}";
781
661 $hdr{connection} = ($keepalive ? "" : "close ") . "Te"; #1.1 782 $hdr{connection} = ($keepalive ? $keepalive10 ? "keep-alive " : "" : "close ") . "Te"; #1.1
662 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 783 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
663 784
664 my %state = (connect_guard => 1); 785 my %state = (connect_guard => 1);
665 786
666 my $ae_error = 595; # connecting 787 my $ae_error = 595; # connecting
667 788
668 # handle actual, non-tunneled, request 789 # handle actual, non-tunneled, request
669 my $handle_actual_request = sub { 790 my $handle_actual_request = sub {
670 $ae_error = 596; # request phase 791 $ae_error = 596; # request phase
671 792
793 my $hdl = $state{handle};
794
672 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls}; 795 $hdl->starttls ("connect") if $uscheme eq "https" && !exists $hdl->{tls};
673 796
674 # send request 797 # send request
675 $state{handle}->push_write ( 798 $hdl->push_write (
676 "$method $rpath HTTP/1.1\015\012" 799 "$method $rpath HTTP/1.1\015\012"
677 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 800 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
678 . "\015\012" 801 . "\015\012"
679 . (delete $arg{body}) 802 . (delete $arg{body})
680 ); 803 );
685 # reduce memory usage, save a kitten, also re-use it for the response headers. 808 # reduce memory usage, save a kitten, also re-use it for the response headers.
686 %hdr = (); 809 %hdr = ();
687 810
688 # status line and headers 811 # status line and headers
689 $state{read_response} = sub { 812 $state{read_response} = sub {
813 return unless %state;
814
690 for ("$_[1]") { 815 for ("$_[1]") {
691 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 816 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
692 817
693 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci 818 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
694 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" })); 819 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid server response" };
695 820
696 # 100 Continue handling 821 # 100 Continue handling
697 # should not happen as we don't send expect: 100-continue, 822 # should not happen as we don't send expect: 100-continue,
698 # but we handle it just in case. 823 # but we handle it just in case.
699 # since we send the request body regardless, if we get an error 824 # since we send the request body regardless, if we get an error
705 HTTPVersion => $1, 830 HTTPVersion => $1,
706 Status => $2, 831 Status => $2,
707 Reason => $3, 832 Reason => $3,
708 ; 833 ;
709 834
710 my $hdr = parse_hdr 835 my $hdr = _parse_hdr
711 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" })); 836 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Garbled response headers" };
712 837
713 %hdr = (%$hdr, @pseudo); 838 %hdr = (%$hdr, @pseudo);
714 } 839 }
715 840
716 # redirect handling 841 # redirect handling
747 $redirect = 1; 872 $redirect = 1;
748 } 873 }
749 } 874 }
750 875
751 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive]) 876 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
752 my $may_keep_alive = $_[3]; 877 if ($state{handle}) {
753 878 # handle keepalive
879 if (
880 $keepalive
881 && $_[3]
882 && ($hdr{HTTPVersion} < 1.1
883 ? $hdr{connection} =~ /\bkeep-?alive\b/i
884 : $hdr{connection} !~ /\bclose\b/i)
885 ) {
886 ka_store $ka_key, delete $state{handle};
887 } else {
888 # no keepalive, destroy the handle
754 $state{handle}->destroy if $state{handle}; 889 $state{handle}->destroy;
890 }
891 }
892
755 %state = (); 893 %state = ();
756 894
757 if (defined $_[1]) { 895 if (defined $_[1]) {
758 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1]; 896 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
759 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2]; 897 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
771 http_request ( 909 http_request (
772 $method => $hdr{location}, 910 $method => $hdr{location},
773 %arg, 911 %arg,
774 recurse => $recurse - 1, 912 recurse => $recurse - 1,
775 Redirect => [$_[0], \%hdr], 913 Redirect => [$_[0], \%hdr],
776 $cb); 914 $cb
915 );
777 } else { 916 } else {
778 $cb->($_[0], \%hdr); 917 $cb->($_[0], \%hdr);
779 } 918 }
780 }; 919 };
781 920
782 $ae_error = 597; # body phase 921 $ae_error = 597; # body phase
783 922
923 my $chunked = $hdr{"transfer-encoding"} =~ /\bchunked\b/i; # not quite correct...
924
784 my $len = $hdr{"content-length"}; 925 my $len = $chunked ? undef : $hdr{"content-length"};
785 926
786 # body handling, many different code paths 927 # body handling, many different code paths
787 # - no body expected 928 # - no body expected
788 # - want_body_handle 929 # - want_body_handle
789 # - te chunked 930 # - te chunked
804 $_[0]->on_error (undef); 945 $_[0]->on_error (undef);
805 $_[0]->on_read (undef); 946 $_[0]->on_read (undef);
806 947
807 $finish->(delete $state{handle}); 948 $finish->(delete $state{handle});
808 949
809 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) { 950 } elsif ($chunked) {
810 my $cl = 0; 951 my $cl = 0;
811 my $body = undef; 952 my $body = "";
812 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 953 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
813 954
814 $state{read_chunk} = sub { 955 $state{read_chunk} = sub {
815 $_[1] =~ /^([0-9a-fA-F]+)/ 956 $_[1] =~ /^([0-9a-fA-F]+)/
816 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 957 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
836 $_[0]->push_read (line => $qr_nlnl, sub { 977 $_[0]->push_read (line => $qr_nlnl, sub {
837 if (length $_[1]) { 978 if (length $_[1]) {
838 for ("$_[1]") { 979 for ("$_[1]") {
839 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 980 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
840 981
841 my $hdr = parse_hdr 982 my $hdr = _parse_hdr
842 or return $finish->(undef, $ae_error => "Garbled response trailers"); 983 or return $finish->(undef, $ae_error => "Garbled response trailers");
843 984
844 %hdr = (%hdr, %$hdr); 985 %hdr = (%hdr, %$hdr);
845 } 986 }
846 } 987 }
889 $_[0]->on_read (sub { }); 1030 $_[0]->on_read (sub { });
890 } 1031 }
891 } 1032 }
892 }; 1033 };
893 1034
1035 # if keepalive is enabled, then the server closing the connection
1036 # before a response can happen legally - we retry on idempotent methods.
1037 if ($keptalive && $idempotent) {
1038 my $old_eof = $hdl->{on_eof};
1039 $hdl->{on_eof} = sub {
1040 _destroy_state %state;
1041
1042 http_request (
1043 $method => $url,
1044 %arg,
1045 keepalive => 0,
1046 $cb
1047 );
1048 };
1049 $hdl->on_read (sub {
1050 return unless %state;
1051
1052 # as soon as we receive something, a connection close
1053 # once more becomes a hard error
1054 $hdl->{on_eof} = $old_eof;
1055 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1056 });
1057 } else {
894 $state{handle}->push_read (line => $qr_nlnl, $state{read_response}); 1058 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1059 }
895 }; 1060 };
896 1061
1062 my $prepare_handle = sub {
1063 my ($hdl) = $state{handle};
1064
1065 $hdl->timeout ($timeout);
1066 $hdl->on_error (sub {
1067 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] };
1068 });
1069 $hdl->on_eof (sub {
1070 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" };
1071 });
1072 };
1073
1074 # connected to proxy (or origin server)
897 my $connect_cb = sub { 1075 my $connect_cb = sub {
898 $state{fh} = shift 1076 my $fh = shift
899 or do {
900 my $err = "$!";
901 %state = ();
902 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err }); 1077 or return _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "$!" };
903 };
904 1078
905 return unless delete $state{connect_guard}; 1079 return unless delete $state{connect_guard};
906 1080
907 # get handle 1081 # get handle
908 $state{handle} = new AnyEvent::Handle 1082 $state{handle} = new AnyEvent::Handle
1083 %{ $arg{handle_params} },
909 fh => $state{fh}, 1084 fh => $fh,
910 peername => $rhost, 1085 peername => $uhost,
911 tls_ctx => $arg{tls_ctx}, 1086 tls_ctx => $arg{tls_ctx},
912 # these need to be reconfigured on keepalive handles
913 timeout => $timeout,
914 on_error => sub {
915 %state = ();
916 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
917 },
918 on_eof => sub {
919 %state = ();
920 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
921 },
922 ; 1087 ;
923 1088
924 # limit the number of persistent connections 1089 $prepare_handle->();
925 # keepalive not yet supported
926# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
927# ++$KA_COUNT{$_[1]};
928# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
929# --$KA_COUNT{$_[1]}
930# };
931# $hdr{connection} = "keep-alive";
932# }
933 1090
934 $state{handle}->starttls ("connect") if $rscheme eq "https"; 1091 #$state{handle}->starttls ("connect") if $rscheme eq "https";
935 1092
936 # now handle proxy-CONNECT method 1093 # now handle proxy-CONNECT method
937 if ($proxy && $uscheme eq "https") { 1094 if ($proxy && $uscheme eq "https") {
938 # oh dear, we have to wrap it into a connect request 1095 # oh dear, we have to wrap it into a connect request
939 1096
940 # maybe re-use $uauthority with patched port? 1097 # maybe re-use $uauthority with patched port?
941 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012"); 1098 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012");
942 $state{handle}->push_read (line => $qr_nlnl, sub { 1099 $state{handle}->push_read (line => $qr_nlnl, sub {
943 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 1100 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
944 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" })); 1101 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" };
945 1102
946 if ($2 == 200) { 1103 if ($2 == 200) {
947 $rpath = $upath; 1104 $rpath = $upath;
948 $handle_actual_request->(); 1105 $handle_actual_request->();
949 } else { 1106 } else {
950 %state = ();
951 $cb->(undef, { @pseudo, Status => $2, Reason => $3 }); 1107 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 };
952 } 1108 }
953 }); 1109 });
954 } else { 1110 } else {
955 $handle_actual_request->(); 1111 $handle_actual_request->();
956 } 1112 }
959 _get_slot $uhost, sub { 1115 _get_slot $uhost, sub {
960 $state{slot_guard} = shift; 1116 $state{slot_guard} = shift;
961 1117
962 return unless $state{connect_guard}; 1118 return unless $state{connect_guard};
963 1119
1120 # try to use an existing keepalive connection, but only if we, ourselves, plan
1121 # on a keepalive request (in theory, this should be a separate config option).
1122 if ($keepalive && $KA_CACHE{$ka_key}) {
1123 $keptalive = 1;
1124 $state{handle} = ka_fetch $ka_key;
1125 $prepare_handle->();
1126 $handle_actual_request->();
1127
1128 } else {
964 my $tcp_connect = $arg{tcp_connect} 1129 my $tcp_connect = $arg{tcp_connect}
965 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1130 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
966 1131
967 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout }); 1132 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
1133 }
968 }; 1134 };
969 1135
970 defined wantarray && AnyEvent::Util::guard { %state = () } 1136 defined wantarray && AnyEvent::Util::guard { _destroy_state %state }
971} 1137}
972 1138
973sub http_get($@) { 1139sub http_get($@) {
974 unshift @_, "GET"; 1140 unshift @_, "GET";
975 &http_request 1141 &http_request
993AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for 1159AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for
994the actual connection, which in turn uses AnyEvent::DNS to resolve 1160the actual connection, which in turn uses AnyEvent::DNS to resolve
995hostnames. The latter is a simple stub resolver and does no caching 1161hostnames. The latter is a simple stub resolver and does no caching
996on its own. If you want DNS caching, you currently have to provide 1162on its own. If you want DNS caching, you currently have to provide
997your own default resolver (by storing a suitable resolver object in 1163your own default resolver (by storing a suitable resolver object in
998C<$AnyEvent::DNS::RESOLVER>). 1164C<$AnyEvent::DNS::RESOLVER>) or your own C<tcp_connect> callback.
999 1165
1000=head2 GLOBAL FUNCTIONS AND VARIABLES 1166=head2 GLOBAL FUNCTIONS AND VARIABLES
1001 1167
1002=over 4 1168=over 4
1003 1169
1004=item AnyEvent::HTTP::set_proxy "proxy-url" 1170=item AnyEvent::HTTP::set_proxy "proxy-url"
1005 1171
1006Sets the default proxy server to use. The proxy-url must begin with a 1172Sets the default proxy server to use. The proxy-url must begin with a
1007string of the form C<http://host:port> (optionally C<https:...>), croaks 1173string of the form C<http://host:port>, croaks otherwise.
1008otherwise.
1009 1174
1010To clear an already-set proxy, use C<undef>. 1175To clear an already-set proxy, use C<undef>.
1011 1176
1012=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1177=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1013 1178
1061 1226
1062=item $AnyEvent::HTTP::MAX_RECURSE 1227=item $AnyEvent::HTTP::MAX_RECURSE
1063 1228
1064The default value for the C<recurse> request parameter (default: C<10>). 1229The default value for the C<recurse> request parameter (default: C<10>).
1065 1230
1231=item $AnyEvent::HTTP::TIMEOUT
1232
1233The default timeout for conenction operations (default: C<300>).
1234
1066=item $AnyEvent::HTTP::USERAGENT 1235=item $AnyEvent::HTTP::USERAGENT
1067 1236
1068The default value for the C<User-Agent> header (the default is 1237The default value for the C<User-Agent> header (the default is
1069C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1238C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1070 1239
1071=item $AnyEvent::HTTP::MAX_PER_HOST 1240=item $AnyEvent::HTTP::MAX_PER_HOST
1072 1241
1073The maximum number of concurrent connections to the same host (identified 1242The maximum number of concurrent connections to the same host (identified
1074by the hostname). If the limit is exceeded, then the additional requests 1243by the hostname). If the limit is exceeded, then the additional requests
1075are queued until previous connections are closed. 1244are queued until previous connections are closed. Both persistent and
1245non-persistent connections are counted in this limit.
1076 1246
1077The default value for this is C<4>, and it is highly advisable to not 1247The default value for this is C<4>, and it is highly advisable to not
1078increase it. 1248increase it much.
1249
1250For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1251connections, older browsers used 2, newers (such as firefox 3) typically
1252use 6, and Opera uses 8 because like, they have the fastest browser and
1253give a shit for everybody else on the planet.
1254
1255=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1256
1257The time after which idle persistent conenctions get closed by
1258AnyEvent::HTTP (default: C<3>).
1079 1259
1080=item $AnyEvent::HTTP::ACTIVE 1260=item $AnyEvent::HTTP::ACTIVE
1081 1261
1082The number of active connections. This is not the number of currently 1262The number of active connections. This is not the number of currently
1083running requests, but the number of currently open and non-idle TCP 1263running requests, but the number of currently open and non-idle TCP
1084connections. This number of can be useful for load-leveling. 1264connections. This number can be useful for load-leveling.
1085 1265
1086=back 1266=back
1087 1267
1088=cut 1268=cut
1089 1269
1132 undef 1312 undef
1133} 1313}
1134 1314
1135sub set_proxy($) { 1315sub set_proxy($) {
1136 if (length $_[0]) { 1316 if (length $_[0]) {
1137 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix 1317 $_[0] =~ m%^(http):// ([^:/]+) (?: : (\d*) )?%ix
1138 or Carp::croak "$_[0]: invalid proxy URL"; 1318 or Carp::croak "$_[0]: invalid proxy URL";
1139 $PROXY = [$2, $3 || 3128, $1] 1319 $PROXY = [$2, $3 || 3128, $1]
1140 } else { 1320 } else {
1141 undef $PROXY; 1321 undef $PROXY;
1142 } 1322 }
1145# initialise proxy from environment 1325# initialise proxy from environment
1146eval { 1326eval {
1147 set_proxy $ENV{http_proxy}; 1327 set_proxy $ENV{http_proxy};
1148}; 1328};
1149 1329
1330=head2 SHOWCASE
1331
1332This section contaisn some more elaborate "real-world" examples or code
1333snippets.
1334
1335=head2 HTTP/1.1 FILE DOWNLOAD
1336
1337Downloading files with HTTP cna be quite tricky, especially when something
1338goes wrong and you want tor esume.
1339
1340Here is a function that initiates and resumes a download. It uses the
1341last modified time to check for file content changes, and works with many
1342HTTP/1.0 servers as well, and usually falls back to a complete re-download
1343on older servers.
1344
1345It calls the completion callback with either C<undef>, which means a
1346nonretryable error occured, C<0> when the download was partial and should
1347be retried, and C<1> if it was successful.
1348
1349 use AnyEvent::HTTP;
1350
1351 sub download($$$) {
1352 my ($url, $file, $cb) = @_;
1353
1354 open my $fh, "+<", $file
1355 or die "$file: $!";
1356
1357 my %hdr;
1358 my $ofs = 0;
1359
1360 warn stat $fh;
1361 warn -s _;
1362 if (stat $fh and -s _) {
1363 $ofs = -s _;
1364 warn "-s is ", $ofs;#d#
1365 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1366 $hdr{"range"} = "bytes=$ofs-";
1367 }
1368
1369 http_get $url,
1370 headers => \%hdr,
1371 on_header => sub {
1372 my ($hdr) = @_;
1373
1374 if ($hdr->{Status} == 200 && $ofs) {
1375 # resume failed
1376 truncate $fh, $ofs = 0;
1377 }
1378
1379 sysseek $fh, $ofs, 0;
1380
1381 1
1382 },
1383 on_body => sub {
1384 my ($data, $hdr) = @_;
1385
1386 if ($hdr->{Status} =~ /^2/) {
1387 length $data == syswrite $fh, $data
1388 or return; # abort on write errors
1389 }
1390
1391 1
1392 },
1393 sub {
1394 my (undef, $hdr) = @_;
1395
1396 my $status = $hdr->{Status};
1397
1398 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1399 utime $fh, $time, $time;
1400 }
1401
1402 if ($status == 200 || $status == 206 || $status == 416) {
1403 # download ok || resume ok || file already fully downloaded
1404 $cb->(1, $hdr);
1405
1406 } elsif ($status == 412) {
1407 # file has changed while resuming, delete and retry
1408 unlink $file;
1409 $cb->(0, $hdr);
1410
1411 } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
1412 # retry later
1413 $cb->(0, $hdr);
1414
1415 } else {
1416 $cb->(undef, $hdr);
1417 }
1418 }
1419 ;
1420 }
1421
1422 download "http://server/somelargefile", "/tmp/somelargefile", sub {
1423 if ($_[0]) {
1424 print "OK!\n";
1425 } elsif (defined $_[0]) {
1426 print "please retry later\n";
1427 } else {
1428 print "ERROR\n";
1429 }
1430 };
1431
1150=head2 SOCKS PROXIES 1432=head3 SOCKS PROXIES
1151 1433
1152Socks proxies are not directly supported by AnyEvent::HTTP. You can 1434Socks proxies are not directly supported by AnyEvent::HTTP. You can
1153compile your perl to support socks, or use an external program such as 1435compile your perl to support socks, or use an external program such as
1154F<socksify> (dante) or F<tsocks> to make your program use a socks proxy 1436F<socksify> (dante) or F<tsocks> to make your program use a socks proxy
1155transparently. 1437transparently.

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines