ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.88 by root, Sun Jan 2 20:57:03 2011 UTC vs.
Revision 1.97 by root, Tue Jan 18 17:50:20 2011 UTC

15This module is an L<AnyEvent> user, you need to make sure that you use and 15This module is an L<AnyEvent> user, you need to make sure that you use and
16run a supported event loop. 16run a supported event loop.
17 17
18This module implements a simple, stateless and non-blocking HTTP 18This module implements a simple, stateless and non-blocking HTTP
19client. It supports GET, POST and other request methods, cookies and more, 19client. It supports GET, POST and other request methods, cookies and more,
20all on a very low level. It can follow redirects supports proxies and 20all on a very low level. It can follow redirects, supports proxies, and
21automatically limits the number of connections to the values specified in 21automatically limits the number of connections to the values specified in
22the RFC. 22the RFC.
23 23
24It should generally be a "good client" that is enough for most HTTP 24It should generally be a "good client" that is enough for most HTTP
25tasks. Simple tasks should be simple, but complex tasks should still be 25tasks. Simple tasks should be simple, but complex tasks should still be
46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '1.5'; 51our $VERSION = '2.03';
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
57our $MAX_PERSISTENT = 8;
58our $PERSISTENT_TIMEOUT = 2; 57our $PERSISTENT_TIMEOUT = 3;
59our $TIMEOUT = 300; 58our $TIMEOUT = 300;
60 59our $MAX_PER_HOST = 4; # changing this is evil
61# changing these is evil
62our $MAX_PERSISTENT_PER_HOST = 2;
63our $MAX_PER_HOST = 4;
64 60
65our $PROXY; 61our $PROXY;
66our $ACTIVE = 0; 62our $ACTIVE = 0;
67 63
68my %KA_COUNT; # number of open keep-alive connections per host 64my %KA_CACHE; # indexed by uhost currently, points to [$handle...] array
69my %CO_SLOT; # number of open connections, and wait queue, per host 65my %CO_SLOT; # number of open connections, and wait queue, per host
70 66
71=item http_get $url, key => value..., $cb->($data, $headers) 67=item http_get $url, key => value..., $cb->($data, $headers)
72 68
73Executes an HTTP-GET request. See the http_request function for details on 69Executes an HTTP-GET request. See the http_request function for details on
169C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
170will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:> 166will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
171(this can be suppressed by using C<undef> for these headers in which case 167(this can be suppressed by using C<undef> for these headers in which case
172they won't be sent at all). 168they won't be sent at all).
173 169
170You really should provide your own C<User-Agent:> header value that is
171appropriate for your program - I wouldn't be surprised if the default
172AnyEvent string gets blocked by webservers sooner or later.
173
174=item timeout => $seconds 174=item timeout => $seconds
175 175
176The time-out to use for various stages - each connect attempt will reset 176The time-out to use for various stages - each connect attempt will reset
177the timeout, as will read or write activity, i.e. this is not an overall 177the timeout, as will read or write activity, i.e. this is not an overall
178timeout. 178timeout.
182=item proxy => [$host, $port[, $scheme]] or undef 182=item proxy => [$host, $port[, $scheme]] or undef
183 183
184Use the given http proxy for all requests. If not specified, then the 184Use the given http proxy for all requests. If not specified, then the
185default proxy (as specified by C<$ENV{http_proxy}>) is used. 185default proxy (as specified by C<$ENV{http_proxy}>) is used.
186 186
187C<$scheme> must be either missing, C<http> for HTTP or C<https> for 187C<$scheme> must be either missing or must be C<http> for HTTP.
188HTTPS.
189 188
190=item body => $string 189=item body => $string
191 190
192The request body, usually empty. Will be sent as-is (future versions of 191The request body, usually empty. Will be sent as-is (future versions of
193this module might offer more options). 192this module might offer more options).
223verification, highest compatibility) and high-security (CA and common-name 222verification, highest compatibility) and high-security (CA and common-name
224verification) TLS context. 223verification) TLS context.
225 224
226The default for this option is C<low>, which could be interpreted as "give 225The default for this option is C<low>, which could be interpreted as "give
227me the page, no matter what". 226me the page, no matter what".
227
228See also the C<sessionid> parameter.
229
230=item session => $string
231
232The module might reuse connections to the same host internally. Sometimes
233(e.g. when using TLS), you do not want to reuse connections from other
234sessions. This can be achieved by setting this parameter to some unique
235ID (such as the address of an object storing your state data, or the TLS
236context) - only connections using the same unique ID will be reused.
228 237
229=item on_prepare => $callback->($fh) 238=item on_prepare => $callback->($fh)
230 239
231In rare cases you need to "tune" the socket before it is used to 240In rare cases you need to "tune" the socket before it is used to
232connect (for exmaple, to bind it on a given IP address). This parameter 241connect (for exmaple, to bind it on a given IP address). This parameter
302called. Instead of the C<$body> argument containing the body data, the 311called. Instead of the C<$body> argument containing the body data, the
303callback will receive the L<AnyEvent::Handle> object associated with the 312callback will receive the L<AnyEvent::Handle> object associated with the
304connection. In error cases, C<undef> will be passed. When there is no body 313connection. In error cases, C<undef> will be passed. When there is no body
305(e.g. status C<304>), the empty string will be passed. 314(e.g. status C<304>), the empty string will be passed.
306 315
307The handle object might or might not be in TLS mode, might be connected to 316The handle object might or might not be in TLS mode, might be connected
308a proxy, be a persistent connection etc., and configured in unspecified 317to a proxy, be a persistent connection, use chunked transfer encoding
309ways. The user is responsible for this handle (it will not be used by this 318etc., and configured in unspecified ways. The user is responsible for this
310module anymore). 319handle (it will not be used by this module anymore).
311 320
312This is useful with some push-type services, where, after the initial 321This is useful with some push-type services, where, after the initial
313headers, an interactive protocol is used (typical example would be the 322headers, an interactive protocol is used (typical example would be the
314push-style twitter API which starts a JSON/XML stream). 323push-style twitter API which starts a JSON/XML stream).
315 324
316If you think you need this, first have a look at C<on_body>, to see if 325If you think you need this, first have a look at C<on_body>, to see if
317that doesn't solve your problem in a better way. 326that doesn't solve your problem in a better way.
327
328=item persistent => $boolean
329
330Try to create/reuse a persistent connection. When this flag is set
331(default: true for idempotent requests, false for all others), then
332C<http_request> tries to re-use an existing (previously-created)
333persistent connection to the host and, failing that, tries to create a new
334one.
335
336Requests failing in certain ways will be automatically retried once, which
337is dangerous for non-idempotent requests, which is why it defaults to off
338for them. The reason for this is because the bozos who designed HTTP/1.1
339made it impossible to distinguish between a fatal error and a normal
340connection timeout, so you never know whether there was a problem with
341your request or not.
342
343When reusing an existent connection, many parameters (such as TLS context)
344will be ignored. See the C<session> parameter for a workaround.
345
346=item keepalive => $boolean
347
348Only used when C<persistent> is also true. This parameter decides whether
349C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection
350(as opposed to only a HTTP/1.1 persistent connection).
351
352The default is true, except when using a proxy, in which case it defaults
353to false, as HTTP/1.0 proxies cannot support this in a meaningful way.
354
355=item handle_params => { key => value ... }
356
357The key-value pairs in this hash will be passed to any L<AnyEvent::Handle>
358constructor that is called - not all requests will create a handle, and
359sometimes more than one is created, so this parameter is only good for
360setting hints.
361
362Example: set the maximum read size to 4096, to potentially conserve memory
363at the cost of speed.
364
365 handle_params => {
366 max_read_size => 4096,
367 },
318 368
319=back 369=back
320 370
321Example: do a simple HTTP GET request for http://www.nethype.de/ and print 371Example: do a simple HTTP GET request for http://www.nethype.de/ and print
322the response body. 372the response body.
329Example: do a HTTP HEAD request on https://www.google.com/, use a 379Example: do a HTTP HEAD request on https://www.google.com/, use a
330timeout of 30 seconds. 380timeout of 30 seconds.
331 381
332 http_request 382 http_request
333 GET => "https://www.google.com", 383 GET => "https://www.google.com",
384 headers => { "user-agent" => "MySearchClient 1.0" },
334 timeout => 30, 385 timeout => 30,
335 sub { 386 sub {
336 my ($body, $hdr) = @_; 387 my ($body, $hdr) = @_;
337 use Data::Dumper; 388 use Data::Dumper;
338 print Dumper $hdr; 389 print Dumper $hdr;
348 }; 399 };
349 400
350 undef $request; 401 undef $request;
351 402
352=cut 403=cut
404
405#############################################################################
406# wait queue/slots
353 407
354sub _slot_schedule; 408sub _slot_schedule;
355sub _slot_schedule($) { 409sub _slot_schedule($) {
356 my $host = shift; 410 my $host = shift;
357 411
380 434
381 _slot_schedule $_[0]; 435 _slot_schedule $_[0];
382} 436}
383 437
384############################################################################# 438#############################################################################
439# cookie handling
385 440
386# expire cookies 441# expire cookies
387sub cookie_jar_expire($;$) { 442sub cookie_jar_expire($;$) {
388 my ($jar, $session_end) = @_; 443 my ($jar, $session_end) = @_;
389 444
413 } 468 }
414} 469}
415 470
416# extract cookies from jar 471# extract cookies from jar
417sub cookie_jar_extract($$$$) { 472sub cookie_jar_extract($$$$) {
418 my ($jar, $uscheme, $uhost, $upath) = @_; 473 my ($jar, $scheme, $host, $path) = @_;
419 474
420 %$jar = () if $jar->{version} != 1; 475 %$jar = () if $jar->{version} != 1;
421 476
422 my @cookies; 477 my @cookies;
423 478
424 while (my ($chost, $paths) = each %$jar) { 479 while (my ($chost, $paths) = each %$jar) {
425 next unless ref $paths; 480 next unless ref $paths;
426 481
427 if ($chost =~ /^\./) { 482 if ($chost =~ /^\./) {
428 next unless $chost eq substr $uhost, -length $chost; 483 next unless $chost eq substr $host, -length $chost;
429 } elsif ($chost =~ /\./) { 484 } elsif ($chost =~ /\./) {
430 next unless $chost eq $uhost; 485 next unless $chost eq $host;
431 } else { 486 } else {
432 next; 487 next;
433 } 488 }
434 489
435 while (my ($cpath, $cookies) = each %$paths) { 490 while (my ($cpath, $cookies) = each %$paths) {
436 next unless $cpath eq substr $upath, 0, length $cpath; 491 next unless $cpath eq substr $path, 0, length $cpath;
437 492
438 while (my ($cookie, $kv) = each %$cookies) { 493 while (my ($cookie, $kv) = each %$cookies) {
439 next if $uscheme ne "https" && exists $kv->{secure}; 494 next if $scheme ne "https" && exists $kv->{secure};
440 495
441 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) { 496 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) {
442 delete $cookies->{$cookie}; 497 delete $cookies->{$cookie};
443 next; 498 next;
444 } 499 }
458 \@cookies 513 \@cookies
459} 514}
460 515
461# parse set_cookie header into jar 516# parse set_cookie header into jar
462sub cookie_jar_set_cookie($$$$) { 517sub cookie_jar_set_cookie($$$$) {
463 my ($jar, $set_cookie, $uhost, $date) = @_; 518 my ($jar, $set_cookie, $host, $date) = @_;
464 519
465 my $anow = int AE::now; 520 my $anow = int AE::now;
466 my $snow; # server-now 521 my $snow; # server-now
467 522
468 for ($set_cookie) { 523 for ($set_cookie) {
491 # quoted 546 # quoted
492 $value = $3; 547 $value = $3;
493 $value =~ s/\\(.)/$1/gs; 548 $value =~ s/\\(.)/$1/gs;
494 } 549 }
495 550
496 push @kv, lc $name, $value; 551 push @kv, @kv ? lc $name : $name, $value;
497 552
498 last unless /\G\s*;/gc; 553 last unless /\G\s*;/gc;
499 } 554 }
500 555
501 last unless @kv; 556 last unless @kv;
524 579
525 # this is not rfc-like and not netscape-like. go figure. 580 # this is not rfc-like and not netscape-like. go figure.
526 my $ndots = $cdom =~ y/.//; 581 my $ndots = $cdom =~ y/.//;
527 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 582 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
528 } else { 583 } else {
529 $cdom = $uhost; 584 $cdom = $host;
530 } 585 }
531 586
532 # store it 587 # store it
533 $jar->{version} = 1; 588 $jar->{version} = 1;
534 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 589 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
535 590
536 redo if /\G\s*,/gc; 591 redo if /\G\s*,/gc;
537 } 592 }
538} 593}
539 594
595#############################################################################
596# keepalive/persistent connection cache
597
598# fetch a connection from the keepalive cache
599sub ka_fetch($) {
600 my $ka_key = shift;
601
602 my $hdl = pop @{ $KA_CACHE{$ka_key} }; # currently we reuse the MOST RECENTLY USED connection
603 delete $KA_CACHE{$ka_key}
604 unless @{ $KA_CACHE{$ka_key} };
605
606 $hdl
607}
608
609sub ka_store($$) {
610 my ($ka_key, $hdl) = @_;
611
612 my $kaa = $KA_CACHE{$ka_key} ||= [];
613
614 my $destroy = sub {
615 my @ka = grep $_ != $hdl, @{ $KA_CACHE{$ka_key} };
616
617 $hdl->destroy;
618
619 @ka
620 ? $KA_CACHE{$ka_key} = \@ka
621 : delete $KA_CACHE{$ka_key};
622 };
623
624 # on error etc., destroy
625 $hdl->on_error ($destroy);
626 $hdl->on_eof ($destroy);
627 $hdl->on_read ($destroy);
628 $hdl->timeout ($PERSISTENT_TIMEOUT);
629
630 push @$kaa, $hdl;
631 shift @$kaa while @$kaa > $MAX_PER_HOST;
632}
633
634#############################################################################
635# utilities
636
540# continue to parse $_ for headers and place them into the arg 637# continue to parse $_ for headers and place them into the arg
541sub parse_hdr() { 638sub _parse_hdr() {
542 my %hdr; 639 my %hdr;
543 640
544 # things seen, not parsed: 641 # things seen, not parsed:
545 # p3pP="NON CUR OTPi OUR NOR UNI" 642 # p3pP="NON CUR OTPi OUR NOR UNI"
546 643
560 for values %hdr; 657 for values %hdr;
561 658
562 \%hdr 659 \%hdr
563} 660}
564 661
662#############################################################################
663# http_get
664
565our $qr_nlnl = qr{(?<![^\012])\015?\012}; 665our $qr_nlnl = qr{(?<![^\012])\015?\012};
566 666
567our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 667our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
568our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 668our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
669
670# maybe it should just become a normal object :/
671
672sub _destroy_state(\%) {
673 my ($state) = @_;
674
675 $state->{handle}->destroy if $state->{handle};
676 %$state = ();
677}
678
679sub _error(\%$$) {
680 my ($state, $cb, $hdr) = @_;
681
682 &_destroy_state ($state);
683
684 $cb->(undef, $hdr);
685 ()
686}
569 687
570sub http_request($$@) { 688sub http_request($$@) {
571 my $cb = pop; 689 my $cb = pop;
572 my ($method, $url, %arg) = @_; 690 my ($method, $url, %arg) = @_;
573 691
594 if $recurse < 0; 712 if $recurse < 0;
595 713
596 my $proxy = $arg{proxy} || $PROXY; 714 my $proxy = $arg{proxy} || $PROXY;
597 my $timeout = $arg{timeout} || $TIMEOUT; 715 my $timeout = $arg{timeout} || $TIMEOUT;
598 716
599 my ($uscheme, $uauthority, $upath, $query, $fragment) = 717 my ($uscheme, $uauthority, $upath, $query, undef) = # ignore fragment
600 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|; 718 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|;
601 719
602 $uscheme = lc $uscheme; 720 $uscheme = lc $uscheme;
603 721
604 my $uport = $uscheme eq "http" ? 80 722 my $uport = $uscheme eq "http" ? 80
649 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 767 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
650 768
651 $hdr{"content-length"} = length $arg{body} 769 $hdr{"content-length"} = length $arg{body}
652 if length $arg{body} || $method ne "GET"; 770 if length $arg{body} || $method ne "GET";
653 771
654 $hdr{connection} = "close Te"; #1.1 772 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/;
773
774 # default value for keepalive is true iff the request is for an idempotent method
775 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : $idempotent;
776 my $keepalive10 = exists $arg{keepalive10} ? $arg{keepalive10} : !$proxy;
777 my $keptalive; # true if this is actually a recycled connection
778
779 # the key to use in the keepalive cache
780 my $ka_key = "$uhost\x00$arg{sessionid}";
781
782 $hdr{connection} = ($keepalive ? $keepalive10 ? "keep-alive " : "" : "close ") . "Te"; #1.1
655 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 783 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
656 784
657 my %state = (connect_guard => 1); 785 my %state = (connect_guard => 1);
658 786
659 my $ae_error = 595; # connecting 787 my $ae_error = 595; # connecting
660 788
661 # handle actual, non-tunneled, request 789 # handle actual, non-tunneled, request
662 my $handle_actual_request = sub { 790 my $handle_actual_request = sub {
663 $ae_error = 596; # request phase 791 $ae_error = 596; # request phase
664 792
793 my $hdl = $state{handle};
794
665 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls}; 795 $hdl->starttls ("connect") if $uscheme eq "https" && !exists $hdl->{tls};
666 796
667 # send request 797 # send request
668 $state{handle}->push_write ( 798 $hdl->push_write (
669 "$method $rpath HTTP/1.1\015\012" 799 "$method $rpath HTTP/1.1\015\012"
670 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 800 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
671 . "\015\012" 801 . "\015\012"
672 . (delete $arg{body}) 802 . (delete $arg{body})
673 ); 803 );
674 804
675 # return if error occured during push_write() 805 # return if error occured during push_write()
676 return unless %state; 806 return unless %state;
677 807
678 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use 808 # reduce memory usage, save a kitten, also re-use it for the response headers.
809 %hdr = ();
679 810
680 # status line and headers 811 # status line and headers
681 $state{read_response} = sub { 812 $state{read_response} = sub {
813 return unless %state;
814
682 for ("$_[1]") { 815 for ("$_[1]") {
683 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 816 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
684 817
685 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci 818 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
686 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" })); 819 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid server response" };
687 820
688 # 100 Continue handling 821 # 100 Continue handling
689 # should not happen as we don't send expect: 100-continue, 822 # should not happen as we don't send expect: 100-continue,
690 # but we handle it just in case. 823 # but we handle it just in case.
691 # since we send the request body regardless, if we get an error 824 # since we send the request body regardless, if we get an error
697 HTTPVersion => $1, 830 HTTPVersion => $1,
698 Status => $2, 831 Status => $2,
699 Reason => $3, 832 Reason => $3,
700 ; 833 ;
701 834
702 my $hdr = parse_hdr 835 my $hdr = _parse_hdr
703 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" })); 836 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Garbled response headers" };
704 837
705 %hdr = (%$hdr, @pseudo); 838 %hdr = (%$hdr, @pseudo);
706 } 839 }
707 840
708 # redirect handling 841 # redirect handling
739 $redirect = 1; 872 $redirect = 1;
740 } 873 }
741 } 874 }
742 875
743 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive]) 876 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
744 my $may_keep_alive = $_[3]; 877 if ($state{handle}) {
745 878 # handle keepalive
879 if (
880 $keepalive
881 && $_[3]
882 && ($hdr{HTTPVersion} < 1.1
883 ? $hdr{connection} =~ /\bkeep-?alive\b/i
884 : $hdr{connection} !~ /\bclose\b/i)
885 ) {
886 ka_store $ka_key, delete $state{handle};
887 } else {
888 # no keepalive, destroy the handle
746 $state{handle}->destroy if $state{handle}; 889 $state{handle}->destroy;
890 }
891 }
892
747 %state = (); 893 %state = ();
748 894
749 if (defined $_[1]) { 895 if (defined $_[1]) {
750 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1]; 896 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
751 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2]; 897 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
763 http_request ( 909 http_request (
764 $method => $hdr{location}, 910 $method => $hdr{location},
765 %arg, 911 %arg,
766 recurse => $recurse - 1, 912 recurse => $recurse - 1,
767 Redirect => [$_[0], \%hdr], 913 Redirect => [$_[0], \%hdr],
768 $cb); 914 $cb
915 );
769 } else { 916 } else {
770 $cb->($_[0], \%hdr); 917 $cb->($_[0], \%hdr);
771 } 918 }
772 }; 919 };
773 920
774 $ae_error = 597; # body phase 921 $ae_error = 597; # body phase
775 922
923 my $chunked = $hdr{"transfer-encoding"} =~ /\bchunked\b/i; # not quite correct...
924
776 my $len = $hdr{"content-length"}; 925 my $len = $chunked ? undef : $hdr{"content-length"};
777 926
778 # body handling, many different code paths 927 # body handling, many different code paths
779 # - no body expected 928 # - no body expected
780 # - want_body_handle 929 # - want_body_handle
781 # - te chunked 930 # - te chunked
796 $_[0]->on_error (undef); 945 $_[0]->on_error (undef);
797 $_[0]->on_read (undef); 946 $_[0]->on_read (undef);
798 947
799 $finish->(delete $state{handle}); 948 $finish->(delete $state{handle});
800 949
801 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) { 950 } elsif ($chunked) {
802 my $cl = 0; 951 my $cl = 0;
803 my $body = undef; 952 my $body = "";
804 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 953 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
805 954
806 $state{read_chunk} = sub { 955 $state{read_chunk} = sub {
807 $_[1] =~ /^([0-9a-fA-F]+)/ 956 $_[1] =~ /^([0-9a-fA-F]+)/
808 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 957 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
828 $_[0]->push_read (line => $qr_nlnl, sub { 977 $_[0]->push_read (line => $qr_nlnl, sub {
829 if (length $_[1]) { 978 if (length $_[1]) {
830 for ("$_[1]") { 979 for ("$_[1]") {
831 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 980 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
832 981
833 my $hdr = parse_hdr 982 my $hdr = _parse_hdr
834 or return $finish->(undef, $ae_error => "Garbled response trailers"); 983 or return $finish->(undef, $ae_error => "Garbled response trailers");
835 984
836 %hdr = (%hdr, %$hdr); 985 %hdr = (%hdr, %$hdr);
837 } 986 }
838 } 987 }
881 $_[0]->on_read (sub { }); 1030 $_[0]->on_read (sub { });
882 } 1031 }
883 } 1032 }
884 }; 1033 };
885 1034
1035 # if keepalive is enabled, then the server closing the connection
1036 # before a response can happen legally - we retry on idempotent methods.
1037 if ($keptalive && $idempotent) {
1038 my $old_eof = $hdl->{on_eof};
1039 $hdl->{on_eof} = sub {
1040 _destroy_state %state;
1041
1042 http_request (
1043 $method => $url,
1044 %arg,
1045 keepalive => 0,
1046 $cb
1047 );
1048 };
1049 $hdl->on_read (sub {
1050 return unless %state;
1051
1052 # as soon as we receive something, a connection close
1053 # once more becomes a hard error
1054 $hdl->{on_eof} = $old_eof;
1055 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1056 });
1057 } else {
886 $state{handle}->push_read (line => $qr_nlnl, $state{read_response}); 1058 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1059 }
887 }; 1060 };
888 1061
1062 my $prepare_handle = sub {
1063 my ($hdl) = $state{handle};
1064
1065 $hdl->timeout ($timeout);
1066 $hdl->on_error (sub {
1067 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] };
1068 });
1069 $hdl->on_eof (sub {
1070 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" };
1071 });
1072 };
1073
1074 # connected to proxy (or origin server)
889 my $connect_cb = sub { 1075 my $connect_cb = sub {
890 $state{fh} = shift 1076 my $fh = shift
891 or do {
892 my $err = "$!";
893 %state = ();
894 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err }); 1077 or return _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "$!" };
895 };
896 1078
897 return unless delete $state{connect_guard}; 1079 return unless delete $state{connect_guard};
898 1080
899 # get handle 1081 # get handle
900 $state{handle} = new AnyEvent::Handle 1082 $state{handle} = new AnyEvent::Handle
1083 %{ $arg{handle_params} },
901 fh => $state{fh}, 1084 fh => $fh,
902 peername => $rhost, 1085 peername => $uhost,
903 tls_ctx => $arg{tls_ctx}, 1086 tls_ctx => $arg{tls_ctx},
904 # these need to be reconfigured on keepalive handles
905 timeout => $timeout,
906 on_error => sub {
907 %state = ();
908 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
909 },
910 on_eof => sub {
911 %state = ();
912 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
913 },
914 ; 1087 ;
915 1088
916 # limit the number of persistent connections 1089 $prepare_handle->();
917 # keepalive not yet supported
918# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
919# ++$KA_COUNT{$_[1]};
920# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
921# --$KA_COUNT{$_[1]}
922# };
923# $hdr{connection} = "keep-alive";
924# }
925 1090
926 $state{handle}->starttls ("connect") if $rscheme eq "https"; 1091 #$state{handle}->starttls ("connect") if $rscheme eq "https";
927 1092
928 # now handle proxy-CONNECT method 1093 # now handle proxy-CONNECT method
929 if ($proxy && $uscheme eq "https") { 1094 if ($proxy && $uscheme eq "https") {
930 # oh dear, we have to wrap it into a connect request 1095 # oh dear, we have to wrap it into a connect request
931 1096
932 # maybe re-use $uauthority with patched port? 1097 # maybe re-use $uauthority with patched port?
933 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012"); 1098 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012");
934 $state{handle}->push_read (line => $qr_nlnl, sub { 1099 $state{handle}->push_read (line => $qr_nlnl, sub {
935 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 1100 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
936 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" })); 1101 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" };
937 1102
938 if ($2 == 200) { 1103 if ($2 == 200) {
939 $rpath = $upath; 1104 $rpath = $upath;
940 $handle_actual_request->(); 1105 $handle_actual_request->();
941 } else { 1106 } else {
942 %state = ();
943 $cb->(undef, { @pseudo, Status => $2, Reason => $3 }); 1107 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 };
944 } 1108 }
945 }); 1109 });
946 } else { 1110 } else {
947 $handle_actual_request->(); 1111 $handle_actual_request->();
948 } 1112 }
951 _get_slot $uhost, sub { 1115 _get_slot $uhost, sub {
952 $state{slot_guard} = shift; 1116 $state{slot_guard} = shift;
953 1117
954 return unless $state{connect_guard}; 1118 return unless $state{connect_guard};
955 1119
1120 # try to use an existing keepalive connection, but only if we, ourselves, plan
1121 # on a keepalive request (in theory, this should be a separate config option).
1122 if ($keepalive && $KA_CACHE{$ka_key}) {
1123 $keptalive = 1;
1124 $state{handle} = ka_fetch $ka_key;
1125 $prepare_handle->();
1126 $handle_actual_request->();
1127
1128 } else {
956 my $tcp_connect = $arg{tcp_connect} 1129 my $tcp_connect = $arg{tcp_connect}
957 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1130 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
958 1131
959 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout }); 1132 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
1133 }
960 }; 1134 };
961 1135
962 defined wantarray && AnyEvent::Util::guard { %state = () } 1136 defined wantarray && AnyEvent::Util::guard { _destroy_state %state }
963} 1137}
964 1138
965sub http_get($@) { 1139sub http_get($@) {
966 unshift @_, "GET"; 1140 unshift @_, "GET";
967 &http_request 1141 &http_request
985AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for 1159AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for
986the actual connection, which in turn uses AnyEvent::DNS to resolve 1160the actual connection, which in turn uses AnyEvent::DNS to resolve
987hostnames. The latter is a simple stub resolver and does no caching 1161hostnames. The latter is a simple stub resolver and does no caching
988on its own. If you want DNS caching, you currently have to provide 1162on its own. If you want DNS caching, you currently have to provide
989your own default resolver (by storing a suitable resolver object in 1163your own default resolver (by storing a suitable resolver object in
990C<$AnyEvent::DNS::RESOLVER>). 1164C<$AnyEvent::DNS::RESOLVER>) or your own C<tcp_connect> callback.
991 1165
992=head2 GLOBAL FUNCTIONS AND VARIABLES 1166=head2 GLOBAL FUNCTIONS AND VARIABLES
993 1167
994=over 4 1168=over 4
995 1169
996=item AnyEvent::HTTP::set_proxy "proxy-url" 1170=item AnyEvent::HTTP::set_proxy "proxy-url"
997 1171
998Sets the default proxy server to use. The proxy-url must begin with a 1172Sets the default proxy server to use. The proxy-url must begin with a
999string of the form C<http://host:port> (optionally C<https:...>), croaks 1173string of the form C<http://host:port>, croaks otherwise.
1000otherwise.
1001 1174
1002To clear an already-set proxy, use C<undef>. 1175To clear an already-set proxy, use C<undef>.
1003 1176
1004=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1177=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1005 1178
1053 1226
1054=item $AnyEvent::HTTP::MAX_RECURSE 1227=item $AnyEvent::HTTP::MAX_RECURSE
1055 1228
1056The default value for the C<recurse> request parameter (default: C<10>). 1229The default value for the C<recurse> request parameter (default: C<10>).
1057 1230
1231=item $AnyEvent::HTTP::TIMEOUT
1232
1233The default timeout for conenction operations (default: C<300>).
1234
1058=item $AnyEvent::HTTP::USERAGENT 1235=item $AnyEvent::HTTP::USERAGENT
1059 1236
1060The default value for the C<User-Agent> header (the default is 1237The default value for the C<User-Agent> header (the default is
1061C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1238C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1062 1239
1063=item $AnyEvent::HTTP::MAX_PER_HOST 1240=item $AnyEvent::HTTP::MAX_PER_HOST
1064 1241
1065The maximum number of concurrent connections to the same host (identified 1242The maximum number of concurrent connections to the same host (identified
1066by the hostname). If the limit is exceeded, then the additional requests 1243by the hostname). If the limit is exceeded, then the additional requests
1067are queued until previous connections are closed. 1244are queued until previous connections are closed. Both persistent and
1245non-persistent connections are counted in this limit.
1068 1246
1069The default value for this is C<4>, and it is highly advisable to not 1247The default value for this is C<4>, and it is highly advisable to not
1070increase it. 1248increase it much.
1249
1250For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1251connections, older browsers used 2, newers (such as firefox 3) typically
1252use 6, and Opera uses 8 because like, they have the fastest browser and
1253give a shit for everybody else on the planet.
1254
1255=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1256
1257The time after which idle persistent conenctions get closed by
1258AnyEvent::HTTP (default: C<3>).
1071 1259
1072=item $AnyEvent::HTTP::ACTIVE 1260=item $AnyEvent::HTTP::ACTIVE
1073 1261
1074The number of active connections. This is not the number of currently 1262The number of active connections. This is not the number of currently
1075running requests, but the number of currently open and non-idle TCP 1263running requests, but the number of currently open and non-idle TCP
1076connections. This number of can be useful for load-leveling. 1264connections. This number can be useful for load-leveling.
1077 1265
1078=back 1266=back
1079 1267
1080=cut 1268=cut
1081 1269
1124 undef 1312 undef
1125} 1313}
1126 1314
1127sub set_proxy($) { 1315sub set_proxy($) {
1128 if (length $_[0]) { 1316 if (length $_[0]) {
1129 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix 1317 $_[0] =~ m%^(http):// ([^:/]+) (?: : (\d*) )?%ix
1130 or Carp::croak "$_[0]: invalid proxy URL"; 1318 or Carp::croak "$_[0]: invalid proxy URL";
1131 $PROXY = [$2, $3 || 3128, $1] 1319 $PROXY = [$2, $3 || 3128, $1]
1132 } else { 1320 } else {
1133 undef $PROXY; 1321 undef $PROXY;
1134 } 1322 }
1137# initialise proxy from environment 1325# initialise proxy from environment
1138eval { 1326eval {
1139 set_proxy $ENV{http_proxy}; 1327 set_proxy $ENV{http_proxy};
1140}; 1328};
1141 1329
1330=head2 SHOWCASE
1331
1332This section contaisn some more elaborate "real-world" examples or code
1333snippets.
1334
1335=head2 HTTP/1.1 FILE DOWNLOAD
1336
1337Downloading files with HTTP can be quite tricky, especially when something
1338goes wrong and you want tor esume.
1339
1340Here is a function that initiates and resumes a download. It uses the
1341last modified time to check for file content changes, and works with many
1342HTTP/1.0 servers as well, and usually falls back to a complete re-download
1343on older servers.
1344
1345It calls the completion callback with either C<undef>, which means a
1346nonretryable error occured, C<0> when the download was partial and should
1347be retried, and C<1> if it was successful.
1348
1349 use AnyEvent::HTTP;
1350
1351 sub download($$$) {
1352 my ($url, $file, $cb) = @_;
1353
1354 open my $fh, "+<", $file
1355 or die "$file: $!";
1356
1357 my %hdr;
1358 my $ofs = 0;
1359
1360 warn stat $fh;
1361 warn -s _;
1362 if (stat $fh and -s _) {
1363 $ofs = -s _;
1364 warn "-s is ", $ofs;#d#
1365 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1366 $hdr{"range"} = "bytes=$ofs-";
1367 }
1368
1369 http_get $url,
1370 headers => \%hdr,
1371 on_header => sub {
1372 my ($hdr) = @_;
1373
1374 if ($hdr->{Status} == 200 && $ofs) {
1375 # resume failed
1376 truncate $fh, $ofs = 0;
1377 }
1378
1379 sysseek $fh, $ofs, 0;
1380
1381 1
1382 },
1383 on_body => sub {
1384 my ($data, $hdr) = @_;
1385
1386 if ($hdr->{Status} =~ /^2/) {
1387 length $data == syswrite $fh, $data
1388 or return; # abort on write errors
1389 }
1390
1391 1
1392 },
1393 sub {
1394 my (undef, $hdr) = @_;
1395
1396 my $status = $hdr->{Status};
1397
1398 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1399 utime $fh, $time, $time;
1400 }
1401
1402 if ($status == 200 || $status == 206 || $status == 416) {
1403 # download ok || resume ok || file already fully downloaded
1404 $cb->(1, $hdr);
1405
1406 } elsif ($status == 412) {
1407 # file has changed while resuming, delete and retry
1408 unlink $file;
1409 $cb->(0, $hdr);
1410
1411 } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
1412 # retry later
1413 $cb->(0, $hdr);
1414
1415 } else {
1416 $cb->(undef, $hdr);
1417 }
1418 }
1419 ;
1420 }
1421
1422 download "http://server/somelargefile", "/tmp/somelargefile", sub {
1423 if ($_[0]) {
1424 print "OK!\n";
1425 } elsif (defined $_[0]) {
1426 print "please retry later\n";
1427 } else {
1428 print "ERROR\n";
1429 }
1430 };
1431
1142=head2 SOCKS PROXIES 1432=head3 SOCKS PROXIES
1143 1433
1144Socks proxies are not directly supported by AnyEvent::HTTP. You can 1434Socks proxies are not directly supported by AnyEvent::HTTP. You can
1145compile your perl to support socks, or use an external program such as 1435compile your perl to support socks, or use an external program such as
1146F<socksify> (dante) or F<tsocks> to make your program use a socks proxy 1436F<socksify> (dante) or F<tsocks> to make your program use a socks proxy
1147transparently. 1437transparently.

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines