ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.90 by root, Mon Jan 3 00:41:25 2011 UTC vs.
Revision 1.101 by root, Wed Feb 16 16:34:34 2011 UTC

15This module is an L<AnyEvent> user, you need to make sure that you use and 15This module is an L<AnyEvent> user, you need to make sure that you use and
16run a supported event loop. 16run a supported event loop.
17 17
18This module implements a simple, stateless and non-blocking HTTP 18This module implements a simple, stateless and non-blocking HTTP
19client. It supports GET, POST and other request methods, cookies and more, 19client. It supports GET, POST and other request methods, cookies and more,
20all on a very low level. It can follow redirects supports proxies and 20all on a very low level. It can follow redirects, supports proxies, and
21automatically limits the number of connections to the values specified in 21automatically limits the number of connections to the values specified in
22the RFC. 22the RFC.
23 23
24It should generally be a "good client" that is enough for most HTTP 24It should generally be a "good client" that is enough for most HTTP
25tasks. Simple tasks should be simple, but complex tasks should still be 25tasks. Simple tasks should be simple, but complex tasks should still be
46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '1.5'; 51our $VERSION = '2.03';
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
57our $MAX_PERSISTENT = 8;
58our $PERSISTENT_TIMEOUT = 2; 57our $PERSISTENT_TIMEOUT = 3;
59our $TIMEOUT = 300; 58our $TIMEOUT = 300;
60 59our $MAX_PER_HOST = 4; # changing this is evil
61# changing these is evil
62our $MAX_PERSISTENT_PER_HOST = 2;
63our $MAX_PER_HOST = 4;
64 60
65our $PROXY; 61our $PROXY;
66our $ACTIVE = 0; 62our $ACTIVE = 0;
67 63
68my %KA_COUNT; # number of open keep-alive connections per host 64my %KA_CACHE; # indexed by uhost currently, points to [$handle...] array
69my %CO_SLOT; # number of open connections, and wait queue, per host 65my %CO_SLOT; # number of open connections, and wait queue, per host
70 66
71=item http_get $url, key => value..., $cb->($data, $headers) 67=item http_get $url, key => value..., $cb->($data, $headers)
72 68
73Executes an HTTP-GET request. See the http_request function for details on 69Executes an HTTP-GET request. See the http_request function for details on
186=item proxy => [$host, $port[, $scheme]] or undef 182=item proxy => [$host, $port[, $scheme]] or undef
187 183
188Use the given http proxy for all requests. If not specified, then the 184Use the given http proxy for all requests. If not specified, then the
189default proxy (as specified by C<$ENV{http_proxy}>) is used. 185default proxy (as specified by C<$ENV{http_proxy}>) is used.
190 186
191C<$scheme> must be either missing, C<http> for HTTP or C<https> for 187C<$scheme> must be either missing or must be C<http> for HTTP.
192HTTPS.
193 188
194=item body => $string 189=item body => $string
195 190
196The request body, usually empty. Will be sent as-is (future versions of 191The request body, usually empty. Will be sent as-is (future versions of
197this module might offer more options). 192this module might offer more options).
227verification, highest compatibility) and high-security (CA and common-name 222verification, highest compatibility) and high-security (CA and common-name
228verification) TLS context. 223verification) TLS context.
229 224
230The default for this option is C<low>, which could be interpreted as "give 225The default for this option is C<low>, which could be interpreted as "give
231me the page, no matter what". 226me the page, no matter what".
227
228See also the C<sessionid> parameter.
229
230=item session => $string
231
232The module might reuse connections to the same host internally. Sometimes
233(e.g. when using TLS), you do not want to reuse connections from other
234sessions. This can be achieved by setting this parameter to some unique
235ID (such as the address of an object storing your state data, or the TLS
236context) - only connections using the same unique ID will be reused.
232 237
233=item on_prepare => $callback->($fh) 238=item on_prepare => $callback->($fh)
234 239
235In rare cases you need to "tune" the socket before it is used to 240In rare cases you need to "tune" the socket before it is used to
236connect (for exmaple, to bind it on a given IP address). This parameter 241connect (for exmaple, to bind it on a given IP address). This parameter
306called. Instead of the C<$body> argument containing the body data, the 311called. Instead of the C<$body> argument containing the body data, the
307callback will receive the L<AnyEvent::Handle> object associated with the 312callback will receive the L<AnyEvent::Handle> object associated with the
308connection. In error cases, C<undef> will be passed. When there is no body 313connection. In error cases, C<undef> will be passed. When there is no body
309(e.g. status C<304>), the empty string will be passed. 314(e.g. status C<304>), the empty string will be passed.
310 315
311The handle object might or might not be in TLS mode, might be connected to 316The handle object might or might not be in TLS mode, might be connected
312a proxy, be a persistent connection etc., and configured in unspecified 317to a proxy, be a persistent connection, use chunked transfer encoding
313ways. The user is responsible for this handle (it will not be used by this 318etc., and configured in unspecified ways. The user is responsible for this
314module anymore). 319handle (it will not be used by this module anymore).
315 320
316This is useful with some push-type services, where, after the initial 321This is useful with some push-type services, where, after the initial
317headers, an interactive protocol is used (typical example would be the 322headers, an interactive protocol is used (typical example would be the
318push-style twitter API which starts a JSON/XML stream). 323push-style twitter API which starts a JSON/XML stream).
319 324
320If you think you need this, first have a look at C<on_body>, to see if 325If you think you need this, first have a look at C<on_body>, to see if
321that doesn't solve your problem in a better way. 326that doesn't solve your problem in a better way.
327
328=item persistent => $boolean
329
330Try to create/reuse a persistent connection. When this flag is set
331(default: true for idempotent requests, false for all others), then
332C<http_request> tries to re-use an existing (previously-created)
333persistent connection to the host and, failing that, tries to create a new
334one.
335
336Requests failing in certain ways will be automatically retried once, which
337is dangerous for non-idempotent requests, which is why it defaults to off
338for them. The reason for this is because the bozos who designed HTTP/1.1
339made it impossible to distinguish between a fatal error and a normal
340connection timeout, so you never know whether there was a problem with
341your request or not.
342
343When reusing an existent connection, many parameters (such as TLS context)
344will be ignored. See the C<session> parameter for a workaround.
345
346=item keepalive => $boolean
347
348Only used when C<persistent> is also true. This parameter decides whether
349C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection
350(as opposed to only a HTTP/1.1 persistent connection).
351
352The default is true, except when using a proxy, in which case it defaults
353to false, as HTTP/1.0 proxies cannot support this in a meaningful way.
354
355=item handle_params => { key => value ... }
356
357The key-value pairs in this hash will be passed to any L<AnyEvent::Handle>
358constructor that is called - not all requests will create a handle, and
359sometimes more than one is created, so this parameter is only good for
360setting hints.
361
362Example: set the maximum read size to 4096, to potentially conserve memory
363at the cost of speed.
364
365 handle_params => {
366 max_read_size => 4096,
367 },
322 368
323=back 369=back
324 370
325Example: do a simple HTTP GET request for http://www.nethype.de/ and print 371Example: do a simple HTTP GET request for http://www.nethype.de/ and print
326the response body. 372the response body.
353 }; 399 };
354 400
355 undef $request; 401 undef $request;
356 402
357=cut 403=cut
404
405#############################################################################
406# wait queue/slots
358 407
359sub _slot_schedule; 408sub _slot_schedule;
360sub _slot_schedule($) { 409sub _slot_schedule($) {
361 my $host = shift; 410 my $host = shift;
362 411
385 434
386 _slot_schedule $_[0]; 435 _slot_schedule $_[0];
387} 436}
388 437
389############################################################################# 438#############################################################################
439# cookie handling
390 440
391# expire cookies 441# expire cookies
392sub cookie_jar_expire($;$) { 442sub cookie_jar_expire($;$) {
393 my ($jar, $session_end) = @_; 443 my ($jar, $session_end) = @_;
394 444
418 } 468 }
419} 469}
420 470
421# extract cookies from jar 471# extract cookies from jar
422sub cookie_jar_extract($$$$) { 472sub cookie_jar_extract($$$$) {
423 my ($jar, $uscheme, $uhost, $upath) = @_; 473 my ($jar, $scheme, $host, $path) = @_;
424 474
425 %$jar = () if $jar->{version} != 1; 475 %$jar = () if $jar->{version} != 1;
426 476
427 my @cookies; 477 my @cookies;
428 478
429 while (my ($chost, $paths) = each %$jar) { 479 while (my ($chost, $paths) = each %$jar) {
430 next unless ref $paths; 480 next unless ref $paths;
431 481
432 if ($chost =~ /^\./) { 482 if ($chost =~ /^\./) {
433 next unless $chost eq substr $uhost, -length $chost; 483 next unless $chost eq substr $host, -length $chost;
434 } elsif ($chost =~ /\./) { 484 } elsif ($chost =~ /\./) {
435 next unless $chost eq $uhost; 485 next unless $chost eq $host;
436 } else { 486 } else {
437 next; 487 next;
438 } 488 }
439 489
440 while (my ($cpath, $cookies) = each %$paths) { 490 while (my ($cpath, $cookies) = each %$paths) {
441 next unless $cpath eq substr $upath, 0, length $cpath; 491 next unless $cpath eq substr $path, 0, length $cpath;
442 492
443 while (my ($cookie, $kv) = each %$cookies) { 493 while (my ($cookie, $kv) = each %$cookies) {
444 next if $uscheme ne "https" && exists $kv->{secure}; 494 next if $scheme ne "https" && exists $kv->{secure};
445 495
446 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) { 496 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) {
447 delete $cookies->{$cookie}; 497 delete $cookies->{$cookie};
448 next; 498 next;
449 } 499 }
463 \@cookies 513 \@cookies
464} 514}
465 515
466# parse set_cookie header into jar 516# parse set_cookie header into jar
467sub cookie_jar_set_cookie($$$$) { 517sub cookie_jar_set_cookie($$$$) {
468 my ($jar, $set_cookie, $uhost, $date) = @_; 518 my ($jar, $set_cookie, $host, $date) = @_;
469 519
470 my $anow = int AE::now; 520 my $anow = int AE::now;
471 my $snow; # server-now 521 my $snow; # server-now
472 522
473 for ($set_cookie) { 523 for ($set_cookie) {
479 while ( 529 while (
480 m{ 530 m{
481 \G\s* 531 \G\s*
482 (?: 532 (?:
483 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+) 533 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+)
484 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) )? 534 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^;,[:space:]]*) ) )?
485 ) 535 )
486 }gcxsi 536 }gcxsi
487 ) { 537 ) {
488 my $name = $2; 538 my $name = $2;
489 my $value = $4; 539 my $value = $4;
496 # quoted 546 # quoted
497 $value = $3; 547 $value = $3;
498 $value =~ s/\\(.)/$1/gs; 548 $value =~ s/\\(.)/$1/gs;
499 } 549 }
500 550
501 push @kv, lc $name, $value; 551 push @kv, @kv ? lc $name : $name, $value;
502 552
503 last unless /\G\s*;/gc; 553 last unless /\G\s*;/gc;
504 } 554 }
505 555
506 last unless @kv; 556 last unless @kv;
529 579
530 # this is not rfc-like and not netscape-like. go figure. 580 # this is not rfc-like and not netscape-like. go figure.
531 my $ndots = $cdom =~ y/.//; 581 my $ndots = $cdom =~ y/.//;
532 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 582 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
533 } else { 583 } else {
534 $cdom = $uhost; 584 $cdom = $host;
535 } 585 }
536 586
537 # store it 587 # store it
538 $jar->{version} = 1; 588 $jar->{version} = 1;
539 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 589 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
540 590
541 redo if /\G\s*,/gc; 591 redo if /\G\s*,/gc;
542 } 592 }
543} 593}
544 594
595#############################################################################
596# keepalive/persistent connection cache
597
598# fetch a connection from the keepalive cache
599sub ka_fetch($) {
600 my $ka_key = shift;
601
602 my $hdl = pop @{ $KA_CACHE{$ka_key} }; # currently we reuse the MOST RECENTLY USED connection
603 delete $KA_CACHE{$ka_key}
604 unless @{ $KA_CACHE{$ka_key} };
605
606 $hdl
607}
608
609sub ka_store($$) {
610 my ($ka_key, $hdl) = @_;
611
612 my $kaa = $KA_CACHE{$ka_key} ||= [];
613
614 my $destroy = sub {
615 my @ka = grep $_ != $hdl, @{ $KA_CACHE{$ka_key} };
616
617 $hdl->destroy;
618
619 @ka
620 ? $KA_CACHE{$ka_key} = \@ka
621 : delete $KA_CACHE{$ka_key};
622 };
623
624 # on error etc., destroy
625 $hdl->on_error ($destroy);
626 $hdl->on_eof ($destroy);
627 $hdl->on_read ($destroy);
628 $hdl->timeout ($PERSISTENT_TIMEOUT);
629
630 push @$kaa, $hdl;
631 shift @$kaa while @$kaa > $MAX_PER_HOST;
632}
633
634#############################################################################
635# utilities
636
545# continue to parse $_ for headers and place them into the arg 637# continue to parse $_ for headers and place them into the arg
546sub parse_hdr() { 638sub _parse_hdr() {
547 my %hdr; 639 my %hdr;
548 640
549 # things seen, not parsed: 641 # things seen, not parsed:
550 # p3pP="NON CUR OTPi OUR NOR UNI" 642 # p3pP="NON CUR OTPi OUR NOR UNI"
551 643
565 for values %hdr; 657 for values %hdr;
566 658
567 \%hdr 659 \%hdr
568} 660}
569 661
662#############################################################################
663# http_get
664
570our $qr_nlnl = qr{(?<![^\012])\015?\012}; 665our $qr_nlnl = qr{(?<![^\012])\015?\012};
571 666
572our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 667our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
573our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 668our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
669
670# maybe it should just become a normal object :/
671
672sub _destroy_state(\%) {
673 my ($state) = @_;
674
675 $state->{handle}->destroy if $state->{handle};
676 %$state = ();
677}
678
679sub _error(\%$$) {
680 my ($state, $cb, $hdr) = @_;
681
682 &_destroy_state ($state);
683
684 $cb->(undef, $hdr);
685 ()
686}
574 687
575sub http_request($$@) { 688sub http_request($$@) {
576 my $cb = pop; 689 my $cb = pop;
577 my ($method, $url, %arg) = @_; 690 my ($method, $url, %arg) = @_;
578 691
599 if $recurse < 0; 712 if $recurse < 0;
600 713
601 my $proxy = $arg{proxy} || $PROXY; 714 my $proxy = $arg{proxy} || $PROXY;
602 my $timeout = $arg{timeout} || $TIMEOUT; 715 my $timeout = $arg{timeout} || $TIMEOUT;
603 716
604 my ($uscheme, $uauthority, $upath, $query, $fragment) = 717 my ($uscheme, $uauthority, $upath, $query, undef) = # ignore fragment
605 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|; 718 $url =~ m|^([^:]+):(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?$|;
606 719
607 $uscheme = lc $uscheme; 720 $uscheme = lc $uscheme;
608 721
609 my $uport = $uscheme eq "http" ? 80 722 my $uport = $uscheme eq "http" ? 80
610 : $uscheme eq "https" ? 443 723 : $uscheme eq "https" ? 443
657 if length $arg{body} || $method ne "GET"; 770 if length $arg{body} || $method ne "GET";
658 771
659 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 772 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/;
660 773
661 # default value for keepalive is true iff the request is for an idempotent method 774 # default value for keepalive is true iff the request is for an idempotent method
662 my $keepalive = exists $arg{keepalive} 775 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : $idempotent;
663 ? $arg{keepalive}*1 776 my $keepalive10 = exists $arg{keepalive10} ? $arg{keepalive10} : !$proxy;
664 : $idempotent ? $PERSISTENT_TIMEOUT : 0; 777 my $keptalive; # true if this is actually a recycled connection
665 778
779 # the key to use in the keepalive cache
780 my $ka_key = "$uhost\x00$arg{sessionid}";
781
666 $hdr{connection} = ($keepalive ? "" : "close ") . "Te"; #1.1 782 $hdr{connection} = ($keepalive ? $keepalive10 ? "keep-alive " : "" : "close ") . "Te"; #1.1
667 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 783 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
668 784
669 my %state = (connect_guard => 1); 785 my %state = (connect_guard => 1);
670 786
671 my $ae_error = 595; # connecting 787 my $ae_error = 595; # connecting
672 788
673 # handle actual, non-tunneled, request 789 # handle actual, non-tunneled, request
674 my $handle_actual_request = sub { 790 my $handle_actual_request = sub {
675 $ae_error = 596; # request phase 791 $ae_error = 596; # request phase
676 792
793 my $hdl = $state{handle};
794
677 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls}; 795 $hdl->starttls ("connect") if $uscheme eq "https" && !exists $hdl->{tls};
678 796
679 # send request 797 # send request
680 $state{handle}->push_write ( 798 $hdl->push_write (
681 "$method $rpath HTTP/1.1\015\012" 799 "$method $rpath HTTP/1.1\015\012"
682 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 800 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
683 . "\015\012" 801 . "\015\012"
684 . (delete $arg{body}) 802 . (delete $arg{body})
685 ); 803 );
690 # reduce memory usage, save a kitten, also re-use it for the response headers. 808 # reduce memory usage, save a kitten, also re-use it for the response headers.
691 %hdr = (); 809 %hdr = ();
692 810
693 # status line and headers 811 # status line and headers
694 $state{read_response} = sub { 812 $state{read_response} = sub {
813 return unless %state;
814
695 for ("$_[1]") { 815 for ("$_[1]") {
696 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 816 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
697 817
698 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci 818 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
699 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" })); 819 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid server response" };
700 820
701 # 100 Continue handling 821 # 100 Continue handling
702 # should not happen as we don't send expect: 100-continue, 822 # should not happen as we don't send expect: 100-continue,
703 # but we handle it just in case. 823 # but we handle it just in case.
704 # since we send the request body regardless, if we get an error 824 # since we send the request body regardless, if we get an error
710 HTTPVersion => $1, 830 HTTPVersion => $1,
711 Status => $2, 831 Status => $2,
712 Reason => $3, 832 Reason => $3,
713 ; 833 ;
714 834
715 my $hdr = parse_hdr 835 my $hdr = _parse_hdr
716 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" })); 836 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Garbled response headers" };
717 837
718 %hdr = (%$hdr, @pseudo); 838 %hdr = (%$hdr, @pseudo);
719 } 839 }
720 840
721 # redirect handling 841 # redirect handling
752 $redirect = 1; 872 $redirect = 1;
753 } 873 }
754 } 874 }
755 875
756 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive]) 876 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
757 my $may_keep_alive = $_[3]; 877 if ($state{handle}) {
758 878 # handle keepalive
879 if (
880 $keepalive
881 && $_[3]
882 && ($hdr{HTTPVersion} < 1.1
883 ? $hdr{connection} =~ /\bkeep-?alive\b/i
884 : $hdr{connection} !~ /\bclose\b/i)
885 ) {
886 ka_store $ka_key, delete $state{handle};
887 } else {
888 # no keepalive, destroy the handle
759 $state{handle}->destroy if $state{handle}; 889 $state{handle}->destroy;
890 }
891 }
892
760 %state = (); 893 %state = ();
761 894
762 if (defined $_[1]) { 895 if (defined $_[1]) {
763 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1]; 896 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
764 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2]; 897 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
776 http_request ( 909 http_request (
777 $method => $hdr{location}, 910 $method => $hdr{location},
778 %arg, 911 %arg,
779 recurse => $recurse - 1, 912 recurse => $recurse - 1,
780 Redirect => [$_[0], \%hdr], 913 Redirect => [$_[0], \%hdr],
781 $cb); 914 $cb
915 );
782 } else { 916 } else {
783 $cb->($_[0], \%hdr); 917 $cb->($_[0], \%hdr);
784 } 918 }
785 }; 919 };
786 920
787 $ae_error = 597; # body phase 921 $ae_error = 597; # body phase
788 922
923 my $chunked = $hdr{"transfer-encoding"} =~ /\bchunked\b/i; # not quite correct...
924
789 my $len = $hdr{"content-length"}; 925 my $len = $chunked ? undef : $hdr{"content-length"};
790 926
791 # body handling, many different code paths 927 # body handling, many different code paths
792 # - no body expected 928 # - no body expected
793 # - want_body_handle 929 # - want_body_handle
794 # - te chunked 930 # - te chunked
809 $_[0]->on_error (undef); 945 $_[0]->on_error (undef);
810 $_[0]->on_read (undef); 946 $_[0]->on_read (undef);
811 947
812 $finish->(delete $state{handle}); 948 $finish->(delete $state{handle});
813 949
814 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) { 950 } elsif ($chunked) {
815 my $cl = 0; 951 my $cl = 0;
816 my $body = undef; 952 my $body = "";
817 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 953 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
818 954
819 $state{read_chunk} = sub { 955 $state{read_chunk} = sub {
820 $_[1] =~ /^([0-9a-fA-F]+)/ 956 $_[1] =~ /^([0-9a-fA-F]+)/
821 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 957 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
841 $_[0]->push_read (line => $qr_nlnl, sub { 977 $_[0]->push_read (line => $qr_nlnl, sub {
842 if (length $_[1]) { 978 if (length $_[1]) {
843 for ("$_[1]") { 979 for ("$_[1]") {
844 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 980 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
845 981
846 my $hdr = parse_hdr 982 my $hdr = _parse_hdr
847 or return $finish->(undef, $ae_error => "Garbled response trailers"); 983 or return $finish->(undef, $ae_error => "Garbled response trailers");
848 984
849 %hdr = (%hdr, %$hdr); 985 %hdr = (%hdr, %$hdr);
850 } 986 }
851 } 987 }
894 $_[0]->on_read (sub { }); 1030 $_[0]->on_read (sub { });
895 } 1031 }
896 } 1032 }
897 }; 1033 };
898 1034
1035 # if keepalive is enabled, then the server closing the connection
1036 # before a response can happen legally - we retry on idempotent methods.
1037 if ($keptalive && $idempotent) {
1038 my $old_eof = $hdl->{on_eof};
1039 $hdl->{on_eof} = sub {
1040 _destroy_state %state;
1041
1042 http_request (
1043 $method => $url,
1044 %arg,
1045 keepalive => 0,
1046 $cb
1047 );
1048 };
1049 $hdl->on_read (sub {
1050 return unless %state;
1051
1052 # as soon as we receive something, a connection close
1053 # once more becomes a hard error
1054 $hdl->{on_eof} = $old_eof;
1055 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1056 });
1057 } else {
899 $state{handle}->push_read (line => $qr_nlnl, $state{read_response}); 1058 $hdl->push_read (line => $qr_nlnl, $state{read_response});
1059 }
900 }; 1060 };
901 1061
1062 my $prepare_handle = sub {
1063 my ($hdl) = $state{handle};
1064
1065 $hdl->timeout ($timeout);
1066 $hdl->on_error (sub {
1067 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => $_[2] };
1068 });
1069 $hdl->on_eof (sub {
1070 _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" };
1071 });
1072 };
1073
1074 # connected to proxy (or origin server)
902 my $connect_cb = sub { 1075 my $connect_cb = sub {
903 $state{fh} = shift 1076 my $fh = shift
904 or do {
905 my $err = "$!";
906 %state = ();
907 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err }); 1077 or return _error %state, $cb, { @pseudo, Status => $ae_error, Reason => "$!" };
908 };
909 1078
910 return unless delete $state{connect_guard}; 1079 return unless delete $state{connect_guard};
911 1080
912 # get handle 1081 # get handle
913 $state{handle} = new AnyEvent::Handle 1082 $state{handle} = new AnyEvent::Handle
1083 %{ $arg{handle_params} },
914 fh => $state{fh}, 1084 fh => $fh,
915 peername => $rhost, 1085 peername => $uhost,
916 tls_ctx => $arg{tls_ctx}, 1086 tls_ctx => $arg{tls_ctx},
917 # these need to be reconfigured on keepalive handles
918 timeout => $timeout,
919 on_error => sub {
920 %state = ();
921 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
922 },
923 on_eof => sub {
924 %state = ();
925 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
926 },
927 ; 1087 ;
928 1088
929 # limit the number of persistent connections 1089 $prepare_handle->();
930 # keepalive not yet supported
931# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
932# ++$KA_COUNT{$_[1]};
933# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
934# --$KA_COUNT{$_[1]}
935# };
936# $hdr{connection} = "keep-alive";
937# }
938 1090
939 $state{handle}->starttls ("connect") if $rscheme eq "https"; 1091 #$state{handle}->starttls ("connect") if $rscheme eq "https";
940 1092
941 # now handle proxy-CONNECT method 1093 # now handle proxy-CONNECT method
942 if ($proxy && $uscheme eq "https") { 1094 if ($proxy && $uscheme eq "https") {
943 # oh dear, we have to wrap it into a connect request 1095 # oh dear, we have to wrap it into a connect request
944 1096
945 # maybe re-use $uauthority with patched port? 1097 # maybe re-use $uauthority with patched port?
946 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012"); 1098 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012");
947 $state{handle}->push_read (line => $qr_nlnl, sub { 1099 $state{handle}->push_read (line => $qr_nlnl, sub {
948 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 1100 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
949 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" })); 1101 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" };
950 1102
951 if ($2 == 200) { 1103 if ($2 == 200) {
952 $rpath = $upath; 1104 $rpath = $upath;
953 $handle_actual_request->(); 1105 $handle_actual_request->();
954 } else { 1106 } else {
955 %state = ();
956 $cb->(undef, { @pseudo, Status => $2, Reason => $3 }); 1107 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 };
957 } 1108 }
958 }); 1109 });
959 } else { 1110 } else {
960 $handle_actual_request->(); 1111 $handle_actual_request->();
961 } 1112 }
964 _get_slot $uhost, sub { 1115 _get_slot $uhost, sub {
965 $state{slot_guard} = shift; 1116 $state{slot_guard} = shift;
966 1117
967 return unless $state{connect_guard}; 1118 return unless $state{connect_guard};
968 1119
1120 # try to use an existing keepalive connection, but only if we, ourselves, plan
1121 # on a keepalive request (in theory, this should be a separate config option).
1122 if ($keepalive && $KA_CACHE{$ka_key}) {
1123 $keptalive = 1;
1124 $state{handle} = ka_fetch $ka_key;
1125 $prepare_handle->();
1126 $handle_actual_request->();
1127
1128 } else {
969 my $tcp_connect = $arg{tcp_connect} 1129 my $tcp_connect = $arg{tcp_connect}
970 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1130 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
971 1131
972 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout }); 1132 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
1133 }
973 }; 1134 };
974 1135
975 defined wantarray && AnyEvent::Util::guard { %state = () } 1136 defined wantarray && AnyEvent::Util::guard { _destroy_state %state }
976} 1137}
977 1138
978sub http_get($@) { 1139sub http_get($@) {
979 unshift @_, "GET"; 1140 unshift @_, "GET";
980 &http_request 1141 &http_request
998AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for 1159AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for
999the actual connection, which in turn uses AnyEvent::DNS to resolve 1160the actual connection, which in turn uses AnyEvent::DNS to resolve
1000hostnames. The latter is a simple stub resolver and does no caching 1161hostnames. The latter is a simple stub resolver and does no caching
1001on its own. If you want DNS caching, you currently have to provide 1162on its own. If you want DNS caching, you currently have to provide
1002your own default resolver (by storing a suitable resolver object in 1163your own default resolver (by storing a suitable resolver object in
1003C<$AnyEvent::DNS::RESOLVER>). 1164C<$AnyEvent::DNS::RESOLVER>) or your own C<tcp_connect> callback.
1004 1165
1005=head2 GLOBAL FUNCTIONS AND VARIABLES 1166=head2 GLOBAL FUNCTIONS AND VARIABLES
1006 1167
1007=over 4 1168=over 4
1008 1169
1009=item AnyEvent::HTTP::set_proxy "proxy-url" 1170=item AnyEvent::HTTP::set_proxy "proxy-url"
1010 1171
1011Sets the default proxy server to use. The proxy-url must begin with a 1172Sets the default proxy server to use. The proxy-url must begin with a
1012string of the form C<http://host:port> (optionally C<https:...>), croaks 1173string of the form C<http://host:port>, croaks otherwise.
1013otherwise.
1014 1174
1015To clear an already-set proxy, use C<undef>. 1175To clear an already-set proxy, use C<undef>.
1016 1176
1017=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1177=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1018 1178
1066 1226
1067=item $AnyEvent::HTTP::MAX_RECURSE 1227=item $AnyEvent::HTTP::MAX_RECURSE
1068 1228
1069The default value for the C<recurse> request parameter (default: C<10>). 1229The default value for the C<recurse> request parameter (default: C<10>).
1070 1230
1231=item $AnyEvent::HTTP::TIMEOUT
1232
1233The default timeout for conenction operations (default: C<300>).
1234
1071=item $AnyEvent::HTTP::USERAGENT 1235=item $AnyEvent::HTTP::USERAGENT
1072 1236
1073The default value for the C<User-Agent> header (the default is 1237The default value for the C<User-Agent> header (the default is
1074C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1238C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1075 1239
1076=item $AnyEvent::HTTP::MAX_PER_HOST 1240=item $AnyEvent::HTTP::MAX_PER_HOST
1077 1241
1078The maximum number of concurrent connections to the same host (identified 1242The maximum number of concurrent connections to the same host (identified
1079by the hostname). If the limit is exceeded, then the additional requests 1243by the hostname). If the limit is exceeded, then the additional requests
1080are queued until previous connections are closed. 1244are queued until previous connections are closed. Both persistent and
1245non-persistent connections are counted in this limit.
1081 1246
1082The default value for this is C<4>, and it is highly advisable to not 1247The default value for this is C<4>, and it is highly advisable to not
1083increase it. 1248increase it much.
1249
1250For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1251connections, older browsers used 2, newers (such as firefox 3) typically
1252use 6, and Opera uses 8 because like, they have the fastest browser and
1253give a shit for everybody else on the planet.
1254
1255=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1256
1257The time after which idle persistent conenctions get closed by
1258AnyEvent::HTTP (default: C<3>).
1084 1259
1085=item $AnyEvent::HTTP::ACTIVE 1260=item $AnyEvent::HTTP::ACTIVE
1086 1261
1087The number of active connections. This is not the number of currently 1262The number of active connections. This is not the number of currently
1088running requests, but the number of currently open and non-idle TCP 1263running requests, but the number of currently open and non-idle TCP
1089connections. This number of can be useful for load-leveling. 1264connections. This number can be useful for load-leveling.
1090 1265
1091=back 1266=back
1092 1267
1093=cut 1268=cut
1094 1269
1137 undef 1312 undef
1138} 1313}
1139 1314
1140sub set_proxy($) { 1315sub set_proxy($) {
1141 if (length $_[0]) { 1316 if (length $_[0]) {
1142 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix 1317 $_[0] =~ m%^(http):// ([^:/]+) (?: : (\d*) )?%ix
1143 or Carp::croak "$_[0]: invalid proxy URL"; 1318 or Carp::croak "$_[0]: invalid proxy URL";
1144 $PROXY = [$2, $3 || 3128, $1] 1319 $PROXY = [$2, $3 || 3128, $1]
1145 } else { 1320 } else {
1146 undef $PROXY; 1321 undef $PROXY;
1147 } 1322 }
1150# initialise proxy from environment 1325# initialise proxy from environment
1151eval { 1326eval {
1152 set_proxy $ENV{http_proxy}; 1327 set_proxy $ENV{http_proxy};
1153}; 1328};
1154 1329
1330=head2 SHOWCASE
1331
1332This section contaisn some more elaborate "real-world" examples or code
1333snippets.
1334
1335=head2 HTTP/1.1 FILE DOWNLOAD
1336
1337Downloading files with HTTP can be quite tricky, especially when something
1338goes wrong and you want to resume.
1339
1340Here is a function that initiates and resumes a download. It uses the
1341last modified time to check for file content changes, and works with many
1342HTTP/1.0 servers as well, and usually falls back to a complete re-download
1343on older servers.
1344
1345It calls the completion callback with either C<undef>, which means a
1346nonretryable error occured, C<0> when the download was partial and should
1347be retried, and C<1> if it was successful.
1348
1349 use AnyEvent::HTTP;
1350
1351 sub download($$$) {
1352 my ($url, $file, $cb) = @_;
1353
1354 open my $fh, "+<", $file
1355 or die "$file: $!";
1356
1357 my %hdr;
1358 my $ofs = 0;
1359
1360 warn stat $fh;
1361 warn -s _;
1362 if (stat $fh and -s _) {
1363 $ofs = -s _;
1364 warn "-s is ", $ofs;#d#
1365 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1366 $hdr{"range"} = "bytes=$ofs-";
1367 }
1368
1369 http_get $url,
1370 headers => \%hdr,
1371 on_header => sub {
1372 my ($hdr) = @_;
1373
1374 if ($hdr->{Status} == 200 && $ofs) {
1375 # resume failed
1376 truncate $fh, $ofs = 0;
1377 }
1378
1379 sysseek $fh, $ofs, 0;
1380
1381 1
1382 },
1383 on_body => sub {
1384 my ($data, $hdr) = @_;
1385
1386 if ($hdr->{Status} =~ /^2/) {
1387 length $data == syswrite $fh, $data
1388 or return; # abort on write errors
1389 }
1390
1391 1
1392 },
1393 sub {
1394 my (undef, $hdr) = @_;
1395
1396 my $status = $hdr->{Status};
1397
1398 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1399 utime $fh, $time, $time;
1400 }
1401
1402 if ($status == 200 || $status == 206 || $status == 416) {
1403 # download ok || resume ok || file already fully downloaded
1404 $cb->(1, $hdr);
1405
1406 } elsif ($status == 412) {
1407 # file has changed while resuming, delete and retry
1408 unlink $file;
1409 $cb->(0, $hdr);
1410
1411 } elsif ($status == 500 or $status == 503 or $status =~ /^59/) {
1412 # retry later
1413 $cb->(0, $hdr);
1414
1415 } else {
1416 $cb->(undef, $hdr);
1417 }
1418 }
1419 ;
1420 }
1421
1422 download "http://server/somelargefile", "/tmp/somelargefile", sub {
1423 if ($_[0]) {
1424 print "OK!\n";
1425 } elsif (defined $_[0]) {
1426 print "please retry later\n";
1427 } else {
1428 print "ERROR\n";
1429 }
1430 };
1431
1155=head2 SOCKS PROXIES 1432=head3 SOCKS PROXIES
1156 1433
1157Socks proxies are not directly supported by AnyEvent::HTTP. You can 1434Socks proxies are not directly supported by AnyEvent::HTTP. You can
1158compile your perl to support socks, or use an external program such as 1435compile your perl to support socks, or use an external program such as
1159F<socksify> (dante) or F<tsocks> to make your program use a socks proxy 1436F<socksify> (dante) or F<tsocks> to make your program use a socks proxy
1160transparently. 1437transparently.

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines