ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.55 by root, Wed Jun 16 19:17:30 2010 UTC vs.
Revision 1.65 by root, Fri Dec 31 03:47:32 2010 UTC

43 43
44use Errno (); 44use Errno ();
45 45
46use AnyEvent 5.0 (); 46use AnyEvent 5.0 ();
47use AnyEvent::Util (); 47use AnyEvent::Util ();
48use AnyEvent::Socket ();
49use AnyEvent::Handle (); 48use AnyEvent::Handle ();
50 49
51use base Exporter::; 50use base Exporter::;
52 51
53our $VERSION = '1.45'; 52our $VERSION = '1.5';
54 53
55our @EXPORT = qw(http_get http_post http_head http_request); 54our @EXPORT = qw(http_get http_post http_head http_request);
56 55
57our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 56our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58our $MAX_RECURSE = 10; 57our $MAX_RECURSE = 10;
92must be an absolute http or https URL. 91must be an absolute http or https URL.
93 92
94When called in void context, nothing is returned. In other contexts, 93When called in void context, nothing is returned. In other contexts,
95C<http_request> returns a "cancellation guard" - you have to keep the 94C<http_request> returns a "cancellation guard" - you have to keep the
96object at least alive until the callback get called. If the object gets 95object at least alive until the callback get called. If the object gets
97destroyed before the callbakc is called, the request will be cancelled. 96destroyed before the callback is called, the request will be cancelled.
98 97
99The callback will be called with the response body data as first argument 98The callback will be called with the response body data as first argument
100(or C<undef> if an error occured), and a hash-ref with response headers as 99(or C<undef> if an error occured), and a hash-ref with response headers as
101second argument. 100second argument.
102 101
103All the headers in that hash are lowercased. In addition to the response 102All the headers in that hash are lowercased. In addition to the response
104headers, the "pseudo-headers" (uppercase to avoid clashing with possible 103headers, the "pseudo-headers" (uppercase to avoid clashing with possible
105response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 104response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
106three parts of the HTTP Status-Line of the same name. 105three parts of the HTTP Status-Line of the same name. If an error occurs
106during the body phase of a request, then the original C<Status> and
107C<Reason> values from the header are available as C<OrigStatus> and
108C<OrigReason>.
107 109
108The pseudo-header C<URL> contains the actual URL (which can differ from 110The pseudo-header C<URL> contains the actual URL (which can differ from
109the requested URL when following redirects - for example, you might get 111the requested URL when following redirects - for example, you might get
110an error that your URL scheme is not supported even though your URL is a 112an error that your URL scheme is not supported even though your URL is a
111valid http URL because it redirected to an ftp URL, in which case you can 113valid http URL because it redirected to an ftp URL, in which case you can
211connect (for exmaple, to bind it on a given IP address). This parameter 213connect (for exmaple, to bind it on a given IP address). This parameter
212overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 214overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
213and behaves exactly the same way (e.g. it has to provide a 215and behaves exactly the same way (e.g. it has to provide a
214timeout). See the description for the C<$prepare_cb> argument of 216timeout). See the description for the C<$prepare_cb> argument of
215C<AnyEvent::Socket::tcp_connect> for details. 217C<AnyEvent::Socket::tcp_connect> for details.
218
219=item tcp_connect => $callback->($host, $service, $connect_cb, $prepare_cb)
220
221In even rarer cases you want total control over how AnyEvent::HTTP
222establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
223to do this, but you can provide your own C<tcp_connect> function -
224obviously, it has to follow the same calling conventions, except that it
225may always return a connection guard object.
226
227There are probably lots of weird uses for this function, starting from
228tracing the hosts C<http_request> actually tries to connect, to (inexact
229but fast) host => IP address caching or even socks protocol support.
216 230
217=item on_header => $callback->($headers) 231=item on_header => $callback->($headers)
218 232
219When specified, this callback will be called with the header hash as soon 233When specified, this callback will be called with the header hash as soon
220as headers have been successfully received from the remote server (not on 234as headers have been successfully received from the remote server (not on
338 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 352 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
339 353
340 _slot_schedule $_[0]; 354 _slot_schedule $_[0];
341} 355}
342 356
343our $qr_nl = qr{\015?\012};
344our $qr_nlnl = qr{(?<![^\012])\015?\012}; 357our $qr_nlnl = qr{(?<![^\012])\015?\012};
345 358
346our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 359our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
347our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 360our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
348 361
367 my @pseudo = (URL => $url); 380 my @pseudo = (URL => $url);
368 push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect}; 381 push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect};
369 382
370 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE; 383 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
371 384
372 return $cb->(undef, { Status => 599, Reason => "Too many redirections", @pseudo }) 385 return $cb->(undef, { @pseudo, Status => 599, Reason => "Too many redirections" })
373 if $recurse < 0; 386 if $recurse < 0;
374 387
375 my $proxy = $arg{proxy} || $PROXY; 388 my $proxy = $arg{proxy} || $PROXY;
376 my $timeout = $arg{timeout} || $TIMEOUT; 389 my $timeout = $arg{timeout} || $TIMEOUT;
377 390
378 my ($uscheme, $uauthority, $upath, $query, $fragment) = 391 my ($uscheme, $uauthority, $upath, $query, $fragment) =
379 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|; 392 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|;
380 393
381 $uscheme = lc $uscheme; 394 $uscheme = lc $uscheme;
382 395
383 my $uport = $uscheme eq "http" ? 80 396 my $uport = $uscheme eq "http" ? 80
384 : $uscheme eq "https" ? 443 397 : $uscheme eq "https" ? 443
385 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", @pseudo }); 398 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
386 399
387 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 400 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
388 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", @pseudo }); 401 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
389 402
390 my $uhost = $1; 403 my $uhost = $1;
391 $uport = $2 if defined $2; 404 $uport = $2 if defined $2;
392 405
393 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost" 406 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
394 unless exists $hdr{host}; 407 unless exists $hdr{host};
395 408
396 $uhost =~ s/^\[(.*)\]$/$1/; 409 $uhost =~ s/^\[(.*)\]$/$1/;
397 $upath .= "?$query" if length $query; 410 $upath .= $query if length $query;
398 411
399 $upath =~ s%^/?%/%; 412 $upath =~ s%^/?%/%;
400 413
401 # cookie processing 414 # cookie processing
402 if (my $jar = $arg{cookie_jar}) { 415 if (my $jar = $arg{cookie_jar}) {
455 _get_slot $uhost, sub { 468 _get_slot $uhost, sub {
456 $state{slot_guard} = shift; 469 $state{slot_guard} = shift;
457 470
458 return unless $state{connect_guard}; 471 return unless $state{connect_guard};
459 472
460 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub { 473 my $connect_cb = sub {
461 $state{fh} = shift 474 $state{fh} = shift
462 or do { 475 or do {
463 my $err = "$!"; 476 my $err = "$!";
464 %state = (); 477 %state = ();
465 return $cb->(undef, { Status => 599, Reason => $err, @pseudo }); 478 return $cb->(undef, { @pseudo, Status => 599, Reason => $err });
466 }; 479 };
467 480
468 pop; # free memory, save a tree 481 pop; # free memory, save a tree
469 482
470 return unless delete $state{connect_guard}; 483 return unless delete $state{connect_guard};
471 484
472 # get handle 485 # get handle
473 $state{handle} = new AnyEvent::Handle 486 $state{handle} = new AnyEvent::Handle
474 fh => $state{fh}, 487 fh => $state{fh},
488 peername => $rhost,
489 tls_ctx => $arg{tls_ctx},
490 # these need to be reconfigured on keepalive handles
475 timeout => $timeout, 491 timeout => $timeout,
476 peername => $rhost, 492 on_error => sub {
477 tls_ctx => $arg{tls_ctx}; 493 %state = ();
494 $cb->(undef, { @pseudo, Status => 599, Reason => $_[2] });
495 },
496 on_eof => sub {
497 %state = ();
498 $cb->(undef, { @pseudo, Status => 599, Reason => "Unexpected end-of-file" });
499 },
500 ;
478 501
479 # limit the number of persistent connections 502 # limit the number of persistent connections
480 # keepalive not yet supported 503 # keepalive not yet supported
481 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 504# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
482 ++$KA_COUNT{$_[1]}; 505# ++$KA_COUNT{$_[1]};
483 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { 506# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
484 --$KA_COUNT{$_[1]} 507# --$KA_COUNT{$_[1]}
485 }; 508# };
486 $hdr{connection} = "keep-alive"; 509# $hdr{connection} = "keep-alive";
487 } else { 510# } else {
488 delete $hdr{connection}; 511 delete $hdr{connection};
489 } 512# }
490
491 # (re-)configure handle
492 $state{handle}->on_error (sub {
493 %state = ();
494 $cb->(undef, { Status => 599, Reason => $_[2], @pseudo });
495 });
496 $state{handle}->on_eof (sub {
497 %state = ();
498 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", @pseudo });
499 });
500 513
501 $state{handle}->starttls ("connect") if $rscheme eq "https"; 514 $state{handle}->starttls ("connect") if $rscheme eq "https";
502 515
503 # handle actual, non-tunneled, request 516 # handle actual, non-tunneled, request
504 my $handle_actual_request = sub { 517 my $handle_actual_request = sub {
515 # return if error occured during push_write() 528 # return if error occured during push_write()
516 return unless %state; 529 return unless %state;
517 530
518 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use 531 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
519 532
520 # status line 533 # status line and headers
521 $state{handle}->push_read (line => $qr_nl, sub { 534 $state{handle}->push_read (line => $qr_nlnl, sub {
522 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 535 my $keepalive = pop;
523 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", @pseudo }));
524 536
525 push @pseudo,
526 HTTPVersion => $1,
527 Status => $2,
528 Reason => $3,
529 ;
530
531 # headers, could be optimized a bit
532 $state{handle}->unshift_read (line => $qr_nlnl, sub {
533 for ("$_[1]") { 537 for ("$_[1]") {
534 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 538 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
535 539
540 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )? \015?\012/igxc
541 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
542
543 push @pseudo,
544 HTTPVersion => $1,
545 Status => $2,
546 Reason => $3,
547 ;
548
536 # things seen, not parsed: 549 # things seen, not parsed:
537 # p3pP="NON CUR OTPi OUR NOR UNI" 550 # p3pP="NON CUR OTPi OUR NOR UNI"
538 551
539 $hdr{lc $1} .= ",$2" 552 $hdr{lc $1} .= ",$2"
540 while /\G 553 while /\G
541 ([^:\000-\037]*): 554 ([^:\000-\037]*):
542 [\011\040]* 555 [\011\040]*
543 ((?: [^\012]+ | \012[\011\040] )*) 556 ((?: [^\012]+ | \012[\011\040] )*)
544 \012 557 \012
545 /gxc; 558 /gxc;
546 559
547 /\G$/ 560 /\G$/
548 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", @pseudo })); 561 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
562 }
563
564 # remove the "," prefix we added to all headers above
565 substr $_, 0, 1, ""
566 for values %hdr;
567
568 # patch in all pseudo headers
569 %hdr = (%hdr, @pseudo);
570
571 # redirect handling
572 # microsoft and other shitheads don't give a shit for following standards,
573 # try to support some common forms of broken Location headers.
574 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
575 $hdr{location} =~ s/^\.\/+//;
576
577 my $url = "$rscheme://$uhost:$uport";
578
579 unless ($hdr{location} =~ s/^\///) {
580 $url .= $upath;
581 $url =~ s/\/[^\/]*$//;
549 } 582 }
550 583
551 # remove the "," prefix we added to all headers above
552 substr $_, 0, 1, ""
553 for values %hdr;
554
555 # patch in all pseudo headers
556 %hdr = (%hdr, @pseudo);
557
558 # redirect handling
559 # microsoft and other shitheads don't give a shit for following standards,
560 # try to support some common forms of broken Location headers.
561 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
562 $hdr{location} =~ s/^\.\/+//;
563
564 my $url = "$rscheme://$uhost:$uport";
565
566 unless ($hdr{location} =~ s/^\///) {
567 $url .= $upath;
568 $url =~ s/\/[^\/]*$//;
569 }
570
571 $hdr{location} = "$url/$hdr{location}"; 584 $hdr{location} = "$url/$hdr{location}";
585 }
586
587 my $redirect;
588
589 if ($recurse) {
590 my $status = $hdr{Status};
591
592 # industry standard is to redirect POST as GET for
593 # 301, 302 and 303, in contrast to http/1.0 and 1.1.
594 # also, the UA should ask the user for 301 and 307 and POST,
595 # industry standard seems to be to simply follow.
596 # we go with the industry standard.
597 if ($status == 301 or $status == 302 or $status == 303) {
598 # HTTP/1.1 is unclear on how to mutate the method
599 $method = "GET" unless $method eq "HEAD";
600 $redirect = 1;
601 } elsif ($status == 307) {
602 $redirect = 1;
572 } 603 }
604 }
573 605
574 my $redirect; 606 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
607 $state{handle}->destroy if $state{handle};
608 %state = ();
575 609
610 if (defined $_[1]) {
611 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
612 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
613 }
614
615 # set-cookie processing
576 if ($recurse) { 616 if ($arg{cookie_jar}) {
577 my $status = $hdr{Status}; 617 for ($hdr{"set-cookie"}) {
618 # parse NAME=VALUE
619 my @kv;
578 620
579 if (($status == 301 || $status == 302) && $method ne "POST") { 621 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
580 # apparently, mozilla et al. just change POST to GET here
581 # more research is needed before we do the same
582 $redirect = 1; 622 my $name = $1;
583 } elsif ($status == 303) { 623 my $value = $3;
584 # even http/1.1 is unclear on how to mutate the method 624
585 $method = "GET" unless $method eq "HEAD"; 625 unless ($value) {
626 $value = $2;
627 $value =~ s/\\(.)/$1/gs;
628 }
629
630 push @kv, $name => $value;
631
632 last unless /\G\s*;/gc;
633 }
634
635 last unless @kv;
636
637 my $name = shift @kv;
638 my %kv = (value => shift @kv, @kv);
639
640 my $cdom;
641 my $cpath = (delete $kv{path}) || "/";
642
643 if (exists $kv{domain}) {
644 $cdom = delete $kv{domain};
645
646 $cdom =~ s/^\.?/./; # make sure it starts with a "."
647
648 next if $cdom =~ /\.$/;
649
650 # this is not rfc-like and not netscape-like. go figure.
651 my $ndots = $cdom =~ y/.//;
652 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
653 } else {
654 $cdom = $uhost;
655 }
656
586 $redirect = 1; 657 # store it
587 } elsif ($status == 307 && $method =~ /^(?:GET|HEAD)$/) { 658 $arg{cookie_jar}{version} = 1;
588 $redirect = 1; 659 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
660
661 redo if /\G\s*,/gc;
589 } 662 }
590 } 663 }
591 664
592 my $finish = sub { 665 if ($redirect && exists $hdr{location}) {
593 $state{handle}->destroy if $state{handle}; 666 # we ignore any errors, as it is very common to receive
594 %state = (); 667 # Content-Length != 0 but no actual body
595 668 # we also access %hdr, as $_[1] might be an erro
596 # set-cookie processing 669 http_request (
597 if ($arg{cookie_jar}) { 670 $method => $hdr{location},
598 for ($_[1]{"set-cookie"}) {
599 # parse NAME=VALUE
600 my @kv;
601
602 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
603 my $name = $1;
604 my $value = $3;
605
606 unless ($value) {
607 $value = $2;
608 $value =~ s/\\(.)/$1/gs;
609 } 671 %arg,
672 recurse => $recurse - 1,
673 Redirect => [$_[0], \%hdr],
674 $cb);
675 } else {
676 $cb->($_[0], \%hdr);
677 }
678 };
610 679
611 push @kv, $name => $value; 680 my $len = $hdr{"content-length"};
612 681
613 last unless /\G\s*;/gc; 682 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
683 $finish->(undef, 598 => "Request cancelled by on_header");
684 } elsif (
685 $hdr{Status} =~ /^(?:1..|204|205|304)$/
686 or $method eq "HEAD"
687 or (defined $len && !$len)
688 ) {
689 # no body
690 $finish->("", undef, undef, 1);
691 } else {
692 # body handling, four different code paths
693 # for want_body_handle, on_body (2x), normal (2x)
694 # we might read too much here, but it does not matter yet (no pipelining)
695 if (!$redirect && $arg{want_body_handle}) {
696 $_[0]->on_eof (undef);
697 $_[0]->on_error (undef);
698 $_[0]->on_read (undef);
699
700 $finish->(delete $state{handle});
701
702 } elsif ($arg{on_body}) {
703 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
704 if ($len) {
705 $_[0]->on_read (sub {
706 $len -= length $_[0]{rbuf};
707
708 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
709 or $finish->(undef, 598 => "Request cancelled by on_body");
710
614 } 711 $len > 0
615 712 or $finish->("", undef, undef, 1);
616 last unless @kv;
617
618 my $name = shift @kv;
619 my %kv = (value => shift @kv, @kv);
620
621 my $cdom;
622 my $cpath = (delete $kv{path}) || "/";
623
624 if (exists $kv{domain}) {
625 $cdom = delete $kv{domain};
626
627 $cdom =~ s/^\.?/./; # make sure it starts with a "."
628
629 next if $cdom =~ /\.$/;
630
631 # this is not rfc-like and not netscape-like. go figure.
632 my $ndots = $cdom =~ y/.//;
633 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
634 } else {
635 $cdom = $uhost;
636 }
637
638 # store it
639 $arg{cookie_jar}{version} = 1;
640 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
641
642 redo if /\G\s*,/gc;
643 } 713 });
714 } else {
715 $_[0]->on_eof (sub {
716 $finish->("");
717 });
718 $_[0]->on_read (sub {
719 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
720 or $finish->(undef, 598 => "Request cancelled by on_body");
721 });
644 } 722 }
723 } else {
724 $_[0]->on_eof (undef);
645 725
646 if ($redirect && exists $hdr{location}) { 726 if ($len) {
647 # we ignore any errors, as it is very common to receive 727 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
648 # Content-Length != 0 but no actual body 728 $_[0]->on_read (sub {
649 # we also access %hdr, as $_[1] might be an erro 729 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
650 http_request ( 730 if $len <= length $_[0]{rbuf};
651 $method => $hdr{location},
652 %arg,
653 recurse => $recurse - 1,
654 Redirect => \@_,
655 $cb); 731 });
656 } else { 732 } else {
657 $cb->($_[0], $_[1]);
658 }
659 };
660
661 my $len = $hdr{"content-length"};
662
663 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
664 $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", @pseudo });
665 } elsif (
666 $hdr{Status} =~ /^(?:1..|[23]04)$/
667 or $method eq "HEAD"
668 or (defined $len && !$len)
669 ) {
670 # no body
671 $finish->("", \%hdr);
672 } else {
673 # body handling, four different code paths
674 # for want_body_handle, on_body (2x), normal (2x)
675 # we might read too much here, but it does not matter yet (no pers. connections)
676 if (!$redirect && $arg{want_body_handle}) {
677 $_[0]->on_eof (undef);
678 $_[0]->on_error (undef);
679 $_[0]->on_read (undef);
680
681 $finish->(delete $state{handle}, \%hdr);
682
683 } elsif ($arg{on_body}) {
684 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) });
685 if ($len) {
686 $_[0]->on_eof (undef);
687 $_[0]->on_read (sub {
688 $len -= length $_[0]{rbuf};
689
690 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
691 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo });
692
693 $len > 0
694 or $finish->("", \%hdr);
695 });
696 } else {
697 $_[0]->on_eof (sub {
698 $finish->("", \%hdr);
699 });
700 $_[0]->on_read (sub {
701 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
702 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo });
703 });
704 }
705 } else {
706 $_[0]->on_eof (undef);
707
708 if ($len) {
709 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) });
710 $_[0]->on_read (sub {
711 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
712 if $len <= length $_[0]{rbuf};
713 });
714 } else {
715 $_[0]->on_error (sub { 733 $_[0]->on_error (sub {
716 $! == Errno::EPIPE || !$! 734 ($! == Errno::EPIPE || !$!)
717 ? $finish->(delete $_[0]{rbuf}, \%hdr) 735 ? $finish->(delete $_[0]{rbuf})
718 : $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }); 736 : $finish->(undef, 599 => $_[2]);
719 }); 737 });
720 $_[0]->on_read (sub { }); 738 $_[0]->on_read (sub { });
721 }
722 } 739 }
723 } 740 }
724 }); 741 }
725 }); 742 });
726 }; 743 };
727 744
728 # now handle proxy-CONNECT method 745 # now handle proxy-CONNECT method
729 if ($proxy && $uscheme eq "https") { 746 if ($proxy && $uscheme eq "https") {
731 748
732 # maybe re-use $uauthority with patched port? 749 # maybe re-use $uauthority with patched port?
733 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012"); 750 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
734 $state{handle}->push_read (line => $qr_nlnl, sub { 751 $state{handle}->push_read (line => $qr_nlnl, sub {
735 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 752 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
736 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", @pseudo })); 753 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }));
737 754
738 if ($2 == 200) { 755 if ($2 == 200) {
739 $rpath = $upath; 756 $rpath = $upath;
740 &$handle_actual_request; 757 &$handle_actual_request;
741 } else { 758 } else {
742 %state = (); 759 %state = ();
743 $cb->(undef, { Status => $2, Reason => $3, @pseudo }); 760 $cb->(undef, { @pseudo, Status => $2, Reason => $3 });
744 } 761 }
745 }); 762 });
746 } else { 763 } else {
747 &$handle_actual_request; 764 &$handle_actual_request;
748 } 765 }
766 };
749 767
750 }, $arg{on_prepare} || sub { $timeout }; 768 my $tcp_connect = $arg{tcp_connect}
769 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
770
771 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
772
751 }; 773 };
752 774
753 defined wantarray && AnyEvent::Util::guard { %state = () } 775 defined wantarray && AnyEvent::Util::guard { %state = () }
754} 776}
755 777
790string of the form C<http://host:port> (optionally C<https:...>), croaks 812string of the form C<http://host:port> (optionally C<https:...>), croaks
791otherwise. 813otherwise.
792 814
793To clear an already-set proxy, use C<undef>. 815To clear an already-set proxy, use C<undef>.
794 816
817=item $date = AnyEvent::HTTP::format_date $timestamp
818
819Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
820Date (RFC 2616).
821
822=item $timestamp = AnyEvent::HTTP::parse_date $date
823
824Takes a HTTP Date (RFC 2616) and returns the corresponding POSIX
825timestamp, or C<undef> if the date cannot be parsed.
826
795=item $AnyEvent::HTTP::MAX_RECURSE 827=item $AnyEvent::HTTP::MAX_RECURSE
796 828
797The default value for the C<recurse> request parameter (default: C<10>). 829The default value for the C<recurse> request parameter (default: C<10>).
798 830
799=item $AnyEvent::HTTP::USERAGENT 831=item $AnyEvent::HTTP::USERAGENT
817connections. This number of can be useful for load-leveling. 849connections. This number of can be useful for load-leveling.
818 850
819=back 851=back
820 852
821=cut 853=cut
854
855our @month = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
856our @weekday = qw(Sun Mon Tue Wed Thu Fri Sat);
857
858sub format_date($) {
859 my ($time) = @_;
860
861 # RFC 822/1123 format
862 my ($S, $M, $H, $mday, $mon, $year, $wday, $yday, undef) = gmtime $time;
863
864 sprintf "%s, %02d %s %04d %02d:%02d:%02d GMT",
865 $weekday[$wday], $mday, $month[$mon], $year + 1900,
866 $H, $M, $S;
867}
868
869sub parse_date($) {
870 my ($date) = @_;
871
872 my ($d, $m, $y, $H, $M, $S);
873
874 if ($date =~ /^[A-Z][a-z][a-z], ([0-9][0-9]) ([A-Z][a-z][a-z]) ([0-9][0-9][0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
875 # RFC 822/1123, required by RFC 2616
876 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6);
877
878 } elsif ($date =~ /^[A-Z][a-z]+, ([0-9][0-9])-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
879 # RFC 850
880 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6);
881
882 } elsif ($date =~ /^[A-Z][a-z][a-z] ([A-Z][a-z][a-z]) ([0-9 ][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) ([0-9][0-9][0-9][0-9])$/) {
883 # ISO C's asctime
884 ($d, $m, $y, $H, $M, $S) = ($2, $1, $6, $3, $4, $5);
885 }
886 # other formats fail in the loop below
887
888 for (0..11) {
889 if ($m eq $month[$_]) {
890 require Time::Local;
891 return Time::Local::timegm ($S, $M, $H, $d, $_, $y);
892 }
893 }
894
895 undef
896}
822 897
823sub set_proxy($) { 898sub set_proxy($) {
824 if (length $_[0]) { 899 if (length $_[0]) {
825 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix 900 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix
826 or Carp::croak "$_[0]: invalid proxy URL"; 901 or Carp::croak "$_[0]: invalid proxy URL";
833# initialise proxy from environment 908# initialise proxy from environment
834eval { 909eval {
835 set_proxy $ENV{http_proxy}; 910 set_proxy $ENV{http_proxy};
836}; 911};
837 912
913=head2 SOCKS PROXIES
914
915Socks proxies are not directly supported by AnyEvent::HTTP. You can
916compile your perl to support socks, or use an external program such as
917F<socksify> (dante) or F<tsocks> to make your program use a socks proxy
918transparently.
919
920Alternatively, for AnyEvent::HTTP only, you can use your own
921C<tcp_connect> function that does the proxy handshake - here is an example
922that works with socks4a proxies:
923
924 use Errno;
925 use AnyEvent::Util;
926 use AnyEvent::Socket;
927 use AnyEvent::Handle;
928
929 # host, port and username of/for your socks4a proxy
930 my $socks_host = "10.0.0.23";
931 my $socks_port = 9050;
932 my $socks_user = "";
933
934 sub socks4a_connect {
935 my ($host, $port, $connect_cb, $prepare_cb) = @_;
936
937 my $hdl = new AnyEvent::Handle
938 connect => [$socks_host, $socks_port],
939 on_prepare => sub { $prepare_cb->($_[0]{fh}) },
940 on_error => sub { $connect_cb->() },
941 ;
942
943 $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
944
945 $hdl->push_read (chunk => 8, sub {
946 my ($hdl, $chunk) = @_;
947 my ($status, $port, $ipn) = unpack "xCna4", $chunk;
948
949 if ($status == 0x5a) {
950 $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
951 } else {
952 $! = Errno::ENXIO; $connect_cb->();
953 }
954 });
955
956 $hdl
957 }
958
959Use C<socks4a_connect> instead of C<tcp_connect> when doing C<http_request>s,
960possibly after switching off other proxy types:
961
962 AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
963
964 http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
965 my ($data, $headers) = @_;
966 ...
967 };
968
838=head1 SEE ALSO 969=head1 SEE ALSO
839 970
840L<AnyEvent>. 971L<AnyEvent>.
841 972
842=head1 AUTHOR 973=head1 AUTHOR

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines