ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.79 by root, Sat Jan 1 20:01:07 2011 UTC vs.
Revision 1.90 by root, Mon Jan 3 00:41:25 2011 UTC

36 36
37=cut 37=cut
38 38
39package AnyEvent::HTTP; 39package AnyEvent::HTTP;
40 40
41use strict; 41use common::sense;
42no warnings;
43 42
44use Errno (); 43use Errno ();
45 44
46use AnyEvent 5.0 (); 45use AnyEvent 5.0 ();
47use AnyEvent::Util (); 46use AnyEvent::Util ();
58our $MAX_PERSISTENT = 8; 57our $MAX_PERSISTENT = 8;
59our $PERSISTENT_TIMEOUT = 2; 58our $PERSISTENT_TIMEOUT = 2;
60our $TIMEOUT = 300; 59our $TIMEOUT = 300;
61 60
62# changing these is evil 61# changing these is evil
63our $MAX_PERSISTENT_PER_HOST = 0; 62our $MAX_PERSISTENT_PER_HOST = 2;
64our $MAX_PER_HOST = 4; 63our $MAX_PER_HOST = 4;
65 64
66our $PROXY; 65our $PROXY;
67our $ACTIVE = 0; 66our $ACTIVE = 0;
68 67
170C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 169C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
171will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:> 170will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
172(this can be suppressed by using C<undef> for these headers in which case 171(this can be suppressed by using C<undef> for these headers in which case
173they won't be sent at all). 172they won't be sent at all).
174 173
174You really should provide your own C<User-Agent:> header value that is
175appropriate for your program - I wouldn't be surprised if the default
176AnyEvent string gets blocked by webservers sooner or later.
177
175=item timeout => $seconds 178=item timeout => $seconds
176 179
177The time-out to use for various stages - each connect attempt will reset 180The time-out to use for various stages - each connect attempt will reset
178the timeout, as will read or write activity, i.e. this is not an overall 181the timeout, as will read or write activity, i.e. this is not an overall
179timeout. 182timeout.
196=item cookie_jar => $hash_ref 199=item cookie_jar => $hash_ref
197 200
198Passing this parameter enables (simplified) cookie-processing, loosely 201Passing this parameter enables (simplified) cookie-processing, loosely
199based on the original netscape specification. 202based on the original netscape specification.
200 203
201The C<$hash_ref> must be an (initially empty) hash reference which will 204The C<$hash_ref> must be an (initially empty) hash reference which
202get updated automatically. It is possible to save the cookie jar to 205will get updated automatically. It is possible to save the cookie jar
203persistent storage with something like JSON or Storable, but this is not 206to persistent storage with something like JSON or Storable - see the
204recommended, as session-only cookies might survive longer than expected. 207C<AnyEvent::HTTP::cookie_jar_expire> function if you wish to remove
208expired or session-only cookies, and also for documentation on the format
209of the cookie jar.
205 210
206Note that this cookie implementation is not meant to be complete. If 211Note that this cookie implementation is not meant to be complete. If
207you want complete cookie management you have to do that on your 212you want complete cookie management you have to do that on your
208own. C<cookie_jar> is meant as a quick fix to get some cookie-using sites 213own. C<cookie_jar> is meant as a quick fix to get most cookie-using sites
209working. Cookies are a privacy disaster, do not use them unless required 214working. Cookies are a privacy disaster, do not use them unless required
210to. 215to.
211 216
212When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:> 217When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:>
213headers will be set and handled by this module, otherwise they will be 218headers will be set and handled by this module, otherwise they will be
328Example: do a HTTP HEAD request on https://www.google.com/, use a 333Example: do a HTTP HEAD request on https://www.google.com/, use a
329timeout of 30 seconds. 334timeout of 30 seconds.
330 335
331 http_request 336 http_request
332 GET => "https://www.google.com", 337 GET => "https://www.google.com",
338 headers => { "user-agent" => "MySearchClient 1.0" },
333 timeout => 30, 339 timeout => 30,
334 sub { 340 sub {
335 my ($body, $hdr) = @_; 341 my ($body, $hdr) = @_;
336 use Data::Dumper; 342 use Data::Dumper;
337 print Dumper $hdr; 343 print Dumper $hdr;
378 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 384 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
379 385
380 _slot_schedule $_[0]; 386 _slot_schedule $_[0];
381} 387}
382 388
389#############################################################################
390
391# expire cookies
392sub cookie_jar_expire($;$) {
393 my ($jar, $session_end) = @_;
394
395 %$jar = () if $jar->{version} != 1;
396
397 my $anow = AE::now;
398
399 while (my ($chost, $paths) = each %$jar) {
400 next unless ref $paths;
401
402 while (my ($cpath, $cookies) = each %$paths) {
403 while (my ($cookie, $kv) = each %$cookies) {
404 if (exists $kv->{_expires}) {
405 delete $cookies->{$cookie}
406 if $anow > $kv->{_expires};
407 } elsif ($session_end) {
408 delete $cookies->{$cookie};
409 }
410 }
411
412 delete $paths->{$cpath}
413 unless %$cookies;
414 }
415
416 delete $jar->{$chost}
417 unless %$paths;
418 }
419}
420
383# extract cookies from jar 421# extract cookies from jar
384sub cookie_jar_extract($$$$) { 422sub cookie_jar_extract($$$$) {
385 my ($jar, $uscheme, $uhost, $upath) = @_; 423 my ($jar, $uscheme, $uhost, $upath) = @_;
386 424
387 %$jar = () if $jar->{version} != 1; 425 %$jar = () if $jar->{version} != 1;
403 next unless $cpath eq substr $upath, 0, length $cpath; 441 next unless $cpath eq substr $upath, 0, length $cpath;
404 442
405 while (my ($cookie, $kv) = each %$cookies) { 443 while (my ($cookie, $kv) = each %$cookies) {
406 next if $uscheme ne "https" && exists $kv->{secure}; 444 next if $uscheme ne "https" && exists $kv->{secure};
407 445
408 if (exists $kv->{expires}) { 446 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) {
409 if (AE::now > parse_date ($kv->{expires})) {
410 delete $cookies->{$cookie}; 447 delete $cookies->{$cookie};
411 next; 448 next;
412 }
413 } 449 }
414 450
415 my $value = $kv->{value}; 451 my $value = $kv->{value};
416 452
417 if ($value =~ /[=;,[:space:]]/) { 453 if ($value =~ /[=;,[:space:]]/) {
426 462
427 \@cookies 463 \@cookies
428} 464}
429 465
430# parse set_cookie header into jar 466# parse set_cookie header into jar
431sub cookie_jar_set_cookie($$$) { 467sub cookie_jar_set_cookie($$$$) {
432 my ($jar, $set_cookie, $uhost) = @_; 468 my ($jar, $set_cookie, $uhost, $date) = @_;
469
470 my $anow = int AE::now;
471 my $snow; # server-now
433 472
434 for ($set_cookie) { 473 for ($set_cookie) {
435 # parse NAME=VALUE 474 # parse NAME=VALUE
436 my @kv; 475 my @kv;
437 476
440 while ( 479 while (
441 m{ 480 m{
442 \G\s* 481 \G\s*
443 (?: 482 (?:
444 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+) 483 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+)
445 | ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) 484 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) )?
446 ) 485 )
447 }gcxsi 486 }gcxsi
448 ) { 487 ) {
449 my $name = $2; 488 my $name = $2;
450 my $value = $4; 489 my $value = $4;
451 490
452 unless (defined $name) { 491 if (defined $1) {
453 # expires 492 # expires
454 $name = "expires"; 493 $name = "expires";
455 $value = $1; 494 $value = $1;
456 } elsif (!defined $value) { 495 } elsif (defined $3) {
457 # quoted 496 # quoted
458 $value = $3; 497 $value = $3;
459 $value =~ s/\\(.)/$1/gs; 498 $value =~ s/\\(.)/$1/gs;
460 } 499 }
461 500
467 last unless @kv; 506 last unless @kv;
468 507
469 my $name = shift @kv; 508 my $name = shift @kv;
470 my %kv = (value => shift @kv, @kv); 509 my %kv = (value => shift @kv, @kv);
471 510
472 $kv{expires} ||= format_date (AE::now + $kv{"max-age"})
473 if exists $kv{"max-age"}; 511 if (exists $kv{"max-age"}) {
512 $kv{_expires} = $anow + delete $kv{"max-age"};
513 } elsif (exists $kv{expires}) {
514 $snow ||= parse_date ($date) || $anow;
515 $kv{_expires} = $anow + (parse_date (delete $kv{expires}) - $snow);
516 } else {
517 delete $kv{_expires};
518 }
474 519
475 my $cdom; 520 my $cdom;
476 my $cpath = (delete $kv{path}) || "/"; 521 my $cpath = (delete $kv{path}) || "/";
477 522
478 if (exists $kv{domain}) { 523 if (exists $kv{domain}) {
489 $cdom = $uhost; 534 $cdom = $uhost;
490 } 535 }
491 536
492 # store it 537 # store it
493 $jar->{version} = 1; 538 $jar->{version} = 1;
494 $jar->{$cdom}{$cpath}{$name} = \%kv; 539 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
495 540
496 redo if /\G\s*,/gc; 541 redo if /\G\s*,/gc;
497 } 542 }
498} 543}
499 544
566 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 611 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
567 612
568 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 613 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
569 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 614 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
570 615
571 my $uhost = $1; 616 my $uhost = lc $1;
572 $uport = $2 if defined $2; 617 $uport = $2 if defined $2;
573 618
574 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost" 619 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
575 unless exists $hdr{host}; 620 unless exists $hdr{host};
576 621
595 $rscheme = "http" unless defined $rscheme; 640 $rscheme = "http" unless defined $rscheme;
596 641
597 # don't support https requests over https-proxy transport, 642 # don't support https requests over https-proxy transport,
598 # can't be done with tls as spec'ed, unless you double-encrypt. 643 # can't be done with tls as spec'ed, unless you double-encrypt.
599 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https"; 644 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
645
646 $rhost = lc $rhost;
647 $rscheme = lc $rscheme;
600 } else { 648 } else {
601 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath); 649 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
602 } 650 }
603 651
604 # leave out fragment and query string, just a heuristic 652 # leave out fragment and query string, just a heuristic
606 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 654 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
607 655
608 $hdr{"content-length"} = length $arg{body} 656 $hdr{"content-length"} = length $arg{body}
609 if length $arg{body} || $method ne "GET"; 657 if length $arg{body} || $method ne "GET";
610 658
611 $hdr{connection} = "close TE"; #1.1 659 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/;
660
661 # default value for keepalive is true iff the request is for an idempotent method
662 my $keepalive = exists $arg{keepalive}
663 ? $arg{keepalive}*1
664 : $idempotent ? $PERSISTENT_TIMEOUT : 0;
665
666 $hdr{connection} = ($keepalive ? "" : "close ") . "Te"; #1.1
612 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 667 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
613 668
614 my %state = (connect_guard => 1); 669 my %state = (connect_guard => 1);
615 670
616 _get_slot $uhost, sub {
617 $state{slot_guard} = shift;
618
619 return unless $state{connect_guard};
620
621 my $ae_error = 595; # connecting 671 my $ae_error = 595; # connecting
622 672
623 my $connect_cb = sub { 673 # handle actual, non-tunneled, request
624 $state{fh} = shift 674 my $handle_actual_request = sub {
675 $ae_error = 596; # request phase
676
677 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
678
679 # send request
680 $state{handle}->push_write (
681 "$method $rpath HTTP/1.1\015\012"
682 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
683 . "\015\012"
684 . (delete $arg{body})
685 );
686
687 # return if error occured during push_write()
688 return unless %state;
689
690 # reduce memory usage, save a kitten, also re-use it for the response headers.
691 %hdr = ();
692
693 # status line and headers
694 $state{read_response} = sub {
695 for ("$_[1]") {
696 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
697
698 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
699 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
700
701 # 100 Continue handling
702 # should not happen as we don't send expect: 100-continue,
703 # but we handle it just in case.
704 # since we send the request body regardless, if we get an error
705 # we are out of-sync, which we currently do NOT handle correctly.
706 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
707 if $2 eq 100;
708
709 push @pseudo,
710 HTTPVersion => $1,
711 Status => $2,
712 Reason => $3,
625 or do { 713 ;
626 my $err = "$!"; 714
715 my $hdr = parse_hdr
716 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
717
718 %hdr = (%$hdr, @pseudo);
719 }
720
721 # redirect handling
722 # microsoft and other shitheads don't give a shit for following standards,
723 # try to support some common forms of broken Location headers.
724 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
725 $hdr{location} =~ s/^\.\/+//;
726
727 my $url = "$rscheme://$uhost:$uport";
728
729 unless ($hdr{location} =~ s/^\///) {
730 $url .= $upath;
731 $url =~ s/\/[^\/]*$//;
732 }
733
734 $hdr{location} = "$url/$hdr{location}";
735 }
736
737 my $redirect;
738
739 if ($recurse) {
740 my $status = $hdr{Status};
741
742 # industry standard is to redirect POST as GET for
743 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
744 # also, the UA should ask the user for 301 and 307 and POST,
745 # industry standard seems to be to simply follow.
746 # we go with the industry standard.
747 if ($status == 301 or $status == 302 or $status == 303) {
748 # HTTP/1.1 is unclear on how to mutate the method
749 $method = "GET" unless $method eq "HEAD";
750 $redirect = 1;
751 } elsif ($status == 307) {
752 $redirect = 1;
753 }
754 }
755
756 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
757 my $may_keep_alive = $_[3];
758
759 $state{handle}->destroy if $state{handle};
627 %state = (); 760 %state = ();
628 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err }); 761
762 if (defined $_[1]) {
763 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
764 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
765 }
766
767 # set-cookie processing
768 if ($arg{cookie_jar}) {
769 cookie_jar_set_cookie $arg{cookie_jar}, $hdr{"set-cookie"}, $uhost, $hdr{date};
770 }
771
772 if ($redirect && exists $hdr{location}) {
773 # we ignore any errors, as it is very common to receive
774 # Content-Length != 0 but no actual body
775 # we also access %hdr, as $_[1] might be an erro
776 http_request (
777 $method => $hdr{location},
778 %arg,
779 recurse => $recurse - 1,
780 Redirect => [$_[0], \%hdr],
781 $cb);
782 } else {
783 $cb->($_[0], \%hdr);
784 }
785 };
786
787 $ae_error = 597; # body phase
788
789 my $len = $hdr{"content-length"};
790
791 # body handling, many different code paths
792 # - no body expected
793 # - want_body_handle
794 # - te chunked
795 # - 2x length known (with or without on_body)
796 # - 2x length not known (with or without on_body)
797 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
798 $finish->(undef, 598 => "Request cancelled by on_header");
799 } elsif (
800 $hdr{Status} =~ /^(?:1..|204|205|304)$/
801 or $method eq "HEAD"
802 or (defined $len && $len == 0) # == 0, not !, because "0 " is true
803 ) {
804 # no body
805 $finish->("", undef, undef, 1);
806
807 } elsif (!$redirect && $arg{want_body_handle}) {
808 $_[0]->on_eof (undef);
809 $_[0]->on_error (undef);
810 $_[0]->on_read (undef);
811
812 $finish->(delete $state{handle});
813
814 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
815 my $cl = 0;
816 my $body = undef;
817 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
818
819 $state{read_chunk} = sub {
820 $_[1] =~ /^([0-9a-fA-F]+)/
821 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
822
823 my $len = hex $1;
824
825 if ($len) {
826 $cl += $len;
827
828 $_[0]->push_read (chunk => $len, sub {
829 $on_body->($_[1], \%hdr)
830 or return $finish->(undef, 598 => "Request cancelled by on_body");
831
832 $_[0]->push_read (line => sub {
833 length $_[1]
834 and return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
835 $_[0]->push_read (line => $state{read_chunk});
836 });
837 });
838 } else {
839 $hdr{"content-length"} ||= $cl;
840
841 $_[0]->push_read (line => $qr_nlnl, sub {
842 if (length $_[1]) {
843 for ("$_[1]") {
844 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
845
846 my $hdr = parse_hdr
847 or return $finish->(undef, $ae_error => "Garbled response trailers");
848
849 %hdr = (%hdr, %$hdr);
850 }
851 }
852
853 $finish->($body, undef, undef, 1);
854 });
855 }
629 }; 856 };
630 857
858 $_[0]->push_read (line => $state{read_chunk});
859
860 } elsif ($arg{on_body}) {
861 if (defined $len) {
862 $_[0]->on_read (sub {
863 $len -= length $_[0]{rbuf};
864
865 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
866 or return $finish->(undef, 598 => "Request cancelled by on_body");
867
868 $len > 0
869 or $finish->("", undef, undef, 1);
870 });
871 } else {
872 $_[0]->on_eof (sub {
873 $finish->("");
874 });
875 $_[0]->on_read (sub {
876 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
877 or $finish->(undef, 598 => "Request cancelled by on_body");
878 });
879 }
880 } else {
881 $_[0]->on_eof (undef);
882
883 if (defined $len) {
884 $_[0]->on_read (sub {
885 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
886 if $len <= length $_[0]{rbuf};
887 });
888 } else {
889 $_[0]->on_error (sub {
890 ($! == Errno::EPIPE || !$!)
891 ? $finish->(delete $_[0]{rbuf})
892 : $finish->(undef, $ae_error => $_[2]);
893 });
894 $_[0]->on_read (sub { });
895 }
896 }
897 };
898
899 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
900 };
901
902 my $connect_cb = sub {
903 $state{fh} = shift
904 or do {
905 my $err = "$!";
906 %state = ();
907 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err });
908 };
909
631 return unless delete $state{connect_guard}; 910 return unless delete $state{connect_guard};
632 911
633 # get handle 912 # get handle
634 $state{handle} = new AnyEvent::Handle 913 $state{handle} = new AnyEvent::Handle
635 fh => $state{fh}, 914 fh => $state{fh},
636 peername => $rhost, 915 peername => $rhost,
637 tls_ctx => $arg{tls_ctx}, 916 tls_ctx => $arg{tls_ctx},
638 # these need to be reconfigured on keepalive handles 917 # these need to be reconfigured on keepalive handles
639 timeout => $timeout, 918 timeout => $timeout,
640 on_error => sub { 919 on_error => sub {
641 %state = (); 920 %state = ();
642 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] }); 921 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
643 }, 922 },
644 on_eof => sub { 923 on_eof => sub {
645 %state = (); 924 %state = ();
646 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" }); 925 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
647 }, 926 },
648 ; 927 ;
649 928
650 # limit the number of persistent connections 929 # limit the number of persistent connections
651 # keepalive not yet supported 930 # keepalive not yet supported
652# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 931# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
653# ++$KA_COUNT{$_[1]}; 932# ++$KA_COUNT{$_[1]};
654# $state{handle}{ka_count_guard} = AnyEvent::Util::guard { 933# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
655# --$KA_COUNT{$_[1]} 934# --$KA_COUNT{$_[1]}
656# }; 935# };
657# $hdr{connection} = "keep-alive"; 936# $hdr{connection} = "keep-alive";
658# } 937# }
659 938
660 $state{handle}->starttls ("connect") if $rscheme eq "https"; 939 $state{handle}->starttls ("connect") if $rscheme eq "https";
661 940
662 # handle actual, non-tunneled, request
663 my $handle_actual_request = sub {
664 $ae_error = 596; # request phase
665
666 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
667
668 # send request
669 $state{handle}->push_write (
670 "$method $rpath HTTP/1.1\015\012"
671 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
672 . "\015\012"
673 . (delete $arg{body})
674 );
675
676 # return if error occured during push_write()
677 return unless %state;
678
679 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
680
681 # status line and headers
682 $state{read_response} = sub {
683 for ("$_[1]") {
684 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
685
686 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
687 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
688
689 # 100 Continue handling
690 # should not happen as we don't send expect: 100-continue,
691 # but we handle it just in case.
692 # since we send the request body regardless, if we get an error
693 # we are out of-sync, which we currently do NOT handle correctly.
694 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
695 if $2 eq 100;
696
697 push @pseudo,
698 HTTPVersion => $1,
699 Status => $2,
700 Reason => $3,
701 ;
702
703 my $hdr = parse_hdr
704 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
705
706 %hdr = (%$hdr, @pseudo);
707 }
708
709 # redirect handling
710 # microsoft and other shitheads don't give a shit for following standards,
711 # try to support some common forms of broken Location headers.
712 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
713 $hdr{location} =~ s/^\.\/+//;
714
715 my $url = "$rscheme://$uhost:$uport";
716
717 unless ($hdr{location} =~ s/^\///) {
718 $url .= $upath;
719 $url =~ s/\/[^\/]*$//;
720 }
721
722 $hdr{location} = "$url/$hdr{location}";
723 }
724
725 my $redirect;
726
727 if ($recurse) {
728 my $status = $hdr{Status};
729
730 # industry standard is to redirect POST as GET for
731 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
732 # also, the UA should ask the user for 301 and 307 and POST,
733 # industry standard seems to be to simply follow.
734 # we go with the industry standard.
735 if ($status == 301 or $status == 302 or $status == 303) {
736 # HTTP/1.1 is unclear on how to mutate the method
737 $method = "GET" unless $method eq "HEAD";
738 $redirect = 1;
739 } elsif ($status == 307) {
740 $redirect = 1;
741 }
742 }
743
744 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
745 my $may_keep_alive = $_[3];
746
747 $state{handle}->destroy if $state{handle};
748 %state = ();
749
750 if (defined $_[1]) {
751 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
752 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
753 }
754
755 # set-cookie processing
756 if ($arg{cookie_jar}) {
757 cookie_jar_set_cookie $arg{cookie_jar}, $hdr{"set-cookie"}, $uhost;
758 }
759
760 if ($redirect && exists $hdr{location}) {
761 # we ignore any errors, as it is very common to receive
762 # Content-Length != 0 but no actual body
763 # we also access %hdr, as $_[1] might be an erro
764 http_request (
765 $method => $hdr{location},
766 %arg,
767 recurse => $recurse - 1,
768 Redirect => [$_[0], \%hdr],
769 $cb);
770 } else {
771 $cb->($_[0], \%hdr);
772 }
773 };
774
775 $ae_error = 597; # body phase
776
777 my $len = $hdr{"content-length"};
778
779 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
780 $finish->(undef, 598 => "Request cancelled by on_header");
781 } elsif (
782 $hdr{Status} =~ /^(?:1..|204|205|304)$/
783 or $method eq "HEAD"
784 or (defined $len && !$len)
785 ) {
786 # no body
787 $finish->("", undef, undef, 1);
788 } else {
789 # body handling, many different code paths
790 # - no body expected
791 # - want_body_handle
792 # - te chunked
793 # - 2x length known (with or without on_body)
794 # - 2x length not known (with or without on_body)
795 if (!$redirect && $arg{want_body_handle}) {
796 $_[0]->on_eof (undef);
797 $_[0]->on_error (undef);
798 $_[0]->on_read (undef);
799
800 $finish->(delete $state{handle});
801
802 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
803 my $cl = 0;
804 my $body = undef;
805 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
806
807 my $read_chunk; $read_chunk = sub {
808 $_[1] =~ /^([0-9a-fA-F]+)/
809 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
810
811 my $len = hex $1;
812
813 if ($len) {
814 $cl += $len;
815
816 $_[0]->push_read (chunk => $len, sub {
817 $on_body->($_[1], \%hdr)
818 or return $finish->(undef, 598 => "Request cancelled by on_body");
819
820 $_[0]->push_read (line => sub {
821 length $_[1]
822 and return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
823 $_[0]->push_read (line => $read_chunk);
824 });
825 });
826 } else {
827 $hdr{"content-length"} ||= $cl;
828
829 $_[0]->push_read (line => $qr_nlnl, sub {
830 if (length $_[1]) {
831 for ("$_[1]") {
832 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
833
834 my $hdr = parse_hdr
835 or return $finish->(undef, $ae_error => "Garbled response trailers");
836
837 %hdr = (%hdr, %$hdr);
838 }
839 }
840
841 $finish->($body, undef, undef, 1);
842 });
843 }
844 };
845
846 $_[0]->push_read (line => $read_chunk);
847
848 } elsif ($arg{on_body}) {
849 if ($len) {
850 $_[0]->on_read (sub {
851 $len -= length $_[0]{rbuf};
852
853 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
854 or return $finish->(undef, 598 => "Request cancelled by on_body");
855
856 $len > 0
857 or $finish->("", undef, undef, 1);
858 });
859 } else {
860 $_[0]->on_eof (sub {
861 $finish->("");
862 });
863 $_[0]->on_read (sub {
864 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
865 or $finish->(undef, 598 => "Request cancelled by on_body");
866 });
867 }
868 } else {
869 $_[0]->on_eof (undef);
870
871 if ($len) {
872 $_[0]->on_read (sub {
873 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
874 if $len <= length $_[0]{rbuf};
875 });
876 } else {
877 $_[0]->on_error (sub {
878 ($! == Errno::EPIPE || !$!)
879 ? $finish->(delete $_[0]{rbuf})
880 : $finish->(undef, $ae_error => $_[2]);
881 });
882 $_[0]->on_read (sub { });
883 }
884 }
885 }
886 };
887
888 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
889 };
890
891 # now handle proxy-CONNECT method 941 # now handle proxy-CONNECT method
892 if ($proxy && $uscheme eq "https") { 942 if ($proxy && $uscheme eq "https") {
893 # oh dear, we have to wrap it into a connect request 943 # oh dear, we have to wrap it into a connect request
894 944
895 # maybe re-use $uauthority with patched port? 945 # maybe re-use $uauthority with patched port?
896 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012"); 946 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012");
897 $state{handle}->push_read (line => $qr_nlnl, sub { 947 $state{handle}->push_read (line => $qr_nlnl, sub {
898 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 948 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
899 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" })); 949 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }));
900 950
901 if ($2 == 200) { 951 if ($2 == 200) {
902 $rpath = $upath; 952 $rpath = $upath;
903 &$handle_actual_request; 953 $handle_actual_request->();
904 } else { 954 } else {
905 %state = (); 955 %state = ();
906 $cb->(undef, { @pseudo, Status => $2, Reason => $3 }); 956 $cb->(undef, { @pseudo, Status => $2, Reason => $3 });
907 }
908 }); 957 }
909 } else {
910 &$handle_actual_request;
911 } 958 });
959 } else {
960 $handle_actual_request->();
912 }; 961 }
962 };
963
964 _get_slot $uhost, sub {
965 $state{slot_guard} = shift;
966
967 return unless $state{connect_guard};
913 968
914 my $tcp_connect = $arg{tcp_connect} 969 my $tcp_connect = $arg{tcp_connect}
915 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 970 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
916 971
917 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout }); 972 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
918
919 }; 973 };
920 974
921 defined wantarray && AnyEvent::Util::guard { %state = () } 975 defined wantarray && AnyEvent::Util::guard { %state = () }
922} 976}
923 977
957Sets the default proxy server to use. The proxy-url must begin with a 1011Sets the default proxy server to use. The proxy-url must begin with a
958string of the form C<http://host:port> (optionally C<https:...>), croaks 1012string of the form C<http://host:port> (optionally C<https:...>), croaks
959otherwise. 1013otherwise.
960 1014
961To clear an already-set proxy, use C<undef>. 1015To clear an already-set proxy, use C<undef>.
1016
1017=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1018
1019Remove all cookies from the cookie jar that have been expired. If
1020C<$session_end> is given and true, then additionally remove all session
1021cookies.
1022
1023You should call this function (with a true C<$session_end>) before you
1024save cookies to disk, and you should call this function after loading them
1025again. If you have a long-running program you can additonally call this
1026function from time to time.
1027
1028A cookie jar is initially an empty hash-reference that is managed by this
1029module. It's format is subject to change, but currently it is like this:
1030
1031The key C<version> has to contain C<1>, otherwise the hash gets
1032emptied. All other keys are hostnames or IP addresses pointing to
1033hash-references. The key for these inner hash references is the
1034server path for which this cookie is meant, and the values are again
1035hash-references. The keys of those hash-references is the cookie name, and
1036the value, you guessed it, is another hash-reference, this time with the
1037key-value pairs from the cookie, except for C<expires> and C<max-age>,
1038which have been replaced by a C<_expires> key that contains the cookie
1039expiry timestamp.
1040
1041Here is an example of a cookie jar with a single cookie, so you have a
1042chance of understanding the above paragraph:
1043
1044 {
1045 version => 1,
1046 "10.0.0.1" => {
1047 "/" => {
1048 "mythweb_id" => {
1049 _expires => 1293917923,
1050 value => "ooRung9dThee3ooyXooM1Ohm",
1051 },
1052 },
1053 },
1054 }
962 1055
963=item $date = AnyEvent::HTTP::format_date $timestamp 1056=item $date = AnyEvent::HTTP::format_date $timestamp
964 1057
965Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP 1058Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
966Date (RFC 2616). 1059Date (RFC 2616).

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines