ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.73 by root, Fri Dec 31 21:00:36 2010 UTC vs.
Revision 1.91 by root, Mon Jan 3 01:03:29 2011 UTC

36 36
37=cut 37=cut
38 38
39package AnyEvent::HTTP; 39package AnyEvent::HTTP;
40 40
41use strict; 41use common::sense;
42no warnings;
43 42
44use Errno (); 43use Errno ();
45 44
46use AnyEvent 5.0 (); 45use AnyEvent 5.0 ();
47use AnyEvent::Util (); 46use AnyEvent::Util ();
58our $MAX_PERSISTENT = 8; 57our $MAX_PERSISTENT = 8;
59our $PERSISTENT_TIMEOUT = 2; 58our $PERSISTENT_TIMEOUT = 2;
60our $TIMEOUT = 300; 59our $TIMEOUT = 300;
61 60
62# changing these is evil 61# changing these is evil
63our $MAX_PERSISTENT_PER_HOST = 0; 62our $MAX_PERSISTENT_PER_HOST = 2;
64our $MAX_PER_HOST = 4; 63our $MAX_PER_HOST = 4;
65 64
66our $PROXY; 65our $PROXY;
67our $ACTIVE = 0; 66our $ACTIVE = 0;
68 67
122 121
123If the server sends a header multiple times, then their contents will be 122If the server sends a header multiple times, then their contents will be
124joined together with a comma (C<,>), as per the HTTP spec. 123joined together with a comma (C<,>), as per the HTTP spec.
125 124
126If an internal error occurs, such as not being able to resolve a hostname, 125If an internal error occurs, such as not being able to resolve a hostname,
127then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x> 126then C<$data> will be C<undef>, C<< $headers->{Status} >> will be
128(usually C<599>) and the C<Reason> pseudo-header will contain an error 127C<590>-C<599> and the C<Reason> pseudo-header will contain an error
129message. 128message. Currently the following status codes are used:
129
130=over 4
131
132=item 595 - errors during connection etsbalishment, proxy handshake.
133
134=item 596 - errors during TLS negotiation, request sending and header processing.
135
136=item 597 - errors during body receiving or processing.
137
138=item 598 - user aborted request via C<on_header> or C<on_body>.
139
140=item 599 - other, usually nonretryable, errors (garbled URL etc.).
141
142=back
130 143
131A typical callback might look like this: 144A typical callback might look like this:
132 145
133 sub { 146 sub {
134 my ($body, $hdr) = @_; 147 my ($body, $hdr) = @_;
156C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 169C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
157will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:> 170will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
158(this can be suppressed by using C<undef> for these headers in which case 171(this can be suppressed by using C<undef> for these headers in which case
159they won't be sent at all). 172they won't be sent at all).
160 173
174You really should provide your own C<User-Agent:> header value that is
175appropriate for your program - I wouldn't be surprised if the default
176AnyEvent string gets blocked by webservers sooner or later.
177
161=item timeout => $seconds 178=item timeout => $seconds
162 179
163The time-out to use for various stages - each connect attempt will reset 180The time-out to use for various stages - each connect attempt will reset
164the timeout, as will read or write activity, i.e. this is not an overall 181the timeout, as will read or write activity, i.e. this is not an overall
165timeout. 182timeout.
182=item cookie_jar => $hash_ref 199=item cookie_jar => $hash_ref
183 200
184Passing this parameter enables (simplified) cookie-processing, loosely 201Passing this parameter enables (simplified) cookie-processing, loosely
185based on the original netscape specification. 202based on the original netscape specification.
186 203
187The C<$hash_ref> must be an (initially empty) hash reference which will 204The C<$hash_ref> must be an (initially empty) hash reference which
188get updated automatically. It is possible to save the cookie jar to 205will get updated automatically. It is possible to save the cookie jar
189persistent storage with something like JSON or Storable, but this is not 206to persistent storage with something like JSON or Storable - see the
190recommended, as session-only cookies might survive longer than expected. 207C<AnyEvent::HTTP::cookie_jar_expire> function if you wish to remove
208expired or session-only cookies, and also for documentation on the format
209of the cookie jar.
191 210
192Note that this cookie implementation is not meant to be complete. If 211Note that this cookie implementation is not meant to be complete. If
193you want complete cookie management you have to do that on your 212you want complete cookie management you have to do that on your
194own. C<cookie_jar> is meant as a quick fix to get some cookie-using sites 213own. C<cookie_jar> is meant as a quick fix to get most cookie-using sites
195working. Cookies are a privacy disaster, do not use them unless required 214working. Cookies are a privacy disaster, do not use them unless required
196to. 215to.
197 216
198When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:> 217When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:>
199headers will be set and handled by this module, otherwise they will be 218headers will be set and handled by this module, otherwise they will be
314Example: do a HTTP HEAD request on https://www.google.com/, use a 333Example: do a HTTP HEAD request on https://www.google.com/, use a
315timeout of 30 seconds. 334timeout of 30 seconds.
316 335
317 http_request 336 http_request
318 GET => "https://www.google.com", 337 GET => "https://www.google.com",
338 headers => { "user-agent" => "MySearchClient 1.0" },
319 timeout => 30, 339 timeout => 30,
320 sub { 340 sub {
321 my ($body, $hdr) = @_; 341 my ($body, $hdr) = @_;
322 use Data::Dumper; 342 use Data::Dumper;
323 print Dumper $hdr; 343 print Dumper $hdr;
364 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 384 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
365 385
366 _slot_schedule $_[0]; 386 _slot_schedule $_[0];
367} 387}
368 388
389#############################################################################
390
391# expire cookies
392sub cookie_jar_expire($;$) {
393 my ($jar, $session_end) = @_;
394
395 %$jar = () if $jar->{version} != 1;
396
397 my $anow = AE::now;
398
399 while (my ($chost, $paths) = each %$jar) {
400 next unless ref $paths;
401
402 while (my ($cpath, $cookies) = each %$paths) {
403 while (my ($cookie, $kv) = each %$cookies) {
404 if (exists $kv->{_expires}) {
405 delete $cookies->{$cookie}
406 if $anow > $kv->{_expires};
407 } elsif ($session_end) {
408 delete $cookies->{$cookie};
409 }
410 }
411
412 delete $paths->{$cpath}
413 unless %$cookies;
414 }
415
416 delete $jar->{$chost}
417 unless %$paths;
418 }
419}
420
369# extract cookies from jar 421# extract cookies from jar
370sub cookie_jar_extract($$$$) { 422sub cookie_jar_extract($$$$) {
371 my ($jar, $uscheme, $uhost, $upath) = @_; 423 my ($jar, $uscheme, $uhost, $upath) = @_;
372 424
373 %$jar = () if $jar->{version} != 1; 425 %$jar = () if $jar->{version} != 1;
389 next unless $cpath eq substr $upath, 0, length $cpath; 441 next unless $cpath eq substr $upath, 0, length $cpath;
390 442
391 while (my ($cookie, $kv) = each %$cookies) { 443 while (my ($cookie, $kv) = each %$cookies) {
392 next if $uscheme ne "https" && exists $kv->{secure}; 444 next if $uscheme ne "https" && exists $kv->{secure};
393 445
394 if (exists $kv->{expires}) { 446 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) {
395 if (AE::now > parse_date ($kv->{expires})) {
396 delete $cookies->{$cookie}; 447 delete $cookies->{$cookie};
397 next; 448 next;
398 }
399 } 449 }
400 450
401 my $value = $kv->{value}; 451 my $value = $kv->{value};
402 452
403 if ($value =~ /[=;,[:space:]]/) { 453 if ($value =~ /[=;,[:space:]]/) {
412 462
413 \@cookies 463 \@cookies
414} 464}
415 465
416# parse set_cookie header into jar 466# parse set_cookie header into jar
417sub cookie_jar_set_cookie($$$) { 467sub cookie_jar_set_cookie($$$$) {
418 my ($jar, $set_cookie, $uhost) = @_; 468 my ($jar, $set_cookie, $uhost, $date) = @_;
469
470 my $anow = int AE::now;
471 my $snow; # server-now
419 472
420 for ($set_cookie) { 473 for ($set_cookie) {
421 # parse NAME=VALUE 474 # parse NAME=VALUE
422 my @kv; 475 my @kv;
423 476
477 # expires is not http-compliant in the original cookie-spec,
478 # we support the official date format and some extensions
424 while ( 479 while (
425 m{ 480 m{
426 \G\s* 481 \G\s*
427 (?: 482 (?:
428 expires \s*=\s* ([A-Z][a-z][a-z],\ [^,;]+) 483 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+)
429 | ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) 484 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) )?
430 ) 485 )
431 }gcxsi 486 }gcxsi
432 ) { 487 ) {
433 my $name = $2; 488 my $name = $2;
434 my $value = $4; 489 my $value = $4;
435 490
436 unless (defined $name) { 491 if (defined $1) {
437 # expires 492 # expires
438 $name = "expires"; 493 $name = "expires";
439 $value = $1; 494 $value = $1;
440 } elsif (!defined $value) { 495 } elsif (defined $3) {
441 # quoted 496 # quoted
442 $value = $3; 497 $value = $3;
443 $value =~ s/\\(.)/$1/gs; 498 $value =~ s/\\(.)/$1/gs;
444 } 499 }
445 500
451 last unless @kv; 506 last unless @kv;
452 507
453 my $name = shift @kv; 508 my $name = shift @kv;
454 my %kv = (value => shift @kv, @kv); 509 my %kv = (value => shift @kv, @kv);
455 510
456 $kv{expires} ||= format_date (AE::now + $kv{"max-age"})
457 if exists $kv{"max-age"}; 511 if (exists $kv{"max-age"}) {
512 $kv{_expires} = $anow + delete $kv{"max-age"};
513 } elsif (exists $kv{expires}) {
514 $snow ||= parse_date ($date) || $anow;
515 $kv{_expires} = $anow + (parse_date (delete $kv{expires}) - $snow);
516 } else {
517 delete $kv{_expires};
518 }
458 519
459 my $cdom; 520 my $cdom;
460 my $cpath = (delete $kv{path}) || "/"; 521 my $cpath = (delete $kv{path}) || "/";
461 522
462 if (exists $kv{domain}) { 523 if (exists $kv{domain}) {
473 $cdom = $uhost; 534 $cdom = $uhost;
474 } 535 }
475 536
476 # store it 537 # store it
477 $jar->{version} = 1; 538 $jar->{version} = 1;
478 $jar->{$cdom}{$cpath}{$name} = \%kv; 539 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
479 540
480 redo if /\G\s*,/gc; 541 redo if /\G\s*,/gc;
481 } 542 }
482} 543}
483 544
550 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 611 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
551 612
552 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 613 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
553 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 614 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
554 615
555 my $uhost = $1; 616 my $uhost = lc $1;
556 $uport = $2 if defined $2; 617 $uport = $2 if defined $2;
557 618
558 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost" 619 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
559 unless exists $hdr{host}; 620 unless exists $hdr{host};
560 621
579 $rscheme = "http" unless defined $rscheme; 640 $rscheme = "http" unless defined $rscheme;
580 641
581 # don't support https requests over https-proxy transport, 642 # don't support https requests over https-proxy transport,
582 # can't be done with tls as spec'ed, unless you double-encrypt. 643 # can't be done with tls as spec'ed, unless you double-encrypt.
583 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https"; 644 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
645
646 $rhost = lc $rhost;
647 $rscheme = lc $rscheme;
584 } else { 648 } else {
585 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath); 649 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
586 } 650 }
587 651
588 # leave out fragment and query string, just a heuristic 652 # leave out fragment and query string, just a heuristic
590 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 654 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
591 655
592 $hdr{"content-length"} = length $arg{body} 656 $hdr{"content-length"} = length $arg{body}
593 if length $arg{body} || $method ne "GET"; 657 if length $arg{body} || $method ne "GET";
594 658
595 $hdr{connection} = "close TE"; #1.1 659 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/;
660
661 # default value for keepalive is true iff the request is for an idempotent method
662 my $keepalive = exists $arg{keepalive}
663 ? $arg{keepalive}*1
664 : $idempotent ? $PERSISTENT_TIMEOUT : 0;
665
666 $hdr{connection} = ($keepalive ? "" : "close ") . "Te"; #1.1
596 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 667 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
597 668
598 my %state = (connect_guard => 1); 669 my %state = (connect_guard => 1);
599 670
600 _get_slot $uhost, sub { 671 my $ae_error = 595; # connecting
601 $state{slot_guard} = shift;
602 672
673 # handle actual, non-tunneled, request
674 my $handle_actual_request = sub {
675 $ae_error = 596; # request phase
676
677 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
678
679 # send request
680 $state{handle}->push_write (
681 "$method $rpath HTTP/1.1\015\012"
682 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
683 . "\015\012"
684 . (delete $arg{body})
685 );
686
687 # return if error occured during push_write()
603 return unless $state{connect_guard}; 688 return unless %state;
604 689
605 my $connect_cb = sub { 690 # reduce memory usage, save a kitten, also re-use it for the response headers.
606 $state{fh} = shift 691 %hdr = ();
692
693 # status line and headers
694 $state{read_response} = sub {
695 for ("$_[1]") {
696 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
697
698 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
699 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
700
701 # 100 Continue handling
702 # should not happen as we don't send expect: 100-continue,
703 # but we handle it just in case.
704 # since we send the request body regardless, if we get an error
705 # we are out of-sync, which we currently do NOT handle correctly.
706 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
707 if $2 eq 100;
708
709 push @pseudo,
710 HTTPVersion => $1,
711 Status => $2,
712 Reason => $3,
607 or do { 713 ;
608 my $err = "$!"; 714
715 my $hdr = parse_hdr
716 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
717
718 %hdr = (%$hdr, @pseudo);
719 }
720
721 # redirect handling
722 # microsoft and other shitheads don't give a shit for following standards,
723 # try to support some common forms of broken Location headers.
724 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
725 $hdr{location} =~ s/^\.\/+//;
726
727 my $url = "$rscheme://$uhost:$uport";
728
729 unless ($hdr{location} =~ s/^\///) {
730 $url .= $upath;
731 $url =~ s/\/[^\/]*$//;
732 }
733
734 $hdr{location} = "$url/$hdr{location}";
735 }
736
737 my $redirect;
738
739 if ($recurse) {
740 my $status = $hdr{Status};
741
742 # industry standard is to redirect POST as GET for
743 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
744 # also, the UA should ask the user for 301 and 307 and POST,
745 # industry standard seems to be to simply follow.
746 # we go with the industry standard.
747 if ($status == 301 or $status == 302 or $status == 303) {
748 # HTTP/1.1 is unclear on how to mutate the method
749 $method = "GET" unless $method eq "HEAD";
750 $redirect = 1;
751 } elsif ($status == 307) {
752 $redirect = 1;
753 }
754 }
755
756 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
757 my $may_keep_alive = $_[3];
758
759 $state{handle}->destroy if $state{handle};
609 %state = (); 760 %state = ();
610 return $cb->(undef, { @pseudo, Status => 599, Reason => $err }); 761
762 if (defined $_[1]) {
763 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
764 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
765 }
766
767 # set-cookie processing
768 if ($arg{cookie_jar}) {
769 cookie_jar_set_cookie $arg{cookie_jar}, $hdr{"set-cookie"}, $uhost, $hdr{date};
770 }
771
772 if ($redirect && exists $hdr{location}) {
773 # we ignore any errors, as it is very common to receive
774 # Content-Length != 0 but no actual body
775 # we also access %hdr, as $_[1] might be an erro
776 http_request (
777 $method => $hdr{location},
778 %arg,
779 recurse => $recurse - 1,
780 Redirect => [$_[0], \%hdr],
781 $cb);
782 } else {
783 $cb->($_[0], \%hdr);
784 }
785 };
786
787 $ae_error = 597; # body phase
788
789 my $chunked = $hdr{"transfer-encoding"} =~ /\bchunked\b/i; # not quite correct...
790
791 my $len = $chunked ? undef : $hdr{"content-length"};
792
793 # body handling, many different code paths
794 # - no body expected
795 # - want_body_handle
796 # - te chunked
797 # - 2x length known (with or without on_body)
798 # - 2x length not known (with or without on_body)
799 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
800 $finish->(undef, 598 => "Request cancelled by on_header");
801 } elsif (
802 $hdr{Status} =~ /^(?:1..|204|205|304)$/
803 or $method eq "HEAD"
804 or (defined $len && $len == 0) # == 0, not !, because "0 " is true
805 ) {
806 # no body
807 $finish->("", undef, undef, 1);
808
809 } elsif (!$redirect && $arg{want_body_handle}) {
810 $_[0]->on_eof (undef);
811 $_[0]->on_error (undef);
812 $_[0]->on_read (undef);
813
814 $finish->(delete $state{handle});
815
816 } elsif ($chunked) {
817 my $cl = 0;
818 my $body = undef;
819 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
820
821 $state{read_chunk} = sub {
822 $_[1] =~ /^([0-9a-fA-F]+)/
823 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
824
825 my $len = hex $1;
826
827 if ($len) {
828 $cl += $len;
829
830 $_[0]->push_read (chunk => $len, sub {
831 $on_body->($_[1], \%hdr)
832 or return $finish->(undef, 598 => "Request cancelled by on_body");
833
834 $_[0]->push_read (line => sub {
835 length $_[1]
836 and return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
837 $_[0]->push_read (line => $state{read_chunk});
838 });
839 });
840 } else {
841 $hdr{"content-length"} ||= $cl;
842
843 $_[0]->push_read (line => $qr_nlnl, sub {
844 if (length $_[1]) {
845 for ("$_[1]") {
846 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
847
848 my $hdr = parse_hdr
849 or return $finish->(undef, $ae_error => "Garbled response trailers");
850
851 %hdr = (%hdr, %$hdr);
852 }
853 }
854
855 $finish->($body, undef, undef, 1);
856 });
857 }
611 }; 858 };
612 859
613 pop; # free memory, save a tree 860 $_[0]->push_read (line => $state{read_chunk});
614 861
862 } elsif ($arg{on_body}) {
863 if (defined $len) {
864 $_[0]->on_read (sub {
865 $len -= length $_[0]{rbuf};
866
867 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
868 or return $finish->(undef, 598 => "Request cancelled by on_body");
869
870 $len > 0
871 or $finish->("", undef, undef, 1);
872 });
873 } else {
874 $_[0]->on_eof (sub {
875 $finish->("");
876 });
877 $_[0]->on_read (sub {
878 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
879 or $finish->(undef, 598 => "Request cancelled by on_body");
880 });
881 }
882 } else {
883 $_[0]->on_eof (undef);
884
885 if (defined $len) {
886 $_[0]->on_read (sub {
887 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
888 if $len <= length $_[0]{rbuf};
889 });
890 } else {
891 $_[0]->on_error (sub {
892 ($! == Errno::EPIPE || !$!)
893 ? $finish->(delete $_[0]{rbuf})
894 : $finish->(undef, $ae_error => $_[2]);
895 });
896 $_[0]->on_read (sub { });
897 }
898 }
899 };
900
901 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
902 };
903
904 my $connect_cb = sub {
905 $state{fh} = shift
906 or do {
907 my $err = "$!";
908 %state = ();
909 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err });
910 };
911
615 return unless delete $state{connect_guard}; 912 return unless delete $state{connect_guard};
616 913
617 # get handle 914 # get handle
618 $state{handle} = new AnyEvent::Handle 915 $state{handle} = new AnyEvent::Handle
619 fh => $state{fh}, 916 fh => $state{fh},
620 peername => $rhost, 917 peername => $rhost,
621 tls_ctx => $arg{tls_ctx}, 918 tls_ctx => $arg{tls_ctx},
622 # these need to be reconfigured on keepalive handles 919 # these need to be reconfigured on keepalive handles
623 timeout => $timeout, 920 timeout => $timeout,
624 on_error => sub { 921 on_error => sub {
625 %state = (); 922 %state = ();
626 $cb->(undef, { @pseudo, Status => 599, Reason => $_[2] }); 923 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
627 }, 924 },
628 on_eof => sub { 925 on_eof => sub {
629 %state = (); 926 %state = ();
630 $cb->(undef, { @pseudo, Status => 599, Reason => "Unexpected end-of-file" }); 927 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
631 }, 928 },
632 ; 929 ;
633 930
634 # limit the number of persistent connections 931 # limit the number of persistent connections
635 # keepalive not yet supported 932 # keepalive not yet supported
636# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 933# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
637# ++$KA_COUNT{$_[1]}; 934# ++$KA_COUNT{$_[1]};
638# $state{handle}{ka_count_guard} = AnyEvent::Util::guard { 935# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
639# --$KA_COUNT{$_[1]} 936# --$KA_COUNT{$_[1]}
640# }; 937# };
641# $hdr{connection} = "keep-alive"; 938# $hdr{connection} = "keep-alive";
642# } 939# }
643 940
644 $state{handle}->starttls ("connect") if $rscheme eq "https"; 941 $state{handle}->starttls ("connect") if $rscheme eq "https";
645 942
646 # handle actual, non-tunneled, request
647 my $handle_actual_request = sub {
648 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
649
650 # send request
651 $state{handle}->push_write (
652 "$method $rpath HTTP/1.1\015\012"
653 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
654 . "\015\012"
655 . (delete $arg{body})
656 );
657
658 # return if error occured during push_write()
659 return unless %state;
660
661 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
662
663 # status line and headers
664 $state{read_response} = sub {
665 for ("$_[1]") {
666 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
667
668 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/igxc
669 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
670
671 # 100 Continue handling
672 # should not happen as we don't send expect: 100-continue,
673 # but we handle it just in case.
674 # since we send the request body regardless, if we get an error
675 # we are out of-sync, which we currently do NOT handle correctly.
676 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
677 if $2 eq 100;
678
679 push @pseudo,
680 HTTPVersion => $1,
681 Status => $2,
682 Reason => $3,
683 ;
684
685 my $hdr = parse_hdr
686 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
687
688 %hdr = (%$hdr, @pseudo);
689 }
690
691 # redirect handling
692 # microsoft and other shitheads don't give a shit for following standards,
693 # try to support some common forms of broken Location headers.
694 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
695 $hdr{location} =~ s/^\.\/+//;
696
697 my $url = "$rscheme://$uhost:$uport";
698
699 unless ($hdr{location} =~ s/^\///) {
700 $url .= $upath;
701 $url =~ s/\/[^\/]*$//;
702 }
703
704 $hdr{location} = "$url/$hdr{location}";
705 }
706
707 my $redirect;
708
709 if ($recurse) {
710 my $status = $hdr{Status};
711
712 # industry standard is to redirect POST as GET for
713 # 301, 302 and 303, in contrast to http/1.0 and 1.1.
714 # also, the UA should ask the user for 301 and 307 and POST,
715 # industry standard seems to be to simply follow.
716 # we go with the industry standard.
717 if ($status == 301 or $status == 302 or $status == 303) {
718 # HTTP/1.1 is unclear on how to mutate the method
719 $method = "GET" unless $method eq "HEAD";
720 $redirect = 1;
721 } elsif ($status == 307) {
722 $redirect = 1;
723 }
724 }
725
726 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
727 my $keepalive = pop;
728
729 $state{handle}->destroy if $state{handle};
730 %state = ();
731
732 if (defined $_[1]) {
733 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
734 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
735 }
736
737 # set-cookie processing
738 if ($arg{cookie_jar}) {
739 cookie_jar_set_cookie $arg{cookie_jar}, $hdr{"set-cookie"}, $uhost;
740 }
741
742 if ($redirect && exists $hdr{location}) {
743 # we ignore any errors, as it is very common to receive
744 # Content-Length != 0 but no actual body
745 # we also access %hdr, as $_[1] might be an erro
746 http_request (
747 $method => $hdr{location},
748 %arg,
749 recurse => $recurse - 1,
750 Redirect => [$_[0], \%hdr],
751 $cb);
752 } else {
753 $cb->($_[0], \%hdr);
754 }
755 };
756
757 my $len = $hdr{"content-length"};
758
759 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
760 $finish->(undef, 598 => "Request cancelled by on_header");
761 } elsif (
762 $hdr{Status} =~ /^(?:1..|204|205|304)$/
763 or $method eq "HEAD"
764 or (defined $len && !$len)
765 ) {
766 # no body
767 $finish->("", undef, undef, 1);
768 } else {
769 # body handling, many different code paths
770 # - no body expected
771 # - want_body_handle
772 # - te chunked
773 # - 2x length known (with or without on_body)
774 # - 2x length not known (with or without on_body)
775 if (!$redirect && $arg{want_body_handle}) {
776 $_[0]->on_eof (undef);
777 $_[0]->on_error (undef);
778 $_[0]->on_read (undef);
779
780 $finish->(delete $state{handle});
781
782 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
783 my $cl = 0;
784 my $body = undef;
785 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
786
787 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
788
789 my $read_chunk; $read_chunk = sub {
790 $_[1] =~ /^([0-9a-fA-F]+)/
791 or $finish->(undef, 599 => "Garbled chunked transfer encoding");
792
793 my $len = hex $1;
794
795 if ($len) {
796 $cl += $len;
797
798 $_[0]->push_read (chunk => $len, sub {
799 $on_body->($_[1], \%hdr)
800 or return $finish->(undef, 598 => "Request cancelled by on_body");
801
802 $_[0]->push_read (line => sub {
803 length $_[1]
804 and return $finish->(undef, 599 => "Garbled chunked transfer encoding");
805 $_[0]->push_read (line => $read_chunk);
806 });
807 });
808 } else {
809 $hdr{"content-length"} ||= $cl;
810
811 $_[0]->push_read (line => $qr_nlnl, sub {
812 if (length $_[1]) {
813 for ("$_[1]") {
814 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
815
816 my $hdr = parse_hdr
817 or return $finish->(undef, 599 => "Garbled response trailers");
818
819 %hdr = (%hdr, %$hdr);
820 }
821 }
822
823 $finish->($body, undef, undef, 1);
824 });
825 }
826 };
827
828 $_[0]->push_read (line => $read_chunk);
829
830 } elsif ($arg{on_body}) {
831 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
832
833 if ($len) {
834 $_[0]->on_read (sub {
835 $len -= length $_[0]{rbuf};
836
837 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
838 or return $finish->(undef, 598 => "Request cancelled by on_body");
839
840 $len > 0
841 or $finish->("", undef, undef, 1);
842 });
843 } else {
844 $_[0]->on_eof (sub {
845 $finish->("");
846 });
847 $_[0]->on_read (sub {
848 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
849 or $finish->(undef, 598 => "Request cancelled by on_body");
850 });
851 }
852 } else {
853 $_[0]->on_eof (undef);
854
855 if ($len) {
856 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
857 $_[0]->on_read (sub {
858 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
859 if $len <= length $_[0]{rbuf};
860 });
861 } else {
862 $_[0]->on_error (sub {
863 ($! == Errno::EPIPE || !$!)
864 ? $finish->(delete $_[0]{rbuf})
865 : $finish->(undef, 599 => $_[2]);
866 });
867 $_[0]->on_read (sub { });
868 }
869 }
870 }
871 };
872
873 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
874 };
875
876 # now handle proxy-CONNECT method 943 # now handle proxy-CONNECT method
877 if ($proxy && $uscheme eq "https") { 944 if ($proxy && $uscheme eq "https") {
878 # oh dear, we have to wrap it into a connect request 945 # oh dear, we have to wrap it into a connect request
879 946
880 # maybe re-use $uauthority with patched port? 947 # maybe re-use $uauthority with patched port?
881 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012"); 948 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012");
882 $state{handle}->push_read (line => $qr_nlnl, sub { 949 $state{handle}->push_read (line => $qr_nlnl, sub {
883 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 950 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
884 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" })); 951 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }));
885 952
886 if ($2 == 200) { 953 if ($2 == 200) {
887 $rpath = $upath; 954 $rpath = $upath;
888 &$handle_actual_request; 955 $handle_actual_request->();
889 } else { 956 } else {
890 %state = (); 957 %state = ();
891 $cb->(undef, { @pseudo, Status => $2, Reason => $3 }); 958 $cb->(undef, { @pseudo, Status => $2, Reason => $3 });
892 }
893 }); 959 }
894 } else {
895 &$handle_actual_request;
896 } 960 });
961 } else {
962 $handle_actual_request->();
897 }; 963 }
964 };
965
966 _get_slot $uhost, sub {
967 $state{slot_guard} = shift;
968
969 return unless $state{connect_guard};
898 970
899 my $tcp_connect = $arg{tcp_connect} 971 my $tcp_connect = $arg{tcp_connect}
900 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 972 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
901 973
902 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout }); 974 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
903
904 }; 975 };
905 976
906 defined wantarray && AnyEvent::Util::guard { %state = () } 977 defined wantarray && AnyEvent::Util::guard { %state = () }
907} 978}
908 979
943string of the form C<http://host:port> (optionally C<https:...>), croaks 1014string of the form C<http://host:port> (optionally C<https:...>), croaks
944otherwise. 1015otherwise.
945 1016
946To clear an already-set proxy, use C<undef>. 1017To clear an already-set proxy, use C<undef>.
947 1018
1019=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1020
1021Remove all cookies from the cookie jar that have been expired. If
1022C<$session_end> is given and true, then additionally remove all session
1023cookies.
1024
1025You should call this function (with a true C<$session_end>) before you
1026save cookies to disk, and you should call this function after loading them
1027again. If you have a long-running program you can additonally call this
1028function from time to time.
1029
1030A cookie jar is initially an empty hash-reference that is managed by this
1031module. It's format is subject to change, but currently it is like this:
1032
1033The key C<version> has to contain C<1>, otherwise the hash gets
1034emptied. All other keys are hostnames or IP addresses pointing to
1035hash-references. The key for these inner hash references is the
1036server path for which this cookie is meant, and the values are again
1037hash-references. The keys of those hash-references is the cookie name, and
1038the value, you guessed it, is another hash-reference, this time with the
1039key-value pairs from the cookie, except for C<expires> and C<max-age>,
1040which have been replaced by a C<_expires> key that contains the cookie
1041expiry timestamp.
1042
1043Here is an example of a cookie jar with a single cookie, so you have a
1044chance of understanding the above paragraph:
1045
1046 {
1047 version => 1,
1048 "10.0.0.1" => {
1049 "/" => {
1050 "mythweb_id" => {
1051 _expires => 1293917923,
1052 value => "ooRung9dThee3ooyXooM1Ohm",
1053 },
1054 },
1055 },
1056 }
1057
948=item $date = AnyEvent::HTTP::format_date $timestamp 1058=item $date = AnyEvent::HTTP::format_date $timestamp
949 1059
950Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP 1060Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
951Date (RFC 2616). 1061Date (RFC 2616).
952 1062
953=item $timestamp = AnyEvent::HTTP::parse_date $date 1063=item $timestamp = AnyEvent::HTTP::parse_date $date
954 1064
955Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) and 1065Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) or a
956returns the corresponding POSIX timestamp, or C<undef> if the date cannot 1066bunch of minor variations of those, and returns the corresponding POSIX
957be parsed. 1067timestamp, or C<undef> if the date cannot be parsed.
958 1068
959=item $AnyEvent::HTTP::MAX_RECURSE 1069=item $AnyEvent::HTTP::MAX_RECURSE
960 1070
961The default value for the C<recurse> request parameter (default: C<10>). 1071The default value for the C<recurse> request parameter (default: C<10>).
962 1072
1001sub parse_date($) { 1111sub parse_date($) {
1002 my ($date) = @_; 1112 my ($date) = @_;
1003 1113
1004 my ($d, $m, $y, $H, $M, $S); 1114 my ($d, $m, $y, $H, $M, $S);
1005 1115
1006 if ($date =~ /^[A-Z][a-z][a-z], ([0-9][0-9])[\- ]([A-Z][a-z][a-z])[\- ]([0-9][0-9][0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) { 1116 if ($date =~ /^[A-Z][a-z][a-z]+, ([0-9][0-9]?)[\- ]([A-Z][a-z][a-z])[\- ]([0-9][0-9][0-9][0-9]) ([0-9][0-9]?):([0-9][0-9]?):([0-9][0-9]?) GMT$/) {
1007 # RFC 822/1123, required by RFC 2616 (with " ") 1117 # RFC 822/1123, required by RFC 2616 (with " ")
1008 # cookie dates (with "-") 1118 # cookie dates (with "-")
1009 1119
1010 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6); 1120 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6);
1011 1121
1012 } elsif ($date =~ /^[A-Z][a-z]+, ([0-9][0-9])-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) { 1122 } elsif ($date =~ /^[A-Z][a-z][a-z]+, ([0-9][0-9]?)-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]?):([0-9][0-9]?):([0-9][0-9]?) GMT$/) {
1013 # RFC 850 1123 # RFC 850
1014 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6); 1124 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6);
1015 1125
1016 } elsif ($date =~ /^[A-Z][a-z][a-z] ([A-Z][a-z][a-z]) ([0-9 ][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) ([0-9][0-9][0-9][0-9])$/) { 1126 } elsif ($date =~ /^[A-Z][a-z][a-z]+ ([A-Z][a-z][a-z]) ([0-9 ]?[0-9]) ([0-9][0-9]?):([0-9][0-9]?):([0-9][0-9]?) ([0-9][0-9][0-9][0-9])$/) {
1017 # ISO C's asctime 1127 # ISO C's asctime
1018 ($d, $m, $y, $H, $M, $S) = ($2, $1, $6, $3, $4, $5); 1128 ($d, $m, $y, $H, $M, $S) = ($2, $1, $6, $3, $4, $5);
1019 } 1129 }
1020 # other formats fail in the loop below 1130 # other formats fail in the loop below
1021 1131

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines