ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.69 by root, Fri Dec 31 19:32:47 2010 UTC vs.
Revision 1.90 by root, Mon Jan 3 00:41:25 2011 UTC

36 36
37=cut 37=cut
38 38
39package AnyEvent::HTTP; 39package AnyEvent::HTTP;
40 40
41use strict; 41use common::sense;
42no warnings;
43 42
44use Errno (); 43use Errno ();
45 44
46use AnyEvent 5.0 (); 45use AnyEvent 5.0 ();
47use AnyEvent::Util (); 46use AnyEvent::Util ();
58our $MAX_PERSISTENT = 8; 57our $MAX_PERSISTENT = 8;
59our $PERSISTENT_TIMEOUT = 2; 58our $PERSISTENT_TIMEOUT = 2;
60our $TIMEOUT = 300; 59our $TIMEOUT = 300;
61 60
62# changing these is evil 61# changing these is evil
63our $MAX_PERSISTENT_PER_HOST = 0; 62our $MAX_PERSISTENT_PER_HOST = 2;
64our $MAX_PER_HOST = 4; 63our $MAX_PER_HOST = 4;
65 64
66our $PROXY; 65our $PROXY;
67our $ACTIVE = 0; 66our $ACTIVE = 0;
68 67
122 121
123If the server sends a header multiple times, then their contents will be 122If the server sends a header multiple times, then their contents will be
124joined together with a comma (C<,>), as per the HTTP spec. 123joined together with a comma (C<,>), as per the HTTP spec.
125 124
126If an internal error occurs, such as not being able to resolve a hostname, 125If an internal error occurs, such as not being able to resolve a hostname,
127then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x> 126then C<$data> will be C<undef>, C<< $headers->{Status} >> will be
128(usually C<599>) and the C<Reason> pseudo-header will contain an error 127C<590>-C<599> and the C<Reason> pseudo-header will contain an error
129message. 128message. Currently the following status codes are used:
129
130=over 4
131
132=item 595 - errors during connection etsbalishment, proxy handshake.
133
134=item 596 - errors during TLS negotiation, request sending and header processing.
135
136=item 597 - errors during body receiving or processing.
137
138=item 598 - user aborted request via C<on_header> or C<on_body>.
139
140=item 599 - other, usually nonretryable, errors (garbled URL etc.).
141
142=back
130 143
131A typical callback might look like this: 144A typical callback might look like this:
132 145
133 sub { 146 sub {
134 my ($body, $hdr) = @_; 147 my ($body, $hdr) = @_;
156C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 169C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
157will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:> 170will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
158(this can be suppressed by using C<undef> for these headers in which case 171(this can be suppressed by using C<undef> for these headers in which case
159they won't be sent at all). 172they won't be sent at all).
160 173
174You really should provide your own C<User-Agent:> header value that is
175appropriate for your program - I wouldn't be surprised if the default
176AnyEvent string gets blocked by webservers sooner or later.
177
161=item timeout => $seconds 178=item timeout => $seconds
162 179
163The time-out to use for various stages - each connect attempt will reset 180The time-out to use for various stages - each connect attempt will reset
164the timeout, as will read or write activity, i.e. this is not an overall 181the timeout, as will read or write activity, i.e. this is not an overall
165timeout. 182timeout.
182=item cookie_jar => $hash_ref 199=item cookie_jar => $hash_ref
183 200
184Passing this parameter enables (simplified) cookie-processing, loosely 201Passing this parameter enables (simplified) cookie-processing, loosely
185based on the original netscape specification. 202based on the original netscape specification.
186 203
187The C<$hash_ref> must be an (initially empty) hash reference which will 204The C<$hash_ref> must be an (initially empty) hash reference which
188get updated automatically. It is possible to save the cookie_jar to 205will get updated automatically. It is possible to save the cookie jar
189persistent storage with something like JSON or Storable, but this is not 206to persistent storage with something like JSON or Storable - see the
190recommended, as expiry times are currently being ignored. 207C<AnyEvent::HTTP::cookie_jar_expire> function if you wish to remove
208expired or session-only cookies, and also for documentation on the format
209of the cookie jar.
191 210
192Note that this cookie implementation is not of very high quality, nor 211Note that this cookie implementation is not meant to be complete. If
193meant to be complete. If you want complete cookie management you have to 212you want complete cookie management you have to do that on your
194do that on your own. C<cookie_jar> is meant as a quick fix to get some 213own. C<cookie_jar> is meant as a quick fix to get most cookie-using sites
195cookie-using sites working. Cookies are a privacy disaster, do not use 214working. Cookies are a privacy disaster, do not use them unless required
196them unless required to. 215to.
197 216
198When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:> 217When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:>
199headers will be ste and handled by this module, otherwise they will be 218headers will be set and handled by this module, otherwise they will be
200left untouched. 219left untouched.
201 220
202=item tls_ctx => $scheme | $tls_ctx 221=item tls_ctx => $scheme | $tls_ctx
203 222
204Specifies the AnyEvent::TLS context to be used for https connections. This 223Specifies the AnyEvent::TLS context to be used for https connections. This
314Example: do a HTTP HEAD request on https://www.google.com/, use a 333Example: do a HTTP HEAD request on https://www.google.com/, use a
315timeout of 30 seconds. 334timeout of 30 seconds.
316 335
317 http_request 336 http_request
318 GET => "https://www.google.com", 337 GET => "https://www.google.com",
338 headers => { "user-agent" => "MySearchClient 1.0" },
319 timeout => 30, 339 timeout => 30,
320 sub { 340 sub {
321 my ($body, $hdr) = @_; 341 my ($body, $hdr) = @_;
322 use Data::Dumper; 342 use Data::Dumper;
323 print Dumper $hdr; 343 print Dumper $hdr;
364 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 384 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
365 385
366 _slot_schedule $_[0]; 386 _slot_schedule $_[0];
367} 387}
368 388
389#############################################################################
390
391# expire cookies
392sub cookie_jar_expire($;$) {
393 my ($jar, $session_end) = @_;
394
395 %$jar = () if $jar->{version} != 1;
396
397 my $anow = AE::now;
398
399 while (my ($chost, $paths) = each %$jar) {
400 next unless ref $paths;
401
402 while (my ($cpath, $cookies) = each %$paths) {
403 while (my ($cookie, $kv) = each %$cookies) {
404 if (exists $kv->{_expires}) {
405 delete $cookies->{$cookie}
406 if $anow > $kv->{_expires};
407 } elsif ($session_end) {
408 delete $cookies->{$cookie};
409 }
410 }
411
412 delete $paths->{$cpath}
413 unless %$cookies;
414 }
415
416 delete $jar->{$chost}
417 unless %$paths;
418 }
419}
420
421# extract cookies from jar
422sub cookie_jar_extract($$$$) {
423 my ($jar, $uscheme, $uhost, $upath) = @_;
424
425 %$jar = () if $jar->{version} != 1;
426
427 my @cookies;
428
429 while (my ($chost, $paths) = each %$jar) {
430 next unless ref $paths;
431
432 if ($chost =~ /^\./) {
433 next unless $chost eq substr $uhost, -length $chost;
434 } elsif ($chost =~ /\./) {
435 next unless $chost eq $uhost;
436 } else {
437 next;
438 }
439
440 while (my ($cpath, $cookies) = each %$paths) {
441 next unless $cpath eq substr $upath, 0, length $cpath;
442
443 while (my ($cookie, $kv) = each %$cookies) {
444 next if $uscheme ne "https" && exists $kv->{secure};
445
446 if (exists $kv->{_expires} and AE::now > $kv->{_expires}) {
447 delete $cookies->{$cookie};
448 next;
449 }
450
451 my $value = $kv->{value};
452
453 if ($value =~ /[=;,[:space:]]/) {
454 $value =~ s/([\\"])/\\$1/g;
455 $value = "\"$value\"";
456 }
457
458 push @cookies, "$cookie=$value";
459 }
460 }
461 }
462
463 \@cookies
464}
465
466# parse set_cookie header into jar
467sub cookie_jar_set_cookie($$$$) {
468 my ($jar, $set_cookie, $uhost, $date) = @_;
469
470 my $anow = int AE::now;
471 my $snow; # server-now
472
473 for ($set_cookie) {
474 # parse NAME=VALUE
475 my @kv;
476
477 # expires is not http-compliant in the original cookie-spec,
478 # we support the official date format and some extensions
479 while (
480 m{
481 \G\s*
482 (?:
483 expires \s*=\s* ([A-Z][a-z][a-z]+,\ [^,;]+)
484 | ([^=;,[:space:]]+) (?: \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) ) )?
485 )
486 }gcxsi
487 ) {
488 my $name = $2;
489 my $value = $4;
490
491 if (defined $1) {
492 # expires
493 $name = "expires";
494 $value = $1;
495 } elsif (defined $3) {
496 # quoted
497 $value = $3;
498 $value =~ s/\\(.)/$1/gs;
499 }
500
501 push @kv, lc $name, $value;
502
503 last unless /\G\s*;/gc;
504 }
505
506 last unless @kv;
507
508 my $name = shift @kv;
509 my %kv = (value => shift @kv, @kv);
510
511 if (exists $kv{"max-age"}) {
512 $kv{_expires} = $anow + delete $kv{"max-age"};
513 } elsif (exists $kv{expires}) {
514 $snow ||= parse_date ($date) || $anow;
515 $kv{_expires} = $anow + (parse_date (delete $kv{expires}) - $snow);
516 } else {
517 delete $kv{_expires};
518 }
519
520 my $cdom;
521 my $cpath = (delete $kv{path}) || "/";
522
523 if (exists $kv{domain}) {
524 $cdom = delete $kv{domain};
525
526 $cdom =~ s/^\.?/./; # make sure it starts with a "."
527
528 next if $cdom =~ /\.$/;
529
530 # this is not rfc-like and not netscape-like. go figure.
531 my $ndots = $cdom =~ y/.//;
532 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
533 } else {
534 $cdom = $uhost;
535 }
536
537 # store it
538 $jar->{version} = 1;
539 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
540
541 redo if /\G\s*,/gc;
542 }
543}
544
369# continue to parse $_ for headers and place them into the arg 545# continue to parse $_ for headers and place them into the arg
370sub parse_hdr() { 546sub parse_hdr() {
371 my %hdr; 547 my %hdr;
372 548
373 # things seen, not parsed: 549 # things seen, not parsed:
435 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 611 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
436 612
437 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 613 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
438 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 614 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
439 615
440 my $uhost = $1; 616 my $uhost = lc $1;
441 $uport = $2 if defined $2; 617 $uport = $2 if defined $2;
442 618
443 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost" 619 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
444 unless exists $hdr{host}; 620 unless exists $hdr{host};
445 621
448 624
449 $upath =~ s%^/?%/%; 625 $upath =~ s%^/?%/%;
450 626
451 # cookie processing 627 # cookie processing
452 if (my $jar = $arg{cookie_jar}) { 628 if (my $jar = $arg{cookie_jar}) {
453 %$jar = () if $jar->{version} != 1; 629 my $cookies = cookie_jar_extract $jar, $uscheme, $uhost, $upath;
454 630
455 my @cookie;
456
457 while (my ($chost, $v) = each %$jar) {
458 if ($chost =~ /^\./) {
459 next unless $chost eq substr $uhost, -length $chost;
460 } elsif ($chost =~ /\./) {
461 next unless $chost eq $uhost;
462 } else {
463 next;
464 }
465
466 while (my ($cpath, $v) = each %$v) {
467 next unless $cpath eq substr $upath, 0, length $cpath;
468
469 while (my ($k, $v) = each %$v) {
470 next if $uscheme ne "https" && exists $v->{secure};
471 my $value = $v->{value};
472 $value =~ s/([\\"])/\\$1/g;
473 push @cookie, "$k=\"$value\"";
474 }
475 }
476 }
477
478 $hdr{cookie} = join "; ", @cookie 631 $hdr{cookie} = join "; ", @$cookies
479 if @cookie; 632 if @$cookies;
480 } 633 }
481 634
482 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path 635 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
483 636
484 if ($proxy) { 637 if ($proxy) {
487 $rscheme = "http" unless defined $rscheme; 640 $rscheme = "http" unless defined $rscheme;
488 641
489 # don't support https requests over https-proxy transport, 642 # don't support https requests over https-proxy transport,
490 # can't be done with tls as spec'ed, unless you double-encrypt. 643 # can't be done with tls as spec'ed, unless you double-encrypt.
491 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https"; 644 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
645
646 $rhost = lc $rhost;
647 $rscheme = lc $rscheme;
492 } else { 648 } else {
493 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath); 649 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
494 } 650 }
495 651
496 # leave out fragment and query string, just a heuristic 652 # leave out fragment and query string, just a heuristic
498 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 654 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
499 655
500 $hdr{"content-length"} = length $arg{body} 656 $hdr{"content-length"} = length $arg{body}
501 if length $arg{body} || $method ne "GET"; 657 if length $arg{body} || $method ne "GET";
502 658
503 $hdr{connection} = "close TE"; #1.1 659 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/;
660
661 # default value for keepalive is true iff the request is for an idempotent method
662 my $keepalive = exists $arg{keepalive}
663 ? $arg{keepalive}*1
664 : $idempotent ? $PERSISTENT_TIMEOUT : 0;
665
666 $hdr{connection} = ($keepalive ? "" : "close ") . "Te"; #1.1
504 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 667 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
505 668
506 my %state = (connect_guard => 1); 669 my %state = (connect_guard => 1);
507 670
508 _get_slot $uhost, sub { 671 my $ae_error = 595; # connecting
509 $state{slot_guard} = shift;
510 672
673 # handle actual, non-tunneled, request
674 my $handle_actual_request = sub {
675 $ae_error = 596; # request phase
676
677 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
678
679 # send request
680 $state{handle}->push_write (
681 "$method $rpath HTTP/1.1\015\012"
682 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
683 . "\015\012"
684 . (delete $arg{body})
685 );
686
687 # return if error occured during push_write()
511 return unless $state{connect_guard}; 688 return unless %state;
512 689
513 my $connect_cb = sub { 690 # reduce memory usage, save a kitten, also re-use it for the response headers.
514 $state{fh} = shift 691 %hdr = ();
692
693 # status line and headers
694 $state{read_response} = sub {
695 for ("$_[1]") {
696 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
697
698 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
699 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
700
701 # 100 Continue handling
702 # should not happen as we don't send expect: 100-continue,
703 # but we handle it just in case.
704 # since we send the request body regardless, if we get an error
705 # we are out of-sync, which we currently do NOT handle correctly.
706 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
707 if $2 eq 100;
708
709 push @pseudo,
710 HTTPVersion => $1,
711 Status => $2,
712 Reason => $3,
515 or do { 713 ;
516 my $err = "$!"; 714
715 my $hdr = parse_hdr
716 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
717
718 %hdr = (%$hdr, @pseudo);
719 }
720
721 # redirect handling
722 # microsoft and other shitheads don't give a shit for following standards,
723 # try to support some common forms of broken Location headers.
724 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
725 $hdr{location} =~ s/^\.\/+//;
726
727 my $url = "$rscheme://$uhost:$uport";
728
729 unless ($hdr{location} =~ s/^\///) {
730 $url .= $upath;
731 $url =~ s/\/[^\/]*$//;
732 }
733
734 $hdr{location} = "$url/$hdr{location}";
735 }
736
737 my $redirect;
738
739 if ($recurse) {
740 my $status = $hdr{Status};
741
742 # industry standard is to redirect POST as GET for
743 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
744 # also, the UA should ask the user for 301 and 307 and POST,
745 # industry standard seems to be to simply follow.
746 # we go with the industry standard.
747 if ($status == 301 or $status == 302 or $status == 303) {
748 # HTTP/1.1 is unclear on how to mutate the method
749 $method = "GET" unless $method eq "HEAD";
750 $redirect = 1;
751 } elsif ($status == 307) {
752 $redirect = 1;
753 }
754 }
755
756 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
757 my $may_keep_alive = $_[3];
758
759 $state{handle}->destroy if $state{handle};
517 %state = (); 760 %state = ();
518 return $cb->(undef, { @pseudo, Status => 599, Reason => $err }); 761
762 if (defined $_[1]) {
763 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
764 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
765 }
766
767 # set-cookie processing
768 if ($arg{cookie_jar}) {
769 cookie_jar_set_cookie $arg{cookie_jar}, $hdr{"set-cookie"}, $uhost, $hdr{date};
770 }
771
772 if ($redirect && exists $hdr{location}) {
773 # we ignore any errors, as it is very common to receive
774 # Content-Length != 0 but no actual body
775 # we also access %hdr, as $_[1] might be an erro
776 http_request (
777 $method => $hdr{location},
778 %arg,
779 recurse => $recurse - 1,
780 Redirect => [$_[0], \%hdr],
781 $cb);
782 } else {
783 $cb->($_[0], \%hdr);
784 }
785 };
786
787 $ae_error = 597; # body phase
788
789 my $len = $hdr{"content-length"};
790
791 # body handling, many different code paths
792 # - no body expected
793 # - want_body_handle
794 # - te chunked
795 # - 2x length known (with or without on_body)
796 # - 2x length not known (with or without on_body)
797 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
798 $finish->(undef, 598 => "Request cancelled by on_header");
799 } elsif (
800 $hdr{Status} =~ /^(?:1..|204|205|304)$/
801 or $method eq "HEAD"
802 or (defined $len && $len == 0) # == 0, not !, because "0 " is true
803 ) {
804 # no body
805 $finish->("", undef, undef, 1);
806
807 } elsif (!$redirect && $arg{want_body_handle}) {
808 $_[0]->on_eof (undef);
809 $_[0]->on_error (undef);
810 $_[0]->on_read (undef);
811
812 $finish->(delete $state{handle});
813
814 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
815 my $cl = 0;
816 my $body = undef;
817 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
818
819 $state{read_chunk} = sub {
820 $_[1] =~ /^([0-9a-fA-F]+)/
821 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
822
823 my $len = hex $1;
824
825 if ($len) {
826 $cl += $len;
827
828 $_[0]->push_read (chunk => $len, sub {
829 $on_body->($_[1], \%hdr)
830 or return $finish->(undef, 598 => "Request cancelled by on_body");
831
832 $_[0]->push_read (line => sub {
833 length $_[1]
834 and return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
835 $_[0]->push_read (line => $state{read_chunk});
836 });
837 });
838 } else {
839 $hdr{"content-length"} ||= $cl;
840
841 $_[0]->push_read (line => $qr_nlnl, sub {
842 if (length $_[1]) {
843 for ("$_[1]") {
844 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
845
846 my $hdr = parse_hdr
847 or return $finish->(undef, $ae_error => "Garbled response trailers");
848
849 %hdr = (%hdr, %$hdr);
850 }
851 }
852
853 $finish->($body, undef, undef, 1);
854 });
855 }
519 }; 856 };
520 857
521 pop; # free memory, save a tree 858 $_[0]->push_read (line => $state{read_chunk});
522 859
860 } elsif ($arg{on_body}) {
861 if (defined $len) {
862 $_[0]->on_read (sub {
863 $len -= length $_[0]{rbuf};
864
865 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
866 or return $finish->(undef, 598 => "Request cancelled by on_body");
867
868 $len > 0
869 or $finish->("", undef, undef, 1);
870 });
871 } else {
872 $_[0]->on_eof (sub {
873 $finish->("");
874 });
875 $_[0]->on_read (sub {
876 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
877 or $finish->(undef, 598 => "Request cancelled by on_body");
878 });
879 }
880 } else {
881 $_[0]->on_eof (undef);
882
883 if (defined $len) {
884 $_[0]->on_read (sub {
885 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
886 if $len <= length $_[0]{rbuf};
887 });
888 } else {
889 $_[0]->on_error (sub {
890 ($! == Errno::EPIPE || !$!)
891 ? $finish->(delete $_[0]{rbuf})
892 : $finish->(undef, $ae_error => $_[2]);
893 });
894 $_[0]->on_read (sub { });
895 }
896 }
897 };
898
899 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
900 };
901
902 my $connect_cb = sub {
903 $state{fh} = shift
904 or do {
905 my $err = "$!";
906 %state = ();
907 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err });
908 };
909
523 return unless delete $state{connect_guard}; 910 return unless delete $state{connect_guard};
524 911
525 # get handle 912 # get handle
526 $state{handle} = new AnyEvent::Handle 913 $state{handle} = new AnyEvent::Handle
527 fh => $state{fh}, 914 fh => $state{fh},
528 peername => $rhost, 915 peername => $rhost,
529 tls_ctx => $arg{tls_ctx}, 916 tls_ctx => $arg{tls_ctx},
530 # these need to be reconfigured on keepalive handles 917 # these need to be reconfigured on keepalive handles
531 timeout => $timeout, 918 timeout => $timeout,
532 on_error => sub { 919 on_error => sub {
533 %state = (); 920 %state = ();
534 $cb->(undef, { @pseudo, Status => 599, Reason => $_[2] }); 921 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
535 }, 922 },
536 on_eof => sub { 923 on_eof => sub {
537 %state = (); 924 %state = ();
538 $cb->(undef, { @pseudo, Status => 599, Reason => "Unexpected end-of-file" }); 925 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
539 }, 926 },
540 ; 927 ;
541 928
542 # limit the number of persistent connections 929 # limit the number of persistent connections
543 # keepalive not yet supported 930 # keepalive not yet supported
544# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 931# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
545# ++$KA_COUNT{$_[1]}; 932# ++$KA_COUNT{$_[1]};
546# $state{handle}{ka_count_guard} = AnyEvent::Util::guard { 933# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
547# --$KA_COUNT{$_[1]} 934# --$KA_COUNT{$_[1]}
548# }; 935# };
549# $hdr{connection} = "keep-alive"; 936# $hdr{connection} = "keep-alive";
550# } 937# }
551 938
552 $state{handle}->starttls ("connect") if $rscheme eq "https"; 939 $state{handle}->starttls ("connect") if $rscheme eq "https";
553 940
554 # handle actual, non-tunneled, request
555 my $handle_actual_request = sub {
556 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
557
558 # send request
559 $state{handle}->push_write (
560 "$method $rpath HTTP/1.1\015\012"
561 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
562 . "\015\012"
563 . (delete $arg{body})
564 );
565
566 # return if error occured during push_write()
567 return unless %state;
568
569 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
570
571 # status line and headers
572 $state{read_response} = sub {
573 for ("$_[1]") {
574 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
575
576 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/igxc
577 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
578
579 # 100 Continue handling
580 # should not happen as we don't send expect: 100-continue,
581 # but we handle it just in case.
582 # since we send the request body regardless, if we get an error
583 # we are out of-sync, which we currently do NOT handle correctly.
584 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
585 if $2 eq 100;
586
587 push @pseudo,
588 HTTPVersion => $1,
589 Status => $2,
590 Reason => $3,
591 ;
592
593 my $hdr = parse_hdr
594 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
595
596 %hdr = (%$hdr, @pseudo);
597 }
598
599 # redirect handling
600 # microsoft and other shitheads don't give a shit for following standards,
601 # try to support some common forms of broken Location headers.
602 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
603 $hdr{location} =~ s/^\.\/+//;
604
605 my $url = "$rscheme://$uhost:$uport";
606
607 unless ($hdr{location} =~ s/^\///) {
608 $url .= $upath;
609 $url =~ s/\/[^\/]*$//;
610 }
611
612 $hdr{location} = "$url/$hdr{location}";
613 }
614
615 my $redirect;
616
617 if ($recurse) {
618 my $status = $hdr{Status};
619
620 # industry standard is to redirect POST as GET for
621 # 301, 302 and 303, in contrast to http/1.0 and 1.1.
622 # also, the UA should ask the user for 301 and 307 and POST,
623 # industry standard seems to be to simply follow.
624 # we go with the industry standard.
625 if ($status == 301 or $status == 302 or $status == 303) {
626 # HTTP/1.1 is unclear on how to mutate the method
627 $method = "GET" unless $method eq "HEAD";
628 $redirect = 1;
629 } elsif ($status == 307) {
630 $redirect = 1;
631 }
632 }
633
634 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
635 my $keepalive = pop;
636
637 $state{handle}->destroy if $state{handle};
638 %state = ();
639
640 if (defined $_[1]) {
641 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
642 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
643 }
644
645 # set-cookie processing
646 if ($arg{cookie_jar}) {
647 for ($hdr{"set-cookie"}) {
648 # parse NAME=VALUE
649 my @kv;
650
651 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
652 my $name = $1;
653 my $value = $3;
654
655 unless ($value) {
656 $value = $2;
657 $value =~ s/\\(.)/$1/gs;
658 }
659
660 push @kv, $name => $value;
661
662 last unless /\G\s*;/gc;
663 }
664
665 last unless @kv;
666
667 my $name = shift @kv;
668 my %kv = (value => shift @kv, @kv);
669
670 my $cdom;
671 my $cpath = (delete $kv{path}) || "/";
672
673 if (exists $kv{domain}) {
674 $cdom = delete $kv{domain};
675
676 $cdom =~ s/^\.?/./; # make sure it starts with a "."
677
678 next if $cdom =~ /\.$/;
679
680 # this is not rfc-like and not netscape-like. go figure.
681 my $ndots = $cdom =~ y/.//;
682 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
683 } else {
684 $cdom = $uhost;
685 }
686
687 # store it
688 $arg{cookie_jar}{version} = 1;
689 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
690
691 redo if /\G\s*,/gc;
692 }
693 }
694
695 if ($redirect && exists $hdr{location}) {
696 # we ignore any errors, as it is very common to receive
697 # Content-Length != 0 but no actual body
698 # we also access %hdr, as $_[1] might be an erro
699 http_request (
700 $method => $hdr{location},
701 %arg,
702 recurse => $recurse - 1,
703 Redirect => [$_[0], \%hdr],
704 $cb);
705 } else {
706 $cb->($_[0], \%hdr);
707 }
708 };
709
710 my $len = $hdr{"content-length"};
711
712 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
713 $finish->(undef, 598 => "Request cancelled by on_header");
714 } elsif (
715 $hdr{Status} =~ /^(?:1..|204|205|304)$/
716 or $method eq "HEAD"
717 or (defined $len && !$len)
718 ) {
719 # no body
720 $finish->("", undef, undef, 1);
721 } else {
722 # body handling, many different code paths
723 # - no body expected
724 # - want_body_handle
725 # - te chunked
726 # - 2x length known (with or without on_body)
727 # - 2x length not known (with or without on_body)
728 if (!$redirect && $arg{want_body_handle}) {
729 $_[0]->on_eof (undef);
730 $_[0]->on_error (undef);
731 $_[0]->on_read (undef);
732
733 $finish->(delete $state{handle});
734
735 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
736 my $cl = 0;
737 my $body = undef;
738 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
739
740 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
741
742 my $read_chunk; $read_chunk = sub {
743 $_[1] =~ /^([0-9a-fA-F]+)/
744 or $finish->(undef, 599 => "Garbled chunked transfer encoding");
745
746 my $len = hex $1;
747
748 if ($len) {
749 $cl += $len;
750
751 $_[0]->push_read (chunk => $len, sub {
752 $on_body->($_[1], \%hdr)
753 or return $finish->(undef, 598 => "Request cancelled by on_body");
754
755 $_[0]->push_read (line => sub {
756 length $_[1]
757 and return $finish->(undef, 599 => "Garbled chunked transfer encoding");
758 $_[0]->push_read (line => $read_chunk);
759 });
760 });
761 } else {
762 $hdr{"content-length"} ||= $cl;
763
764 $_[0]->push_read (line => $qr_nlnl, sub {
765 if (length $_[1]) {
766 for ("$_[1]") {
767 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
768
769 my $hdr = parse_hdr
770 or return $finish->(undef, 599 => "Garbled response trailers");
771
772 %hdr = (%hdr, %$hdr);
773 }
774 }
775
776 $finish->($body, undef, undef, 1);
777 });
778 }
779 };
780
781 $_[0]->push_read (line => $read_chunk);
782
783 } elsif ($arg{on_body}) {
784 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
785
786 if ($len) {
787 $_[0]->on_read (sub {
788 $len -= length $_[0]{rbuf};
789
790 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
791 or return $finish->(undef, 598 => "Request cancelled by on_body");
792
793 $len > 0
794 or $finish->("", undef, undef, 1);
795 });
796 } else {
797 $_[0]->on_eof (sub {
798 $finish->("");
799 });
800 $_[0]->on_read (sub {
801 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
802 or $finish->(undef, 598 => "Request cancelled by on_body");
803 });
804 }
805 } else {
806 $_[0]->on_eof (undef);
807
808 if ($len) {
809 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
810 $_[0]->on_read (sub {
811 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
812 if $len <= length $_[0]{rbuf};
813 });
814 } else {
815 $_[0]->on_error (sub {
816 ($! == Errno::EPIPE || !$!)
817 ? $finish->(delete $_[0]{rbuf})
818 : $finish->(undef, 599 => $_[2]);
819 });
820 $_[0]->on_read (sub { });
821 }
822 }
823 }
824 };
825
826 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
827 };
828
829 # now handle proxy-CONNECT method 941 # now handle proxy-CONNECT method
830 if ($proxy && $uscheme eq "https") { 942 if ($proxy && $uscheme eq "https") {
831 # oh dear, we have to wrap it into a connect request 943 # oh dear, we have to wrap it into a connect request
832 944
833 # maybe re-use $uauthority with patched port? 945 # maybe re-use $uauthority with patched port?
834 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012"); 946 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012");
835 $state{handle}->push_read (line => $qr_nlnl, sub { 947 $state{handle}->push_read (line => $qr_nlnl, sub {
836 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 948 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
837 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" })); 949 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }));
838 950
839 if ($2 == 200) { 951 if ($2 == 200) {
840 $rpath = $upath; 952 $rpath = $upath;
841 &$handle_actual_request; 953 $handle_actual_request->();
842 } else { 954 } else {
843 %state = (); 955 %state = ();
844 $cb->(undef, { @pseudo, Status => $2, Reason => $3 }); 956 $cb->(undef, { @pseudo, Status => $2, Reason => $3 });
845 }
846 }); 957 }
847 } else {
848 &$handle_actual_request;
849 } 958 });
959 } else {
960 $handle_actual_request->();
850 }; 961 }
962 };
963
964 _get_slot $uhost, sub {
965 $state{slot_guard} = shift;
966
967 return unless $state{connect_guard};
851 968
852 my $tcp_connect = $arg{tcp_connect} 969 my $tcp_connect = $arg{tcp_connect}
853 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 970 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
854 971
855 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout }); 972 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
856
857 }; 973 };
858 974
859 defined wantarray && AnyEvent::Util::guard { %state = () } 975 defined wantarray && AnyEvent::Util::guard { %state = () }
860} 976}
861 977
896string of the form C<http://host:port> (optionally C<https:...>), croaks 1012string of the form C<http://host:port> (optionally C<https:...>), croaks
897otherwise. 1013otherwise.
898 1014
899To clear an already-set proxy, use C<undef>. 1015To clear an already-set proxy, use C<undef>.
900 1016
1017=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1018
1019Remove all cookies from the cookie jar that have been expired. If
1020C<$session_end> is given and true, then additionally remove all session
1021cookies.
1022
1023You should call this function (with a true C<$session_end>) before you
1024save cookies to disk, and you should call this function after loading them
1025again. If you have a long-running program you can additonally call this
1026function from time to time.
1027
1028A cookie jar is initially an empty hash-reference that is managed by this
1029module. It's format is subject to change, but currently it is like this:
1030
1031The key C<version> has to contain C<1>, otherwise the hash gets
1032emptied. All other keys are hostnames or IP addresses pointing to
1033hash-references. The key for these inner hash references is the
1034server path for which this cookie is meant, and the values are again
1035hash-references. The keys of those hash-references is the cookie name, and
1036the value, you guessed it, is another hash-reference, this time with the
1037key-value pairs from the cookie, except for C<expires> and C<max-age>,
1038which have been replaced by a C<_expires> key that contains the cookie
1039expiry timestamp.
1040
1041Here is an example of a cookie jar with a single cookie, so you have a
1042chance of understanding the above paragraph:
1043
1044 {
1045 version => 1,
1046 "10.0.0.1" => {
1047 "/" => {
1048 "mythweb_id" => {
1049 _expires => 1293917923,
1050 value => "ooRung9dThee3ooyXooM1Ohm",
1051 },
1052 },
1053 },
1054 }
1055
901=item $date = AnyEvent::HTTP::format_date $timestamp 1056=item $date = AnyEvent::HTTP::format_date $timestamp
902 1057
903Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP 1058Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
904Date (RFC 2616). 1059Date (RFC 2616).
905 1060
906=item $timestamp = AnyEvent::HTTP::parse_date $date 1061=item $timestamp = AnyEvent::HTTP::parse_date $date
907 1062
908Takes a HTTP Date (RFC 2616) and returns the corresponding POSIX 1063Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) or a
1064bunch of minor variations of those, and returns the corresponding POSIX
909timestamp, or C<undef> if the date cannot be parsed. 1065timestamp, or C<undef> if the date cannot be parsed.
910 1066
911=item $AnyEvent::HTTP::MAX_RECURSE 1067=item $AnyEvent::HTTP::MAX_RECURSE
912 1068
913The default value for the C<recurse> request parameter (default: C<10>). 1069The default value for the C<recurse> request parameter (default: C<10>).
953sub parse_date($) { 1109sub parse_date($) {
954 my ($date) = @_; 1110 my ($date) = @_;
955 1111
956 my ($d, $m, $y, $H, $M, $S); 1112 my ($d, $m, $y, $H, $M, $S);
957 1113
958 if ($date =~ /^[A-Z][a-z][a-z], ([0-9][0-9]) ([A-Z][a-z][a-z]) ([0-9][0-9][0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) { 1114 if ($date =~ /^[A-Z][a-z][a-z]+, ([0-9][0-9]?)[\- ]([A-Z][a-z][a-z])[\- ]([0-9][0-9][0-9][0-9]) ([0-9][0-9]?):([0-9][0-9]?):([0-9][0-9]?) GMT$/) {
959 # RFC 822/1123, required by RFC 2616 1115 # RFC 822/1123, required by RFC 2616 (with " ")
1116 # cookie dates (with "-")
1117
960 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6); 1118 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6);
961 1119
962 } elsif ($date =~ /^[A-Z][a-z]+, ([0-9][0-9])-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) { 1120 } elsif ($date =~ /^[A-Z][a-z][a-z]+, ([0-9][0-9]?)-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]?):([0-9][0-9]?):([0-9][0-9]?) GMT$/) {
963 # RFC 850 1121 # RFC 850
964 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6); 1122 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6);
965 1123
966 } elsif ($date =~ /^[A-Z][a-z][a-z] ([A-Z][a-z][a-z]) ([0-9 ][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) ([0-9][0-9][0-9][0-9])$/) { 1124 } elsif ($date =~ /^[A-Z][a-z][a-z]+ ([A-Z][a-z][a-z]) ([0-9 ]?[0-9]) ([0-9][0-9]?):([0-9][0-9]?):([0-9][0-9]?) ([0-9][0-9][0-9][0-9])$/) {
967 # ISO C's asctime 1125 # ISO C's asctime
968 ($d, $m, $y, $H, $M, $S) = ($2, $1, $6, $3, $4, $5); 1126 ($d, $m, $y, $H, $M, $S) = ($2, $1, $6, $3, $4, $5);
969 } 1127 }
970 # other formats fail in the loop below 1128 # other formats fail in the loop below
971 1129

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines