ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.69 by root, Fri Dec 31 19:32:47 2010 UTC vs.
Revision 1.77 by root, Sat Jan 1 19:13:41 2011 UTC

122 122
123If the server sends a header multiple times, then their contents will be 123If the server sends a header multiple times, then their contents will be
124joined together with a comma (C<,>), as per the HTTP spec. 124joined together with a comma (C<,>), as per the HTTP spec.
125 125
126If an internal error occurs, such as not being able to resolve a hostname, 126If an internal error occurs, such as not being able to resolve a hostname,
127then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x> 127then C<$data> will be C<undef>, C<< $headers->{Status} >> will be
128(usually C<599>) and the C<Reason> pseudo-header will contain an error 128C<590>-C<599> and the C<Reason> pseudo-header will contain an error
129message. 129message. Currently the following status codes are used:
130
131=over 4
132
133=item 595 - errors during connection etsbalishment, proxy handshake.
134
135=item 596 - errors during TLS negotiation, request sending and header processing.
136
137=item 597 - errors during body receive or processing.
138
139=item 598 - user aborted request in C<on_header> or C<on_body>.
140
141=item 599 - other, usually nonretryable, errors (garbled URL etc.).
142
143=back
130 144
131A typical callback might look like this: 145A typical callback might look like this:
132 146
133 sub { 147 sub {
134 my ($body, $hdr) = @_; 148 my ($body, $hdr) = @_;
183 197
184Passing this parameter enables (simplified) cookie-processing, loosely 198Passing this parameter enables (simplified) cookie-processing, loosely
185based on the original netscape specification. 199based on the original netscape specification.
186 200
187The C<$hash_ref> must be an (initially empty) hash reference which will 201The C<$hash_ref> must be an (initially empty) hash reference which will
188get updated automatically. It is possible to save the cookie_jar to 202get updated automatically. It is possible to save the cookie jar to
189persistent storage with something like JSON or Storable, but this is not 203persistent storage with something like JSON or Storable, but this is not
190recommended, as expiry times are currently being ignored. 204recommended, as session-only cookies might survive longer than expected.
191 205
192Note that this cookie implementation is not of very high quality, nor 206Note that this cookie implementation is not meant to be complete. If
193meant to be complete. If you want complete cookie management you have to 207you want complete cookie management you have to do that on your
194do that on your own. C<cookie_jar> is meant as a quick fix to get some 208own. C<cookie_jar> is meant as a quick fix to get some cookie-using sites
195cookie-using sites working. Cookies are a privacy disaster, do not use 209working. Cookies are a privacy disaster, do not use them unless required
196them unless required to. 210to.
197 211
198When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:> 212When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:>
199headers will be ste and handled by this module, otherwise they will be 213headers will be set and handled by this module, otherwise they will be
200left untouched. 214left untouched.
201 215
202=item tls_ctx => $scheme | $tls_ctx 216=item tls_ctx => $scheme | $tls_ctx
203 217
204Specifies the AnyEvent::TLS context to be used for https connections. This 218Specifies the AnyEvent::TLS context to be used for https connections. This
364 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 378 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
365 379
366 _slot_schedule $_[0]; 380 _slot_schedule $_[0];
367} 381}
368 382
383# extract cookies from jar
384sub cookie_jar_extract($$$$) {
385 my ($jar, $uscheme, $uhost, $upath) = @_;
386
387 %$jar = () if $jar->{version} != 1;
388
389 my @cookies;
390
391 while (my ($chost, $paths) = each %$jar) {
392 next unless ref $paths;
393
394 if ($chost =~ /^\./) {
395 next unless $chost eq substr $uhost, -length $chost;
396 } elsif ($chost =~ /\./) {
397 next unless $chost eq $uhost;
398 } else {
399 next;
400 }
401
402 while (my ($cpath, $cookies) = each %$paths) {
403 next unless $cpath eq substr $upath, 0, length $cpath;
404
405 while (my ($cookie, $kv) = each %$cookies) {
406 next if $uscheme ne "https" && exists $kv->{secure};
407
408 if (exists $kv->{expires}) {
409 if (AE::now > parse_date ($kv->{expires})) {
410 delete $cookies->{$cookie};
411 next;
412 }
413 }
414
415 my $value = $kv->{value};
416
417 if ($value =~ /[=;,[:space:]]/) {
418 $value =~ s/([\\"])/\\$1/g;
419 $value = "\"$value\"";
420 }
421
422 push @cookies, "$cookie=$value";
423 }
424 }
425 }
426
427 \@cookies
428}
429
430# parse set_cookie header into jar
431sub cookie_jar_set_cookie($$$) {
432 my ($jar, $set_cookie, $uhost) = @_;
433
434 for ($set_cookie) {
435 # parse NAME=VALUE
436 my @kv;
437
438 while (
439 m{
440 \G\s*
441 (?:
442 expires \s*=\s* ([A-Z][a-z][a-z],\ [^,;]+)
443 | ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )
444 )
445 }gcxsi
446 ) {
447 my $name = $2;
448 my $value = $4;
449
450 unless (defined $name) {
451 # expires
452 $name = "expires";
453 $value = $1;
454 } elsif (!defined $value) {
455 # quoted
456 $value = $3;
457 $value =~ s/\\(.)/$1/gs;
458 }
459
460 push @kv, lc $name, $value;
461
462 last unless /\G\s*;/gc;
463 }
464
465 last unless @kv;
466
467 my $name = shift @kv;
468 my %kv = (value => shift @kv, @kv);
469
470 $kv{expires} ||= format_date (AE::now + $kv{"max-age"})
471 if exists $kv{"max-age"};
472
473 my $cdom;
474 my $cpath = (delete $kv{path}) || "/";
475
476 if (exists $kv{domain}) {
477 $cdom = delete $kv{domain};
478
479 $cdom =~ s/^\.?/./; # make sure it starts with a "."
480
481 next if $cdom =~ /\.$/;
482
483 # this is not rfc-like and not netscape-like. go figure.
484 my $ndots = $cdom =~ y/.//;
485 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
486 } else {
487 $cdom = $uhost;
488 }
489
490 # store it
491 $jar->{version} = 1;
492 $jar->{$cdom}{$cpath}{$name} = \%kv;
493
494 redo if /\G\s*,/gc;
495 }
496}
497
369# continue to parse $_ for headers and place them into the arg 498# continue to parse $_ for headers and place them into the arg
370sub parse_hdr() { 499sub parse_hdr() {
371 my %hdr; 500 my %hdr;
372 501
373 # things seen, not parsed: 502 # things seen, not parsed:
448 577
449 $upath =~ s%^/?%/%; 578 $upath =~ s%^/?%/%;
450 579
451 # cookie processing 580 # cookie processing
452 if (my $jar = $arg{cookie_jar}) { 581 if (my $jar = $arg{cookie_jar}) {
453 %$jar = () if $jar->{version} != 1; 582 my $cookies = cookie_jar_extract $jar, $uscheme, $uhost, $upath;
454 583
455 my @cookie;
456
457 while (my ($chost, $v) = each %$jar) {
458 if ($chost =~ /^\./) {
459 next unless $chost eq substr $uhost, -length $chost;
460 } elsif ($chost =~ /\./) {
461 next unless $chost eq $uhost;
462 } else {
463 next;
464 }
465
466 while (my ($cpath, $v) = each %$v) {
467 next unless $cpath eq substr $upath, 0, length $cpath;
468
469 while (my ($k, $v) = each %$v) {
470 next if $uscheme ne "https" && exists $v->{secure};
471 my $value = $v->{value};
472 $value =~ s/([\\"])/\\$1/g;
473 push @cookie, "$k=\"$value\"";
474 }
475 }
476 }
477
478 $hdr{cookie} = join "; ", @cookie 584 $hdr{cookie} = join "; ", @$cookies
479 if @cookie; 585 if @$cookies;
480 } 586 }
481 587
482 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path 588 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
483 589
484 if ($proxy) { 590 if ($proxy) {
508 _get_slot $uhost, sub { 614 _get_slot $uhost, sub {
509 $state{slot_guard} = shift; 615 $state{slot_guard} = shift;
510 616
511 return unless $state{connect_guard}; 617 return unless $state{connect_guard};
512 618
619 my $ae_error = 595; # connecting
620
513 my $connect_cb = sub { 621 my $connect_cb = sub {
514 $state{fh} = shift 622 $state{fh} = shift
515 or do { 623 or do {
516 my $err = "$!"; 624 my $err = "$!";
517 %state = (); 625 %state = ();
518 return $cb->(undef, { @pseudo, Status => 599, Reason => $err }); 626 return $cb->(undef, { @pseudo, Status => $ae_error, Reason => $err });
519 }; 627 };
520
521 pop; # free memory, save a tree
522 628
523 return unless delete $state{connect_guard}; 629 return unless delete $state{connect_guard};
524 630
525 # get handle 631 # get handle
526 $state{handle} = new AnyEvent::Handle 632 $state{handle} = new AnyEvent::Handle
529 tls_ctx => $arg{tls_ctx}, 635 tls_ctx => $arg{tls_ctx},
530 # these need to be reconfigured on keepalive handles 636 # these need to be reconfigured on keepalive handles
531 timeout => $timeout, 637 timeout => $timeout,
532 on_error => sub { 638 on_error => sub {
533 %state = (); 639 %state = ();
534 $cb->(undef, { @pseudo, Status => 599, Reason => $_[2] }); 640 $cb->(undef, { @pseudo, Status => $ae_error, Reason => $_[2] });
535 }, 641 },
536 on_eof => sub { 642 on_eof => sub {
537 %state = (); 643 %state = ();
538 $cb->(undef, { @pseudo, Status => 599, Reason => "Unexpected end-of-file" }); 644 $cb->(undef, { @pseudo, Status => $ae_error, Reason => "Unexpected end-of-file" });
539 }, 645 },
540 ; 646 ;
541 647
542 # limit the number of persistent connections 648 # limit the number of persistent connections
543 # keepalive not yet supported 649 # keepalive not yet supported
551 657
552 $state{handle}->starttls ("connect") if $rscheme eq "https"; 658 $state{handle}->starttls ("connect") if $rscheme eq "https";
553 659
554 # handle actual, non-tunneled, request 660 # handle actual, non-tunneled, request
555 my $handle_actual_request = sub { 661 my $handle_actual_request = sub {
662 $ae_error = 596; # request phase
663
556 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls}; 664 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
557 665
558 # send request 666 # send request
559 $state{handle}->push_write ( 667 $state{handle}->push_write (
560 "$method $rpath HTTP/1.1\015\012" 668 "$method $rpath HTTP/1.1\015\012"
571 # status line and headers 679 # status line and headers
572 $state{read_response} = sub { 680 $state{read_response} = sub {
573 for ("$_[1]") { 681 for ("$_[1]") {
574 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 682 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
575 683
576 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/igxc 684 /^HTTP\/0*([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/gxci
577 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" })); 685 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
578 686
579 # 100 Continue handling 687 # 100 Continue handling
580 # should not happen as we don't send expect: 100-continue, 688 # should not happen as we don't send expect: 100-continue,
581 # but we handle it just in case. 689 # but we handle it just in case.
616 724
617 if ($recurse) { 725 if ($recurse) {
618 my $status = $hdr{Status}; 726 my $status = $hdr{Status};
619 727
620 # industry standard is to redirect POST as GET for 728 # industry standard is to redirect POST as GET for
621 # 301, 302 and 303, in contrast to http/1.0 and 1.1. 729 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
622 # also, the UA should ask the user for 301 and 307 and POST, 730 # also, the UA should ask the user for 301 and 307 and POST,
623 # industry standard seems to be to simply follow. 731 # industry standard seems to be to simply follow.
624 # we go with the industry standard. 732 # we go with the industry standard.
625 if ($status == 301 or $status == 302 or $status == 303) { 733 if ($status == 301 or $status == 302 or $status == 303) {
626 # HTTP/1.1 is unclear on how to mutate the method 734 # HTTP/1.1 is unclear on how to mutate the method
630 $redirect = 1; 738 $redirect = 1;
631 } 739 }
632 } 740 }
633 741
634 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive]) 742 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
635 my $keepalive = pop; 743 my $may_keep_alive = $_[3];
636 744
637 $state{handle}->destroy if $state{handle}; 745 $state{handle}->destroy if $state{handle};
638 %state = (); 746 %state = ();
639 747
640 if (defined $_[1]) { 748 if (defined $_[1]) {
642 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2]; 750 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
643 } 751 }
644 752
645 # set-cookie processing 753 # set-cookie processing
646 if ($arg{cookie_jar}) { 754 if ($arg{cookie_jar}) {
647 for ($hdr{"set-cookie"}) { 755 cookie_jar_set_cookie $arg{cookie_jar}, $hdr{"set-cookie"}, $uhost;
648 # parse NAME=VALUE
649 my @kv;
650
651 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
652 my $name = $1;
653 my $value = $3;
654
655 unless ($value) {
656 $value = $2;
657 $value =~ s/\\(.)/$1/gs;
658 }
659
660 push @kv, $name => $value;
661
662 last unless /\G\s*;/gc;
663 }
664
665 last unless @kv;
666
667 my $name = shift @kv;
668 my %kv = (value => shift @kv, @kv);
669
670 my $cdom;
671 my $cpath = (delete $kv{path}) || "/";
672
673 if (exists $kv{domain}) {
674 $cdom = delete $kv{domain};
675
676 $cdom =~ s/^\.?/./; # make sure it starts with a "."
677
678 next if $cdom =~ /\.$/;
679
680 # this is not rfc-like and not netscape-like. go figure.
681 my $ndots = $cdom =~ y/.//;
682 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
683 } else {
684 $cdom = $uhost;
685 }
686
687 # store it
688 $arg{cookie_jar}{version} = 1;
689 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
690
691 redo if /\G\s*,/gc;
692 }
693 } 756 }
694 757
695 if ($redirect && exists $hdr{location}) { 758 if ($redirect && exists $hdr{location}) {
696 # we ignore any errors, as it is very common to receive 759 # we ignore any errors, as it is very common to receive
697 # Content-Length != 0 but no actual body 760 # Content-Length != 0 but no actual body
704 $cb); 767 $cb);
705 } else { 768 } else {
706 $cb->($_[0], \%hdr); 769 $cb->($_[0], \%hdr);
707 } 770 }
708 }; 771 };
772
773 $ae_error = 597; # body phase
709 774
710 my $len = $hdr{"content-length"}; 775 my $len = $hdr{"content-length"};
711 776
712 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) { 777 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
713 $finish->(undef, 598 => "Request cancelled by on_header"); 778 $finish->(undef, 598 => "Request cancelled by on_header");
735 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) { 800 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
736 my $cl = 0; 801 my $cl = 0;
737 my $body = undef; 802 my $body = undef;
738 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 803 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
739 804
740 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
741
742 my $read_chunk; $read_chunk = sub { 805 my $read_chunk; $read_chunk = sub {
743 $_[1] =~ /^([0-9a-fA-F]+)/ 806 $_[1] =~ /^([0-9a-fA-F]+)/
744 or $finish->(undef, 599 => "Garbled chunked transfer encoding"); 807 or $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
745 808
746 my $len = hex $1; 809 my $len = hex $1;
747 810
748 if ($len) { 811 if ($len) {
749 $cl += $len; 812 $cl += $len;
752 $on_body->($_[1], \%hdr) 815 $on_body->($_[1], \%hdr)
753 or return $finish->(undef, 598 => "Request cancelled by on_body"); 816 or return $finish->(undef, 598 => "Request cancelled by on_body");
754 817
755 $_[0]->push_read (line => sub { 818 $_[0]->push_read (line => sub {
756 length $_[1] 819 length $_[1]
757 and return $finish->(undef, 599 => "Garbled chunked transfer encoding"); 820 and return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
758 $_[0]->push_read (line => $read_chunk); 821 $_[0]->push_read (line => $read_chunk);
759 }); 822 });
760 }); 823 });
761 } else { 824 } else {
762 $hdr{"content-length"} ||= $cl; 825 $hdr{"content-length"} ||= $cl;
765 if (length $_[1]) { 828 if (length $_[1]) {
766 for ("$_[1]") { 829 for ("$_[1]") {
767 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 830 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
768 831
769 my $hdr = parse_hdr 832 my $hdr = parse_hdr
770 or return $finish->(undef, 599 => "Garbled response trailers"); 833 or return $finish->(undef, $ae_error => "Garbled response trailers");
771 834
772 %hdr = (%hdr, %$hdr); 835 %hdr = (%hdr, %$hdr);
773 } 836 }
774 } 837 }
775 838
779 }; 842 };
780 843
781 $_[0]->push_read (line => $read_chunk); 844 $_[0]->push_read (line => $read_chunk);
782 845
783 } elsif ($arg{on_body}) { 846 } elsif ($arg{on_body}) {
784 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
785
786 if ($len) { 847 if ($len) {
787 $_[0]->on_read (sub { 848 $_[0]->on_read (sub {
788 $len -= length $_[0]{rbuf}; 849 $len -= length $_[0]{rbuf};
789 850
790 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 851 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
804 } 865 }
805 } else { 866 } else {
806 $_[0]->on_eof (undef); 867 $_[0]->on_eof (undef);
807 868
808 if ($len) { 869 if ($len) {
809 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
810 $_[0]->on_read (sub { 870 $_[0]->on_read (sub {
811 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1) 871 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
812 if $len <= length $_[0]{rbuf}; 872 if $len <= length $_[0]{rbuf};
813 }); 873 });
814 } else { 874 } else {
815 $_[0]->on_error (sub { 875 $_[0]->on_error (sub {
816 ($! == Errno::EPIPE || !$!) 876 ($! == Errno::EPIPE || !$!)
817 ? $finish->(delete $_[0]{rbuf}) 877 ? $finish->(delete $_[0]{rbuf})
818 : $finish->(undef, 599 => $_[2]); 878 : $finish->(undef, $ae_error => $_[2]);
819 }); 879 });
820 $_[0]->on_read (sub { }); 880 $_[0]->on_read (sub { });
821 } 881 }
822 } 882 }
823 } 883 }
903Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP 963Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
904Date (RFC 2616). 964Date (RFC 2616).
905 965
906=item $timestamp = AnyEvent::HTTP::parse_date $date 966=item $timestamp = AnyEvent::HTTP::parse_date $date
907 967
908Takes a HTTP Date (RFC 2616) and returns the corresponding POSIX 968Takes a HTTP Date (RFC 2616) or a Cookie date (netscape cookie spec) and
909timestamp, or C<undef> if the date cannot be parsed. 969returns the corresponding POSIX timestamp, or C<undef> if the date cannot
970be parsed.
910 971
911=item $AnyEvent::HTTP::MAX_RECURSE 972=item $AnyEvent::HTTP::MAX_RECURSE
912 973
913The default value for the C<recurse> request parameter (default: C<10>). 974The default value for the C<recurse> request parameter (default: C<10>).
914 975
953sub parse_date($) { 1014sub parse_date($) {
954 my ($date) = @_; 1015 my ($date) = @_;
955 1016
956 my ($d, $m, $y, $H, $M, $S); 1017 my ($d, $m, $y, $H, $M, $S);
957 1018
958 if ($date =~ /^[A-Z][a-z][a-z], ([0-9][0-9]) ([A-Z][a-z][a-z]) ([0-9][0-9][0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) { 1019 if ($date =~ /^[A-Z][a-z][a-z], ([0-9][0-9])[\- ]([A-Z][a-z][a-z])[\- ]([0-9][0-9][0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
959 # RFC 822/1123, required by RFC 2616 1020 # RFC 822/1123, required by RFC 2616 (with " ")
1021 # cookie dates (with "-")
1022
960 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6); 1023 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6);
961 1024
962 } elsif ($date =~ /^[A-Z][a-z]+, ([0-9][0-9])-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) { 1025 } elsif ($date =~ /^[A-Z][a-z]+, ([0-9][0-9])-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
963 # RFC 850 1026 # RFC 850
964 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6); 1027 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6);

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines