ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.122 by root, Fri May 8 17:28:39 2015 UTC vs.
Revision 1.134 by root, Fri Sep 7 22:11:31 2018 UTC

46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = 2.21; 51our $VERSION = 2.24;
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
195C<$scheme> must be either missing or must be C<http> for HTTP. 195C<$scheme> must be either missing or must be C<http> for HTTP.
196 196
197If not specified, then the default proxy is used (see 197If not specified, then the default proxy is used (see
198C<AnyEvent::HTTP::set_proxy>). 198C<AnyEvent::HTTP::set_proxy>).
199 199
200Currently, if your proxy requires authorization, you have to specify an
201appropriate "Proxy-Authorization" header in every request.
202
200=item body => $string 203=item body => $string
201 204
202The request body, usually empty. Will be sent as-is (future versions of 205The request body, usually empty. Will be sent as-is (future versions of
203this module might offer more options). 206this module might offer more options).
204 207
260In even rarer cases you want total control over how AnyEvent::HTTP 263In even rarer cases you want total control over how AnyEvent::HTTP
261establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect> 264establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
262to do this, but you can provide your own C<tcp_connect> function - 265to do this, but you can provide your own C<tcp_connect> function -
263obviously, it has to follow the same calling conventions, except that it 266obviously, it has to follow the same calling conventions, except that it
264may always return a connection guard object. 267may always return a connection guard object.
268
269The connections made by this hook will be treated as equivalent to
270connecitons made the built-in way, specifically, they will be put into
271and taken from the persistent conneciton cache. If your C<$tcp_connect>
272function is incompatible with this kind of re-use, consider switching off
273C<persistent> connections and/or providing a C<session> identifier.
265 274
266There are probably lots of weird uses for this function, starting from 275There are probably lots of weird uses for this function, starting from
267tracing the hosts C<http_request> actually tries to connect, to (inexact 276tracing the hosts C<http_request> actually tries to connect, to (inexact
268but fast) host => IP address caching or even socks protocol support. 277but fast) host => IP address caching or even socks protocol support.
269 278
339=item persistent => $boolean 348=item persistent => $boolean
340 349
341Try to create/reuse a persistent connection. When this flag is set 350Try to create/reuse a persistent connection. When this flag is set
342(default: true for idempotent requests, false for all others), then 351(default: true for idempotent requests, false for all others), then
343C<http_request> tries to re-use an existing (previously-created) 352C<http_request> tries to re-use an existing (previously-created)
344persistent connection to the host and, failing that, tries to create a new 353persistent connection to same host (i.e. identical URL scheme, hostname,
345one. 354port and session) and, failing that, tries to create a new one.
346 355
347Requests failing in certain ways will be automatically retried once, which 356Requests failing in certain ways will be automatically retried once, which
348is dangerous for non-idempotent requests, which is why it defaults to off 357is dangerous for non-idempotent requests, which is why it defaults to off
349for them. The reason for this is because the bozos who designed HTTP/1.1 358for them. The reason for this is because the bozos who designed HTTP/1.1
350made it impossible to distinguish between a fatal error and a normal 359made it impossible to distinguish between a fatal error and a normal
451 460
452# expire cookies 461# expire cookies
453sub cookie_jar_expire($;$) { 462sub cookie_jar_expire($;$) {
454 my ($jar, $session_end) = @_; 463 my ($jar, $session_end) = @_;
455 464
456 %$jar = () if $jar->{version} != 1; 465 %$jar = () if $jar->{version} != 2;
457 466
458 my $anow = AE::now; 467 my $anow = AE::now;
459 468
460 while (my ($chost, $paths) = each %$jar) { 469 while (my ($chost, $paths) = each %$jar) {
461 next unless ref $paths; 470 next unless ref $paths;
481 490
482# extract cookies from jar 491# extract cookies from jar
483sub cookie_jar_extract($$$$) { 492sub cookie_jar_extract($$$$) {
484 my ($jar, $scheme, $host, $path) = @_; 493 my ($jar, $scheme, $host, $path) = @_;
485 494
486 %$jar = () if $jar->{version} != 1; 495 %$jar = () if $jar->{version} != 2;
496
497 $host = AnyEvent::Util::idn_to_ascii $host
498 if $host =~ /[^\x00-\x7f]/;
487 499
488 my @cookies; 500 my @cookies;
489 501
490 while (my ($chost, $paths) = each %$jar) { 502 while (my ($chost, $paths) = each %$jar) {
491 next unless ref $paths; 503 next unless ref $paths;
492 504
493 if ($chost =~ /^\./) { 505 # exact match or suffix including . match
494 next unless $chost eq substr $host, -length $chost; 506 $chost eq $host or ".$chost" eq substr $host, -1 - length $chost
495 } elsif ($chost =~ /\./) {
496 next unless $chost eq $host;
497 } else {
498 next; 507 or next;
499 }
500 508
501 while (my ($cpath, $cookies) = each %$paths) { 509 while (my ($cpath, $cookies) = each %$paths) {
502 next unless $cpath eq substr $path, 0, length $cpath; 510 next unless $cpath eq substr $path, 0, length $cpath;
503 511
504 while (my ($cookie, $kv) = each %$cookies) { 512 while (my ($cookie, $kv) = each %$cookies) {
525} 533}
526 534
527# parse set_cookie header into jar 535# parse set_cookie header into jar
528sub cookie_jar_set_cookie($$$$) { 536sub cookie_jar_set_cookie($$$$) {
529 my ($jar, $set_cookie, $host, $date) = @_; 537 my ($jar, $set_cookie, $host, $date) = @_;
538
539 %$jar = () if $jar->{version} != 2;
530 540
531 my $anow = int AE::now; 541 my $anow = int AE::now;
532 my $snow; # server-now 542 my $snow; # server-now
533 543
534 for ($set_cookie) { 544 for ($set_cookie) {
580 590
581 my $cdom; 591 my $cdom;
582 my $cpath = (delete $kv{path}) || "/"; 592 my $cpath = (delete $kv{path}) || "/";
583 593
584 if (exists $kv{domain}) { 594 if (exists $kv{domain}) {
585 $cdom = delete $kv{domain}; 595 $cdom = $kv{domain};
586 596
587 $cdom =~ s/^\.?/./; # make sure it starts with a "." 597 $cdom =~ s/^\.?/./; # make sure it starts with a "."
588 598
589 next if $cdom =~ /\.$/; 599 next if $cdom =~ /\.$/;
590 600
591 # this is not rfc-like and not netscape-like. go figure. 601 # this is not rfc-like and not netscape-like. go figure.
592 my $ndots = $cdom =~ y/.//; 602 my $ndots = $cdom =~ y/.//;
593 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 603 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
604
605 $cdom = substr $cdom, 1; # remove initial .
594 } else { 606 } else {
595 $cdom = $host; 607 $cdom = $host;
596 } 608 }
597 609
598 # store it 610 # store it
599 $jar->{version} = 1; 611 $jar->{version} = 2;
600 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 612 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
601 613
602 redo if /\G\s*,/gc; 614 redo if /\G\s*,/gc;
603 } 615 }
604} 616}
770 782
771 my $uport = $uscheme eq "http" ? 80 783 my $uport = $uscheme eq "http" ? 80
772 : $uscheme eq "https" ? 443 784 : $uscheme eq "https" ? 443
773 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 785 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
774 786
775 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 787 $uauthority =~ /^(?: .*\@ )? ([^\@]+?) (?: : (\d+) )?$/x
776 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 788 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
777 789
778 my $uhost = lc $1; 790 my $uhost = lc $1;
779 $uport = $2 if defined $2; 791 $uport = $2 if defined $2;
780 792
892 # we give our best and fall back to URI if available. 904 # we give our best and fall back to URI if available.
893 if (exists $hdr{location}) { 905 if (exists $hdr{location}) {
894 my $loc = $hdr{location}; 906 my $loc = $hdr{location};
895 907
896 if ($loc =~ m%^//%) { # // 908 if ($loc =~ m%^//%) { # //
897 $loc = "$rscheme:$loc"; 909 $loc = "$uscheme:$loc";
898 910
899 } elsif ($loc eq "") { 911 } elsif ($loc eq "") {
900 $loc = $url; 912 $loc = $url;
901 913
902 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple" 914 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
903 $loc =~ s/^\.\/+//; 915 $loc =~ s/^\.\/+//;
904 916
905 if ($loc !~ m%^[.?#]%) { 917 if ($loc !~ m%^[.?#]%) {
906 my $prefix = "$rscheme://$uhost:$uport"; 918 my $prefix = "$uscheme://$uauthority";
907 919
908 unless ($loc =~ s/^\///) { 920 unless ($loc =~ s/^\///) {
909 $prefix .= $upath; 921 $prefix .= $upath;
910 $prefix =~ s/\/[^\/]*$//; 922 $prefix =~ s/\/[^\/]*$//;
911 } 923 }
1028 $finish->(delete $state{handle}); 1040 $finish->(delete $state{handle});
1029 1041
1030 } elsif ($chunked) { 1042 } elsif ($chunked) {
1031 my $cl = 0; 1043 my $cl = 0;
1032 my $body = ""; 1044 my $body = "";
1033 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1045 my $on_body = (!$redirect && $arg{on_body}) || sub { $body .= shift; 1 };
1034 1046
1035 $state{read_chunk} = sub { 1047 $state{read_chunk} = sub {
1036 $_[1] =~ /^([0-9a-fA-F]+)/ 1048 $_[1] =~ /^([0-9a-fA-F]+)/
1037 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1049 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
1038 1050
1071 } 1083 }
1072 }; 1084 };
1073 1085
1074 $_[0]->push_read (line => $state{read_chunk}); 1086 $_[0]->push_read (line => $state{read_chunk});
1075 1087
1076 } elsif ($arg{on_body}) { 1088 } elsif (!$redirect && $arg{on_body}) {
1077 if (defined $len) { 1089 if (defined $len) {
1078 $_[0]->on_read (sub { 1090 $_[0]->on_read (sub {
1079 $len -= length $_[0]{rbuf}; 1091 $len -= length $_[0]{rbuf};
1080 1092
1081 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 1093 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
1120 _destroy_state %state; 1132 _destroy_state %state;
1121 1133
1122 %state = (); 1134 %state = ();
1123 $state{recurse} = 1135 $state{recurse} =
1124 http_request ( 1136 http_request (
1125 $method => $url, 1137 $method => $url,
1126 %arg, 1138 %arg,
1127 recurse => $recurse - 1, 1139 recurse => $recurse - 1,
1128 keepalive => 0, 1140 persistent => 0,
1129 sub { 1141 sub {
1130 %state = (); 1142 %state = ();
1131 &$cb 1143 &$cb
1132 } 1144 }
1133 ); 1145 );
1214 # on a keepalive request (in theory, this should be a separate config option). 1226 # on a keepalive request (in theory, this should be a separate config option).
1215 if ($persistent && $KA_CACHE{$ka_key}) { 1227 if ($persistent && $KA_CACHE{$ka_key}) {
1216 $was_persistent = 1; 1228 $was_persistent = 1;
1217 1229
1218 $state{handle} = ka_fetch $ka_key; 1230 $state{handle} = ka_fetch $ka_key;
1219 $state{handle}->destroyed 1231# $state{handle}->destroyed
1220 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d# 1232# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1221 $prepare_handle->(); 1233 $prepare_handle->();
1222 $state{handle}->destroyed 1234# $state{handle}->destroyed
1223 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d# 1235# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1224 $handle_actual_request->(); 1236 $handle_actual_request->();
1225 1237
1226 } else { 1238 } else {
1227 my $tcp_connect = $arg{tcp_connect} 1239 my $tcp_connect = $arg{tcp_connect}
1228 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1240 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
1288function from time to time. 1300function from time to time.
1289 1301
1290A cookie jar is initially an empty hash-reference that is managed by this 1302A cookie jar is initially an empty hash-reference that is managed by this
1291module. Its format is subject to change, but currently it is as follows: 1303module. Its format is subject to change, but currently it is as follows:
1292 1304
1293The key C<version> has to contain C<1>, otherwise the hash gets 1305The key C<version> has to contain C<2>, otherwise the hash gets
1294emptied. All other keys are hostnames or IP addresses pointing to 1306cleared. All other keys are hostnames or IP addresses pointing to
1295hash-references. The key for these inner hash references is the 1307hash-references. The key for these inner hash references is the
1296server path for which this cookie is meant, and the values are again 1308server path for which this cookie is meant, and the values are again
1297hash-references. Each key of those hash-references is a cookie name, and 1309hash-references. Each key of those hash-references is a cookie name, and
1298the value, you guessed it, is another hash-reference, this time with the 1310the value, you guessed it, is another hash-reference, this time with the
1299key-value pairs from the cookie, except for C<expires> and C<max-age>, 1311key-value pairs from the cookie, except for C<expires> and C<max-age>,
1303 1315
1304Here is an example of a cookie jar with a single cookie, so you have a 1316Here is an example of a cookie jar with a single cookie, so you have a
1305chance of understanding the above paragraph: 1317chance of understanding the above paragraph:
1306 1318
1307 { 1319 {
1308 version => 1, 1320 version => 2,
1309 "10.0.0.1" => { 1321 "10.0.0.1" => {
1310 "/" => { 1322 "/" => {
1311 "mythweb_id" => { 1323 "mythweb_id" => {
1312 _expires => 1293917923, 1324 _expires => 1293917923,
1313 value => "ooRung9dThee3ooyXooM1Ohm", 1325 value => "ooRung9dThee3ooyXooM1Ohm",
1458 or die "$file: $!"; 1470 or die "$file: $!";
1459 1471
1460 my %hdr; 1472 my %hdr;
1461 my $ofs = 0; 1473 my $ofs = 0;
1462 1474
1463 warn stat $fh;
1464 warn -s _;
1465 if (stat $fh and -s _) { 1475 if (stat $fh and -s _) {
1466 $ofs = -s _; 1476 $ofs = -s _;
1467 warn "-s is ", $ofs; 1477 warn "-s is ", $ofs;
1468 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; 1478 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1469 $hdr{"range"} = "bytes=$ofs-"; 1479 $hdr{"range"} = "bytes=$ofs-";
1497 my (undef, $hdr) = @_; 1507 my (undef, $hdr) = @_;
1498 1508
1499 my $status = $hdr->{Status}; 1509 my $status = $hdr->{Status};
1500 1510
1501 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { 1511 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1502 utime $fh, $time, $time; 1512 utime $time, $time, $fh;
1503 } 1513 }
1504 1514
1505 if ($status == 200 || $status == 206 || $status == 416) { 1515 if ($status == 200 || $status == 206 || $status == 416) {
1506 # download ok || resume ok || file already fully downloaded 1516 # download ok || resume ok || file already fully downloaded
1507 $cb->(1, $hdr); 1517 $cb->(1, $hdr);

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines