ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.120 by root, Sun Jun 8 23:36:36 2014 UTC vs.
Revision 1.134 by root, Fri Sep 7 22:11:31 2018 UTC

46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = 2.21; 51our $VERSION = 2.24;
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
195C<$scheme> must be either missing or must be C<http> for HTTP. 195C<$scheme> must be either missing or must be C<http> for HTTP.
196 196
197If not specified, then the default proxy is used (see 197If not specified, then the default proxy is used (see
198C<AnyEvent::HTTP::set_proxy>). 198C<AnyEvent::HTTP::set_proxy>).
199 199
200Currently, if your proxy requires authorization, you have to specify an
201appropriate "Proxy-Authorization" header in every request.
202
200=item body => $string 203=item body => $string
201 204
202The request body, usually empty. Will be sent as-is (future versions of 205The request body, usually empty. Will be sent as-is (future versions of
203this module might offer more options). 206this module might offer more options).
204 207
260In even rarer cases you want total control over how AnyEvent::HTTP 263In even rarer cases you want total control over how AnyEvent::HTTP
261establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect> 264establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
262to do this, but you can provide your own C<tcp_connect> function - 265to do this, but you can provide your own C<tcp_connect> function -
263obviously, it has to follow the same calling conventions, except that it 266obviously, it has to follow the same calling conventions, except that it
264may always return a connection guard object. 267may always return a connection guard object.
268
269The connections made by this hook will be treated as equivalent to
270connecitons made the built-in way, specifically, they will be put into
271and taken from the persistent conneciton cache. If your C<$tcp_connect>
272function is incompatible with this kind of re-use, consider switching off
273C<persistent> connections and/or providing a C<session> identifier.
265 274
266There are probably lots of weird uses for this function, starting from 275There are probably lots of weird uses for this function, starting from
267tracing the hosts C<http_request> actually tries to connect, to (inexact 276tracing the hosts C<http_request> actually tries to connect, to (inexact
268but fast) host => IP address caching or even socks protocol support. 277but fast) host => IP address caching or even socks protocol support.
269 278
339=item persistent => $boolean 348=item persistent => $boolean
340 349
341Try to create/reuse a persistent connection. When this flag is set 350Try to create/reuse a persistent connection. When this flag is set
342(default: true for idempotent requests, false for all others), then 351(default: true for idempotent requests, false for all others), then
343C<http_request> tries to re-use an existing (previously-created) 352C<http_request> tries to re-use an existing (previously-created)
344persistent connection to the host and, failing that, tries to create a new 353persistent connection to same host (i.e. identical URL scheme, hostname,
345one. 354port and session) and, failing that, tries to create a new one.
346 355
347Requests failing in certain ways will be automatically retried once, which 356Requests failing in certain ways will be automatically retried once, which
348is dangerous for non-idempotent requests, which is why it defaults to off 357is dangerous for non-idempotent requests, which is why it defaults to off
349for them. The reason for this is because the bozos who designed HTTP/1.1 358for them. The reason for this is because the bozos who designed HTTP/1.1
350made it impossible to distinguish between a fatal error and a normal 359made it impossible to distinguish between a fatal error and a normal
451 460
452# expire cookies 461# expire cookies
453sub cookie_jar_expire($;$) { 462sub cookie_jar_expire($;$) {
454 my ($jar, $session_end) = @_; 463 my ($jar, $session_end) = @_;
455 464
456 %$jar = () if $jar->{version} != 1; 465 %$jar = () if $jar->{version} != 2;
457 466
458 my $anow = AE::now; 467 my $anow = AE::now;
459 468
460 while (my ($chost, $paths) = each %$jar) { 469 while (my ($chost, $paths) = each %$jar) {
461 next unless ref $paths; 470 next unless ref $paths;
481 490
482# extract cookies from jar 491# extract cookies from jar
483sub cookie_jar_extract($$$$) { 492sub cookie_jar_extract($$$$) {
484 my ($jar, $scheme, $host, $path) = @_; 493 my ($jar, $scheme, $host, $path) = @_;
485 494
486 %$jar = () if $jar->{version} != 1; 495 %$jar = () if $jar->{version} != 2;
496
497 $host = AnyEvent::Util::idn_to_ascii $host
498 if $host =~ /[^\x00-\x7f]/;
487 499
488 my @cookies; 500 my @cookies;
489 501
490 while (my ($chost, $paths) = each %$jar) { 502 while (my ($chost, $paths) = each %$jar) {
491 next unless ref $paths; 503 next unless ref $paths;
492 504
493 if ($chost =~ /^\./) { 505 # exact match or suffix including . match
494 next unless $chost eq substr $host, -length $chost; 506 $chost eq $host or ".$chost" eq substr $host, -1 - length $chost
495 } elsif ($chost =~ /\./) {
496 next unless $chost eq $host;
497 } else {
498 next; 507 or next;
499 }
500 508
501 while (my ($cpath, $cookies) = each %$paths) { 509 while (my ($cpath, $cookies) = each %$paths) {
502 next unless $cpath eq substr $path, 0, length $cpath; 510 next unless $cpath eq substr $path, 0, length $cpath;
503 511
504 while (my ($cookie, $kv) = each %$cookies) { 512 while (my ($cookie, $kv) = each %$cookies) {
525} 533}
526 534
527# parse set_cookie header into jar 535# parse set_cookie header into jar
528sub cookie_jar_set_cookie($$$$) { 536sub cookie_jar_set_cookie($$$$) {
529 my ($jar, $set_cookie, $host, $date) = @_; 537 my ($jar, $set_cookie, $host, $date) = @_;
538
539 %$jar = () if $jar->{version} != 2;
530 540
531 my $anow = int AE::now; 541 my $anow = int AE::now;
532 my $snow; # server-now 542 my $snow; # server-now
533 543
534 for ($set_cookie) { 544 for ($set_cookie) {
580 590
581 my $cdom; 591 my $cdom;
582 my $cpath = (delete $kv{path}) || "/"; 592 my $cpath = (delete $kv{path}) || "/";
583 593
584 if (exists $kv{domain}) { 594 if (exists $kv{domain}) {
585 $cdom = delete $kv{domain}; 595 $cdom = $kv{domain};
586 596
587 $cdom =~ s/^\.?/./; # make sure it starts with a "." 597 $cdom =~ s/^\.?/./; # make sure it starts with a "."
588 598
589 next if $cdom =~ /\.$/; 599 next if $cdom =~ /\.$/;
590 600
591 # this is not rfc-like and not netscape-like. go figure. 601 # this is not rfc-like and not netscape-like. go figure.
592 my $ndots = $cdom =~ y/.//; 602 my $ndots = $cdom =~ y/.//;
593 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 603 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
604
605 $cdom = substr $cdom, 1; # remove initial .
594 } else { 606 } else {
595 $cdom = $host; 607 $cdom = $host;
596 } 608 }
597 609
598 # store it 610 # store it
599 $jar->{version} = 1; 611 $jar->{version} = 2;
600 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 612 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
601 613
602 redo if /\G\s*,/gc; 614 redo if /\G\s*,/gc;
603 } 615 }
604} 616}
770 782
771 my $uport = $uscheme eq "http" ? 80 783 my $uport = $uscheme eq "http" ? 80
772 : $uscheme eq "https" ? 443 784 : $uscheme eq "https" ? 443
773 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 785 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
774 786
775 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 787 $uauthority =~ /^(?: .*\@ )? ([^\@]+?) (?: : (\d+) )?$/x
776 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 788 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
777 789
778 my $uhost = lc $1; 790 my $uhost = lc $1;
779 $uport = $2 if defined $2; 791 $uport = $2 if defined $2;
780 792
892 # we give our best and fall back to URI if available. 904 # we give our best and fall back to URI if available.
893 if (exists $hdr{location}) { 905 if (exists $hdr{location}) {
894 my $loc = $hdr{location}; 906 my $loc = $hdr{location};
895 907
896 if ($loc =~ m%^//%) { # // 908 if ($loc =~ m%^//%) { # //
897 $loc = "$rscheme:$loc"; 909 $loc = "$uscheme:$loc";
898 910
899 } elsif ($loc eq "") { 911 } elsif ($loc eq "") {
900 $loc = $url; 912 $loc = $url;
901 913
902 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple" 914 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
903 $loc =~ s/^\.\/+//; 915 $loc =~ s/^\.\/+//;
904 916
905 if ($loc !~ m%^[.?#]%) { 917 if ($loc !~ m%^[.?#]%) {
906 my $prefix = "$rscheme://$uhost:$uport"; 918 my $prefix = "$uscheme://$uauthority";
907 919
908 unless ($loc =~ s/^\///) { 920 unless ($loc =~ s/^\///) {
909 $prefix .= $upath; 921 $prefix .= $upath;
910 $prefix =~ s/\/[^\/]*$//; 922 $prefix =~ s/\/[^\/]*$//;
911 } 923 }
934 # industry standard is to redirect POST as GET for 946 # industry standard is to redirect POST as GET for
935 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1. 947 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
936 # also, the UA should ask the user for 301 and 307 and POST, 948 # also, the UA should ask the user for 301 and 307 and POST,
937 # industry standard seems to be to simply follow. 949 # industry standard seems to be to simply follow.
938 # we go with the industry standard. 308 is defined 950 # we go with the industry standard. 308 is defined
939 # by rfc7238 951 # by rfc7538
940 if ($status == 301 or $status == 302 or $status == 303) { 952 if ($status == 301 or $status == 302 or $status == 303) {
953 $redirect = 1;
941 # HTTP/1.1 is unclear on how to mutate the method 954 # HTTP/1.1 is unclear on how to mutate the method
942 $method = "GET" unless $method eq "HEAD"; 955 unless ($method eq "HEAD") {
943 $redirect = 1; 956 $method = "GET";
957 delete $arg{body};
958 }
944 } elsif ($status == 307 or $status == 308) { 959 } elsif ($status == 307 or $status == 308) {
945 $redirect = 1; 960 $redirect = 1;
946 } 961 }
947 } 962 }
948 963
1025 $finish->(delete $state{handle}); 1040 $finish->(delete $state{handle});
1026 1041
1027 } elsif ($chunked) { 1042 } elsif ($chunked) {
1028 my $cl = 0; 1043 my $cl = 0;
1029 my $body = ""; 1044 my $body = "";
1030 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1045 my $on_body = (!$redirect && $arg{on_body}) || sub { $body .= shift; 1 };
1031 1046
1032 $state{read_chunk} = sub { 1047 $state{read_chunk} = sub {
1033 $_[1] =~ /^([0-9a-fA-F]+)/ 1048 $_[1] =~ /^([0-9a-fA-F]+)/
1034 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1049 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
1035 1050
1068 } 1083 }
1069 }; 1084 };
1070 1085
1071 $_[0]->push_read (line => $state{read_chunk}); 1086 $_[0]->push_read (line => $state{read_chunk});
1072 1087
1073 } elsif ($arg{on_body}) { 1088 } elsif (!$redirect && $arg{on_body}) {
1074 if (defined $len) { 1089 if (defined $len) {
1075 $_[0]->on_read (sub { 1090 $_[0]->on_read (sub {
1076 $len -= length $_[0]{rbuf}; 1091 $len -= length $_[0]{rbuf};
1077 1092
1078 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 1093 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
1117 _destroy_state %state; 1132 _destroy_state %state;
1118 1133
1119 %state = (); 1134 %state = ();
1120 $state{recurse} = 1135 $state{recurse} =
1121 http_request ( 1136 http_request (
1122 $method => $url, 1137 $method => $url,
1123 %arg, 1138 %arg,
1124 recurse => $recurse - 1, 1139 recurse => $recurse - 1,
1125 keepalive => 0, 1140 persistent => 0,
1126 sub { 1141 sub {
1127 %state = (); 1142 %state = ();
1128 &$cb 1143 &$cb
1129 } 1144 }
1130 ); 1145 );
1176 1191
1177 # now handle proxy-CONNECT method 1192 # now handle proxy-CONNECT method
1178 if ($proxy && $uscheme eq "https") { 1193 if ($proxy && $uscheme eq "https") {
1179 # oh dear, we have to wrap it into a connect request 1194 # oh dear, we have to wrap it into a connect request
1180 1195
1196 my $auth = exists $hdr{"proxy-authorization"}
1197 ? "proxy-authorization: " . (delete $hdr{"proxy-authorization"}) . "\015\012"
1198 : "";
1199
1181 # maybe re-use $uauthority with patched port? 1200 # maybe re-use $uauthority with patched port?
1182 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012"); 1201 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012$auth\015\012");
1183 $state{handle}->push_read (line => $qr_nlnl, sub { 1202 $state{handle}->push_read (line => $qr_nlnl, sub {
1184 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 1203 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
1185 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }; 1204 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" };
1186 1205
1187 if ($2 == 200) { 1206 if ($2 == 200) {
1190 } else { 1209 } else {
1191 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 }; 1210 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 };
1192 } 1211 }
1193 }); 1212 });
1194 } else { 1213 } else {
1214 delete $hdr{"proxy-authorization"} unless $proxy;
1215
1195 $handle_actual_request->(); 1216 $handle_actual_request->();
1196 } 1217 }
1197 }; 1218 };
1198 1219
1199 _get_slot $uhost, sub { 1220 _get_slot $uhost, sub {
1205 # on a keepalive request (in theory, this should be a separate config option). 1226 # on a keepalive request (in theory, this should be a separate config option).
1206 if ($persistent && $KA_CACHE{$ka_key}) { 1227 if ($persistent && $KA_CACHE{$ka_key}) {
1207 $was_persistent = 1; 1228 $was_persistent = 1;
1208 1229
1209 $state{handle} = ka_fetch $ka_key; 1230 $state{handle} = ka_fetch $ka_key;
1210 $state{handle}->destroyed 1231# $state{handle}->destroyed
1211 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d# 1232# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1212 $prepare_handle->(); 1233 $prepare_handle->();
1213 $state{handle}->destroyed 1234# $state{handle}->destroyed
1214 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d# 1235# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1215 $handle_actual_request->(); 1236 $handle_actual_request->();
1216 1237
1217 } else { 1238 } else {
1218 my $tcp_connect = $arg{tcp_connect} 1239 my $tcp_connect = $arg{tcp_connect}
1219 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1240 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
1277save cookies to disk, and you should call this function after loading them 1298save cookies to disk, and you should call this function after loading them
1278again. If you have a long-running program you can additionally call this 1299again. If you have a long-running program you can additionally call this
1279function from time to time. 1300function from time to time.
1280 1301
1281A cookie jar is initially an empty hash-reference that is managed by this 1302A cookie jar is initially an empty hash-reference that is managed by this
1282module. It's format is subject to change, but currently it is like this: 1303module. Its format is subject to change, but currently it is as follows:
1283 1304
1284The key C<version> has to contain C<1>, otherwise the hash gets 1305The key C<version> has to contain C<2>, otherwise the hash gets
1285emptied. All other keys are hostnames or IP addresses pointing to 1306cleared. All other keys are hostnames or IP addresses pointing to
1286hash-references. The key for these inner hash references is the 1307hash-references. The key for these inner hash references is the
1287server path for which this cookie is meant, and the values are again 1308server path for which this cookie is meant, and the values are again
1288hash-references. Each key of those hash-references is a cookie name, and 1309hash-references. Each key of those hash-references is a cookie name, and
1289the value, you guessed it, is another hash-reference, this time with the 1310the value, you guessed it, is another hash-reference, this time with the
1290key-value pairs from the cookie, except for C<expires> and C<max-age>, 1311key-value pairs from the cookie, except for C<expires> and C<max-age>,
1294 1315
1295Here is an example of a cookie jar with a single cookie, so you have a 1316Here is an example of a cookie jar with a single cookie, so you have a
1296chance of understanding the above paragraph: 1317chance of understanding the above paragraph:
1297 1318
1298 { 1319 {
1299 version => 1, 1320 version => 2,
1300 "10.0.0.1" => { 1321 "10.0.0.1" => {
1301 "/" => { 1322 "/" => {
1302 "mythweb_id" => { 1323 "mythweb_id" => {
1303 _expires => 1293917923, 1324 _expires => 1293917923,
1304 value => "ooRung9dThee3ooyXooM1Ohm", 1325 value => "ooRung9dThee3ooyXooM1Ohm",
1332C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1353C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1333 1354
1334=item $AnyEvent::HTTP::MAX_PER_HOST 1355=item $AnyEvent::HTTP::MAX_PER_HOST
1335 1356
1336The maximum number of concurrent connections to the same host (identified 1357The maximum number of concurrent connections to the same host (identified
1337by the hostname). If the limit is exceeded, then the additional requests 1358by the hostname). If the limit is exceeded, then additional requests
1338are queued until previous connections are closed. Both persistent and 1359are queued until previous connections are closed. Both persistent and
1339non-persistent connections are counted in this limit. 1360non-persistent connections are counted in this limit.
1340 1361
1341The default value for this is C<4>, and it is highly advisable to not 1362The default value for this is C<4>, and it is highly advisable to not
1342increase it much. 1363increase it much.
1449 or die "$file: $!"; 1470 or die "$file: $!";
1450 1471
1451 my %hdr; 1472 my %hdr;
1452 my $ofs = 0; 1473 my $ofs = 0;
1453 1474
1454 warn stat $fh;
1455 warn -s _;
1456 if (stat $fh and -s _) { 1475 if (stat $fh and -s _) {
1457 $ofs = -s _; 1476 $ofs = -s _;
1458 warn "-s is ", $ofs; 1477 warn "-s is ", $ofs;
1459 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; 1478 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1460 $hdr{"range"} = "bytes=$ofs-"; 1479 $hdr{"range"} = "bytes=$ofs-";
1488 my (undef, $hdr) = @_; 1507 my (undef, $hdr) = @_;
1489 1508
1490 my $status = $hdr->{Status}; 1509 my $status = $hdr->{Status};
1491 1510
1492 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { 1511 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1493 utime $fh, $time, $time; 1512 utime $time, $time, $fh;
1494 } 1513 }
1495 1514
1496 if ($status == 200 || $status == 206 || $status == 416) { 1515 if ($status == 200 || $status == 206 || $status == 416) {
1497 # download ok || resume ok || file already fully downloaded 1516 # download ok || resume ok || file already fully downloaded
1498 $cb->(1, $hdr); 1517 $cb->(1, $hdr);

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines