ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.126 by root, Thu Jan 7 13:14:16 2016 UTC vs.
Revision 1.136 by root, Wed Oct 16 01:20:02 2019 UTC

46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = 2.22; 51our $VERSION = 2.24;
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
198C<AnyEvent::HTTP::set_proxy>). 198C<AnyEvent::HTTP::set_proxy>).
199 199
200Currently, if your proxy requires authorization, you have to specify an 200Currently, if your proxy requires authorization, you have to specify an
201appropriate "Proxy-Authorization" header in every request. 201appropriate "Proxy-Authorization" header in every request.
202 202
203Note that this module will prefer an existing persistent connection,
204even if that connection was made using another proxy. If you need to
205ensure that a new connection is made in this case, you can either force
206C<persistent> to false or e.g. use the proxy address in your C<sessionid>.
207
203=item body => $string 208=item body => $string
204 209
205The request body, usually empty. Will be sent as-is (future versions of 210The request body, usually empty. Will be sent as-is (future versions of
206this module might offer more options). 211this module might offer more options).
207 212
239The default for this option is C<low>, which could be interpreted as "give 244The default for this option is C<low>, which could be interpreted as "give
240me the page, no matter what". 245me the page, no matter what".
241 246
242See also the C<sessionid> parameter. 247See also the C<sessionid> parameter.
243 248
244=item session => $string 249=item sessionid => $string
245 250
246The module might reuse connections to the same host internally. Sometimes 251The module might reuse connections to the same host internally (regardless
247(e.g. when using TLS), you do not want to reuse connections from other 252of other settings, such as C<tcp_connect> or C<proxy>). Sometimes (e.g.
253when using TLS or a specfic proxy), you do not want to reuse connections
248sessions. This can be achieved by setting this parameter to some unique 254from other sessions. This can be achieved by setting this parameter to
249ID (such as the address of an object storing your state data, or the TLS 255some unique ID (such as the address of an object storing your state data
250context) - only connections using the same unique ID will be reused. 256or the TLS context, or the proxy IP) - only connections using the same
257unique ID will be reused.
251 258
252=item on_prepare => $callback->($fh) 259=item on_prepare => $callback->($fh)
253 260
254In rare cases you need to "tune" the socket before it is used to 261In rare cases you need to "tune" the socket before it is used to
255connect (for example, to bind it on a given IP address). This parameter 262connect (for example, to bind it on a given IP address). This parameter
263In even rarer cases you want total control over how AnyEvent::HTTP 270In even rarer cases you want total control over how AnyEvent::HTTP
264establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect> 271establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
265to do this, but you can provide your own C<tcp_connect> function - 272to do this, but you can provide your own C<tcp_connect> function -
266obviously, it has to follow the same calling conventions, except that it 273obviously, it has to follow the same calling conventions, except that it
267may always return a connection guard object. 274may always return a connection guard object.
275
276The connections made by this hook will be treated as equivalent to
277connections made the built-in way, specifically, they will be put into
278and taken from the persistent connection cache. If your C<$tcp_connect>
279function is incompatible with this kind of re-use, consider switching off
280C<persistent> connections and/or providing a C<sessionid> identifier.
268 281
269There are probably lots of weird uses for this function, starting from 282There are probably lots of weird uses for this function, starting from
270tracing the hosts C<http_request> actually tries to connect, to (inexact 283tracing the hosts C<http_request> actually tries to connect, to (inexact
271but fast) host => IP address caching or even socks protocol support. 284but fast) host => IP address caching or even socks protocol support.
272 285
342=item persistent => $boolean 355=item persistent => $boolean
343 356
344Try to create/reuse a persistent connection. When this flag is set 357Try to create/reuse a persistent connection. When this flag is set
345(default: true for idempotent requests, false for all others), then 358(default: true for idempotent requests, false for all others), then
346C<http_request> tries to re-use an existing (previously-created) 359C<http_request> tries to re-use an existing (previously-created)
347persistent connection to the host and, failing that, tries to create a new 360persistent connection to same host (i.e. identical URL scheme, hostname,
348one. 361port and sessionid) and, failing that, tries to create a new one.
349 362
350Requests failing in certain ways will be automatically retried once, which 363Requests failing in certain ways will be automatically retried once, which
351is dangerous for non-idempotent requests, which is why it defaults to off 364is dangerous for non-idempotent requests, which is why it defaults to off
352for them. The reason for this is because the bozos who designed HTTP/1.1 365for them. The reason for this is because the bozos who designed HTTP/1.1
353made it impossible to distinguish between a fatal error and a normal 366made it impossible to distinguish between a fatal error and a normal
354connection timeout, so you never know whether there was a problem with 367connection timeout, so you never know whether there was a problem with
355your request or not. 368your request or not.
356 369
357When reusing an existent connection, many parameters (such as TLS context) 370When reusing an existent connection, many parameters (such as TLS context)
358will be ignored. See the C<session> parameter for a workaround. 371will be ignored. See the C<sessionid> parameter for a workaround.
359 372
360=item keepalive => $boolean 373=item keepalive => $boolean
361 374
362Only used when C<persistent> is also true. This parameter decides whether 375Only used when C<persistent> is also true. This parameter decides whether
363C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection 376C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection
454 467
455# expire cookies 468# expire cookies
456sub cookie_jar_expire($;$) { 469sub cookie_jar_expire($;$) {
457 my ($jar, $session_end) = @_; 470 my ($jar, $session_end) = @_;
458 471
459 %$jar = () if $jar->{version} != 1; 472 %$jar = () if $jar->{version} != 2;
460 473
461 my $anow = AE::now; 474 my $anow = AE::now;
462 475
463 while (my ($chost, $paths) = each %$jar) { 476 while (my ($chost, $paths) = each %$jar) {
464 next unless ref $paths; 477 next unless ref $paths;
484 497
485# extract cookies from jar 498# extract cookies from jar
486sub cookie_jar_extract($$$$) { 499sub cookie_jar_extract($$$$) {
487 my ($jar, $scheme, $host, $path) = @_; 500 my ($jar, $scheme, $host, $path) = @_;
488 501
489 %$jar = () if $jar->{version} != 1; 502 %$jar = () if $jar->{version} != 2;
503
504 $host = AnyEvent::Util::idn_to_ascii $host
505 if $host =~ /[^\x00-\x7f]/;
490 506
491 my @cookies; 507 my @cookies;
492 508
493 while (my ($chost, $paths) = each %$jar) { 509 while (my ($chost, $paths) = each %$jar) {
494 next unless ref $paths; 510 next unless ref $paths;
495 511
496 if ($chost =~ /^\./) { 512 # exact match or suffix including . match
497 next unless $chost eq substr $host, -length $chost; 513 $chost eq $host or ".$chost" eq substr $host, -1 - length $chost
498 } elsif ($chost =~ /\./) {
499 next unless $chost eq $host;
500 } else {
501 next; 514 or next;
502 }
503 515
504 while (my ($cpath, $cookies) = each %$paths) { 516 while (my ($cpath, $cookies) = each %$paths) {
505 next unless $cpath eq substr $path, 0, length $cpath; 517 next unless $cpath eq substr $path, 0, length $cpath;
506 518
507 while (my ($cookie, $kv) = each %$cookies) { 519 while (my ($cookie, $kv) = each %$cookies) {
528} 540}
529 541
530# parse set_cookie header into jar 542# parse set_cookie header into jar
531sub cookie_jar_set_cookie($$$$) { 543sub cookie_jar_set_cookie($$$$) {
532 my ($jar, $set_cookie, $host, $date) = @_; 544 my ($jar, $set_cookie, $host, $date) = @_;
545
546 %$jar = () if $jar->{version} != 2;
533 547
534 my $anow = int AE::now; 548 my $anow = int AE::now;
535 my $snow; # server-now 549 my $snow; # server-now
536 550
537 for ($set_cookie) { 551 for ($set_cookie) {
583 597
584 my $cdom; 598 my $cdom;
585 my $cpath = (delete $kv{path}) || "/"; 599 my $cpath = (delete $kv{path}) || "/";
586 600
587 if (exists $kv{domain}) { 601 if (exists $kv{domain}) {
588 $cdom = delete $kv{domain}; 602 $cdom = $kv{domain};
589 603
590 $cdom =~ s/^\.?/./; # make sure it starts with a "." 604 $cdom =~ s/^\.?/./; # make sure it starts with a "."
591 605
592 next if $cdom =~ /\.$/; 606 next if $cdom =~ /\.$/;
593 607
594 # this is not rfc-like and not netscape-like. go figure. 608 # this is not rfc-like and not netscape-like. go figure.
595 my $ndots = $cdom =~ y/.//; 609 my $ndots = $cdom =~ y/.//;
596 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 610 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
611
612 $cdom = substr $cdom, 1; # remove initial .
597 } else { 613 } else {
598 $cdom = $host; 614 $cdom = $host;
599 } 615 }
600 616
601 # store it 617 # store it
602 $jar->{version} = 1; 618 $jar->{version} = 2;
603 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 619 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
604 620
605 redo if /\G\s*,/gc; 621 redo if /\G\s*,/gc;
606 } 622 }
607} 623}
904 920
905 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple" 921 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
906 $loc =~ s/^\.\/+//; 922 $loc =~ s/^\.\/+//;
907 923
908 if ($loc !~ m%^[.?#]%) { 924 if ($loc !~ m%^[.?#]%) {
909 my $prefix = "$uscheme://$uhost:$uport"; 925 my $prefix = "$uscheme://$uauthority";
910 926
911 unless ($loc =~ s/^\///) { 927 unless ($loc =~ s/^\///) {
912 $prefix .= $upath; 928 $prefix .= $upath;
913 $prefix =~ s/\/[^\/]*$//; 929 $prefix =~ s/\/[^\/]*$//;
914 } 930 }
1031 $finish->(delete $state{handle}); 1047 $finish->(delete $state{handle});
1032 1048
1033 } elsif ($chunked) { 1049 } elsif ($chunked) {
1034 my $cl = 0; 1050 my $cl = 0;
1035 my $body = ""; 1051 my $body = "";
1036 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1052 my $on_body = (!$redirect && $arg{on_body}) || sub { $body .= shift; 1 };
1037 1053
1038 $state{read_chunk} = sub { 1054 $state{read_chunk} = sub {
1039 $_[1] =~ /^([0-9a-fA-F]+)/ 1055 $_[1] =~ /^([0-9a-fA-F]+)/
1040 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1056 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
1041 1057
1074 } 1090 }
1075 }; 1091 };
1076 1092
1077 $_[0]->push_read (line => $state{read_chunk}); 1093 $_[0]->push_read (line => $state{read_chunk});
1078 1094
1079 } elsif ($arg{on_body}) { 1095 } elsif (!$redirect && $arg{on_body}) {
1080 if (defined $len) { 1096 if (defined $len) {
1081 $_[0]->on_read (sub { 1097 $_[0]->on_read (sub {
1082 $len -= length $_[0]{rbuf}; 1098 $len -= length $_[0]{rbuf};
1083 1099
1084 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 1100 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
1217 # on a keepalive request (in theory, this should be a separate config option). 1233 # on a keepalive request (in theory, this should be a separate config option).
1218 if ($persistent && $KA_CACHE{$ka_key}) { 1234 if ($persistent && $KA_CACHE{$ka_key}) {
1219 $was_persistent = 1; 1235 $was_persistent = 1;
1220 1236
1221 $state{handle} = ka_fetch $ka_key; 1237 $state{handle} = ka_fetch $ka_key;
1222 $state{handle}->destroyed 1238# $state{handle}->destroyed
1223 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d# 1239# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1224 $prepare_handle->(); 1240 $prepare_handle->();
1225 $state{handle}->destroyed 1241# $state{handle}->destroyed
1226 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d# 1242# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1227 $handle_actual_request->(); 1243 $handle_actual_request->();
1228 1244
1229 } else { 1245 } else {
1230 my $tcp_connect = $arg{tcp_connect} 1246 my $tcp_connect = $arg{tcp_connect}
1231 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1247 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
1291function from time to time. 1307function from time to time.
1292 1308
1293A cookie jar is initially an empty hash-reference that is managed by this 1309A cookie jar is initially an empty hash-reference that is managed by this
1294module. Its format is subject to change, but currently it is as follows: 1310module. Its format is subject to change, but currently it is as follows:
1295 1311
1296The key C<version> has to contain C<1>, otherwise the hash gets 1312The key C<version> has to contain C<2>, otherwise the hash gets
1297emptied. All other keys are hostnames or IP addresses pointing to 1313cleared. All other keys are hostnames or IP addresses pointing to
1298hash-references. The key for these inner hash references is the 1314hash-references. The key for these inner hash references is the
1299server path for which this cookie is meant, and the values are again 1315server path for which this cookie is meant, and the values are again
1300hash-references. Each key of those hash-references is a cookie name, and 1316hash-references. Each key of those hash-references is a cookie name, and
1301the value, you guessed it, is another hash-reference, this time with the 1317the value, you guessed it, is another hash-reference, this time with the
1302key-value pairs from the cookie, except for C<expires> and C<max-age>, 1318key-value pairs from the cookie, except for C<expires> and C<max-age>,
1306 1322
1307Here is an example of a cookie jar with a single cookie, so you have a 1323Here is an example of a cookie jar with a single cookie, so you have a
1308chance of understanding the above paragraph: 1324chance of understanding the above paragraph:
1309 1325
1310 { 1326 {
1311 version => 1, 1327 version => 2,
1312 "10.0.0.1" => { 1328 "10.0.0.1" => {
1313 "/" => { 1329 "/" => {
1314 "mythweb_id" => { 1330 "mythweb_id" => {
1315 _expires => 1293917923, 1331 _expires => 1293917923,
1316 value => "ooRung9dThee3ooyXooM1Ohm", 1332 value => "ooRung9dThee3ooyXooM1Ohm",
1461 or die "$file: $!"; 1477 or die "$file: $!";
1462 1478
1463 my %hdr; 1479 my %hdr;
1464 my $ofs = 0; 1480 my $ofs = 0;
1465 1481
1466 warn stat $fh;
1467 warn -s _;
1468 if (stat $fh and -s _) { 1482 if (stat $fh and -s _) {
1469 $ofs = -s _; 1483 $ofs = -s _;
1470 warn "-s is ", $ofs; 1484 warn "-s is ", $ofs;
1471 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; 1485 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1472 $hdr{"range"} = "bytes=$ofs-"; 1486 $hdr{"range"} = "bytes=$ofs-";
1500 my (undef, $hdr) = @_; 1514 my (undef, $hdr) = @_;
1501 1515
1502 my $status = $hdr->{Status}; 1516 my $status = $hdr->{Status};
1503 1517
1504 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { 1518 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1505 utime $fh, $time, $time; 1519 utime $time, $time, $fh;
1506 } 1520 }
1507 1521
1508 if ($status == 200 || $status == 206 || $status == 416) { 1522 if ($status == 200 || $status == 206 || $status == 416) {
1509 # download ok || resume ok || file already fully downloaded 1523 # download ok || resume ok || file already fully downloaded
1510 $cb->(1, $hdr); 1524 $cb->(1, $hdr);

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines