ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.122 by root, Fri May 8 17:28:39 2015 UTC vs.
Revision 1.140 by root, Wed Mar 6 19:29:18 2024 UTC

4 4
5=head1 SYNOPSIS 5=head1 SYNOPSIS
6 6
7 use AnyEvent::HTTP; 7 use AnyEvent::HTTP;
8 8
9 http_get "http://www.nethype.de/", sub { print $_[1] }; 9 http_get "http://www.nethype.de/", sub {
10 my ($body, $hdr) = @_;
11 print "$hdr->{URL} Status: $hdr->{Status}\n";
12 print $body;
13 };
10 14
11 # ... do something else here 15 # ... do something else here
12 16
13=head1 DESCRIPTION 17=head1 DESCRIPTION
14 18
46use AnyEvent::Util (); 50use AnyEvent::Util ();
47use AnyEvent::Handle (); 51use AnyEvent::Handle ();
48 52
49use base Exporter::; 53use base Exporter::;
50 54
51our $VERSION = 2.21; 55our $VERSION = 2.25;
52 56
53our @EXPORT = qw(http_get http_post http_head http_request); 57our @EXPORT = qw(http_get http_post http_head http_request);
54 58
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 59our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 60our $MAX_RECURSE = 10;
195C<$scheme> must be either missing or must be C<http> for HTTP. 199C<$scheme> must be either missing or must be C<http> for HTTP.
196 200
197If not specified, then the default proxy is used (see 201If not specified, then the default proxy is used (see
198C<AnyEvent::HTTP::set_proxy>). 202C<AnyEvent::HTTP::set_proxy>).
199 203
204Currently, if your proxy requires authorization, you have to specify an
205appropriate "Proxy-Authorization" header in every request.
206
207Note that this module will prefer an existing persistent connection,
208even if that connection was made using another proxy. If you need to
209ensure that a new connection is made in this case, you can either force
210C<persistent> to false or e.g. use the proxy address in your C<sessionid>.
211
200=item body => $string 212=item body => $string
201 213
202The request body, usually empty. Will be sent as-is (future versions of 214The request body, usually empty. Will be sent as-is (future versions of
203this module might offer more options). 215this module might offer more options).
204 216
236The default for this option is C<low>, which could be interpreted as "give 248The default for this option is C<low>, which could be interpreted as "give
237me the page, no matter what". 249me the page, no matter what".
238 250
239See also the C<sessionid> parameter. 251See also the C<sessionid> parameter.
240 252
241=item session => $string 253=item sessionid => $string
242 254
243The module might reuse connections to the same host internally. Sometimes 255The module might reuse connections to the same host internally (regardless
244(e.g. when using TLS), you do not want to reuse connections from other 256of other settings, such as C<tcp_connect> or C<proxy>). Sometimes (e.g.
257when using TLS or a specfic proxy), you do not want to reuse connections
245sessions. This can be achieved by setting this parameter to some unique 258from other sessions. This can be achieved by setting this parameter to
246ID (such as the address of an object storing your state data, or the TLS 259some unique ID (such as the address of an object storing your state data
247context) - only connections using the same unique ID will be reused. 260or the TLS context, or the proxy IP) - only connections using the same
261unique ID will be reused.
248 262
249=item on_prepare => $callback->($fh) 263=item on_prepare => $callback->($fh)
250 264
251In rare cases you need to "tune" the socket before it is used to 265In rare cases you need to "tune" the socket before it is used to
252connect (for example, to bind it on a given IP address). This parameter 266connect (for example, to bind it on a given IP address). This parameter
260In even rarer cases you want total control over how AnyEvent::HTTP 274In even rarer cases you want total control over how AnyEvent::HTTP
261establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect> 275establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
262to do this, but you can provide your own C<tcp_connect> function - 276to do this, but you can provide your own C<tcp_connect> function -
263obviously, it has to follow the same calling conventions, except that it 277obviously, it has to follow the same calling conventions, except that it
264may always return a connection guard object. 278may always return a connection guard object.
279
280The connections made by this hook will be treated as equivalent to
281connections made the built-in way, specifically, they will be put into
282and taken from the persistent connection cache. If your C<$tcp_connect>
283function is incompatible with this kind of re-use, consider switching off
284C<persistent> connections and/or providing a C<sessionid> identifier.
265 285
266There are probably lots of weird uses for this function, starting from 286There are probably lots of weird uses for this function, starting from
267tracing the hosts C<http_request> actually tries to connect, to (inexact 287tracing the hosts C<http_request> actually tries to connect, to (inexact
268but fast) host => IP address caching or even socks protocol support. 288but fast) host => IP address caching or even socks protocol support.
269 289
339=item persistent => $boolean 359=item persistent => $boolean
340 360
341Try to create/reuse a persistent connection. When this flag is set 361Try to create/reuse a persistent connection. When this flag is set
342(default: true for idempotent requests, false for all others), then 362(default: true for idempotent requests, false for all others), then
343C<http_request> tries to re-use an existing (previously-created) 363C<http_request> tries to re-use an existing (previously-created)
344persistent connection to the host and, failing that, tries to create a new 364persistent connection to same host (i.e. identical URL scheme, hostname,
345one. 365port and sessionid) and, failing that, tries to create a new one.
346 366
347Requests failing in certain ways will be automatically retried once, which 367Requests failing in certain ways will be automatically retried once, which
348is dangerous for non-idempotent requests, which is why it defaults to off 368is dangerous for non-idempotent requests, which is why it defaults to off
349for them. The reason for this is because the bozos who designed HTTP/1.1 369for them. The reason for this is because the bozos who designed HTTP/1.1
350made it impossible to distinguish between a fatal error and a normal 370made it impossible to distinguish between a fatal error and a normal
351connection timeout, so you never know whether there was a problem with 371connection timeout, so you never know whether there was a problem with
352your request or not. 372your request or not.
353 373
354When reusing an existent connection, many parameters (such as TLS context) 374When reusing an existent connection, many parameters (such as TLS context)
355will be ignored. See the C<session> parameter for a workaround. 375will be ignored. See the C<sessionid> parameter for a workaround.
356 376
357=item keepalive => $boolean 377=item keepalive => $boolean
358 378
359Only used when C<persistent> is also true. This parameter decides whether 379Only used when C<persistent> is also true. This parameter decides whether
360C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection 380C<http_request> tries to handshake a HTTP/1.0-style keep-alive connection
451 471
452# expire cookies 472# expire cookies
453sub cookie_jar_expire($;$) { 473sub cookie_jar_expire($;$) {
454 my ($jar, $session_end) = @_; 474 my ($jar, $session_end) = @_;
455 475
456 %$jar = () if $jar->{version} != 1; 476 %$jar = () if $jar->{version} != 2;
457 477
458 my $anow = AE::now; 478 my $anow = AE::now;
459 479
460 while (my ($chost, $paths) = each %$jar) { 480 while (my ($chost, $paths) = each %$jar) {
461 next unless ref $paths; 481 next unless ref $paths;
481 501
482# extract cookies from jar 502# extract cookies from jar
483sub cookie_jar_extract($$$$) { 503sub cookie_jar_extract($$$$) {
484 my ($jar, $scheme, $host, $path) = @_; 504 my ($jar, $scheme, $host, $path) = @_;
485 505
486 %$jar = () if $jar->{version} != 1; 506 %$jar = () if $jar->{version} != 2;
507
508 $host = AnyEvent::Util::idn_to_ascii $host
509 if $host =~ /[^\x00-\x7f]/;
487 510
488 my @cookies; 511 my @cookies;
489 512
490 while (my ($chost, $paths) = each %$jar) { 513 while (my ($chost, $paths) = each %$jar) {
491 next unless ref $paths; 514 next unless ref $paths;
492 515
493 if ($chost =~ /^\./) { 516 # exact match or suffix including . match
494 next unless $chost eq substr $host, -length $chost; 517 $chost eq $host or ".$chost" eq substr $host, -1 - length $chost
495 } elsif ($chost =~ /\./) {
496 next unless $chost eq $host;
497 } else {
498 next; 518 or next;
499 }
500 519
501 while (my ($cpath, $cookies) = each %$paths) { 520 while (my ($cpath, $cookies) = each %$paths) {
502 next unless $cpath eq substr $path, 0, length $cpath; 521 next unless $cpath eq substr $path, 0, length $cpath;
503 522
504 while (my ($cookie, $kv) = each %$cookies) { 523 while (my ($cookie, $kv) = each %$cookies) {
525} 544}
526 545
527# parse set_cookie header into jar 546# parse set_cookie header into jar
528sub cookie_jar_set_cookie($$$$) { 547sub cookie_jar_set_cookie($$$$) {
529 my ($jar, $set_cookie, $host, $date) = @_; 548 my ($jar, $set_cookie, $host, $date) = @_;
549
550 %$jar = () if $jar->{version} != 2;
530 551
531 my $anow = int AE::now; 552 my $anow = int AE::now;
532 my $snow; # server-now 553 my $snow; # server-now
533 554
534 for ($set_cookie) { 555 for ($set_cookie) {
580 601
581 my $cdom; 602 my $cdom;
582 my $cpath = (delete $kv{path}) || "/"; 603 my $cpath = (delete $kv{path}) || "/";
583 604
584 if (exists $kv{domain}) { 605 if (exists $kv{domain}) {
585 $cdom = delete $kv{domain}; 606 $cdom = $kv{domain};
586 607
587 $cdom =~ s/^\.?/./; # make sure it starts with a "." 608 $cdom =~ s/^\.?/./; # make sure it starts with a "."
588 609
589 next if $cdom =~ /\.$/; 610 next if $cdom =~ /\.$/;
590 611
591 # this is not rfc-like and not netscape-like. go figure. 612 # this is not rfc-like and not netscape-like. go figure.
592 my $ndots = $cdom =~ y/.//; 613 my $ndots = $cdom =~ y/.//;
593 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2); 614 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
615
616 $cdom = substr $cdom, 1; # remove initial .
594 } else { 617 } else {
595 $cdom = $host; 618 $cdom = $host;
596 } 619 }
597 620
598 # store it 621 # store it
599 $jar->{version} = 1; 622 $jar->{version} = 2;
600 $jar->{lc $cdom}{$cpath}{$name} = \%kv; 623 $jar->{lc $cdom}{$cpath}{$name} = \%kv;
601 624
602 redo if /\G\s*,/gc; 625 redo if /\G\s*,/gc;
603 } 626 }
604} 627}
697} 720}
698 721
699our %IDEMPOTENT = ( 722our %IDEMPOTENT = (
700 DELETE => 1, 723 DELETE => 1,
701 GET => 1, 724 GET => 1,
725 QUERY => 1,
702 HEAD => 1, 726 HEAD => 1,
703 OPTIONS => 1, 727 OPTIONS => 1,
704 PUT => 1, 728 PUT => 1,
705 TRACE => 1, 729 TRACE => 1,
706 730
718 MKCOL => 1, 742 MKCOL => 1,
719 MKREDIRECTREF => 1, 743 MKREDIRECTREF => 1,
720 MKWORKSPACE => 1, 744 MKWORKSPACE => 1,
721 MOVE => 1, 745 MOVE => 1,
722 ORDERPATCH => 1, 746 ORDERPATCH => 1,
747 PRI => 1,
723 PROPFIND => 1, 748 PROPFIND => 1,
724 PROPPATCH => 1, 749 PROPPATCH => 1,
725 REBIND => 1, 750 REBIND => 1,
726 REPORT => 1, 751 REPORT => 1,
727 SEARCH => 1, 752 SEARCH => 1,
770 795
771 my $uport = $uscheme eq "http" ? 80 796 my $uport = $uscheme eq "http" ? 80
772 : $uscheme eq "https" ? 443 797 : $uscheme eq "https" ? 443
773 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" }); 798 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
774 799
775 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 800 $uauthority =~ /^(?: .*\@ )? ([^\@]+?) (?: : (\d+) )?$/x
776 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" }); 801 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
777 802
778 my $uhost = lc $1; 803 my $uhost = lc $1;
779 $uport = $2 if defined $2; 804 $uport = $2 if defined $2;
780 805
892 # we give our best and fall back to URI if available. 917 # we give our best and fall back to URI if available.
893 if (exists $hdr{location}) { 918 if (exists $hdr{location}) {
894 my $loc = $hdr{location}; 919 my $loc = $hdr{location};
895 920
896 if ($loc =~ m%^//%) { # // 921 if ($loc =~ m%^//%) { # //
897 $loc = "$rscheme:$loc"; 922 $loc = "$uscheme:$loc";
898 923
899 } elsif ($loc eq "") { 924 } elsif ($loc eq "") {
900 $loc = $url; 925 $loc = $url;
901 926
902 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple" 927 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
903 $loc =~ s/^\.\/+//; 928 $loc =~ s/^\.\/+//;
904 929
905 if ($loc !~ m%^[.?#]%) { 930 if ($loc !~ m%^[.?#]%) {
906 my $prefix = "$rscheme://$uhost:$uport"; 931 my $prefix = "$uscheme://$uauthority";
907 932
908 unless ($loc =~ s/^\///) { 933 unless ($loc =~ s/^\///) {
909 $prefix .= $upath; 934 $prefix .= $upath;
910 $prefix =~ s/\/[^\/]*$//; 935 $prefix =~ s/\/[^\/]*$//;
911 } 936 }
1028 $finish->(delete $state{handle}); 1053 $finish->(delete $state{handle});
1029 1054
1030 } elsif ($chunked) { 1055 } elsif ($chunked) {
1031 my $cl = 0; 1056 my $cl = 0;
1032 my $body = ""; 1057 my $body = "";
1033 my $on_body = $arg{on_body} || sub { $body .= shift; 1 }; 1058 my $on_body = (!$redirect && $arg{on_body}) || sub { $body .= shift; 1 };
1034 1059
1035 $state{read_chunk} = sub { 1060 $state{read_chunk} = sub {
1036 $_[1] =~ /^([0-9a-fA-F]+)/ 1061 $_[1] =~ /^([0-9a-fA-F]+)/
1037 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding"); 1062 or return $finish->(undef, $ae_error => "Garbled chunked transfer encoding");
1038 1063
1071 } 1096 }
1072 }; 1097 };
1073 1098
1074 $_[0]->push_read (line => $state{read_chunk}); 1099 $_[0]->push_read (line => $state{read_chunk});
1075 1100
1076 } elsif ($arg{on_body}) { 1101 } elsif (!$redirect && $arg{on_body}) {
1077 if (defined $len) { 1102 if (defined $len) {
1078 $_[0]->on_read (sub { 1103 $_[0]->on_read (sub {
1079 $len -= length $_[0]{rbuf}; 1104 $len -= length $_[0]{rbuf};
1080 1105
1081 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 1106 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
1120 _destroy_state %state; 1145 _destroy_state %state;
1121 1146
1122 %state = (); 1147 %state = ();
1123 $state{recurse} = 1148 $state{recurse} =
1124 http_request ( 1149 http_request (
1125 $method => $url, 1150 $method => $url,
1126 %arg, 1151 %arg,
1127 recurse => $recurse - 1, 1152 recurse => $recurse - 1,
1128 keepalive => 0, 1153 persistent => 0,
1129 sub { 1154 sub {
1130 %state = (); 1155 %state = ();
1131 &$cb 1156 &$cb
1132 } 1157 }
1133 ); 1158 );
1214 # on a keepalive request (in theory, this should be a separate config option). 1239 # on a keepalive request (in theory, this should be a separate config option).
1215 if ($persistent && $KA_CACHE{$ka_key}) { 1240 if ($persistent && $KA_CACHE{$ka_key}) {
1216 $was_persistent = 1; 1241 $was_persistent = 1;
1217 1242
1218 $state{handle} = ka_fetch $ka_key; 1243 $state{handle} = ka_fetch $ka_key;
1219 $state{handle}->destroyed 1244# $state{handle}->destroyed
1220 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d# 1245# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (1), please report.";#d#
1221 $prepare_handle->(); 1246 $prepare_handle->();
1222 $state{handle}->destroyed 1247# $state{handle}->destroyed
1223 and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d# 1248# and die "AnyEvent::HTTP: unexpectedly got a destructed handle (2), please report.";#d#
1249 $rpath = $upath;
1224 $handle_actual_request->(); 1250 $handle_actual_request->();
1225 1251
1226 } else { 1252 } else {
1227 my $tcp_connect = $arg{tcp_connect} 1253 my $tcp_connect = $arg{tcp_connect}
1228 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect }; 1254 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
1288function from time to time. 1314function from time to time.
1289 1315
1290A cookie jar is initially an empty hash-reference that is managed by this 1316A cookie jar is initially an empty hash-reference that is managed by this
1291module. Its format is subject to change, but currently it is as follows: 1317module. Its format is subject to change, but currently it is as follows:
1292 1318
1293The key C<version> has to contain C<1>, otherwise the hash gets 1319The key C<version> has to contain C<2>, otherwise the hash gets
1294emptied. All other keys are hostnames or IP addresses pointing to 1320cleared. All other keys are hostnames or IP addresses pointing to
1295hash-references. The key for these inner hash references is the 1321hash-references. The key for these inner hash references is the
1296server path for which this cookie is meant, and the values are again 1322server path for which this cookie is meant, and the values are again
1297hash-references. Each key of those hash-references is a cookie name, and 1323hash-references. Each key of those hash-references is a cookie name, and
1298the value, you guessed it, is another hash-reference, this time with the 1324the value, you guessed it, is another hash-reference, this time with the
1299key-value pairs from the cookie, except for C<expires> and C<max-age>, 1325key-value pairs from the cookie, except for C<expires> and C<max-age>,
1303 1329
1304Here is an example of a cookie jar with a single cookie, so you have a 1330Here is an example of a cookie jar with a single cookie, so you have a
1305chance of understanding the above paragraph: 1331chance of understanding the above paragraph:
1306 1332
1307 { 1333 {
1308 version => 1, 1334 version => 2,
1309 "10.0.0.1" => { 1335 "10.0.0.1" => {
1310 "/" => { 1336 "/" => {
1311 "mythweb_id" => { 1337 "mythweb_id" => {
1312 _expires => 1293917923, 1338 _expires => 1293917923,
1313 value => "ooRung9dThee3ooyXooM1Ohm", 1339 value => "ooRung9dThee3ooyXooM1Ohm",
1458 or die "$file: $!"; 1484 or die "$file: $!";
1459 1485
1460 my %hdr; 1486 my %hdr;
1461 my $ofs = 0; 1487 my $ofs = 0;
1462 1488
1463 warn stat $fh;
1464 warn -s _;
1465 if (stat $fh and -s _) { 1489 if (stat $fh and -s _) {
1466 $ofs = -s _; 1490 $ofs = -s _;
1467 warn "-s is ", $ofs; 1491 warn "-s is ", $ofs;
1468 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9]; 1492 $hdr{"if-unmodified-since"} = AnyEvent::HTTP::format_date +(stat _)[9];
1469 $hdr{"range"} = "bytes=$ofs-"; 1493 $hdr{"range"} = "bytes=$ofs-";
1497 my (undef, $hdr) = @_; 1521 my (undef, $hdr) = @_;
1498 1522
1499 my $status = $hdr->{Status}; 1523 my $status = $hdr->{Status};
1500 1524
1501 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) { 1525 if (my $time = AnyEvent::HTTP::parse_date $hdr->{"last-modified"}) {
1502 utime $fh, $time, $time; 1526 utime $time, $time, $fh;
1503 } 1527 }
1504 1528
1505 if ($status == 200 || $status == 206 || $status == 416) { 1529 if ($status == 200 || $status == 206 || $status == 416) {
1506 # download ok || resume ok || file already fully downloaded 1530 # download ok || resume ok || file already fully downloaded
1507 $cb->(1, $hdr); 1531 $cb->(1, $hdr);

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines