ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.114 by root, Mon Jan 14 21:36:26 2013 UTC vs.
Revision 1.123 by root, Fri May 8 17:34:35 2015 UTC

46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '2.15'; 51our $VERSION = 2.21;
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
89C<http_request> returns a "cancellation guard" - you have to keep the 89C<http_request> returns a "cancellation guard" - you have to keep the
90object at least alive until the callback get called. If the object gets 90object at least alive until the callback get called. If the object gets
91destroyed before the callback is called, the request will be cancelled. 91destroyed before the callback is called, the request will be cancelled.
92 92
93The callback will be called with the response body data as first argument 93The callback will be called with the response body data as first argument
94(or C<undef> if an error occured), and a hash-ref with response headers 94(or C<undef> if an error occurred), and a hash-ref with response headers
95(and trailers) as second argument. 95(and trailers) as second argument.
96 96
97All the headers in that hash are lowercased. In addition to the response 97All the headers in that hash are lowercased. In addition to the response
98headers, the "pseudo-headers" (uppercase to avoid clashing with possible 98headers, the "pseudo-headers" (uppercase to avoid clashing with possible
99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
157=item recurse => $count (default: $MAX_RECURSE) 157=item recurse => $count (default: $MAX_RECURSE)
158 158
159Whether to recurse requests or not, e.g. on redirects, authentication and 159Whether to recurse requests or not, e.g. on redirects, authentication and
160other retries and so on, and how often to do so. 160other retries and so on, and how often to do so.
161 161
162Only redirects to http and https URLs are supported. While most common
163redirection forms are handled entirely within this module, some require
164the use of the optional L<URI> module. If it is required but missing, then
165the request will fail with an error.
166
162=item headers => hashref 167=item headers => hashref
163 168
164The request headers to use. Currently, C<http_request> may provide its own 169The request headers to use. Currently, C<http_request> may provide its own
165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 170C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
166will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:> 171will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
189 194
190C<$scheme> must be either missing or must be C<http> for HTTP. 195C<$scheme> must be either missing or must be C<http> for HTTP.
191 196
192If not specified, then the default proxy is used (see 197If not specified, then the default proxy is used (see
193C<AnyEvent::HTTP::set_proxy>). 198C<AnyEvent::HTTP::set_proxy>).
199
200Currently, if your proxy requires authorization, you have to specify an
201appropriate "Proxy-Authorization" header in every request.
194 202
195=item body => $string 203=item body => $string
196 204
197The request body, usually empty. Will be sent as-is (future versions of 205The request body, usually empty. Will be sent as-is (future versions of
198this module might offer more options). 206this module might offer more options).
242context) - only connections using the same unique ID will be reused. 250context) - only connections using the same unique ID will be reused.
243 251
244=item on_prepare => $callback->($fh) 252=item on_prepare => $callback->($fh)
245 253
246In rare cases you need to "tune" the socket before it is used to 254In rare cases you need to "tune" the socket before it is used to
247connect (for exmaple, to bind it on a given IP address). This parameter 255connect (for example, to bind it on a given IP address). This parameter
248overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 256overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
249and behaves exactly the same way (e.g. it has to provide a 257and behaves exactly the same way (e.g. it has to provide a
250timeout). See the description for the C<$prepare_cb> argument of 258timeout). See the description for the C<$prepare_cb> argument of
251C<AnyEvent::Socket::tcp_connect> for details. 259C<AnyEvent::Socket::tcp_connect> for details.
252 260
689 697
690 $cb->(undef, $hdr); 698 $cb->(undef, $hdr);
691 () 699 ()
692} 700}
693 701
702our %IDEMPOTENT = (
703 DELETE => 1,
704 GET => 1,
705 HEAD => 1,
706 OPTIONS => 1,
707 PUT => 1,
708 TRACE => 1,
709
710 ACL => 1,
711 "BASELINE-CONTROL" => 1,
712 BIND => 1,
713 CHECKIN => 1,
714 CHECKOUT => 1,
715 COPY => 1,
716 LABEL => 1,
717 LINK => 1,
718 MERGE => 1,
719 MKACTIVITY => 1,
720 MKCALENDAR => 1,
721 MKCOL => 1,
722 MKREDIRECTREF => 1,
723 MKWORKSPACE => 1,
724 MOVE => 1,
725 ORDERPATCH => 1,
726 PROPFIND => 1,
727 PROPPATCH => 1,
728 REBIND => 1,
729 REPORT => 1,
730 SEARCH => 1,
731 UNBIND => 1,
732 UNCHECKOUT => 1,
733 UNLINK => 1,
734 UNLOCK => 1,
735 UPDATE => 1,
736 UPDATEREDIRECTREF => 1,
737 "VERSION-CONTROL" => 1,
738);
739
694sub http_request($$@) { 740sub http_request($$@) {
695 my $cb = pop; 741 my $cb = pop;
696 my ($method, $url, %arg) = @_; 742 my ($method, $url, %arg) = @_;
697 743
698 my %hdr; 744 my %hdr;
773 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 819 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
774 820
775 $hdr{"content-length"} = length $arg{body} 821 $hdr{"content-length"} = length $arg{body}
776 if length $arg{body} || $method ne "GET"; 822 if length $arg{body} || $method ne "GET";
777 823
778 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 824 my $idempotent = $IDEMPOTENT{$method};
779 825
780 # default value for keepalive is true iff the request is for an idempotent method 826 # default value for keepalive is true iff the request is for an idempotent method
781 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent; 827 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent;
782 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy; 828 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy;
783 my $was_persistent; # true if this is actually a recycled connection 829 my $was_persistent; # true if this is actually a recycled connection
784 830
785 # the key to use in the keepalive cache 831 # the key to use in the keepalive cache
786 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}"; 832 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}";
787 833
788 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive " : "" : "close ") . "Te"; #1.1 834 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive, " : "" : "close, ") . "Te"; #1.1
789 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 835 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
790 836
791 my %state = (connect_guard => 1); 837 my %state = (connect_guard => 1);
792 838
793 my $ae_error = 595; # connecting 839 my $ae_error = 595; # connecting
803 # send request 849 # send request
804 $hdl->push_write ( 850 $hdl->push_write (
805 "$method $rpath HTTP/1.1\015\012" 851 "$method $rpath HTTP/1.1\015\012"
806 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 852 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
807 . "\015\012" 853 . "\015\012"
808 . (delete $arg{body}) 854 . $arg{body}
809 ); 855 );
810 856
811 # return if error occured during push_write() 857 # return if error occurred during push_write()
812 return unless %state; 858 return unless %state;
813 859
814 # reduce memory usage, save a kitten, also re-use it for the response headers. 860 # reduce memory usage, save a kitten, also re-use it for the response headers.
815 %hdr = (); 861 %hdr = ();
816 862
843 889
844 %hdr = (%$hdr, @pseudo); 890 %hdr = (%$hdr, @pseudo);
845 } 891 }
846 892
847 # redirect handling 893 # redirect handling
848 # microsoft and other shitheads don't give a shit for following standards, 894 # relative uri handling forced by microsoft and other shitheads.
849 # try to support some common forms of broken Location headers. 895 # we give our best and fall back to URI if available.
850 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) { 896 if (exists $hdr{location}) {
897 my $loc = $hdr{location};
898
899 if ($loc =~ m%^//%) { # //
900 $loc = "$rscheme:$loc";
901
902 } elsif ($loc eq "") {
903 $loc = $url;
904
905 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
851 $hdr{location} =~ s/^\.\/+//; 906 $loc =~ s/^\.\/+//;
852 907
908 if ($loc !~ m%^[.?#]%) {
853 my $url = "$rscheme://$uhost:$uport"; 909 my $prefix = "$rscheme://$uhost:$uport";
854 910
855 unless ($hdr{location} =~ s/^\///) { 911 unless ($loc =~ s/^\///) {
856 $url .= $upath; 912 $prefix .= $upath;
857 $url =~ s/\/[^\/]*$//; 913 $prefix =~ s/\/[^\/]*$//;
914 }
915
916 $loc = "$prefix/$loc";
917
918 } elsif (eval { require URI }) { # uri
919 $loc = URI->new_abs ($loc, $url)->as_string;
920
921 } else {
922 return _error %state, $cb, { @pseudo, Status => 599, Reason => "Cannot parse Location (URI module missing)" };
923 #$hdr{Status} = 599;
924 #$hdr{Reason} = "Unparsable Redirect (URI module missing)";
925 #$recurse = 0;
926 }
858 } 927 }
859 928
860 $hdr{location} = "$url/$hdr{location}"; 929 $hdr{location} = $loc;
861 } 930 }
862 931
863 my $redirect; 932 my $redirect;
864 933
865 if ($recurse) { 934 if ($recurse) {
867 936
868 # industry standard is to redirect POST as GET for 937 # industry standard is to redirect POST as GET for
869 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1. 938 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
870 # also, the UA should ask the user for 301 and 307 and POST, 939 # also, the UA should ask the user for 301 and 307 and POST,
871 # industry standard seems to be to simply follow. 940 # industry standard seems to be to simply follow.
872 # we go with the industry standard. 941 # we go with the industry standard. 308 is defined
942 # by rfc7538
873 if ($status == 301 or $status == 302 or $status == 303) { 943 if ($status == 301 or $status == 302 or $status == 303) {
944 $redirect = 1;
874 # HTTP/1.1 is unclear on how to mutate the method 945 # HTTP/1.1 is unclear on how to mutate the method
875 $method = "GET" unless $method eq "HEAD"; 946 unless ($method eq "HEAD") {
876 $redirect = 1; 947 $method = "GET";
948 delete $arg{body};
949 }
877 } elsif ($status == 307) { 950 } elsif ($status == 307 or $status == 308) {
878 $redirect = 1; 951 $redirect = 1;
879 } 952 }
880 } 953 }
881 954
882 my $finish = sub { # ($data, $err_status, $err_reason[, $persistent]) 955 my $finish = sub { # ($data, $err_status, $err_reason[, $persistent])
1109 1182
1110 # now handle proxy-CONNECT method 1183 # now handle proxy-CONNECT method
1111 if ($proxy && $uscheme eq "https") { 1184 if ($proxy && $uscheme eq "https") {
1112 # oh dear, we have to wrap it into a connect request 1185 # oh dear, we have to wrap it into a connect request
1113 1186
1187 my $auth = exists $hdr{"proxy-authorization"}
1188 ? "proxy-authorization: " . (delete $hdr{"proxy-authorization"}) . "\015\012"
1189 : "";
1190
1114 # maybe re-use $uauthority with patched port? 1191 # maybe re-use $uauthority with patched port?
1115 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012\015\012"); 1192 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012$auth\015\012");
1116 $state{handle}->push_read (line => $qr_nlnl, sub { 1193 $state{handle}->push_read (line => $qr_nlnl, sub {
1117 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 1194 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
1118 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }; 1195 or return _error %state, $cb, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" };
1119 1196
1120 if ($2 == 200) { 1197 if ($2 == 200) {
1123 } else { 1200 } else {
1124 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 }; 1201 _error %state, $cb, { @pseudo, Status => $2, Reason => $3 };
1125 } 1202 }
1126 }); 1203 });
1127 } else { 1204 } else {
1205 delete $hdr{"proxy-authorization"} unless $proxy;
1206
1128 $handle_actual_request->(); 1207 $handle_actual_request->();
1129 } 1208 }
1130 }; 1209 };
1131 1210
1132 _get_slot $uhost, sub { 1211 _get_slot $uhost, sub {
1206C<$session_end> is given and true, then additionally remove all session 1285C<$session_end> is given and true, then additionally remove all session
1207cookies. 1286cookies.
1208 1287
1209You should call this function (with a true C<$session_end>) before you 1288You should call this function (with a true C<$session_end>) before you
1210save cookies to disk, and you should call this function after loading them 1289save cookies to disk, and you should call this function after loading them
1211again. If you have a long-running program you can additonally call this 1290again. If you have a long-running program you can additionally call this
1212function from time to time. 1291function from time to time.
1213 1292
1214A cookie jar is initially an empty hash-reference that is managed by this 1293A cookie jar is initially an empty hash-reference that is managed by this
1215module. It's format is subject to change, but currently it is like this: 1294module. Its format is subject to change, but currently it is as follows:
1216 1295
1217The key C<version> has to contain C<1>, otherwise the hash gets 1296The key C<version> has to contain C<1>, otherwise the hash gets
1218emptied. All other keys are hostnames or IP addresses pointing to 1297emptied. All other keys are hostnames or IP addresses pointing to
1219hash-references. The key for these inner hash references is the 1298hash-references. The key for these inner hash references is the
1220server path for which this cookie is meant, and the values are again 1299server path for which this cookie is meant, and the values are again
1221hash-references. The keys of those hash-references is the cookie name, and 1300hash-references. Each key of those hash-references is a cookie name, and
1222the value, you guessed it, is another hash-reference, this time with the 1301the value, you guessed it, is another hash-reference, this time with the
1223key-value pairs from the cookie, except for C<expires> and C<max-age>, 1302key-value pairs from the cookie, except for C<expires> and C<max-age>,
1224which have been replaced by a C<_expires> key that contains the cookie 1303which have been replaced by a C<_expires> key that contains the cookie
1225expiry timestamp. 1304expiry timestamp. Session cookies are indicated by not having an
1305C<_expires> key.
1226 1306
1227Here is an example of a cookie jar with a single cookie, so you have a 1307Here is an example of a cookie jar with a single cookie, so you have a
1228chance of understanding the above paragraph: 1308chance of understanding the above paragraph:
1229 1309
1230 { 1310 {
1264C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 1344C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
1265 1345
1266=item $AnyEvent::HTTP::MAX_PER_HOST 1346=item $AnyEvent::HTTP::MAX_PER_HOST
1267 1347
1268The maximum number of concurrent connections to the same host (identified 1348The maximum number of concurrent connections to the same host (identified
1269by the hostname). If the limit is exceeded, then the additional requests 1349by the hostname). If the limit is exceeded, then additional requests
1270are queued until previous connections are closed. Both persistent and 1350are queued until previous connections are closed. Both persistent and
1271non-persistent connections are counted in this limit. 1351non-persistent connections are counted in this limit.
1272 1352
1273The default value for this is C<4>, and it is highly advisable to not 1353The default value for this is C<4>, and it is highly advisable to not
1274increase it much. 1354increase it much.
1275 1355
1276For comparison: the RFC's recommend 4 non-persistent or 2 persistent 1356For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1277connections, older browsers used 2, newers (such as firefox 3) typically 1357connections, older browsers used 2, newer ones (such as firefox 3)
1278use 6, and Opera uses 8 because like, they have the fastest browser and 1358typically use 6, and Opera uses 8 because like, they have the fastest
1279give a shit for everybody else on the planet. 1359browser and give a shit for everybody else on the planet.
1280 1360
1281=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 1361=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1282 1362
1283The time after which idle persistent conenctions get closed by 1363The time after which idle persistent connections get closed by
1284AnyEvent::HTTP (default: C<3>). 1364AnyEvent::HTTP (default: C<3>).
1285 1365
1286=item $AnyEvent::HTTP::ACTIVE 1366=item $AnyEvent::HTTP::ACTIVE
1287 1367
1288The number of active connections. This is not the number of currently 1368The number of active connections. This is not the number of currently
1353 set_proxy $ENV{http_proxy}; 1433 set_proxy $ENV{http_proxy};
1354}; 1434};
1355 1435
1356=head2 SHOWCASE 1436=head2 SHOWCASE
1357 1437
1358This section contaisn some more elaborate "real-world" examples or code 1438This section contains some more elaborate "real-world" examples or code
1359snippets. 1439snippets.
1360 1440
1361=head2 HTTP/1.1 FILE DOWNLOAD 1441=head2 HTTP/1.1 FILE DOWNLOAD
1362 1442
1363Downloading files with HTTP can be quite tricky, especially when something 1443Downloading files with HTTP can be quite tricky, especially when something
1367last modified time to check for file content changes, and works with many 1447last modified time to check for file content changes, and works with many
1368HTTP/1.0 servers as well, and usually falls back to a complete re-download 1448HTTP/1.0 servers as well, and usually falls back to a complete re-download
1369on older servers. 1449on older servers.
1370 1450
1371It calls the completion callback with either C<undef>, which means a 1451It calls the completion callback with either C<undef>, which means a
1372nonretryable error occured, C<0> when the download was partial and should 1452nonretryable error occurred, C<0> when the download was partial and should
1373be retried, and C<1> if it was successful. 1453be retried, and C<1> if it was successful.
1374 1454
1375 use AnyEvent::HTTP; 1455 use AnyEvent::HTTP;
1376 1456
1377 sub download($$$) { 1457 sub download($$$) {

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines