ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.111 by root, Sun Apr 22 12:58:38 2012 UTC vs.
Revision 1.120 by root, Sun Jun 8 23:36:36 2014 UTC

46use AnyEvent::Util (); 46use AnyEvent::Util ();
47use AnyEvent::Handle (); 47use AnyEvent::Handle ();
48 48
49use base Exporter::; 49use base Exporter::;
50 50
51our $VERSION = '2.14'; 51our $VERSION = 2.21;
52 52
53our @EXPORT = qw(http_get http_post http_head http_request); 53our @EXPORT = qw(http_get http_post http_head http_request);
54 54
55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 55our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
56our $MAX_RECURSE = 10; 56our $MAX_RECURSE = 10;
89C<http_request> returns a "cancellation guard" - you have to keep the 89C<http_request> returns a "cancellation guard" - you have to keep the
90object at least alive until the callback get called. If the object gets 90object at least alive until the callback get called. If the object gets
91destroyed before the callback is called, the request will be cancelled. 91destroyed before the callback is called, the request will be cancelled.
92 92
93The callback will be called with the response body data as first argument 93The callback will be called with the response body data as first argument
94(or C<undef> if an error occured), and a hash-ref with response headers 94(or C<undef> if an error occurred), and a hash-ref with response headers
95(and trailers) as second argument. 95(and trailers) as second argument.
96 96
97All the headers in that hash are lowercased. In addition to the response 97All the headers in that hash are lowercased. In addition to the response
98headers, the "pseudo-headers" (uppercase to avoid clashing with possible 98headers, the "pseudo-headers" (uppercase to avoid clashing with possible
99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 99response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
123C<590>-C<599> and the C<Reason> pseudo-header will contain an error 123C<590>-C<599> and the C<Reason> pseudo-header will contain an error
124message. Currently the following status codes are used: 124message. Currently the following status codes are used:
125 125
126=over 4 126=over 4
127 127
128=item 595 - errors during connection etsbalishment, proxy handshake. 128=item 595 - errors during connection establishment, proxy handshake.
129 129
130=item 596 - errors during TLS negotiation, request sending and header processing. 130=item 596 - errors during TLS negotiation, request sending and header processing.
131 131
132=item 597 - errors during body receiving or processing. 132=item 597 - errors during body receiving or processing.
133 133
154 154
155=over 4 155=over 4
156 156
157=item recurse => $count (default: $MAX_RECURSE) 157=item recurse => $count (default: $MAX_RECURSE)
158 158
159Whether to recurse requests or not, e.g. on redirects, authentication 159Whether to recurse requests or not, e.g. on redirects, authentication and
160retries and so on, and how often to do so. 160other retries and so on, and how often to do so.
161
162Only redirects to http and https URLs are supported. While most common
163redirection forms are handled entirely within this module, some require
164the use of the optional L<URI> module. If it is required but missing, then
165the request will fail with an error.
161 166
162=item headers => hashref 167=item headers => hashref
163 168
164The request headers to use. Currently, C<http_request> may provide its own 169The request headers to use. Currently, C<http_request> may provide its own
165C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and 170C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
242context) - only connections using the same unique ID will be reused. 247context) - only connections using the same unique ID will be reused.
243 248
244=item on_prepare => $callback->($fh) 249=item on_prepare => $callback->($fh)
245 250
246In rare cases you need to "tune" the socket before it is used to 251In rare cases you need to "tune" the socket before it is used to
247connect (for exmaple, to bind it on a given IP address). This parameter 252connect (for example, to bind it on a given IP address). This parameter
248overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 253overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
249and behaves exactly the same way (e.g. it has to provide a 254and behaves exactly the same way (e.g. it has to provide a
250timeout). See the description for the C<$prepare_cb> argument of 255timeout). See the description for the C<$prepare_cb> argument of
251C<AnyEvent::Socket::tcp_connect> for details. 256C<AnyEvent::Socket::tcp_connect> for details.
252 257
689 694
690 $cb->(undef, $hdr); 695 $cb->(undef, $hdr);
691 () 696 ()
692} 697}
693 698
699our %IDEMPOTENT = (
700 DELETE => 1,
701 GET => 1,
702 HEAD => 1,
703 OPTIONS => 1,
704 PUT => 1,
705 TRACE => 1,
706
707 ACL => 1,
708 "BASELINE-CONTROL" => 1,
709 BIND => 1,
710 CHECKIN => 1,
711 CHECKOUT => 1,
712 COPY => 1,
713 LABEL => 1,
714 LINK => 1,
715 MERGE => 1,
716 MKACTIVITY => 1,
717 MKCALENDAR => 1,
718 MKCOL => 1,
719 MKREDIRECTREF => 1,
720 MKWORKSPACE => 1,
721 MOVE => 1,
722 ORDERPATCH => 1,
723 PROPFIND => 1,
724 PROPPATCH => 1,
725 REBIND => 1,
726 REPORT => 1,
727 SEARCH => 1,
728 UNBIND => 1,
729 UNCHECKOUT => 1,
730 UNLINK => 1,
731 UNLOCK => 1,
732 UPDATE => 1,
733 UPDATEREDIRECTREF => 1,
734 "VERSION-CONTROL" => 1,
735);
736
694sub http_request($$@) { 737sub http_request($$@) {
695 my $cb = pop; 738 my $cb = pop;
696 my ($method, $url, %arg) = @_; 739 my ($method, $url, %arg) = @_;
697 740
698 my %hdr; 741 my %hdr;
773 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"}; 816 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
774 817
775 $hdr{"content-length"} = length $arg{body} 818 $hdr{"content-length"} = length $arg{body}
776 if length $arg{body} || $method ne "GET"; 819 if length $arg{body} || $method ne "GET";
777 820
778 my $idempotent = $method =~ /^(?:GET|HEAD|PUT|DELETE|OPTIONS|TRACE)$/; 821 my $idempotent = $IDEMPOTENT{$method};
779 822
780 # default value for keepalive is true iff the request is for an idempotent method 823 # default value for keepalive is true iff the request is for an idempotent method
781 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent; 824 my $persistent = exists $arg{persistent} ? !!$arg{persistent} : $idempotent;
782 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy; 825 my $keepalive = exists $arg{keepalive} ? !!$arg{keepalive} : !$proxy;
783 my $was_persistent; # true if this is actually a recycled connection 826 my $was_persistent; # true if this is actually a recycled connection
784 827
785 # the key to use in the keepalive cache 828 # the key to use in the keepalive cache
786 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}"; 829 my $ka_key = "$uscheme\x00$uhost\x00$uport\x00$arg{sessionid}";
787 830
788 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive " : "" : "close ") . "Te"; #1.1 831 $hdr{connection} = ($persistent ? $keepalive ? "keep-alive, " : "" : "close, ") . "Te"; #1.1
789 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1 832 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
790 833
791 my %state = (connect_guard => 1); 834 my %state = (connect_guard => 1);
792 835
793 my $ae_error = 595; # connecting 836 my $ae_error = 595; # connecting
803 # send request 846 # send request
804 $hdl->push_write ( 847 $hdl->push_write (
805 "$method $rpath HTTP/1.1\015\012" 848 "$method $rpath HTTP/1.1\015\012"
806 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 849 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
807 . "\015\012" 850 . "\015\012"
808 . (delete $arg{body}) 851 . $arg{body}
809 ); 852 );
810 853
811 # return if error occured during push_write() 854 # return if error occurred during push_write()
812 return unless %state; 855 return unless %state;
813 856
814 # reduce memory usage, save a kitten, also re-use it for the response headers. 857 # reduce memory usage, save a kitten, also re-use it for the response headers.
815 %hdr = (); 858 %hdr = ();
816 859
843 886
844 %hdr = (%$hdr, @pseudo); 887 %hdr = (%$hdr, @pseudo);
845 } 888 }
846 889
847 # redirect handling 890 # redirect handling
848 # microsoft and other shitheads don't give a shit for following standards, 891 # relative uri handling forced by microsoft and other shitheads.
849 # try to support some common forms of broken Location headers. 892 # we give our best and fall back to URI if available.
850 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) { 893 if (exists $hdr{location}) {
894 my $loc = $hdr{location};
895
896 if ($loc =~ m%^//%) { # //
897 $loc = "$rscheme:$loc";
898
899 } elsif ($loc eq "") {
900 $loc = $url;
901
902 } elsif ($loc !~ /^(?: $ | [^:\/?\#]+ : )/x) { # anything "simple"
851 $hdr{location} =~ s/^\.\/+//; 903 $loc =~ s/^\.\/+//;
852 904
905 if ($loc !~ m%^[.?#]%) {
853 my $url = "$rscheme://$uhost:$uport"; 906 my $prefix = "$rscheme://$uhost:$uport";
854 907
855 unless ($hdr{location} =~ s/^\///) { 908 unless ($loc =~ s/^\///) {
856 $url .= $upath; 909 $prefix .= $upath;
857 $url =~ s/\/[^\/]*$//; 910 $prefix =~ s/\/[^\/]*$//;
911 }
912
913 $loc = "$prefix/$loc";
914
915 } elsif (eval { require URI }) { # uri
916 $loc = URI->new_abs ($loc, $url)->as_string;
917
918 } else {
919 return _error %state, $cb, { @pseudo, Status => 599, Reason => "Cannot parse Location (URI module missing)" };
920 #$hdr{Status} = 599;
921 #$hdr{Reason} = "Unparsable Redirect (URI module missing)";
922 #$recurse = 0;
923 }
858 } 924 }
859 925
860 $hdr{location} = "$url/$hdr{location}"; 926 $hdr{location} = $loc;
861 } 927 }
862 928
863 my $redirect; 929 my $redirect;
864 930
865 if ($recurse) { 931 if ($recurse) {
867 933
868 # industry standard is to redirect POST as GET for 934 # industry standard is to redirect POST as GET for
869 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1. 935 # 301, 302 and 303, in contrast to HTTP/1.0 and 1.1.
870 # also, the UA should ask the user for 301 and 307 and POST, 936 # also, the UA should ask the user for 301 and 307 and POST,
871 # industry standard seems to be to simply follow. 937 # industry standard seems to be to simply follow.
872 # we go with the industry standard. 938 # we go with the industry standard. 308 is defined
939 # by rfc7238
873 if ($status == 301 or $status == 302 or $status == 303) { 940 if ($status == 301 or $status == 302 or $status == 303) {
874 # HTTP/1.1 is unclear on how to mutate the method 941 # HTTP/1.1 is unclear on how to mutate the method
875 $method = "GET" unless $method eq "HEAD"; 942 $method = "GET" unless $method eq "HEAD";
876 $redirect = 1; 943 $redirect = 1;
877 } elsif ($status == 307) { 944 } elsif ($status == 307 or $status == 308) {
878 $redirect = 1; 945 $redirect = 1;
879 } 946 }
880 } 947 }
881 948
882 my $finish = sub { # ($data, $err_status, $err_reason[, $persistent]) 949 my $finish = sub { # ($data, $err_status, $err_reason[, $persistent])
1050 _destroy_state %state; 1117 _destroy_state %state;
1051 1118
1052 %state = (); 1119 %state = ();
1053 $state{recurse} = 1120 $state{recurse} =
1054 http_request ( 1121 http_request (
1055 $method => $url, 1122 $method => $url,
1056 %arg, 1123 %arg,
1124 recurse => $recurse - 1,
1057 keepalive => 0, 1125 keepalive => 0,
1058 sub { 1126 sub {
1059 %state = (); 1127 %state = ();
1060 &$cb 1128 &$cb
1061 } 1129 }
1193Sets the default proxy server to use. The proxy-url must begin with a 1261Sets the default proxy server to use. The proxy-url must begin with a
1194string of the form C<http://host:port>, croaks otherwise. 1262string of the form C<http://host:port>, croaks otherwise.
1195 1263
1196To clear an already-set proxy, use C<undef>. 1264To clear an already-set proxy, use C<undef>.
1197 1265
1198When AnyEvent::HTTP is laoded for the first time it will query the 1266When AnyEvent::HTTP is loaded for the first time it will query the
1199default proxy from the operating system, currently by looking at 1267default proxy from the operating system, currently by looking at
1200C<$ENV{http_proxy>}. 1268C<$ENV{http_proxy>}.
1201 1269
1202=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end] 1270=item AnyEvent::HTTP::cookie_jar_expire $jar[, $session_end]
1203 1271
1205C<$session_end> is given and true, then additionally remove all session 1273C<$session_end> is given and true, then additionally remove all session
1206cookies. 1274cookies.
1207 1275
1208You should call this function (with a true C<$session_end>) before you 1276You should call this function (with a true C<$session_end>) before you
1209save cookies to disk, and you should call this function after loading them 1277save cookies to disk, and you should call this function after loading them
1210again. If you have a long-running program you can additonally call this 1278again. If you have a long-running program you can additionally call this
1211function from time to time. 1279function from time to time.
1212 1280
1213A cookie jar is initially an empty hash-reference that is managed by this 1281A cookie jar is initially an empty hash-reference that is managed by this
1214module. It's format is subject to change, but currently it is like this: 1282module. It's format is subject to change, but currently it is like this:
1215 1283
1216The key C<version> has to contain C<1>, otherwise the hash gets 1284The key C<version> has to contain C<1>, otherwise the hash gets
1217emptied. All other keys are hostnames or IP addresses pointing to 1285emptied. All other keys are hostnames or IP addresses pointing to
1218hash-references. The key for these inner hash references is the 1286hash-references. The key for these inner hash references is the
1219server path for which this cookie is meant, and the values are again 1287server path for which this cookie is meant, and the values are again
1220hash-references. The keys of those hash-references is the cookie name, and 1288hash-references. Each key of those hash-references is a cookie name, and
1221the value, you guessed it, is another hash-reference, this time with the 1289the value, you guessed it, is another hash-reference, this time with the
1222key-value pairs from the cookie, except for C<expires> and C<max-age>, 1290key-value pairs from the cookie, except for C<expires> and C<max-age>,
1223which have been replaced by a C<_expires> key that contains the cookie 1291which have been replaced by a C<_expires> key that contains the cookie
1224expiry timestamp. 1292expiry timestamp. Session cookies are indicated by not having an
1293C<_expires> key.
1225 1294
1226Here is an example of a cookie jar with a single cookie, so you have a 1295Here is an example of a cookie jar with a single cookie, so you have a
1227chance of understanding the above paragraph: 1296chance of understanding the above paragraph:
1228 1297
1229 { 1298 {
1271 1340
1272The default value for this is C<4>, and it is highly advisable to not 1341The default value for this is C<4>, and it is highly advisable to not
1273increase it much. 1342increase it much.
1274 1343
1275For comparison: the RFC's recommend 4 non-persistent or 2 persistent 1344For comparison: the RFC's recommend 4 non-persistent or 2 persistent
1276connections, older browsers used 2, newers (such as firefox 3) typically 1345connections, older browsers used 2, newer ones (such as firefox 3)
1277use 6, and Opera uses 8 because like, they have the fastest browser and 1346typically use 6, and Opera uses 8 because like, they have the fastest
1278give a shit for everybody else on the planet. 1347browser and give a shit for everybody else on the planet.
1279 1348
1280=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 1349=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
1281 1350
1282The time after which idle persistent conenctions get closed by 1351The time after which idle persistent connections get closed by
1283AnyEvent::HTTP (default: C<3>). 1352AnyEvent::HTTP (default: C<3>).
1284 1353
1285=item $AnyEvent::HTTP::ACTIVE 1354=item $AnyEvent::HTTP::ACTIVE
1286 1355
1287The number of active connections. This is not the number of currently 1356The number of active connections. This is not the number of currently
1352 set_proxy $ENV{http_proxy}; 1421 set_proxy $ENV{http_proxy};
1353}; 1422};
1354 1423
1355=head2 SHOWCASE 1424=head2 SHOWCASE
1356 1425
1357This section contaisn some more elaborate "real-world" examples or code 1426This section contains some more elaborate "real-world" examples or code
1358snippets. 1427snippets.
1359 1428
1360=head2 HTTP/1.1 FILE DOWNLOAD 1429=head2 HTTP/1.1 FILE DOWNLOAD
1361 1430
1362Downloading files with HTTP can be quite tricky, especially when something 1431Downloading files with HTTP can be quite tricky, especially when something
1366last modified time to check for file content changes, and works with many 1435last modified time to check for file content changes, and works with many
1367HTTP/1.0 servers as well, and usually falls back to a complete re-download 1436HTTP/1.0 servers as well, and usually falls back to a complete re-download
1368on older servers. 1437on older servers.
1369 1438
1370It calls the completion callback with either C<undef>, which means a 1439It calls the completion callback with either C<undef>, which means a
1371nonretryable error occured, C<0> when the download was partial and should 1440nonretryable error occurred, C<0> when the download was partial and should
1372be retried, and C<1> if it was successful. 1441be retried, and C<1> if it was successful.
1373 1442
1374 use AnyEvent::HTTP; 1443 use AnyEvent::HTTP;
1375 1444
1376 sub download($$$) { 1445 sub download($$$) {

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines