ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.47
Committed: Wed Aug 5 16:07:48 2009 UTC (14 years, 9 months ago) by root
Branch: MAIN
Changes since 1.46: +12 -7 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.44 use AnyEvent 4.8 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.46 our $VERSION = '1.41';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143 root 1.47 and will provide defaults for C<User-Agent:> and C<Referer:> (this can be
144     suppressed by using C<undef> for these headers in which case they won't be
145     send at all).
146 root 1.1
147     =item timeout => $seconds
148    
149     The time-out to use for various stages - each connect attempt will reset
150 root 1.2 the timeout, as will read or write activity. Default timeout is 5 minutes.
151    
152     =item proxy => [$host, $port[, $scheme]] or undef
153    
154     Use the given http proxy for all requests. If not specified, then the
155     default proxy (as specified by C<$ENV{http_proxy}>) is used.
156    
157 root 1.47 C<$scheme> must be either missing, C<http> for HTTP or C<https> for
158 root 1.2 HTTPS.
159 root 1.1
160 root 1.3 =item body => $string
161    
162     The request body, usually empty. Will be-sent as-is (future versions of
163     this module might offer more options).
164    
165 root 1.10 =item cookie_jar => $hash_ref
166    
167     Passing this parameter enables (simplified) cookie-processing, loosely
168     based on the original netscape specification.
169    
170     The C<$hash_ref> must be an (initially empty) hash reference which will
171     get updated automatically. It is possible to save the cookie_jar to
172     persistent storage with something like JSON or Storable, but this is not
173 root 1.40 recommended, as expiry times are currently being ignored.
174 root 1.10
175     Note that this cookie implementation is not of very high quality, nor
176     meant to be complete. If you want complete cookie management you have to
177     do that on your own. C<cookie_jar> is meant as a quick fix to get some
178     cookie-using sites working. Cookies are a privacy disaster, do not use
179     them unless required to.
180    
181 root 1.40 =item tls_ctx => $scheme | $tls_ctx
182    
183     Specifies the AnyEvent::TLS context to be used for https connections. This
184     parameter follows the same rules as the C<tls_ctx> parameter to
185     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
186     C<high> can be specified, which give you a predefined low-security (no
187     verification, highest compatibility) and high-security (CA and common-name
188     verification) TLS context.
189    
190     The default for this option is C<low>, which could be interpreted as "give
191     me the page, no matter what".
192    
193 root 1.42 =item on_header => $callback->($headers)
194 root 1.41
195     When specified, this callback will be called with the header hash as soon
196     as headers have been successfully received from the remote server (not on
197     locally-generated errors).
198    
199     It has to return either true (in which case AnyEvent::HTTP will continue),
200     or false, in which case AnyEvent::HTTP will cancel the download (and call
201     the finish callback with an error code of C<598>).
202    
203     This callback is useful, among other things, to quickly reject unwanted
204     content, which, if it is supposed to be rare, can be faster than first
205     doing a C<HEAD> request.
206    
207 root 1.42 Example: cancel the request unless the content-type is "text/html".
208 root 1.41
209 root 1.42 on_header => sub {
210     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
211     },
212 root 1.41
213 root 1.42 =item on_body => $callback->($partial_body, $headers)
214 root 1.41
215 root 1.42 When specified, all body data will be passed to this callback instead of
216     to the completion callback. The completion callback will get the empty
217     string instead of the body data.
218 root 1.41
219 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
220     or false, in which case AnyEvent::HTTP will cancel the download (and call
221     the completion callback with an error code of C<598>).
222    
223     This callback is useful when the data is too large to be held in memory
224     (so the callback writes it to a file) or when only some information should
225     be extracted, or when the body should be processed incrementally.
226 root 1.41
227     It is usually preferred over doing your own body handling via
228 root 1.45 C<want_body_handle>, but in case of streaming APIs, where HTTP is
229     only used to create a connection, C<want_body_handle> is the better
230     alternative, as it allows you to install your own event handler, reducing
231     resource usage.
232 root 1.41
233     =item want_body_handle => $enable
234    
235     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
236     changes considerably: after parsing the headers, and instead of
237     downloading the body (if any), the completion callback will be
238     called. Instead of the C<$body> argument containing the body data, the
239     callback will receive the L<AnyEvent::Handle> object associated with the
240     connection. In error cases, C<undef> will be passed. When there is no body
241     (e.g. status C<304>), the empty string will be passed.
242    
243     The handle object might or might not be in TLS mode, might be connected to
244     a proxy, be a persistent connection etc., and configured in unspecified
245     ways. The user is responsible for this handle (it will not be used by this
246     module anymore).
247    
248     This is useful with some push-type services, where, after the initial
249     headers, an interactive protocol is used (typical example would be the
250     push-style twitter API which starts a JSON/XML stream).
251    
252     If you think you need this, first have a look at C<on_body>, to see if
253 root 1.45 that doesn't solve your problem in a better way.
254 root 1.41
255 root 1.1 =back
256    
257 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
258    
259     http_request GET => "http://www.nethype.de/", sub {
260     my ($body, $hdr) = @_;
261     print "$body\n";
262     };
263    
264     Example: make a HTTP HEAD request on https://www.google.com/, use a
265     timeout of 30 seconds.
266    
267     http_request
268     GET => "https://www.google.com",
269     timeout => 30,
270     sub {
271     my ($body, $hdr) = @_;
272     use Data::Dumper;
273     print Dumper $hdr;
274     }
275     ;
276 root 1.1
277 root 1.29 Example: make another simple HTTP GET request, but immediately try to
278     cancel it.
279    
280     my $request = http_request GET => "http://www.nethype.de/", sub {
281     my ($body, $hdr) = @_;
282     print "$body\n";
283     };
284    
285     undef $request;
286    
287 root 1.1 =cut
288    
289 root 1.12 sub _slot_schedule;
290 root 1.11 sub _slot_schedule($) {
291     my $host = shift;
292    
293     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
294     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
295 root 1.12 # somebody wants that slot
296 root 1.11 ++$CO_SLOT{$host}[0];
297 root 1.14 ++$ACTIVE;
298 root 1.11
299     $cb->(AnyEvent::Util::guard {
300 root 1.14 --$ACTIVE;
301 root 1.11 --$CO_SLOT{$host}[0];
302     _slot_schedule $host;
303     });
304     } else {
305     # nobody wants the slot, maybe we can forget about it
306     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
307     last;
308     }
309     }
310     }
311    
312     # wait for a free slot on host, call callback
313     sub _get_slot($$) {
314     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
315    
316     _slot_schedule $_[0];
317     }
318    
319 root 1.46 our $qr_nl = qr{\015?\012};
320     our $qr_nlnl = qr{(?<![^\012])\015?\012};
321 root 1.34
322 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
323     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
324 root 1.40
325 elmex 1.15 sub http_request($$@) {
326 root 1.1 my $cb = pop;
327     my ($method, $url, %arg) = @_;
328    
329     my %hdr;
330    
331 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
332     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
333    
334 root 1.3 $method = uc $method;
335    
336 root 1.8 if (my $hdr = $arg{headers}) {
337 root 1.1 while (my ($k, $v) = each %$hdr) {
338     $hdr{lc $k} = $v;
339     }
340     }
341    
342 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
343 root 1.8
344 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
345 root 1.8 if $recurse < 0;
346    
347 root 1.2 my $proxy = $arg{proxy} || $PROXY;
348 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
349    
350 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
351 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
352 root 1.2
353 root 1.31 $uscheme = lc $uscheme;
354 root 1.1
355 root 1.31 my $uport = $uscheme eq "http" ? 80
356     : $uscheme eq "https" ? 443
357 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
358 root 1.13
359 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
360 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
361 root 1.10
362     my $uhost = $1;
363     $uport = $2 if defined $2;
364    
365 root 1.43 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost";
366    
367 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
368     $upath .= "?$query" if length $query;
369    
370     $upath =~ s%^/?%/%;
371    
372     # cookie processing
373     if (my $jar = $arg{cookie_jar}) {
374 root 1.31 %$jar = () if $jar->{version} != 1;
375 root 1.10
376     my @cookie;
377    
378     while (my ($chost, $v) = each %$jar) {
379 root 1.30 if ($chost =~ /^\./) {
380     next unless $chost eq substr $uhost, -length $chost;
381     } elsif ($chost =~ /\./) {
382     next unless $chost eq $uhost;
383     } else {
384     next;
385     }
386 root 1.10
387     while (my ($cpath, $v) = each %$v) {
388     next unless $cpath eq substr $upath, 0, length $cpath;
389    
390     while (my ($k, $v) = each %$v) {
391 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
392     my $value = $v->{value};
393     $value =~ s/([\\"])/\\$1/g;
394     push @cookie, "$k=\"$value\"";
395 root 1.10 }
396     }
397     }
398    
399     $hdr{cookie} = join "; ", @cookie
400     if @cookie;
401     }
402 root 1.1
403 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
404 root 1.2
405 root 1.10 if ($proxy) {
406 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
407 root 1.31
408 root 1.47 $rscheme = "http" unless defined $rscheme;
409    
410 root 1.31 # don't support https requests over https-proxy transport,
411 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
412 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
413 root 1.10 } else {
414 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
415 root 1.2 }
416    
417 root 1.47 # leave out fragment and query string, just a heuristic
418     $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer};
419     $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"};
420 root 1.41
421 root 1.10 $hdr{"content-length"} = length $arg{body};
422 root 1.1
423 root 1.11 my %state = (connect_guard => 1);
424    
425     _get_slot $uhost, sub {
426     $state{slot_guard} = shift;
427 root 1.1
428 root 1.11 return unless $state{connect_guard};
429 root 1.1
430 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
431     $state{fh} = shift
432 root 1.44 or do {
433     my $err = "$!";
434     %state = ();
435     return $cb->(undef, { Status => 599, Reason => $err, URL => $url });
436     };
437    
438 root 1.34 pop; # free memory, save a tree
439 root 1.11
440 root 1.34 return unless delete $state{connect_guard};
441 root 1.11
442     # get handle
443     $state{handle} = new AnyEvent::Handle
444 root 1.40 fh => $state{fh},
445     timeout => $timeout,
446     peername => $rhost,
447     tls_ctx => $arg{tls_ctx};
448 root 1.11
449     # limit the number of persistent connections
450 root 1.34 # keepalive not yet supported
451 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
452     ++$KA_COUNT{$_[1]};
453 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
454     --$KA_COUNT{$_[1]}
455     };
456 root 1.11 $hdr{connection} = "keep-alive";
457     } else {
458     delete $hdr{connection};
459     }
460 root 1.1
461 root 1.11 # (re-)configure handle
462     $state{handle}->on_error (sub {
463     %state = ();
464 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
465 root 1.11 });
466     $state{handle}->on_eof (sub {
467     %state = ();
468 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
469 root 1.11 });
470 root 1.1
471 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
472    
473     # handle actual, non-tunneled, request
474     my $handle_actual_request = sub {
475 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
476 root 1.31
477     # send request
478     $state{handle}->push_write (
479     "$method $rpath HTTP/1.0\015\012"
480 root 1.47 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
481 root 1.31 . "\015\012"
482     . (delete $arg{body})
483 root 1.11 );
484    
485 root 1.31 %hdr = (); # reduce memory usage, save a kitten
486 root 1.10
487 root 1.31 # status line
488 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
489 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
490 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
491 root 1.31
492     my %hdr = ( # response headers
493     HTTPVersion => ",$1",
494     Status => ",$2",
495     Reason => ",$3",
496     URL => ",$url"
497     );
498    
499     # headers, could be optimized a bit
500 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
501 root 1.46 for ("$_[1]") {
502 root 1.31 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
503    
504 root 1.40 # things seen, not parsed:
505     # p3pP="NON CUR OTPi OUR NOR UNI"
506    
507 root 1.31 $hdr{lc $1} .= ",$2"
508     while /\G
509 root 1.43 ([^:\000-\037]*):
510 root 1.31 [\011\040]*
511     ((?: [^\012]+ | \012[\011\040] )*)
512     \012
513     /gxc;
514    
515     /\G$/
516 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
517 root 1.31 }
518    
519     substr $_, 0, 1, ""
520     for values %hdr;
521    
522 root 1.41 # redirect handling
523     # microsoft and other shitheads don't give a shit for following standards,
524     # try to support some common forms of broken Location headers.
525     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
526     $hdr{location} =~ s/^\.\/+//;
527    
528     my $url = "$rscheme://$uhost:$uport";
529    
530     unless ($hdr{location} =~ s/^\///) {
531     $url .= $upath;
532     $url =~ s/\/[^\/]*$//;
533     }
534    
535     $hdr{location} = "$url/$hdr{location}";
536     }
537    
538     my $redirect;
539    
540     if ($recurse) {
541     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
542     # apparently, mozilla et al. just change POST to GET here
543     # more research is needed before we do the same
544     $redirect = 1;
545     } elsif ($hdr{Status} == 303) {
546     # even http/1.1 is unclear on how to mutate the method
547     $method = "GET" unless $method eq "HEAD";
548     $redirect = 1;
549     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
550     $redirect = 1;
551     }
552     }
553    
554 root 1.31 my $finish = sub {
555 root 1.41 $state{handle}->destroy if $state{handle};
556 root 1.31 %state = ();
557    
558     # set-cookie processing
559     if ($arg{cookie_jar}) {
560 root 1.41 for ($_[1]{"set-cookie"}) {
561 root 1.31 # parse NAME=VALUE
562     my @kv;
563    
564     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
565     my $name = $1;
566     my $value = $3;
567    
568     unless ($value) {
569     $value = $2;
570     $value =~ s/\\(.)/$1/gs;
571     }
572    
573     push @kv, $name => $value;
574    
575     last unless /\G\s*;/gc;
576     }
577    
578     last unless @kv;
579 root 1.10
580 root 1.31 my $name = shift @kv;
581     my %kv = (value => shift @kv, @kv);
582 root 1.11
583 root 1.31 my $cdom;
584     my $cpath = (delete $kv{path}) || "/";
585 root 1.10
586 root 1.31 if (exists $kv{domain}) {
587     $cdom = delete $kv{domain};
588    
589     $cdom =~ s/^\.?/./; # make sure it starts with a "."
590 root 1.11
591 root 1.31 next if $cdom =~ /\.$/;
592    
593     # this is not rfc-like and not netscape-like. go figure.
594     my $ndots = $cdom =~ y/.//;
595     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
596     } else {
597     $cdom = $uhost;
598     }
599 root 1.30
600 root 1.31 # store it
601     $arg{cookie_jar}{version} = 1;
602     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
603    
604     redo if /\G\s*,/gc;
605 root 1.30 }
606 root 1.11 }
607 root 1.8
608 root 1.47 if ($redirect && exists $hdr{location}) {
609 root 1.41 # we ignore any errors, as it is very common to receive
610     # Content-Length != 0 but no actual body
611     # we also access %hdr, as $_[1] might be an erro
612     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
613 root 1.31 } else {
614     $cb->($_[0], $_[1]);
615     }
616     };
617 root 1.24
618 root 1.41 my $len = $hdr{"content-length"};
619    
620     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
621     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
622     } elsif (
623 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
624 root 1.41 or $method eq "HEAD"
625     or (defined $len && !$len)
626     ) {
627     # no body
628     $finish->("", \%hdr);
629 root 1.11 } else {
630 root 1.41 # body handling, four different code paths
631     # for want_body_handle, on_body (2x), normal (2x)
632     # we might read too much here, but it does not matter yet (no pers. connections)
633     if (!$redirect && $arg{want_body_handle}) {
634     $_[0]->on_eof (undef);
635     $_[0]->on_error (undef);
636     $_[0]->on_read (undef);
637    
638     $finish->(delete $state{handle}, \%hdr);
639 root 1.31
640 root 1.41 } elsif ($arg{on_body}) {
641     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
642     if ($len) {
643     $_[0]->on_eof (undef);
644     $_[0]->on_read (sub {
645     $len -= length $_[0]{rbuf};
646    
647     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
648     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
649    
650     $len > 0
651     or $finish->("", \%hdr);
652     });
653     } else {
654     $_[0]->on_eof (sub {
655     $finish->("", \%hdr);
656     });
657     $_[0]->on_read (sub {
658     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
659     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
660     });
661     }
662 root 1.31 } else {
663     $_[0]->on_eof (undef);
664 root 1.41
665     if ($len) {
666     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
667     $_[0]->on_read (sub {
668     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
669     if $len <= length $_[0]{rbuf};
670     });
671     } else {
672     $_[0]->on_error (sub {
673 root 1.43 $! == Errno::EPIPE || !$!
674 root 1.41 ? $finish->(delete $_[0]{rbuf}, \%hdr)
675     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
676     });
677     $_[0]->on_read (sub { });
678     }
679 root 1.31 }
680 root 1.11 }
681 root 1.31 });
682     });
683     };
684 root 1.3
685 root 1.31 # now handle proxy-CONNECT method
686     if ($proxy && $uscheme eq "https") {
687     # oh dear, we have to wrap it into a connect request
688    
689     # maybe re-use $uauthority with patched port?
690     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
691 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
692 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
693 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
694 root 1.31
695     if ($2 == 200) {
696     $rpath = $upath;
697     &$handle_actual_request;
698 root 1.3 } else {
699 root 1.31 %state = ();
700     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
701 root 1.3 }
702 root 1.11 });
703 root 1.31 } else {
704     &$handle_actual_request;
705     }
706    
707 root 1.11 }, sub {
708     $timeout
709     };
710 root 1.1 };
711    
712     defined wantarray && AnyEvent::Util::guard { %state = () }
713     }
714    
715 elmex 1.15 sub http_get($@) {
716 root 1.1 unshift @_, "GET";
717     &http_request
718     }
719    
720 elmex 1.15 sub http_head($@) {
721 root 1.4 unshift @_, "HEAD";
722     &http_request
723     }
724    
725 elmex 1.15 sub http_post($$@) {
726 root 1.22 my $url = shift;
727     unshift @_, "POST", $url, "body";
728 root 1.3 &http_request
729     }
730    
731 root 1.9 =back
732    
733 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
734 root 1.1
735     =over 4
736    
737 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
738    
739     Sets the default proxy server to use. The proxy-url must begin with a
740     string of the form C<http://host:port> (optionally C<https:...>).
741    
742 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
743 root 1.1
744 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
745 root 1.1
746     =item $AnyEvent::HTTP::USERAGENT
747    
748     The default value for the C<User-Agent> header (the default is
749 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
750 root 1.1
751 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
752 root 1.1
753 root 1.47 The maximum number of concurrent connections to the same host (identified
754 root 1.43 by the hostname). If the limit is exceeded, then the additional requests
755     are queued until previous connections are closed.
756 root 1.1
757 root 1.43 The default value for this is C<4>, and it is highly advisable to not
758     increase it.
759 root 1.3
760 root 1.14 =item $AnyEvent::HTTP::ACTIVE
761    
762     The number of active connections. This is not the number of currently
763     running requests, but the number of currently open and non-idle TCP
764     connections. This number of can be useful for load-leveling.
765    
766 root 1.1 =back
767    
768     =cut
769    
770 root 1.2 sub set_proxy($) {
771     $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
772     }
773    
774     # initialise proxy from environment
775     set_proxy $ENV{http_proxy};
776    
777 root 1.1 =head1 SEE ALSO
778    
779     L<AnyEvent>.
780    
781     =head1 AUTHOR
782    
783 root 1.18 Marc Lehmann <schmorp@schmorp.de>
784     http://home.schmorp.de/
785 root 1.1
786 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
787     testcases and bugreports.
788    
789 root 1.1 =cut
790    
791     1
792