ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.44
Committed: Tue Jul 7 00:15:32 2009 UTC (14 years, 10 months ago) by root
Branch: MAIN
CVS Tags: rel-1_4
Changes since 1.43: +8 -3 lines
Log Message:
1.4

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.44 use AnyEvent 4.8 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.44 our $VERSION = '1.4';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143     and will provide defaults for C<User-Agent:> and C<Referer:>.
144 root 1.1
145     =item timeout => $seconds
146    
147     The time-out to use for various stages - each connect attempt will reset
148 root 1.2 the timeout, as will read or write activity. Default timeout is 5 minutes.
149    
150     =item proxy => [$host, $port[, $scheme]] or undef
151    
152     Use the given http proxy for all requests. If not specified, then the
153     default proxy (as specified by C<$ENV{http_proxy}>) is used.
154    
155     C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
156     HTTPS.
157 root 1.1
158 root 1.3 =item body => $string
159    
160     The request body, usually empty. Will be-sent as-is (future versions of
161     this module might offer more options).
162    
163 root 1.10 =item cookie_jar => $hash_ref
164    
165     Passing this parameter enables (simplified) cookie-processing, loosely
166     based on the original netscape specification.
167    
168     The C<$hash_ref> must be an (initially empty) hash reference which will
169     get updated automatically. It is possible to save the cookie_jar to
170     persistent storage with something like JSON or Storable, but this is not
171 root 1.40 recommended, as expiry times are currently being ignored.
172 root 1.10
173     Note that this cookie implementation is not of very high quality, nor
174     meant to be complete. If you want complete cookie management you have to
175     do that on your own. C<cookie_jar> is meant as a quick fix to get some
176     cookie-using sites working. Cookies are a privacy disaster, do not use
177     them unless required to.
178    
179 root 1.40 =item tls_ctx => $scheme | $tls_ctx
180    
181     Specifies the AnyEvent::TLS context to be used for https connections. This
182     parameter follows the same rules as the C<tls_ctx> parameter to
183     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
184     C<high> can be specified, which give you a predefined low-security (no
185     verification, highest compatibility) and high-security (CA and common-name
186     verification) TLS context.
187    
188     The default for this option is C<low>, which could be interpreted as "give
189     me the page, no matter what".
190    
191 root 1.42 =item on_header => $callback->($headers)
192 root 1.41
193     When specified, this callback will be called with the header hash as soon
194     as headers have been successfully received from the remote server (not on
195     locally-generated errors).
196    
197     It has to return either true (in which case AnyEvent::HTTP will continue),
198     or false, in which case AnyEvent::HTTP will cancel the download (and call
199     the finish callback with an error code of C<598>).
200    
201     This callback is useful, among other things, to quickly reject unwanted
202     content, which, if it is supposed to be rare, can be faster than first
203     doing a C<HEAD> request.
204    
205 root 1.42 Example: cancel the request unless the content-type is "text/html".
206 root 1.41
207 root 1.42 on_header => sub {
208     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
209     },
210 root 1.41
211 root 1.42 =item on_body => $callback->($partial_body, $headers)
212 root 1.41
213 root 1.42 When specified, all body data will be passed to this callback instead of
214     to the completion callback. The completion callback will get the empty
215     string instead of the body data.
216 root 1.41
217 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
218     or false, in which case AnyEvent::HTTP will cancel the download (and call
219     the completion callback with an error code of C<598>).
220    
221     This callback is useful when the data is too large to be held in memory
222     (so the callback writes it to a file) or when only some information should
223     be extracted, or when the body should be processed incrementally.
224 root 1.41
225     It is usually preferred over doing your own body handling via
226     C<want_body_handle>.
227    
228     =item want_body_handle => $enable
229    
230     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
231     changes considerably: after parsing the headers, and instead of
232     downloading the body (if any), the completion callback will be
233     called. Instead of the C<$body> argument containing the body data, the
234     callback will receive the L<AnyEvent::Handle> object associated with the
235     connection. In error cases, C<undef> will be passed. When there is no body
236     (e.g. status C<304>), the empty string will be passed.
237    
238     The handle object might or might not be in TLS mode, might be connected to
239     a proxy, be a persistent connection etc., and configured in unspecified
240     ways. The user is responsible for this handle (it will not be used by this
241     module anymore).
242    
243     This is useful with some push-type services, where, after the initial
244     headers, an interactive protocol is used (typical example would be the
245     push-style twitter API which starts a JSON/XML stream).
246    
247     If you think you need this, first have a look at C<on_body>, to see if
248     that doesn'T solve your problem in a better way.
249    
250 root 1.1 =back
251    
252 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
253    
254     http_request GET => "http://www.nethype.de/", sub {
255     my ($body, $hdr) = @_;
256     print "$body\n";
257     };
258    
259     Example: make a HTTP HEAD request on https://www.google.com/, use a
260     timeout of 30 seconds.
261    
262     http_request
263     GET => "https://www.google.com",
264     timeout => 30,
265     sub {
266     my ($body, $hdr) = @_;
267     use Data::Dumper;
268     print Dumper $hdr;
269     }
270     ;
271 root 1.1
272 root 1.29 Example: make another simple HTTP GET request, but immediately try to
273     cancel it.
274    
275     my $request = http_request GET => "http://www.nethype.de/", sub {
276     my ($body, $hdr) = @_;
277     print "$body\n";
278     };
279    
280     undef $request;
281    
282 root 1.1 =cut
283    
284 root 1.12 sub _slot_schedule;
285 root 1.11 sub _slot_schedule($) {
286     my $host = shift;
287    
288     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
289     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
290 root 1.12 # somebody wants that slot
291 root 1.11 ++$CO_SLOT{$host}[0];
292 root 1.14 ++$ACTIVE;
293 root 1.11
294     $cb->(AnyEvent::Util::guard {
295 root 1.14 --$ACTIVE;
296 root 1.11 --$CO_SLOT{$host}[0];
297     _slot_schedule $host;
298     });
299     } else {
300     # nobody wants the slot, maybe we can forget about it
301     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
302     last;
303     }
304     }
305     }
306    
307     # wait for a free slot on host, call callback
308     sub _get_slot($$) {
309     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
310    
311     _slot_schedule $_[0];
312     }
313    
314 root 1.34 our $qr_nl = qr<\015?\012>;
315     our $qr_nlnl = qr<\015?\012\015?\012>;
316    
317 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
318     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
319 root 1.40
320 elmex 1.15 sub http_request($$@) {
321 root 1.1 my $cb = pop;
322     my ($method, $url, %arg) = @_;
323    
324     my %hdr;
325    
326 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
327     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
328    
329 root 1.3 $method = uc $method;
330    
331 root 1.8 if (my $hdr = $arg{headers}) {
332 root 1.1 while (my ($k, $v) = each %$hdr) {
333     $hdr{lc $k} = $v;
334     }
335     }
336    
337 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
338 root 1.8
339 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
340 root 1.8 if $recurse < 0;
341    
342 root 1.2 my $proxy = $arg{proxy} || $PROXY;
343 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
344    
345 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
346 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
347 root 1.2
348 root 1.31 $uscheme = lc $uscheme;
349 root 1.1
350 root 1.31 my $uport = $uscheme eq "http" ? 80
351     : $uscheme eq "https" ? 443
352 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
353 root 1.13
354 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
355 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
356 root 1.10
357     my $uhost = $1;
358     $uport = $2 if defined $2;
359    
360 root 1.43 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost";
361    
362 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
363     $upath .= "?$query" if length $query;
364    
365     $upath =~ s%^/?%/%;
366    
367     # cookie processing
368     if (my $jar = $arg{cookie_jar}) {
369 root 1.31 %$jar = () if $jar->{version} != 1;
370 root 1.10
371     my @cookie;
372    
373     while (my ($chost, $v) = each %$jar) {
374 root 1.30 if ($chost =~ /^\./) {
375     next unless $chost eq substr $uhost, -length $chost;
376     } elsif ($chost =~ /\./) {
377     next unless $chost eq $uhost;
378     } else {
379     next;
380     }
381 root 1.10
382     while (my ($cpath, $v) = each %$v) {
383     next unless $cpath eq substr $upath, 0, length $cpath;
384    
385     while (my ($k, $v) = each %$v) {
386 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
387     my $value = $v->{value};
388     $value =~ s/([\\"])/\\$1/g;
389     push @cookie, "$k=\"$value\"";
390 root 1.10 }
391     }
392     }
393    
394     $hdr{cookie} = join "; ", @cookie
395     if @cookie;
396     }
397 root 1.1
398 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
399 root 1.2
400 root 1.10 if ($proxy) {
401 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
402 root 1.31
403     # don't support https requests over https-proxy transport,
404 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
405 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
406 root 1.10 } else {
407 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
408 root 1.2 }
409    
410 root 1.41 $hdr{"user-agent"} ||= $USERAGENT;
411     $hdr{referer} ||= "$uscheme://$uauthority$upath"; # leave out fragment and query string, just a heuristic
412    
413 root 1.10 $hdr{"content-length"} = length $arg{body};
414 root 1.1
415 root 1.11 my %state = (connect_guard => 1);
416    
417     _get_slot $uhost, sub {
418     $state{slot_guard} = shift;
419 root 1.1
420 root 1.11 return unless $state{connect_guard};
421 root 1.1
422 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
423     $state{fh} = shift
424 root 1.44 or do {
425     my $err = "$!";
426     %state = ();
427     return $cb->(undef, { Status => 599, Reason => $err, URL => $url });
428     };
429    
430 root 1.34 pop; # free memory, save a tree
431 root 1.11
432 root 1.34 return unless delete $state{connect_guard};
433 root 1.11
434     # get handle
435     $state{handle} = new AnyEvent::Handle
436 root 1.40 fh => $state{fh},
437     timeout => $timeout,
438     peername => $rhost,
439     tls_ctx => $arg{tls_ctx};
440 root 1.11
441     # limit the number of persistent connections
442 root 1.34 # keepalive not yet supported
443 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
444     ++$KA_COUNT{$_[1]};
445 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
446     --$KA_COUNT{$_[1]}
447     };
448 root 1.11 $hdr{connection} = "keep-alive";
449     } else {
450     delete $hdr{connection};
451     }
452 root 1.1
453 root 1.11 # (re-)configure handle
454     $state{handle}->on_error (sub {
455     %state = ();
456 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
457 root 1.11 });
458     $state{handle}->on_eof (sub {
459     %state = ();
460 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
461 root 1.11 });
462 root 1.1
463 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
464    
465     # handle actual, non-tunneled, request
466     my $handle_actual_request = sub {
467 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
468 root 1.31
469     # send request
470     $state{handle}->push_write (
471     "$method $rpath HTTP/1.0\015\012"
472     . (join "", map "\u$_: $hdr{$_}\015\012", keys %hdr)
473     . "\015\012"
474     . (delete $arg{body})
475 root 1.11 );
476    
477 root 1.31 %hdr = (); # reduce memory usage, save a kitten
478 root 1.10
479 root 1.31 # status line
480 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
481 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
482 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
483 root 1.31
484     my %hdr = ( # response headers
485     HTTPVersion => ",$1",
486     Status => ",$2",
487     Reason => ",$3",
488     URL => ",$url"
489     );
490    
491     # headers, could be optimized a bit
492 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
493 root 1.31 for ("$_[1]\012") {
494     y/\015//d; # weed out any \015, as they show up in the weirdest of places.
495    
496 root 1.40 # things seen, not parsed:
497     # p3pP="NON CUR OTPi OUR NOR UNI"
498    
499 root 1.31 $hdr{lc $1} .= ",$2"
500     while /\G
501 root 1.43 ([^:\000-\037]*):
502 root 1.31 [\011\040]*
503     ((?: [^\012]+ | \012[\011\040] )*)
504     \012
505     /gxc;
506    
507     /\G$/
508 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
509 root 1.31 }
510    
511     substr $_, 0, 1, ""
512     for values %hdr;
513    
514 root 1.41 # redirect handling
515     # microsoft and other shitheads don't give a shit for following standards,
516     # try to support some common forms of broken Location headers.
517     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
518     $hdr{location} =~ s/^\.\/+//;
519    
520     my $url = "$rscheme://$uhost:$uport";
521    
522     unless ($hdr{location} =~ s/^\///) {
523     $url .= $upath;
524     $url =~ s/\/[^\/]*$//;
525     }
526    
527     $hdr{location} = "$url/$hdr{location}";
528     }
529    
530     my $redirect;
531    
532     if ($recurse) {
533     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
534     # apparently, mozilla et al. just change POST to GET here
535     # more research is needed before we do the same
536     $redirect = 1;
537     } elsif ($hdr{Status} == 303) {
538     # even http/1.1 is unclear on how to mutate the method
539     $method = "GET" unless $method eq "HEAD";
540     $redirect = 1;
541     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
542     $redirect = 1;
543     }
544     }
545    
546 root 1.31 my $finish = sub {
547 root 1.41 $state{handle}->destroy if $state{handle};
548 root 1.31 %state = ();
549    
550     # set-cookie processing
551     if ($arg{cookie_jar}) {
552 root 1.41 for ($_[1]{"set-cookie"}) {
553 root 1.31 # parse NAME=VALUE
554     my @kv;
555    
556     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
557     my $name = $1;
558     my $value = $3;
559    
560     unless ($value) {
561     $value = $2;
562     $value =~ s/\\(.)/$1/gs;
563     }
564    
565     push @kv, $name => $value;
566    
567     last unless /\G\s*;/gc;
568     }
569    
570     last unless @kv;
571 root 1.10
572 root 1.31 my $name = shift @kv;
573     my %kv = (value => shift @kv, @kv);
574 root 1.11
575 root 1.31 my $cdom;
576     my $cpath = (delete $kv{path}) || "/";
577 root 1.10
578 root 1.31 if (exists $kv{domain}) {
579     $cdom = delete $kv{domain};
580    
581     $cdom =~ s/^\.?/./; # make sure it starts with a "."
582 root 1.11
583 root 1.31 next if $cdom =~ /\.$/;
584    
585     # this is not rfc-like and not netscape-like. go figure.
586     my $ndots = $cdom =~ y/.//;
587     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
588     } else {
589     $cdom = $uhost;
590     }
591 root 1.30
592 root 1.31 # store it
593     $arg{cookie_jar}{version} = 1;
594     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
595    
596     redo if /\G\s*,/gc;
597 root 1.30 }
598 root 1.11 }
599 root 1.8
600 root 1.41 if ($redirect) {
601     # we ignore any errors, as it is very common to receive
602     # Content-Length != 0 but no actual body
603     # we also access %hdr, as $_[1] might be an erro
604     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
605 root 1.31 } else {
606     $cb->($_[0], $_[1]);
607     }
608     };
609 root 1.24
610 root 1.41 my $len = $hdr{"content-length"};
611    
612     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
613     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
614     } elsif (
615 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
616 root 1.41 or $method eq "HEAD"
617     or (defined $len && !$len)
618     ) {
619     # no body
620     $finish->("", \%hdr);
621 root 1.11 } else {
622 root 1.41 # body handling, four different code paths
623     # for want_body_handle, on_body (2x), normal (2x)
624     # we might read too much here, but it does not matter yet (no pers. connections)
625     if (!$redirect && $arg{want_body_handle}) {
626     $_[0]->on_eof (undef);
627     $_[0]->on_error (undef);
628     $_[0]->on_read (undef);
629    
630     $finish->(delete $state{handle}, \%hdr);
631 root 1.31
632 root 1.41 } elsif ($arg{on_body}) {
633     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
634     if ($len) {
635     $_[0]->on_eof (undef);
636     $_[0]->on_read (sub {
637     $len -= length $_[0]{rbuf};
638    
639     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
640     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
641    
642     $len > 0
643     or $finish->("", \%hdr);
644     });
645     } else {
646     $_[0]->on_eof (sub {
647     $finish->("", \%hdr);
648     });
649     $_[0]->on_read (sub {
650     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
651     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
652     });
653     }
654 root 1.31 } else {
655     $_[0]->on_eof (undef);
656 root 1.41
657     if ($len) {
658     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
659     $_[0]->on_read (sub {
660     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
661     if $len <= length $_[0]{rbuf};
662     });
663     } else {
664     $_[0]->on_error (sub {
665 root 1.43 $! == Errno::EPIPE || !$!
666 root 1.41 ? $finish->(delete $_[0]{rbuf}, \%hdr)
667     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
668     });
669     $_[0]->on_read (sub { });
670     }
671 root 1.31 }
672 root 1.11 }
673 root 1.31 });
674     });
675     };
676 root 1.3
677 root 1.31 # now handle proxy-CONNECT method
678     if ($proxy && $uscheme eq "https") {
679     # oh dear, we have to wrap it into a connect request
680    
681     # maybe re-use $uauthority with patched port?
682     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
683 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
684 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
685 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
686 root 1.31
687     if ($2 == 200) {
688     $rpath = $upath;
689     &$handle_actual_request;
690 root 1.3 } else {
691 root 1.31 %state = ();
692     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
693 root 1.3 }
694 root 1.11 });
695 root 1.31 } else {
696     &$handle_actual_request;
697     }
698    
699 root 1.11 }, sub {
700     $timeout
701     };
702 root 1.1 };
703    
704     defined wantarray && AnyEvent::Util::guard { %state = () }
705     }
706    
707 elmex 1.15 sub http_get($@) {
708 root 1.1 unshift @_, "GET";
709     &http_request
710     }
711    
712 elmex 1.15 sub http_head($@) {
713 root 1.4 unshift @_, "HEAD";
714     &http_request
715     }
716    
717 elmex 1.15 sub http_post($$@) {
718 root 1.22 my $url = shift;
719     unshift @_, "POST", $url, "body";
720 root 1.3 &http_request
721     }
722    
723 root 1.9 =back
724    
725 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
726 root 1.1
727     =over 4
728    
729 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
730    
731     Sets the default proxy server to use. The proxy-url must begin with a
732     string of the form C<http://host:port> (optionally C<https:...>).
733    
734 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
735 root 1.1
736 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
737 root 1.1
738     =item $AnyEvent::HTTP::USERAGENT
739    
740     The default value for the C<User-Agent> header (the default is
741 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
742 root 1.1
743 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
744 root 1.1
745 root 1.43 The maximum number of concurrent conenctions to the same host (identified
746     by the hostname). If the limit is exceeded, then the additional requests
747     are queued until previous connections are closed.
748 root 1.1
749 root 1.43 The default value for this is C<4>, and it is highly advisable to not
750     increase it.
751 root 1.3
752 root 1.14 =item $AnyEvent::HTTP::ACTIVE
753    
754     The number of active connections. This is not the number of currently
755     running requests, but the number of currently open and non-idle TCP
756     connections. This number of can be useful for load-leveling.
757    
758 root 1.1 =back
759    
760     =cut
761    
762 root 1.2 sub set_proxy($) {
763     $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
764     }
765    
766     # initialise proxy from environment
767     set_proxy $ENV{http_proxy};
768    
769 root 1.1 =head1 SEE ALSO
770    
771     L<AnyEvent>.
772    
773     =head1 AUTHOR
774    
775 root 1.18 Marc Lehmann <schmorp@schmorp.de>
776     http://home.schmorp.de/
777 root 1.1
778 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
779     testcases and bugreports.
780    
781 root 1.1 =cut
782    
783     1
784