ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.43
Committed: Mon Jul 6 03:03:12 2009 UTC (14 years, 10 months ago) by root
Branch: MAIN
Changes since 1.42: +11 -13 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.40 use AnyEvent 4.452 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.39 our $VERSION = '1.12';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143     and will provide defaults for C<User-Agent:> and C<Referer:>.
144 root 1.1
145     =item timeout => $seconds
146    
147     The time-out to use for various stages - each connect attempt will reset
148 root 1.2 the timeout, as will read or write activity. Default timeout is 5 minutes.
149    
150     =item proxy => [$host, $port[, $scheme]] or undef
151    
152     Use the given http proxy for all requests. If not specified, then the
153     default proxy (as specified by C<$ENV{http_proxy}>) is used.
154    
155     C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
156     HTTPS.
157 root 1.1
158 root 1.3 =item body => $string
159    
160     The request body, usually empty. Will be-sent as-is (future versions of
161     this module might offer more options).
162    
163 root 1.10 =item cookie_jar => $hash_ref
164    
165     Passing this parameter enables (simplified) cookie-processing, loosely
166     based on the original netscape specification.
167    
168     The C<$hash_ref> must be an (initially empty) hash reference which will
169     get updated automatically. It is possible to save the cookie_jar to
170     persistent storage with something like JSON or Storable, but this is not
171 root 1.40 recommended, as expiry times are currently being ignored.
172 root 1.10
173     Note that this cookie implementation is not of very high quality, nor
174     meant to be complete. If you want complete cookie management you have to
175     do that on your own. C<cookie_jar> is meant as a quick fix to get some
176     cookie-using sites working. Cookies are a privacy disaster, do not use
177     them unless required to.
178    
179 root 1.40 =item tls_ctx => $scheme | $tls_ctx
180    
181     Specifies the AnyEvent::TLS context to be used for https connections. This
182     parameter follows the same rules as the C<tls_ctx> parameter to
183     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
184     C<high> can be specified, which give you a predefined low-security (no
185     verification, highest compatibility) and high-security (CA and common-name
186     verification) TLS context.
187    
188     The default for this option is C<low>, which could be interpreted as "give
189     me the page, no matter what".
190    
191 root 1.42 =item on_header => $callback->($headers)
192 root 1.41
193     When specified, this callback will be called with the header hash as soon
194     as headers have been successfully received from the remote server (not on
195     locally-generated errors).
196    
197     It has to return either true (in which case AnyEvent::HTTP will continue),
198     or false, in which case AnyEvent::HTTP will cancel the download (and call
199     the finish callback with an error code of C<598>).
200    
201     This callback is useful, among other things, to quickly reject unwanted
202     content, which, if it is supposed to be rare, can be faster than first
203     doing a C<HEAD> request.
204    
205 root 1.42 Example: cancel the request unless the content-type is "text/html".
206 root 1.41
207 root 1.42 on_header => sub {
208     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
209     },
210 root 1.41
211 root 1.42 =item on_body => $callback->($partial_body, $headers)
212 root 1.41
213 root 1.42 When specified, all body data will be passed to this callback instead of
214     to the completion callback. The completion callback will get the empty
215     string instead of the body data.
216 root 1.41
217 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
218     or false, in which case AnyEvent::HTTP will cancel the download (and call
219     the completion callback with an error code of C<598>).
220    
221     This callback is useful when the data is too large to be held in memory
222     (so the callback writes it to a file) or when only some information should
223     be extracted, or when the body should be processed incrementally.
224 root 1.41
225     It is usually preferred over doing your own body handling via
226     C<want_body_handle>.
227    
228     =item want_body_handle => $enable
229    
230     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
231     changes considerably: after parsing the headers, and instead of
232     downloading the body (if any), the completion callback will be
233     called. Instead of the C<$body> argument containing the body data, the
234     callback will receive the L<AnyEvent::Handle> object associated with the
235     connection. In error cases, C<undef> will be passed. When there is no body
236     (e.g. status C<304>), the empty string will be passed.
237    
238     The handle object might or might not be in TLS mode, might be connected to
239     a proxy, be a persistent connection etc., and configured in unspecified
240     ways. The user is responsible for this handle (it will not be used by this
241     module anymore).
242    
243     This is useful with some push-type services, where, after the initial
244     headers, an interactive protocol is used (typical example would be the
245     push-style twitter API which starts a JSON/XML stream).
246    
247     If you think you need this, first have a look at C<on_body>, to see if
248     that doesn'T solve your problem in a better way.
249    
250 root 1.1 =back
251    
252 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
253    
254     http_request GET => "http://www.nethype.de/", sub {
255     my ($body, $hdr) = @_;
256     print "$body\n";
257     };
258    
259     Example: make a HTTP HEAD request on https://www.google.com/, use a
260     timeout of 30 seconds.
261    
262     http_request
263     GET => "https://www.google.com",
264     timeout => 30,
265     sub {
266     my ($body, $hdr) = @_;
267     use Data::Dumper;
268     print Dumper $hdr;
269     }
270     ;
271 root 1.1
272 root 1.29 Example: make another simple HTTP GET request, but immediately try to
273     cancel it.
274    
275     my $request = http_request GET => "http://www.nethype.de/", sub {
276     my ($body, $hdr) = @_;
277     print "$body\n";
278     };
279    
280     undef $request;
281    
282 root 1.1 =cut
283    
284 root 1.12 sub _slot_schedule;
285 root 1.11 sub _slot_schedule($) {
286     my $host = shift;
287    
288     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
289     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
290 root 1.12 # somebody wants that slot
291 root 1.11 ++$CO_SLOT{$host}[0];
292 root 1.14 ++$ACTIVE;
293 root 1.11
294     $cb->(AnyEvent::Util::guard {
295 root 1.14 --$ACTIVE;
296 root 1.11 --$CO_SLOT{$host}[0];
297     _slot_schedule $host;
298     });
299     } else {
300     # nobody wants the slot, maybe we can forget about it
301     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
302     last;
303     }
304     }
305     }
306    
307     # wait for a free slot on host, call callback
308     sub _get_slot($$) {
309     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
310    
311     _slot_schedule $_[0];
312     }
313    
314 root 1.34 our $qr_nl = qr<\015?\012>;
315     our $qr_nlnl = qr<\015?\012\015?\012>;
316    
317 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
318     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
319 root 1.40
320 elmex 1.15 sub http_request($$@) {
321 root 1.1 my $cb = pop;
322     my ($method, $url, %arg) = @_;
323    
324     my %hdr;
325    
326 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
327     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
328    
329 root 1.3 $method = uc $method;
330    
331 root 1.8 if (my $hdr = $arg{headers}) {
332 root 1.1 while (my ($k, $v) = each %$hdr) {
333     $hdr{lc $k} = $v;
334     }
335     }
336    
337 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
338 root 1.8
339 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
340 root 1.8 if $recurse < 0;
341    
342 root 1.2 my $proxy = $arg{proxy} || $PROXY;
343 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
344    
345 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
346 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
347 root 1.2
348 root 1.31 $uscheme = lc $uscheme;
349 root 1.1
350 root 1.31 my $uport = $uscheme eq "http" ? 80
351     : $uscheme eq "https" ? 443
352 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
353 root 1.13
354 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
355 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
356 root 1.10
357     my $uhost = $1;
358     $uport = $2 if defined $2;
359    
360 root 1.43 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost";
361    
362 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
363     $upath .= "?$query" if length $query;
364    
365     $upath =~ s%^/?%/%;
366    
367     # cookie processing
368     if (my $jar = $arg{cookie_jar}) {
369 root 1.31 %$jar = () if $jar->{version} != 1;
370 root 1.10
371     my @cookie;
372    
373     while (my ($chost, $v) = each %$jar) {
374 root 1.30 if ($chost =~ /^\./) {
375     next unless $chost eq substr $uhost, -length $chost;
376     } elsif ($chost =~ /\./) {
377     next unless $chost eq $uhost;
378     } else {
379     next;
380     }
381 root 1.10
382     while (my ($cpath, $v) = each %$v) {
383     next unless $cpath eq substr $upath, 0, length $cpath;
384    
385     while (my ($k, $v) = each %$v) {
386 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
387     my $value = $v->{value};
388     $value =~ s/([\\"])/\\$1/g;
389     push @cookie, "$k=\"$value\"";
390 root 1.10 }
391     }
392     }
393    
394     $hdr{cookie} = join "; ", @cookie
395     if @cookie;
396     }
397 root 1.1
398 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
399 root 1.2
400 root 1.10 if ($proxy) {
401 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
402 root 1.31
403     # don't support https requests over https-proxy transport,
404 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
405 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
406 root 1.10 } else {
407 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
408 root 1.2 }
409    
410 root 1.41 $hdr{"user-agent"} ||= $USERAGENT;
411     $hdr{referer} ||= "$uscheme://$uauthority$upath"; # leave out fragment and query string, just a heuristic
412    
413 root 1.10 $hdr{"content-length"} = length $arg{body};
414 root 1.1
415 root 1.11 my %state = (connect_guard => 1);
416    
417     _get_slot $uhost, sub {
418     $state{slot_guard} = shift;
419 root 1.1
420 root 1.11 return unless $state{connect_guard};
421 root 1.1
422 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
423     $state{fh} = shift
424 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "$!", URL => $url }));
425 root 1.34 pop; # free memory, save a tree
426 root 1.11
427 root 1.34 return unless delete $state{connect_guard};
428 root 1.11
429     # get handle
430     $state{handle} = new AnyEvent::Handle
431 root 1.40 fh => $state{fh},
432     timeout => $timeout,
433     peername => $rhost,
434     tls_ctx => $arg{tls_ctx};
435 root 1.11
436     # limit the number of persistent connections
437 root 1.34 # keepalive not yet supported
438 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
439     ++$KA_COUNT{$_[1]};
440 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
441     --$KA_COUNT{$_[1]}
442     };
443 root 1.11 $hdr{connection} = "keep-alive";
444     } else {
445     delete $hdr{connection};
446     }
447 root 1.1
448 root 1.11 # (re-)configure handle
449     $state{handle}->on_error (sub {
450     %state = ();
451 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
452 root 1.11 });
453     $state{handle}->on_eof (sub {
454     %state = ();
455 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
456 root 1.11 });
457 root 1.1
458 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
459    
460     # handle actual, non-tunneled, request
461     my $handle_actual_request = sub {
462 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
463 root 1.31
464     # send request
465     $state{handle}->push_write (
466     "$method $rpath HTTP/1.0\015\012"
467     . (join "", map "\u$_: $hdr{$_}\015\012", keys %hdr)
468     . "\015\012"
469     . (delete $arg{body})
470 root 1.11 );
471    
472 root 1.31 %hdr = (); # reduce memory usage, save a kitten
473 root 1.10
474 root 1.31 # status line
475 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
476 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
477 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
478 root 1.31
479     my %hdr = ( # response headers
480     HTTPVersion => ",$1",
481     Status => ",$2",
482     Reason => ",$3",
483     URL => ",$url"
484     );
485    
486     # headers, could be optimized a bit
487 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
488 root 1.31 for ("$_[1]\012") {
489     y/\015//d; # weed out any \015, as they show up in the weirdest of places.
490    
491 root 1.40 # things seen, not parsed:
492     # p3pP="NON CUR OTPi OUR NOR UNI"
493    
494 root 1.31 $hdr{lc $1} .= ",$2"
495     while /\G
496 root 1.43 ([^:\000-\037]*):
497 root 1.31 [\011\040]*
498     ((?: [^\012]+ | \012[\011\040] )*)
499     \012
500     /gxc;
501    
502     /\G$/
503 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
504 root 1.31 }
505    
506     substr $_, 0, 1, ""
507     for values %hdr;
508    
509 root 1.41 # redirect handling
510     # microsoft and other shitheads don't give a shit for following standards,
511     # try to support some common forms of broken Location headers.
512     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
513     $hdr{location} =~ s/^\.\/+//;
514    
515     my $url = "$rscheme://$uhost:$uport";
516    
517     unless ($hdr{location} =~ s/^\///) {
518     $url .= $upath;
519     $url =~ s/\/[^\/]*$//;
520     }
521    
522     $hdr{location} = "$url/$hdr{location}";
523     }
524    
525     my $redirect;
526    
527     if ($recurse) {
528     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
529     # apparently, mozilla et al. just change POST to GET here
530     # more research is needed before we do the same
531     $redirect = 1;
532     } elsif ($hdr{Status} == 303) {
533     # even http/1.1 is unclear on how to mutate the method
534     $method = "GET" unless $method eq "HEAD";
535     $redirect = 1;
536     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
537     $redirect = 1;
538     }
539     }
540    
541 root 1.31 my $finish = sub {
542 root 1.41 $state{handle}->destroy if $state{handle};
543 root 1.31 %state = ();
544    
545     # set-cookie processing
546     if ($arg{cookie_jar}) {
547 root 1.41 for ($_[1]{"set-cookie"}) {
548 root 1.31 # parse NAME=VALUE
549     my @kv;
550    
551     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
552     my $name = $1;
553     my $value = $3;
554    
555     unless ($value) {
556     $value = $2;
557     $value =~ s/\\(.)/$1/gs;
558     }
559    
560     push @kv, $name => $value;
561    
562     last unless /\G\s*;/gc;
563     }
564    
565     last unless @kv;
566 root 1.10
567 root 1.31 my $name = shift @kv;
568     my %kv = (value => shift @kv, @kv);
569 root 1.11
570 root 1.31 my $cdom;
571     my $cpath = (delete $kv{path}) || "/";
572 root 1.10
573 root 1.31 if (exists $kv{domain}) {
574     $cdom = delete $kv{domain};
575    
576     $cdom =~ s/^\.?/./; # make sure it starts with a "."
577 root 1.11
578 root 1.31 next if $cdom =~ /\.$/;
579    
580     # this is not rfc-like and not netscape-like. go figure.
581     my $ndots = $cdom =~ y/.//;
582     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
583     } else {
584     $cdom = $uhost;
585     }
586 root 1.30
587 root 1.31 # store it
588     $arg{cookie_jar}{version} = 1;
589     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
590    
591     redo if /\G\s*,/gc;
592 root 1.30 }
593 root 1.11 }
594 root 1.8
595 root 1.41 if ($redirect) {
596     # we ignore any errors, as it is very common to receive
597     # Content-Length != 0 but no actual body
598     # we also access %hdr, as $_[1] might be an erro
599     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
600 root 1.31 } else {
601     $cb->($_[0], $_[1]);
602     }
603     };
604 root 1.24
605 root 1.41 my $len = $hdr{"content-length"};
606    
607     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
608     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
609     } elsif (
610 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
611 root 1.41 or $method eq "HEAD"
612     or (defined $len && !$len)
613     ) {
614     # no body
615     $finish->("", \%hdr);
616 root 1.11 } else {
617 root 1.41 # body handling, four different code paths
618     # for want_body_handle, on_body (2x), normal (2x)
619     # we might read too much here, but it does not matter yet (no pers. connections)
620     if (!$redirect && $arg{want_body_handle}) {
621     $_[0]->on_eof (undef);
622     $_[0]->on_error (undef);
623     $_[0]->on_read (undef);
624    
625     $finish->(delete $state{handle}, \%hdr);
626 root 1.31
627 root 1.41 } elsif ($arg{on_body}) {
628     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
629     if ($len) {
630     $_[0]->on_eof (undef);
631     $_[0]->on_read (sub {
632     $len -= length $_[0]{rbuf};
633    
634     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
635     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
636    
637     $len > 0
638     or $finish->("", \%hdr);
639     });
640     } else {
641     $_[0]->on_eof (sub {
642     $finish->("", \%hdr);
643     });
644     $_[0]->on_read (sub {
645     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
646     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
647     });
648     }
649 root 1.31 } else {
650     $_[0]->on_eof (undef);
651 root 1.41
652     if ($len) {
653     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
654     $_[0]->on_read (sub {
655     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
656     if $len <= length $_[0]{rbuf};
657     });
658     } else {
659     $_[0]->on_error (sub {
660 root 1.43 $! == Errno::EPIPE || !$!
661 root 1.41 ? $finish->(delete $_[0]{rbuf}, \%hdr)
662     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
663     });
664     $_[0]->on_read (sub { });
665     }
666 root 1.31 }
667 root 1.11 }
668 root 1.31 });
669     });
670     };
671 root 1.3
672 root 1.31 # now handle proxy-CONNECT method
673     if ($proxy && $uscheme eq "https") {
674     # oh dear, we have to wrap it into a connect request
675    
676     # maybe re-use $uauthority with patched port?
677     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
678 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
679 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
680 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
681 root 1.31
682     if ($2 == 200) {
683     $rpath = $upath;
684     &$handle_actual_request;
685 root 1.3 } else {
686 root 1.31 %state = ();
687     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
688 root 1.3 }
689 root 1.11 });
690 root 1.31 } else {
691     &$handle_actual_request;
692     }
693    
694 root 1.11 }, sub {
695     $timeout
696     };
697 root 1.1 };
698    
699     defined wantarray && AnyEvent::Util::guard { %state = () }
700     }
701    
702 elmex 1.15 sub http_get($@) {
703 root 1.1 unshift @_, "GET";
704     &http_request
705     }
706    
707 elmex 1.15 sub http_head($@) {
708 root 1.4 unshift @_, "HEAD";
709     &http_request
710     }
711    
712 elmex 1.15 sub http_post($$@) {
713 root 1.22 my $url = shift;
714     unshift @_, "POST", $url, "body";
715 root 1.3 &http_request
716     }
717    
718 root 1.9 =back
719    
720 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
721 root 1.1
722     =over 4
723    
724 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
725    
726     Sets the default proxy server to use. The proxy-url must begin with a
727     string of the form C<http://host:port> (optionally C<https:...>).
728    
729 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
730 root 1.1
731 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
732 root 1.1
733     =item $AnyEvent::HTTP::USERAGENT
734    
735     The default value for the C<User-Agent> header (the default is
736 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
737 root 1.1
738 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
739 root 1.1
740 root 1.43 The maximum number of concurrent conenctions to the same host (identified
741     by the hostname). If the limit is exceeded, then the additional requests
742     are queued until previous connections are closed.
743 root 1.1
744 root 1.43 The default value for this is C<4>, and it is highly advisable to not
745     increase it.
746 root 1.3
747 root 1.14 =item $AnyEvent::HTTP::ACTIVE
748    
749     The number of active connections. This is not the number of currently
750     running requests, but the number of currently open and non-idle TCP
751     connections. This number of can be useful for load-leveling.
752    
753 root 1.1 =back
754    
755     =cut
756    
757 root 1.2 sub set_proxy($) {
758     $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
759     }
760    
761     # initialise proxy from environment
762     set_proxy $ENV{http_proxy};
763    
764 root 1.1 =head1 SEE ALSO
765    
766     L<AnyEvent>.
767    
768     =head1 AUTHOR
769    
770 root 1.18 Marc Lehmann <schmorp@schmorp.de>
771     http://home.schmorp.de/
772 root 1.1
773 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
774     testcases and bugreports.
775    
776 root 1.1 =cut
777    
778     1
779