ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.42
Committed: Mon Jul 6 00:08:16 2009 UTC (14 years, 10 months ago) by root
Branch: MAIN
Changes since 1.41: +20 -13 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.40 use AnyEvent 4.452 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.39 our $VERSION = '1.12';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64     our $MAX_PERSISTENT_PER_HOST = 2;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143     and will provide defaults for C<User-Agent:> and C<Referer:>.
144 root 1.1
145     =item timeout => $seconds
146    
147     The time-out to use for various stages - each connect attempt will reset
148 root 1.2 the timeout, as will read or write activity. Default timeout is 5 minutes.
149    
150     =item proxy => [$host, $port[, $scheme]] or undef
151    
152     Use the given http proxy for all requests. If not specified, then the
153     default proxy (as specified by C<$ENV{http_proxy}>) is used.
154    
155     C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
156     HTTPS.
157 root 1.1
158 root 1.3 =item body => $string
159    
160     The request body, usually empty. Will be-sent as-is (future versions of
161     this module might offer more options).
162    
163 root 1.10 =item cookie_jar => $hash_ref
164    
165     Passing this parameter enables (simplified) cookie-processing, loosely
166     based on the original netscape specification.
167    
168     The C<$hash_ref> must be an (initially empty) hash reference which will
169     get updated automatically. It is possible to save the cookie_jar to
170     persistent storage with something like JSON or Storable, but this is not
171 root 1.40 recommended, as expiry times are currently being ignored.
172 root 1.10
173     Note that this cookie implementation is not of very high quality, nor
174     meant to be complete. If you want complete cookie management you have to
175     do that on your own. C<cookie_jar> is meant as a quick fix to get some
176     cookie-using sites working. Cookies are a privacy disaster, do not use
177     them unless required to.
178    
179 root 1.40 =item tls_ctx => $scheme | $tls_ctx
180    
181     Specifies the AnyEvent::TLS context to be used for https connections. This
182     parameter follows the same rules as the C<tls_ctx> parameter to
183     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
184     C<high> can be specified, which give you a predefined low-security (no
185     verification, highest compatibility) and high-security (CA and common-name
186     verification) TLS context.
187    
188     The default for this option is C<low>, which could be interpreted as "give
189     me the page, no matter what".
190    
191 root 1.42 =item on_header => $callback->($headers)
192 root 1.41
193     When specified, this callback will be called with the header hash as soon
194     as headers have been successfully received from the remote server (not on
195     locally-generated errors).
196    
197     It has to return either true (in which case AnyEvent::HTTP will continue),
198     or false, in which case AnyEvent::HTTP will cancel the download (and call
199     the finish callback with an error code of C<598>).
200    
201     This callback is useful, among other things, to quickly reject unwanted
202     content, which, if it is supposed to be rare, can be faster than first
203     doing a C<HEAD> request.
204    
205 root 1.42 Example: cancel the request unless the content-type is "text/html".
206 root 1.41
207 root 1.42 on_header => sub {
208     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
209     },
210 root 1.41
211 root 1.42 =item on_body => $callback->($partial_body, $headers)
212 root 1.41
213 root 1.42 When specified, all body data will be passed to this callback instead of
214     to the completion callback. The completion callback will get the empty
215     string instead of the body data.
216 root 1.41
217 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
218     or false, in which case AnyEvent::HTTP will cancel the download (and call
219     the completion callback with an error code of C<598>).
220    
221     This callback is useful when the data is too large to be held in memory
222     (so the callback writes it to a file) or when only some information should
223     be extracted, or when the body should be processed incrementally.
224 root 1.41
225     It is usually preferred over doing your own body handling via
226     C<want_body_handle>.
227    
228     =item want_body_handle => $enable
229    
230     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
231     changes considerably: after parsing the headers, and instead of
232     downloading the body (if any), the completion callback will be
233     called. Instead of the C<$body> argument containing the body data, the
234     callback will receive the L<AnyEvent::Handle> object associated with the
235     connection. In error cases, C<undef> will be passed. When there is no body
236     (e.g. status C<304>), the empty string will be passed.
237    
238     The handle object might or might not be in TLS mode, might be connected to
239     a proxy, be a persistent connection etc., and configured in unspecified
240     ways. The user is responsible for this handle (it will not be used by this
241     module anymore).
242    
243     This is useful with some push-type services, where, after the initial
244     headers, an interactive protocol is used (typical example would be the
245     push-style twitter API which starts a JSON/XML stream).
246    
247     If you think you need this, first have a look at C<on_body>, to see if
248     that doesn'T solve your problem in a better way.
249    
250 root 1.1 =back
251    
252 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
253    
254     http_request GET => "http://www.nethype.de/", sub {
255     my ($body, $hdr) = @_;
256     print "$body\n";
257     };
258    
259     Example: make a HTTP HEAD request on https://www.google.com/, use a
260     timeout of 30 seconds.
261    
262     http_request
263     GET => "https://www.google.com",
264     timeout => 30,
265     sub {
266     my ($body, $hdr) = @_;
267     use Data::Dumper;
268     print Dumper $hdr;
269     }
270     ;
271 root 1.1
272 root 1.29 Example: make another simple HTTP GET request, but immediately try to
273     cancel it.
274    
275     my $request = http_request GET => "http://www.nethype.de/", sub {
276     my ($body, $hdr) = @_;
277     print "$body\n";
278     };
279    
280     undef $request;
281    
282 root 1.1 =cut
283    
284 root 1.12 sub _slot_schedule;
285 root 1.11 sub _slot_schedule($) {
286     my $host = shift;
287    
288     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
289     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
290 root 1.12 # somebody wants that slot
291 root 1.11 ++$CO_SLOT{$host}[0];
292 root 1.14 ++$ACTIVE;
293 root 1.11
294     $cb->(AnyEvent::Util::guard {
295 root 1.14 --$ACTIVE;
296 root 1.11 --$CO_SLOT{$host}[0];
297     _slot_schedule $host;
298     });
299     } else {
300     # nobody wants the slot, maybe we can forget about it
301     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
302     last;
303     }
304     }
305     }
306    
307     # wait for a free slot on host, call callback
308     sub _get_slot($$) {
309     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
310    
311     _slot_schedule $_[0];
312     }
313    
314 root 1.34 our $qr_nl = qr<\015?\012>;
315     our $qr_nlnl = qr<\015?\012\015?\012>;
316    
317 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
318     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
319 root 1.40
320 elmex 1.15 sub http_request($$@) {
321 root 1.1 my $cb = pop;
322     my ($method, $url, %arg) = @_;
323    
324     my %hdr;
325    
326 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
327     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
328    
329 root 1.3 $method = uc $method;
330    
331 root 1.8 if (my $hdr = $arg{headers}) {
332 root 1.1 while (my ($k, $v) = each %$hdr) {
333     $hdr{lc $k} = $v;
334     }
335     }
336    
337 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
338 root 1.8
339 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
340 root 1.8 if $recurse < 0;
341    
342 root 1.2 my $proxy = $arg{proxy} || $PROXY;
343 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
344    
345 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
346 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
347 root 1.2
348 root 1.31 $uscheme = lc $uscheme;
349 root 1.1
350 root 1.31 my $uport = $uscheme eq "http" ? 80
351     : $uscheme eq "https" ? 443
352 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
353 root 1.13
354 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
355 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
356 root 1.10
357     my $uhost = $1;
358     $uport = $2 if defined $2;
359    
360     $uhost =~ s/^\[(.*)\]$/$1/;
361     $upath .= "?$query" if length $query;
362    
363     $upath =~ s%^/?%/%;
364    
365     # cookie processing
366     if (my $jar = $arg{cookie_jar}) {
367 root 1.31 %$jar = () if $jar->{version} != 1;
368 root 1.10
369     my @cookie;
370    
371     while (my ($chost, $v) = each %$jar) {
372 root 1.30 if ($chost =~ /^\./) {
373     next unless $chost eq substr $uhost, -length $chost;
374     } elsif ($chost =~ /\./) {
375     next unless $chost eq $uhost;
376     } else {
377     next;
378     }
379 root 1.10
380     while (my ($cpath, $v) = each %$v) {
381     next unless $cpath eq substr $upath, 0, length $cpath;
382    
383     while (my ($k, $v) = each %$v) {
384 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
385     my $value = $v->{value};
386     $value =~ s/([\\"])/\\$1/g;
387     push @cookie, "$k=\"$value\"";
388 root 1.10 }
389     }
390     }
391    
392     $hdr{cookie} = join "; ", @cookie
393     if @cookie;
394     }
395 root 1.1
396 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
397 root 1.2
398 root 1.10 if ($proxy) {
399 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
400 root 1.31
401     # don't support https requests over https-proxy transport,
402 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
403 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
404 root 1.10 } else {
405 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
406 root 1.2 }
407    
408 root 1.41 $hdr{"user-agent"} ||= $USERAGENT;
409     $hdr{referer} ||= "$uscheme://$uauthority$upath"; # leave out fragment and query string, just a heuristic
410    
411     $hdr{host} = "$uhost:$uport";
412 root 1.10 $hdr{"content-length"} = length $arg{body};
413 root 1.1
414 root 1.11 my %state = (connect_guard => 1);
415    
416     _get_slot $uhost, sub {
417     $state{slot_guard} = shift;
418 root 1.1
419 root 1.11 return unless $state{connect_guard};
420 root 1.1
421 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
422     $state{fh} = shift
423 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "$!", URL => $url }));
424 root 1.34 pop; # free memory, save a tree
425 root 1.11
426 root 1.34 return unless delete $state{connect_guard};
427 root 1.11
428     # get handle
429     $state{handle} = new AnyEvent::Handle
430 root 1.40 fh => $state{fh},
431     timeout => $timeout,
432     peername => $rhost,
433     tls_ctx => $arg{tls_ctx};
434 root 1.11
435     # limit the number of persistent connections
436 root 1.34 # keepalive not yet supported
437 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
438     ++$KA_COUNT{$_[1]};
439 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
440     --$KA_COUNT{$_[1]}
441     };
442 root 1.11 $hdr{connection} = "keep-alive";
443     } else {
444     delete $hdr{connection};
445     }
446 root 1.1
447 root 1.11 # (re-)configure handle
448     $state{handle}->on_error (sub {
449     %state = ();
450 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
451 root 1.11 });
452     $state{handle}->on_eof (sub {
453     %state = ();
454 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
455 root 1.11 });
456 root 1.1
457 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
458    
459     # handle actual, non-tunneled, request
460     my $handle_actual_request = sub {
461 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
462 root 1.31
463     # send request
464     $state{handle}->push_write (
465     "$method $rpath HTTP/1.0\015\012"
466     . (join "", map "\u$_: $hdr{$_}\015\012", keys %hdr)
467     . "\015\012"
468     . (delete $arg{body})
469 root 1.11 );
470    
471 root 1.31 %hdr = (); # reduce memory usage, save a kitten
472 root 1.10
473 root 1.31 # status line
474 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
475 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
476 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
477 root 1.31
478     my %hdr = ( # response headers
479     HTTPVersion => ",$1",
480     Status => ",$2",
481     Reason => ",$3",
482     URL => ",$url"
483     );
484    
485     # headers, could be optimized a bit
486 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
487 root 1.31 for ("$_[1]\012") {
488     y/\015//d; # weed out any \015, as they show up in the weirdest of places.
489    
490 root 1.40 # things seen, not parsed:
491     # p3pP="NON CUR OTPi OUR NOR UNI"
492    
493 root 1.31 $hdr{lc $1} .= ",$2"
494     while /\G
495     ([^:\000-\037]+):
496     [\011\040]*
497     ((?: [^\012]+ | \012[\011\040] )*)
498     \012
499     /gxc;
500    
501     /\G$/
502 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
503 root 1.31 }
504    
505     substr $_, 0, 1, ""
506     for values %hdr;
507    
508 root 1.41 # redirect handling
509     # microsoft and other shitheads don't give a shit for following standards,
510     # try to support some common forms of broken Location headers.
511     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
512     $hdr{location} =~ s/^\.\/+//;
513    
514     my $url = "$rscheme://$uhost:$uport";
515    
516     unless ($hdr{location} =~ s/^\///) {
517     $url .= $upath;
518     $url =~ s/\/[^\/]*$//;
519     }
520    
521     $hdr{location} = "$url/$hdr{location}";
522     }
523    
524     my $redirect;
525    
526     if ($recurse) {
527     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
528     # apparently, mozilla et al. just change POST to GET here
529     # more research is needed before we do the same
530     $redirect = 1;
531     } elsif ($hdr{Status} == 303) {
532     # even http/1.1 is unclear on how to mutate the method
533     $method = "GET" unless $method eq "HEAD";
534     $redirect = 1;
535     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
536     $redirect = 1;
537     }
538     }
539    
540 root 1.31 my $finish = sub {
541 root 1.41 $state{handle}->destroy if $state{handle};
542 root 1.31 %state = ();
543    
544     # set-cookie processing
545     if ($arg{cookie_jar}) {
546 root 1.41 for ($_[1]{"set-cookie"}) {
547 root 1.31 # parse NAME=VALUE
548     my @kv;
549    
550     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
551     my $name = $1;
552     my $value = $3;
553    
554     unless ($value) {
555     $value = $2;
556     $value =~ s/\\(.)/$1/gs;
557     }
558    
559     push @kv, $name => $value;
560    
561     last unless /\G\s*;/gc;
562     }
563    
564     last unless @kv;
565 root 1.10
566 root 1.31 my $name = shift @kv;
567     my %kv = (value => shift @kv, @kv);
568 root 1.11
569 root 1.31 my $cdom;
570     my $cpath = (delete $kv{path}) || "/";
571 root 1.10
572 root 1.31 if (exists $kv{domain}) {
573     $cdom = delete $kv{domain};
574    
575     $cdom =~ s/^\.?/./; # make sure it starts with a "."
576 root 1.11
577 root 1.31 next if $cdom =~ /\.$/;
578    
579     # this is not rfc-like and not netscape-like. go figure.
580     my $ndots = $cdom =~ y/.//;
581     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
582     } else {
583     $cdom = $uhost;
584     }
585 root 1.30
586 root 1.31 # store it
587     $arg{cookie_jar}{version} = 1;
588     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
589    
590     redo if /\G\s*,/gc;
591 root 1.30 }
592 root 1.11 }
593 root 1.8
594 root 1.41 if ($redirect) {
595     # we ignore any errors, as it is very common to receive
596     # Content-Length != 0 but no actual body
597     # we also access %hdr, as $_[1] might be an erro
598     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
599 root 1.31 } else {
600     $cb->($_[0], $_[1]);
601     }
602     };
603 root 1.24
604 root 1.41 my $len = $hdr{"content-length"};
605    
606     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
607     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
608     } elsif (
609 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
610 root 1.41 or $method eq "HEAD"
611     or (defined $len && !$len)
612     ) {
613     # no body
614     $finish->("", \%hdr);
615 root 1.11 } else {
616 root 1.41 # body handling, four different code paths
617     # for want_body_handle, on_body (2x), normal (2x)
618     # we might read too much here, but it does not matter yet (no pers. connections)
619     if (!$redirect && $arg{want_body_handle}) {
620     $_[0]->on_eof (undef);
621     $_[0]->on_error (undef);
622     $_[0]->on_read (undef);
623    
624     $finish->(delete $state{handle}, \%hdr);
625 root 1.31
626 root 1.41 } elsif ($arg{on_body}) {
627     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
628     if ($len) {
629     $_[0]->on_eof (undef);
630     $_[0]->on_read (sub {
631     $len -= length $_[0]{rbuf};
632    
633     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
634     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
635    
636     $len > 0
637     or $finish->("", \%hdr);
638     });
639     } else {
640     $_[0]->on_eof (sub {
641     $finish->("", \%hdr);
642     });
643     $_[0]->on_read (sub {
644     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
645     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
646     });
647     }
648 root 1.31 } else {
649     $_[0]->on_eof (undef);
650 root 1.41
651     if ($len) {
652     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
653     $_[0]->on_read (sub {
654     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
655     if $len <= length $_[0]{rbuf};
656     });
657     } else {
658     $_[0]->on_error (sub {
659     $! == Errno::EPIPE
660     ? $finish->(delete $_[0]{rbuf}, \%hdr)
661     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
662     });
663     $_[0]->on_read (sub { });
664     }
665 root 1.31 }
666 root 1.11 }
667 root 1.31 });
668     });
669     };
670 root 1.3
671 root 1.31 # now handle proxy-CONNECT method
672     if ($proxy && $uscheme eq "https") {
673     # oh dear, we have to wrap it into a connect request
674    
675     # maybe re-use $uauthority with patched port?
676     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
677 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
678 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
679 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
680 root 1.31
681     if ($2 == 200) {
682     $rpath = $upath;
683     &$handle_actual_request;
684 root 1.3 } else {
685 root 1.31 %state = ();
686     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
687 root 1.3 }
688 root 1.11 });
689 root 1.31 } else {
690     &$handle_actual_request;
691     }
692    
693 root 1.11 }, sub {
694     $timeout
695     };
696 root 1.1 };
697    
698     defined wantarray && AnyEvent::Util::guard { %state = () }
699     }
700    
701 elmex 1.15 sub http_get($@) {
702 root 1.1 unshift @_, "GET";
703     &http_request
704     }
705    
706 elmex 1.15 sub http_head($@) {
707 root 1.4 unshift @_, "HEAD";
708     &http_request
709     }
710    
711 elmex 1.15 sub http_post($$@) {
712 root 1.22 my $url = shift;
713     unshift @_, "POST", $url, "body";
714 root 1.3 &http_request
715     }
716    
717 root 1.9 =back
718    
719 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
720 root 1.1
721     =over 4
722    
723 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
724    
725     Sets the default proxy server to use. The proxy-url must begin with a
726     string of the form C<http://host:port> (optionally C<https:...>).
727    
728 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
729 root 1.1
730 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
731 root 1.1
732     =item $AnyEvent::HTTP::USERAGENT
733    
734     The default value for the C<User-Agent> header (the default is
735 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
736 root 1.1
737     =item $AnyEvent::HTTP::MAX_PERSISTENT
738    
739     The maximum number of persistent connections to keep open (default: 8).
740    
741 root 1.3 Not implemented currently.
742    
743 root 1.1 =item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
744    
745 root 1.2 The maximum time to cache a persistent connection, in seconds (default: 2).
746 root 1.1
747 root 1.3 Not implemented currently.
748    
749 root 1.14 =item $AnyEvent::HTTP::ACTIVE
750    
751     The number of active connections. This is not the number of currently
752     running requests, but the number of currently open and non-idle TCP
753     connections. This number of can be useful for load-leveling.
754    
755 root 1.1 =back
756    
757     =cut
758    
759 root 1.2 sub set_proxy($) {
760     $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
761     }
762    
763     # initialise proxy from environment
764     set_proxy $ENV{http_proxy};
765    
766 root 1.1 =head1 SEE ALSO
767    
768     L<AnyEvent>.
769    
770     =head1 AUTHOR
771    
772 root 1.18 Marc Lehmann <schmorp@schmorp.de>
773     http://home.schmorp.de/
774 root 1.1
775 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
776     testcases and bugreports.
777    
778 root 1.1 =cut
779    
780     1
781