ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.45
Committed: Fri Jul 17 16:56:36 2009 UTC (14 years, 10 months ago) by root
Branch: MAIN
Changes since 1.44: +5 -2 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.44 use AnyEvent 4.8 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.44 our $VERSION = '1.4';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143     and will provide defaults for C<User-Agent:> and C<Referer:>.
144 root 1.1
145     =item timeout => $seconds
146    
147     The time-out to use for various stages - each connect attempt will reset
148 root 1.2 the timeout, as will read or write activity. Default timeout is 5 minutes.
149    
150     =item proxy => [$host, $port[, $scheme]] or undef
151    
152     Use the given http proxy for all requests. If not specified, then the
153     default proxy (as specified by C<$ENV{http_proxy}>) is used.
154    
155     C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
156     HTTPS.
157 root 1.1
158 root 1.3 =item body => $string
159    
160     The request body, usually empty. Will be-sent as-is (future versions of
161     this module might offer more options).
162    
163 root 1.10 =item cookie_jar => $hash_ref
164    
165     Passing this parameter enables (simplified) cookie-processing, loosely
166     based on the original netscape specification.
167    
168     The C<$hash_ref> must be an (initially empty) hash reference which will
169     get updated automatically. It is possible to save the cookie_jar to
170     persistent storage with something like JSON or Storable, but this is not
171 root 1.40 recommended, as expiry times are currently being ignored.
172 root 1.10
173     Note that this cookie implementation is not of very high quality, nor
174     meant to be complete. If you want complete cookie management you have to
175     do that on your own. C<cookie_jar> is meant as a quick fix to get some
176     cookie-using sites working. Cookies are a privacy disaster, do not use
177     them unless required to.
178    
179 root 1.40 =item tls_ctx => $scheme | $tls_ctx
180    
181     Specifies the AnyEvent::TLS context to be used for https connections. This
182     parameter follows the same rules as the C<tls_ctx> parameter to
183     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
184     C<high> can be specified, which give you a predefined low-security (no
185     verification, highest compatibility) and high-security (CA and common-name
186     verification) TLS context.
187    
188     The default for this option is C<low>, which could be interpreted as "give
189     me the page, no matter what".
190    
191 root 1.42 =item on_header => $callback->($headers)
192 root 1.41
193     When specified, this callback will be called with the header hash as soon
194     as headers have been successfully received from the remote server (not on
195     locally-generated errors).
196    
197     It has to return either true (in which case AnyEvent::HTTP will continue),
198     or false, in which case AnyEvent::HTTP will cancel the download (and call
199     the finish callback with an error code of C<598>).
200    
201     This callback is useful, among other things, to quickly reject unwanted
202     content, which, if it is supposed to be rare, can be faster than first
203     doing a C<HEAD> request.
204    
205 root 1.42 Example: cancel the request unless the content-type is "text/html".
206 root 1.41
207 root 1.42 on_header => sub {
208     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
209     },
210 root 1.41
211 root 1.42 =item on_body => $callback->($partial_body, $headers)
212 root 1.41
213 root 1.42 When specified, all body data will be passed to this callback instead of
214     to the completion callback. The completion callback will get the empty
215     string instead of the body data.
216 root 1.41
217 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
218     or false, in which case AnyEvent::HTTP will cancel the download (and call
219     the completion callback with an error code of C<598>).
220    
221     This callback is useful when the data is too large to be held in memory
222     (so the callback writes it to a file) or when only some information should
223     be extracted, or when the body should be processed incrementally.
224 root 1.41
225     It is usually preferred over doing your own body handling via
226 root 1.45 C<want_body_handle>, but in case of streaming APIs, where HTTP is
227     only used to create a connection, C<want_body_handle> is the better
228     alternative, as it allows you to install your own event handler, reducing
229     resource usage.
230 root 1.41
231     =item want_body_handle => $enable
232    
233     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
234     changes considerably: after parsing the headers, and instead of
235     downloading the body (if any), the completion callback will be
236     called. Instead of the C<$body> argument containing the body data, the
237     callback will receive the L<AnyEvent::Handle> object associated with the
238     connection. In error cases, C<undef> will be passed. When there is no body
239     (e.g. status C<304>), the empty string will be passed.
240    
241     The handle object might or might not be in TLS mode, might be connected to
242     a proxy, be a persistent connection etc., and configured in unspecified
243     ways. The user is responsible for this handle (it will not be used by this
244     module anymore).
245    
246     This is useful with some push-type services, where, after the initial
247     headers, an interactive protocol is used (typical example would be the
248     push-style twitter API which starts a JSON/XML stream).
249    
250     If you think you need this, first have a look at C<on_body>, to see if
251 root 1.45 that doesn't solve your problem in a better way.
252 root 1.41
253 root 1.1 =back
254    
255 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
256    
257     http_request GET => "http://www.nethype.de/", sub {
258     my ($body, $hdr) = @_;
259     print "$body\n";
260     };
261    
262     Example: make a HTTP HEAD request on https://www.google.com/, use a
263     timeout of 30 seconds.
264    
265     http_request
266     GET => "https://www.google.com",
267     timeout => 30,
268     sub {
269     my ($body, $hdr) = @_;
270     use Data::Dumper;
271     print Dumper $hdr;
272     }
273     ;
274 root 1.1
275 root 1.29 Example: make another simple HTTP GET request, but immediately try to
276     cancel it.
277    
278     my $request = http_request GET => "http://www.nethype.de/", sub {
279     my ($body, $hdr) = @_;
280     print "$body\n";
281     };
282    
283     undef $request;
284    
285 root 1.1 =cut
286    
287 root 1.12 sub _slot_schedule;
288 root 1.11 sub _slot_schedule($) {
289     my $host = shift;
290    
291     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
292     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
293 root 1.12 # somebody wants that slot
294 root 1.11 ++$CO_SLOT{$host}[0];
295 root 1.14 ++$ACTIVE;
296 root 1.11
297     $cb->(AnyEvent::Util::guard {
298 root 1.14 --$ACTIVE;
299 root 1.11 --$CO_SLOT{$host}[0];
300     _slot_schedule $host;
301     });
302     } else {
303     # nobody wants the slot, maybe we can forget about it
304     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
305     last;
306     }
307     }
308     }
309    
310     # wait for a free slot on host, call callback
311     sub _get_slot($$) {
312     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
313    
314     _slot_schedule $_[0];
315     }
316    
317 root 1.34 our $qr_nl = qr<\015?\012>;
318     our $qr_nlnl = qr<\015?\012\015?\012>;
319    
320 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
321     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
322 root 1.40
323 elmex 1.15 sub http_request($$@) {
324 root 1.1 my $cb = pop;
325     my ($method, $url, %arg) = @_;
326    
327     my %hdr;
328    
329 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
330     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
331    
332 root 1.3 $method = uc $method;
333    
334 root 1.8 if (my $hdr = $arg{headers}) {
335 root 1.1 while (my ($k, $v) = each %$hdr) {
336     $hdr{lc $k} = $v;
337     }
338     }
339    
340 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
341 root 1.8
342 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
343 root 1.8 if $recurse < 0;
344    
345 root 1.2 my $proxy = $arg{proxy} || $PROXY;
346 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
347    
348 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
349 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
350 root 1.2
351 root 1.31 $uscheme = lc $uscheme;
352 root 1.1
353 root 1.31 my $uport = $uscheme eq "http" ? 80
354     : $uscheme eq "https" ? 443
355 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
356 root 1.13
357 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
358 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
359 root 1.10
360     my $uhost = $1;
361     $uport = $2 if defined $2;
362    
363 root 1.43 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost";
364    
365 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
366     $upath .= "?$query" if length $query;
367    
368     $upath =~ s%^/?%/%;
369    
370     # cookie processing
371     if (my $jar = $arg{cookie_jar}) {
372 root 1.31 %$jar = () if $jar->{version} != 1;
373 root 1.10
374     my @cookie;
375    
376     while (my ($chost, $v) = each %$jar) {
377 root 1.30 if ($chost =~ /^\./) {
378     next unless $chost eq substr $uhost, -length $chost;
379     } elsif ($chost =~ /\./) {
380     next unless $chost eq $uhost;
381     } else {
382     next;
383     }
384 root 1.10
385     while (my ($cpath, $v) = each %$v) {
386     next unless $cpath eq substr $upath, 0, length $cpath;
387    
388     while (my ($k, $v) = each %$v) {
389 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
390     my $value = $v->{value};
391     $value =~ s/([\\"])/\\$1/g;
392     push @cookie, "$k=\"$value\"";
393 root 1.10 }
394     }
395     }
396    
397     $hdr{cookie} = join "; ", @cookie
398     if @cookie;
399     }
400 root 1.1
401 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
402 root 1.2
403 root 1.10 if ($proxy) {
404 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
405 root 1.31
406     # don't support https requests over https-proxy transport,
407 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
408 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
409 root 1.10 } else {
410 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
411 root 1.2 }
412    
413 root 1.41 $hdr{"user-agent"} ||= $USERAGENT;
414     $hdr{referer} ||= "$uscheme://$uauthority$upath"; # leave out fragment and query string, just a heuristic
415    
416 root 1.10 $hdr{"content-length"} = length $arg{body};
417 root 1.1
418 root 1.11 my %state = (connect_guard => 1);
419    
420     _get_slot $uhost, sub {
421     $state{slot_guard} = shift;
422 root 1.1
423 root 1.11 return unless $state{connect_guard};
424 root 1.1
425 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
426     $state{fh} = shift
427 root 1.44 or do {
428     my $err = "$!";
429     %state = ();
430     return $cb->(undef, { Status => 599, Reason => $err, URL => $url });
431     };
432    
433 root 1.34 pop; # free memory, save a tree
434 root 1.11
435 root 1.34 return unless delete $state{connect_guard};
436 root 1.11
437     # get handle
438     $state{handle} = new AnyEvent::Handle
439 root 1.40 fh => $state{fh},
440     timeout => $timeout,
441     peername => $rhost,
442     tls_ctx => $arg{tls_ctx};
443 root 1.11
444     # limit the number of persistent connections
445 root 1.34 # keepalive not yet supported
446 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
447     ++$KA_COUNT{$_[1]};
448 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
449     --$KA_COUNT{$_[1]}
450     };
451 root 1.11 $hdr{connection} = "keep-alive";
452     } else {
453     delete $hdr{connection};
454     }
455 root 1.1
456 root 1.11 # (re-)configure handle
457     $state{handle}->on_error (sub {
458     %state = ();
459 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
460 root 1.11 });
461     $state{handle}->on_eof (sub {
462     %state = ();
463 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
464 root 1.11 });
465 root 1.1
466 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
467    
468     # handle actual, non-tunneled, request
469     my $handle_actual_request = sub {
470 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
471 root 1.31
472     # send request
473     $state{handle}->push_write (
474     "$method $rpath HTTP/1.0\015\012"
475     . (join "", map "\u$_: $hdr{$_}\015\012", keys %hdr)
476     . "\015\012"
477     . (delete $arg{body})
478 root 1.11 );
479    
480 root 1.31 %hdr = (); # reduce memory usage, save a kitten
481 root 1.10
482 root 1.31 # status line
483 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
484 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
485 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
486 root 1.31
487     my %hdr = ( # response headers
488     HTTPVersion => ",$1",
489     Status => ",$2",
490     Reason => ",$3",
491     URL => ",$url"
492     );
493    
494     # headers, could be optimized a bit
495 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
496 root 1.31 for ("$_[1]\012") {
497     y/\015//d; # weed out any \015, as they show up in the weirdest of places.
498    
499 root 1.40 # things seen, not parsed:
500     # p3pP="NON CUR OTPi OUR NOR UNI"
501    
502 root 1.31 $hdr{lc $1} .= ",$2"
503     while /\G
504 root 1.43 ([^:\000-\037]*):
505 root 1.31 [\011\040]*
506     ((?: [^\012]+ | \012[\011\040] )*)
507     \012
508     /gxc;
509    
510     /\G$/
511 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
512 root 1.31 }
513    
514     substr $_, 0, 1, ""
515     for values %hdr;
516    
517 root 1.41 # redirect handling
518     # microsoft and other shitheads don't give a shit for following standards,
519     # try to support some common forms of broken Location headers.
520     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
521     $hdr{location} =~ s/^\.\/+//;
522    
523     my $url = "$rscheme://$uhost:$uport";
524    
525     unless ($hdr{location} =~ s/^\///) {
526     $url .= $upath;
527     $url =~ s/\/[^\/]*$//;
528     }
529    
530     $hdr{location} = "$url/$hdr{location}";
531     }
532    
533     my $redirect;
534    
535     if ($recurse) {
536     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
537     # apparently, mozilla et al. just change POST to GET here
538     # more research is needed before we do the same
539     $redirect = 1;
540     } elsif ($hdr{Status} == 303) {
541     # even http/1.1 is unclear on how to mutate the method
542     $method = "GET" unless $method eq "HEAD";
543     $redirect = 1;
544     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
545     $redirect = 1;
546     }
547     }
548    
549 root 1.31 my $finish = sub {
550 root 1.41 $state{handle}->destroy if $state{handle};
551 root 1.31 %state = ();
552    
553     # set-cookie processing
554     if ($arg{cookie_jar}) {
555 root 1.41 for ($_[1]{"set-cookie"}) {
556 root 1.31 # parse NAME=VALUE
557     my @kv;
558    
559     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
560     my $name = $1;
561     my $value = $3;
562    
563     unless ($value) {
564     $value = $2;
565     $value =~ s/\\(.)/$1/gs;
566     }
567    
568     push @kv, $name => $value;
569    
570     last unless /\G\s*;/gc;
571     }
572    
573     last unless @kv;
574 root 1.10
575 root 1.31 my $name = shift @kv;
576     my %kv = (value => shift @kv, @kv);
577 root 1.11
578 root 1.31 my $cdom;
579     my $cpath = (delete $kv{path}) || "/";
580 root 1.10
581 root 1.31 if (exists $kv{domain}) {
582     $cdom = delete $kv{domain};
583    
584     $cdom =~ s/^\.?/./; # make sure it starts with a "."
585 root 1.11
586 root 1.31 next if $cdom =~ /\.$/;
587    
588     # this is not rfc-like and not netscape-like. go figure.
589     my $ndots = $cdom =~ y/.//;
590     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
591     } else {
592     $cdom = $uhost;
593     }
594 root 1.30
595 root 1.31 # store it
596     $arg{cookie_jar}{version} = 1;
597     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
598    
599     redo if /\G\s*,/gc;
600 root 1.30 }
601 root 1.11 }
602 root 1.8
603 root 1.41 if ($redirect) {
604     # we ignore any errors, as it is very common to receive
605     # Content-Length != 0 but no actual body
606     # we also access %hdr, as $_[1] might be an erro
607     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
608 root 1.31 } else {
609     $cb->($_[0], $_[1]);
610     }
611     };
612 root 1.24
613 root 1.41 my $len = $hdr{"content-length"};
614    
615     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
616     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
617     } elsif (
618 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
619 root 1.41 or $method eq "HEAD"
620     or (defined $len && !$len)
621     ) {
622     # no body
623     $finish->("", \%hdr);
624 root 1.11 } else {
625 root 1.41 # body handling, four different code paths
626     # for want_body_handle, on_body (2x), normal (2x)
627     # we might read too much here, but it does not matter yet (no pers. connections)
628     if (!$redirect && $arg{want_body_handle}) {
629     $_[0]->on_eof (undef);
630     $_[0]->on_error (undef);
631     $_[0]->on_read (undef);
632    
633     $finish->(delete $state{handle}, \%hdr);
634 root 1.31
635 root 1.41 } elsif ($arg{on_body}) {
636     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
637     if ($len) {
638     $_[0]->on_eof (undef);
639     $_[0]->on_read (sub {
640     $len -= length $_[0]{rbuf};
641    
642     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
643     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
644    
645     $len > 0
646     or $finish->("", \%hdr);
647     });
648     } else {
649     $_[0]->on_eof (sub {
650     $finish->("", \%hdr);
651     });
652     $_[0]->on_read (sub {
653     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
654     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
655     });
656     }
657 root 1.31 } else {
658     $_[0]->on_eof (undef);
659 root 1.41
660     if ($len) {
661     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
662     $_[0]->on_read (sub {
663     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
664     if $len <= length $_[0]{rbuf};
665     });
666     } else {
667     $_[0]->on_error (sub {
668 root 1.43 $! == Errno::EPIPE || !$!
669 root 1.41 ? $finish->(delete $_[0]{rbuf}, \%hdr)
670     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
671     });
672     $_[0]->on_read (sub { });
673     }
674 root 1.31 }
675 root 1.11 }
676 root 1.31 });
677     });
678     };
679 root 1.3
680 root 1.31 # now handle proxy-CONNECT method
681     if ($proxy && $uscheme eq "https") {
682     # oh dear, we have to wrap it into a connect request
683    
684     # maybe re-use $uauthority with patched port?
685     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
686 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
687 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
688 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
689 root 1.31
690     if ($2 == 200) {
691     $rpath = $upath;
692     &$handle_actual_request;
693 root 1.3 } else {
694 root 1.31 %state = ();
695     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
696 root 1.3 }
697 root 1.11 });
698 root 1.31 } else {
699     &$handle_actual_request;
700     }
701    
702 root 1.11 }, sub {
703     $timeout
704     };
705 root 1.1 };
706    
707     defined wantarray && AnyEvent::Util::guard { %state = () }
708     }
709    
710 elmex 1.15 sub http_get($@) {
711 root 1.1 unshift @_, "GET";
712     &http_request
713     }
714    
715 elmex 1.15 sub http_head($@) {
716 root 1.4 unshift @_, "HEAD";
717     &http_request
718     }
719    
720 elmex 1.15 sub http_post($$@) {
721 root 1.22 my $url = shift;
722     unshift @_, "POST", $url, "body";
723 root 1.3 &http_request
724     }
725    
726 root 1.9 =back
727    
728 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
729 root 1.1
730     =over 4
731    
732 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
733    
734     Sets the default proxy server to use. The proxy-url must begin with a
735     string of the form C<http://host:port> (optionally C<https:...>).
736    
737 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
738 root 1.1
739 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
740 root 1.1
741     =item $AnyEvent::HTTP::USERAGENT
742    
743     The default value for the C<User-Agent> header (the default is
744 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
745 root 1.1
746 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
747 root 1.1
748 root 1.43 The maximum number of concurrent conenctions to the same host (identified
749     by the hostname). If the limit is exceeded, then the additional requests
750     are queued until previous connections are closed.
751 root 1.1
752 root 1.43 The default value for this is C<4>, and it is highly advisable to not
753     increase it.
754 root 1.3
755 root 1.14 =item $AnyEvent::HTTP::ACTIVE
756    
757     The number of active connections. This is not the number of currently
758     running requests, but the number of currently open and non-idle TCP
759     connections. This number of can be useful for load-leveling.
760    
761 root 1.1 =back
762    
763     =cut
764    
765 root 1.2 sub set_proxy($) {
766     $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
767     }
768    
769     # initialise proxy from environment
770     set_proxy $ENV{http_proxy};
771    
772 root 1.1 =head1 SEE ALSO
773    
774     L<AnyEvent>.
775    
776     =head1 AUTHOR
777    
778 root 1.18 Marc Lehmann <schmorp@schmorp.de>
779     http://home.schmorp.de/
780 root 1.1
781 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
782     testcases and bugreports.
783    
784 root 1.1 =cut
785    
786     1
787