ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.41
Committed: Sun Jul 5 23:50:59 2009 UTC (14 years, 10 months ago) by root
Branch: MAIN
Changes since 1.40: +159 -56 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.40 use AnyEvent 4.452 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.39 our $VERSION = '1.12';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64     our $MAX_PERSISTENT_PER_HOST = 2;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.2 The callback will be called with the response data as first argument
100     (or C<undef> if it wasn't available due to errors), and a hash-ref with
101     response headers as second argument.
102    
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143     and will provide defaults for C<User-Agent:> and C<Referer:>.
144 root 1.1
145     =item timeout => $seconds
146    
147     The time-out to use for various stages - each connect attempt will reset
148 root 1.2 the timeout, as will read or write activity. Default timeout is 5 minutes.
149    
150     =item proxy => [$host, $port[, $scheme]] or undef
151    
152     Use the given http proxy for all requests. If not specified, then the
153     default proxy (as specified by C<$ENV{http_proxy}>) is used.
154    
155     C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
156     HTTPS.
157 root 1.1
158 root 1.3 =item body => $string
159    
160     The request body, usually empty. Will be-sent as-is (future versions of
161     this module might offer more options).
162    
163 root 1.10 =item cookie_jar => $hash_ref
164    
165     Passing this parameter enables (simplified) cookie-processing, loosely
166     based on the original netscape specification.
167    
168     The C<$hash_ref> must be an (initially empty) hash reference which will
169     get updated automatically. It is possible to save the cookie_jar to
170     persistent storage with something like JSON or Storable, but this is not
171 root 1.40 recommended, as expiry times are currently being ignored.
172 root 1.10
173     Note that this cookie implementation is not of very high quality, nor
174     meant to be complete. If you want complete cookie management you have to
175     do that on your own. C<cookie_jar> is meant as a quick fix to get some
176     cookie-using sites working. Cookies are a privacy disaster, do not use
177     them unless required to.
178    
179 root 1.40 =item tls_ctx => $scheme | $tls_ctx
180    
181     Specifies the AnyEvent::TLS context to be used for https connections. This
182     parameter follows the same rules as the C<tls_ctx> parameter to
183     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
184     C<high> can be specified, which give you a predefined low-security (no
185     verification, highest compatibility) and high-security (CA and common-name
186     verification) TLS context.
187    
188     The default for this option is C<low>, which could be interpreted as "give
189     me the page, no matter what".
190    
191 root 1.41 =item on_header => $callback->($hdr)
192    
193     When specified, this callback will be called with the header hash as soon
194     as headers have been successfully received from the remote server (not on
195     locally-generated errors).
196    
197     It has to return either true (in which case AnyEvent::HTTP will continue),
198     or false, in which case AnyEvent::HTTP will cancel the download (and call
199     the finish callback with an error code of C<598>).
200    
201     This callback is useful, among other things, to quickly reject unwanted
202     content, which, if it is supposed to be rare, can be faster than first
203     doing a C<HEAD> request.
204    
205     =item on_body => $callback->($data, $hdr)
206    
207     When specified, all body data will be "filtered" through this callback.
208    
209     The callback will incrementally receive body data, and is supposed to
210     return it or a modified version of it (empty strings are valid returns).
211    
212     If the callback returns C<undef>, then the request will be cancelled.
213    
214     This callback is useful when you want to do some processing on the data,
215     or the data is too large to be held in memory (so the callback writes it
216     to a file and returns the empty string) and so on.
217    
218     It is usually preferred over doing your own body handling via
219     C<want_body_handle>.
220    
221     =item want_body_handle => $enable
222    
223     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
224     changes considerably: after parsing the headers, and instead of
225     downloading the body (if any), the completion callback will be
226     called. Instead of the C<$body> argument containing the body data, the
227     callback will receive the L<AnyEvent::Handle> object associated with the
228     connection. In error cases, C<undef> will be passed. When there is no body
229     (e.g. status C<304>), the empty string will be passed.
230    
231     The handle object might or might not be in TLS mode, might be connected to
232     a proxy, be a persistent connection etc., and configured in unspecified
233     ways. The user is responsible for this handle (it will not be used by this
234     module anymore).
235    
236     This is useful with some push-type services, where, after the initial
237     headers, an interactive protocol is used (typical example would be the
238     push-style twitter API which starts a JSON/XML stream).
239    
240     If you think you need this, first have a look at C<on_body>, to see if
241     that doesn'T solve your problem in a better way.
242    
243 root 1.1 =back
244    
245 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
246    
247     http_request GET => "http://www.nethype.de/", sub {
248     my ($body, $hdr) = @_;
249     print "$body\n";
250     };
251    
252     Example: make a HTTP HEAD request on https://www.google.com/, use a
253     timeout of 30 seconds.
254    
255     http_request
256     GET => "https://www.google.com",
257     timeout => 30,
258     sub {
259     my ($body, $hdr) = @_;
260     use Data::Dumper;
261     print Dumper $hdr;
262     }
263     ;
264 root 1.1
265 root 1.29 Example: make another simple HTTP GET request, but immediately try to
266     cancel it.
267    
268     my $request = http_request GET => "http://www.nethype.de/", sub {
269     my ($body, $hdr) = @_;
270     print "$body\n";
271     };
272    
273     undef $request;
274    
275 root 1.1 =cut
276    
277 root 1.12 sub _slot_schedule;
278 root 1.11 sub _slot_schedule($) {
279     my $host = shift;
280    
281     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
282     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
283 root 1.12 # somebody wants that slot
284 root 1.11 ++$CO_SLOT{$host}[0];
285 root 1.14 ++$ACTIVE;
286 root 1.11
287     $cb->(AnyEvent::Util::guard {
288 root 1.14 --$ACTIVE;
289 root 1.11 --$CO_SLOT{$host}[0];
290     _slot_schedule $host;
291     });
292     } else {
293     # nobody wants the slot, maybe we can forget about it
294     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
295     last;
296     }
297     }
298     }
299    
300     # wait for a free slot on host, call callback
301     sub _get_slot($$) {
302     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
303    
304     _slot_schedule $_[0];
305     }
306    
307 root 1.34 our $qr_nl = qr<\015?\012>;
308     our $qr_nlnl = qr<\015?\012\015?\012>;
309    
310 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
311     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
312 root 1.40
313 elmex 1.15 sub http_request($$@) {
314 root 1.1 my $cb = pop;
315     my ($method, $url, %arg) = @_;
316    
317     my %hdr;
318    
319 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
320     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
321    
322 root 1.3 $method = uc $method;
323    
324 root 1.8 if (my $hdr = $arg{headers}) {
325 root 1.1 while (my ($k, $v) = each %$hdr) {
326     $hdr{lc $k} = $v;
327     }
328     }
329    
330 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
331 root 1.8
332 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
333 root 1.8 if $recurse < 0;
334    
335 root 1.2 my $proxy = $arg{proxy} || $PROXY;
336 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
337    
338 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
339 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
340 root 1.2
341 root 1.31 $uscheme = lc $uscheme;
342 root 1.1
343 root 1.31 my $uport = $uscheme eq "http" ? 80
344     : $uscheme eq "https" ? 443
345 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
346 root 1.13
347 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
348 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
349 root 1.10
350     my $uhost = $1;
351     $uport = $2 if defined $2;
352    
353     $uhost =~ s/^\[(.*)\]$/$1/;
354     $upath .= "?$query" if length $query;
355    
356     $upath =~ s%^/?%/%;
357    
358     # cookie processing
359     if (my $jar = $arg{cookie_jar}) {
360 root 1.31 %$jar = () if $jar->{version} != 1;
361 root 1.10
362     my @cookie;
363    
364     while (my ($chost, $v) = each %$jar) {
365 root 1.30 if ($chost =~ /^\./) {
366     next unless $chost eq substr $uhost, -length $chost;
367     } elsif ($chost =~ /\./) {
368     next unless $chost eq $uhost;
369     } else {
370     next;
371     }
372 root 1.10
373     while (my ($cpath, $v) = each %$v) {
374     next unless $cpath eq substr $upath, 0, length $cpath;
375    
376     while (my ($k, $v) = each %$v) {
377 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
378     my $value = $v->{value};
379     $value =~ s/([\\"])/\\$1/g;
380     push @cookie, "$k=\"$value\"";
381 root 1.10 }
382     }
383     }
384    
385     $hdr{cookie} = join "; ", @cookie
386     if @cookie;
387     }
388 root 1.1
389 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
390 root 1.2
391 root 1.10 if ($proxy) {
392 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
393 root 1.31
394     # don't support https requests over https-proxy transport,
395 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
396 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
397 root 1.10 } else {
398 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
399 root 1.2 }
400    
401 root 1.41 $hdr{"user-agent"} ||= $USERAGENT;
402     $hdr{referer} ||= "$uscheme://$uauthority$upath"; # leave out fragment and query string, just a heuristic
403    
404     $hdr{host} = "$uhost:$uport";
405 root 1.10 $hdr{"content-length"} = length $arg{body};
406 root 1.1
407 root 1.11 my %state = (connect_guard => 1);
408    
409     _get_slot $uhost, sub {
410     $state{slot_guard} = shift;
411 root 1.1
412 root 1.11 return unless $state{connect_guard};
413 root 1.1
414 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
415     $state{fh} = shift
416 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "$!", URL => $url }));
417 root 1.34 pop; # free memory, save a tree
418 root 1.11
419 root 1.34 return unless delete $state{connect_guard};
420 root 1.11
421     # get handle
422     $state{handle} = new AnyEvent::Handle
423 root 1.40 fh => $state{fh},
424     timeout => $timeout,
425     peername => $rhost,
426     tls_ctx => $arg{tls_ctx};
427 root 1.11
428     # limit the number of persistent connections
429 root 1.34 # keepalive not yet supported
430 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
431     ++$KA_COUNT{$_[1]};
432 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
433     --$KA_COUNT{$_[1]}
434     };
435 root 1.11 $hdr{connection} = "keep-alive";
436     } else {
437     delete $hdr{connection};
438     }
439 root 1.1
440 root 1.11 # (re-)configure handle
441     $state{handle}->on_error (sub {
442     %state = ();
443 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
444 root 1.11 });
445     $state{handle}->on_eof (sub {
446     %state = ();
447 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
448 root 1.11 });
449 root 1.1
450 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
451    
452     # handle actual, non-tunneled, request
453     my $handle_actual_request = sub {
454 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
455 root 1.31
456     # send request
457     $state{handle}->push_write (
458     "$method $rpath HTTP/1.0\015\012"
459     . (join "", map "\u$_: $hdr{$_}\015\012", keys %hdr)
460     . "\015\012"
461     . (delete $arg{body})
462 root 1.11 );
463    
464 root 1.31 %hdr = (); # reduce memory usage, save a kitten
465 root 1.10
466 root 1.31 # status line
467 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
468 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
469 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
470 root 1.31
471     my %hdr = ( # response headers
472     HTTPVersion => ",$1",
473     Status => ",$2",
474     Reason => ",$3",
475     URL => ",$url"
476     );
477    
478     # headers, could be optimized a bit
479 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
480 root 1.31 for ("$_[1]\012") {
481     y/\015//d; # weed out any \015, as they show up in the weirdest of places.
482    
483 root 1.40 # things seen, not parsed:
484     # p3pP="NON CUR OTPi OUR NOR UNI"
485    
486 root 1.31 $hdr{lc $1} .= ",$2"
487     while /\G
488     ([^:\000-\037]+):
489     [\011\040]*
490     ((?: [^\012]+ | \012[\011\040] )*)
491     \012
492     /gxc;
493    
494     /\G$/
495 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
496 root 1.31 }
497    
498     substr $_, 0, 1, ""
499     for values %hdr;
500    
501 root 1.41 # redirect handling
502     # microsoft and other shitheads don't give a shit for following standards,
503     # try to support some common forms of broken Location headers.
504     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
505     $hdr{location} =~ s/^\.\/+//;
506    
507     my $url = "$rscheme://$uhost:$uport";
508    
509     unless ($hdr{location} =~ s/^\///) {
510     $url .= $upath;
511     $url =~ s/\/[^\/]*$//;
512     }
513    
514     $hdr{location} = "$url/$hdr{location}";
515     }
516    
517     my $redirect;
518    
519     if ($recurse) {
520     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
521     # apparently, mozilla et al. just change POST to GET here
522     # more research is needed before we do the same
523     $redirect = 1;
524     } elsif ($hdr{Status} == 303) {
525     # even http/1.1 is unclear on how to mutate the method
526     $method = "GET" unless $method eq "HEAD";
527     $redirect = 1;
528     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
529     $redirect = 1;
530     }
531     }
532    
533 root 1.31 my $finish = sub {
534 root 1.41 $state{handle}->destroy if $state{handle};
535 root 1.31 %state = ();
536    
537     # set-cookie processing
538     if ($arg{cookie_jar}) {
539 root 1.41 for ($_[1]{"set-cookie"}) {
540 root 1.31 # parse NAME=VALUE
541     my @kv;
542    
543     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
544     my $name = $1;
545     my $value = $3;
546    
547     unless ($value) {
548     $value = $2;
549     $value =~ s/\\(.)/$1/gs;
550     }
551    
552     push @kv, $name => $value;
553    
554     last unless /\G\s*;/gc;
555     }
556    
557     last unless @kv;
558 root 1.10
559 root 1.31 my $name = shift @kv;
560     my %kv = (value => shift @kv, @kv);
561 root 1.11
562 root 1.31 my $cdom;
563     my $cpath = (delete $kv{path}) || "/";
564 root 1.10
565 root 1.31 if (exists $kv{domain}) {
566     $cdom = delete $kv{domain};
567    
568     $cdom =~ s/^\.?/./; # make sure it starts with a "."
569 root 1.11
570 root 1.31 next if $cdom =~ /\.$/;
571    
572     # this is not rfc-like and not netscape-like. go figure.
573     my $ndots = $cdom =~ y/.//;
574     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
575     } else {
576     $cdom = $uhost;
577     }
578 root 1.30
579 root 1.31 # store it
580     $arg{cookie_jar}{version} = 1;
581     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
582    
583     redo if /\G\s*,/gc;
584 root 1.30 }
585 root 1.11 }
586 root 1.8
587 root 1.41 if ($redirect) {
588     # we ignore any errors, as it is very common to receive
589     # Content-Length != 0 but no actual body
590     # we also access %hdr, as $_[1] might be an erro
591     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
592 root 1.31 } else {
593     $cb->($_[0], $_[1]);
594     }
595     };
596 root 1.24
597 root 1.41 my $len = $hdr{"content-length"};
598    
599     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
600     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
601     } elsif (
602     $hdr{Status} =~ /^(?:1..|204|304)$/
603     or $method eq "HEAD"
604     or (defined $len && !$len)
605     ) {
606     # no body
607     $finish->("", \%hdr);
608 root 1.11 } else {
609 root 1.41 # body handling, four different code paths
610     # for want_body_handle, on_body (2x), normal (2x)
611     # we might read too much here, but it does not matter yet (no pers. connections)
612     if (!$redirect && $arg{want_body_handle}) {
613     $_[0]->on_eof (undef);
614     $_[0]->on_error (undef);
615     $_[0]->on_read (undef);
616    
617     $finish->(delete $state{handle}, \%hdr);
618 root 1.31
619 root 1.41 } elsif ($arg{on_body}) {
620     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
621     if ($len) {
622     $_[0]->on_eof (undef);
623     $_[0]->on_read (sub {
624     $len -= length $_[0]{rbuf};
625    
626     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
627     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
628    
629     $len > 0
630     or $finish->("", \%hdr);
631     });
632     } else {
633     $_[0]->on_eof (sub {
634     $finish->("", \%hdr);
635     });
636     $_[0]->on_read (sub {
637     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
638     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
639     });
640     }
641 root 1.31 } else {
642     $_[0]->on_eof (undef);
643 root 1.41
644     if ($len) {
645     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
646     $_[0]->on_read (sub {
647     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
648     if $len <= length $_[0]{rbuf};
649     });
650     } else {
651     $_[0]->on_error (sub {
652     $! == Errno::EPIPE
653     ? $finish->(delete $_[0]{rbuf}, \%hdr)
654     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
655     });
656     $_[0]->on_read (sub { });
657     }
658 root 1.31 }
659 root 1.11 }
660 root 1.31 });
661     });
662     };
663 root 1.3
664 root 1.31 # now handle proxy-CONNECT method
665     if ($proxy && $uscheme eq "https") {
666     # oh dear, we have to wrap it into a connect request
667    
668     # maybe re-use $uauthority with patched port?
669     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
670 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
671 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
672 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
673 root 1.31
674     if ($2 == 200) {
675     $rpath = $upath;
676     &$handle_actual_request;
677 root 1.3 } else {
678 root 1.31 %state = ();
679     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
680 root 1.3 }
681 root 1.11 });
682 root 1.31 } else {
683     &$handle_actual_request;
684     }
685    
686 root 1.11 }, sub {
687     $timeout
688     };
689 root 1.1 };
690    
691     defined wantarray && AnyEvent::Util::guard { %state = () }
692     }
693    
694 elmex 1.15 sub http_get($@) {
695 root 1.1 unshift @_, "GET";
696     &http_request
697     }
698    
699 elmex 1.15 sub http_head($@) {
700 root 1.4 unshift @_, "HEAD";
701     &http_request
702     }
703    
704 elmex 1.15 sub http_post($$@) {
705 root 1.22 my $url = shift;
706     unshift @_, "POST", $url, "body";
707 root 1.3 &http_request
708     }
709    
710 root 1.9 =back
711    
712 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
713 root 1.1
714     =over 4
715    
716 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
717    
718     Sets the default proxy server to use. The proxy-url must begin with a
719     string of the form C<http://host:port> (optionally C<https:...>).
720    
721 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
722 root 1.1
723 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
724 root 1.1
725     =item $AnyEvent::HTTP::USERAGENT
726    
727     The default value for the C<User-Agent> header (the default is
728 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
729 root 1.1
730     =item $AnyEvent::HTTP::MAX_PERSISTENT
731    
732     The maximum number of persistent connections to keep open (default: 8).
733    
734 root 1.3 Not implemented currently.
735    
736 root 1.1 =item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
737    
738 root 1.2 The maximum time to cache a persistent connection, in seconds (default: 2).
739 root 1.1
740 root 1.3 Not implemented currently.
741    
742 root 1.14 =item $AnyEvent::HTTP::ACTIVE
743    
744     The number of active connections. This is not the number of currently
745     running requests, but the number of currently open and non-idle TCP
746     connections. This number of can be useful for load-leveling.
747    
748 root 1.1 =back
749    
750     =cut
751    
752 root 1.2 sub set_proxy($) {
753     $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
754     }
755    
756     # initialise proxy from environment
757     set_proxy $ENV{http_proxy};
758    
759 root 1.1 =head1 SEE ALSO
760    
761     L<AnyEvent>.
762    
763     =head1 AUTHOR
764    
765 root 1.18 Marc Lehmann <schmorp@schmorp.de>
766     http://home.schmorp.de/
767 root 1.1
768 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
769     testcases and bugreports.
770    
771 root 1.1 =cut
772    
773     1
774