ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.53
Committed: Sat Dec 5 15:37:07 2009 UTC (14 years, 5 months ago) by root
Branch: MAIN
CVS Tags: rel-1_44
Changes since 1.52: +5 -3 lines
Log Message:
1.44

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.51 use AnyEvent 5.0 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.53 our $VERSION = '1.44';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143 root 1.47 and will provide defaults for C<User-Agent:> and C<Referer:> (this can be
144     suppressed by using C<undef> for these headers in which case they won't be
145 root 1.49 sent at all).
146 root 1.1
147     =item timeout => $seconds
148    
149     The time-out to use for various stages - each connect attempt will reset
150 root 1.51 the timeout, as will read or write activity, i.e. this is not an overall
151     timeout.
152    
153     Default timeout is 5 minutes.
154 root 1.2
155     =item proxy => [$host, $port[, $scheme]] or undef
156    
157     Use the given http proxy for all requests. If not specified, then the
158     default proxy (as specified by C<$ENV{http_proxy}>) is used.
159    
160 root 1.47 C<$scheme> must be either missing, C<http> for HTTP or C<https> for
161 root 1.2 HTTPS.
162 root 1.1
163 root 1.3 =item body => $string
164    
165     The request body, usually empty. Will be-sent as-is (future versions of
166     this module might offer more options).
167    
168 root 1.10 =item cookie_jar => $hash_ref
169    
170     Passing this parameter enables (simplified) cookie-processing, loosely
171     based on the original netscape specification.
172    
173     The C<$hash_ref> must be an (initially empty) hash reference which will
174     get updated automatically. It is possible to save the cookie_jar to
175     persistent storage with something like JSON or Storable, but this is not
176 root 1.40 recommended, as expiry times are currently being ignored.
177 root 1.10
178     Note that this cookie implementation is not of very high quality, nor
179     meant to be complete. If you want complete cookie management you have to
180     do that on your own. C<cookie_jar> is meant as a quick fix to get some
181     cookie-using sites working. Cookies are a privacy disaster, do not use
182     them unless required to.
183    
184 root 1.40 =item tls_ctx => $scheme | $tls_ctx
185    
186     Specifies the AnyEvent::TLS context to be used for https connections. This
187     parameter follows the same rules as the C<tls_ctx> parameter to
188     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
189     C<high> can be specified, which give you a predefined low-security (no
190     verification, highest compatibility) and high-security (CA and common-name
191     verification) TLS context.
192    
193     The default for this option is C<low>, which could be interpreted as "give
194     me the page, no matter what".
195    
196 root 1.51 =item on_prepare => $callback->($fh)
197    
198     In rare cases you need to "tune" the socket before it is used to
199     connect (for exmaple, to bind it on a given IP address). This parameter
200     overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
201     and behaves exactly the same way (e.g. it has to provide a
202     timeout). See the description for the C<$prepare_cb> argument of
203     C<AnyEvent::Socket::tcp_connect> for details.
204    
205 root 1.42 =item on_header => $callback->($headers)
206 root 1.41
207     When specified, this callback will be called with the header hash as soon
208     as headers have been successfully received from the remote server (not on
209     locally-generated errors).
210    
211     It has to return either true (in which case AnyEvent::HTTP will continue),
212     or false, in which case AnyEvent::HTTP will cancel the download (and call
213     the finish callback with an error code of C<598>).
214    
215     This callback is useful, among other things, to quickly reject unwanted
216     content, which, if it is supposed to be rare, can be faster than first
217     doing a C<HEAD> request.
218    
219 root 1.42 Example: cancel the request unless the content-type is "text/html".
220 root 1.41
221 root 1.42 on_header => sub {
222     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
223     },
224 root 1.41
225 root 1.42 =item on_body => $callback->($partial_body, $headers)
226 root 1.41
227 root 1.42 When specified, all body data will be passed to this callback instead of
228     to the completion callback. The completion callback will get the empty
229     string instead of the body data.
230 root 1.41
231 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
232     or false, in which case AnyEvent::HTTP will cancel the download (and call
233     the completion callback with an error code of C<598>).
234    
235     This callback is useful when the data is too large to be held in memory
236     (so the callback writes it to a file) or when only some information should
237     be extracted, or when the body should be processed incrementally.
238 root 1.41
239     It is usually preferred over doing your own body handling via
240 root 1.45 C<want_body_handle>, but in case of streaming APIs, where HTTP is
241     only used to create a connection, C<want_body_handle> is the better
242     alternative, as it allows you to install your own event handler, reducing
243     resource usage.
244 root 1.41
245     =item want_body_handle => $enable
246    
247     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
248     changes considerably: after parsing the headers, and instead of
249     downloading the body (if any), the completion callback will be
250     called. Instead of the C<$body> argument containing the body data, the
251     callback will receive the L<AnyEvent::Handle> object associated with the
252     connection. In error cases, C<undef> will be passed. When there is no body
253     (e.g. status C<304>), the empty string will be passed.
254    
255     The handle object might or might not be in TLS mode, might be connected to
256     a proxy, be a persistent connection etc., and configured in unspecified
257     ways. The user is responsible for this handle (it will not be used by this
258     module anymore).
259    
260     This is useful with some push-type services, where, after the initial
261     headers, an interactive protocol is used (typical example would be the
262     push-style twitter API which starts a JSON/XML stream).
263    
264     If you think you need this, first have a look at C<on_body>, to see if
265 root 1.45 that doesn't solve your problem in a better way.
266 root 1.41
267 root 1.1 =back
268    
269 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
270    
271     http_request GET => "http://www.nethype.de/", sub {
272     my ($body, $hdr) = @_;
273     print "$body\n";
274     };
275    
276     Example: make a HTTP HEAD request on https://www.google.com/, use a
277     timeout of 30 seconds.
278    
279     http_request
280     GET => "https://www.google.com",
281     timeout => 30,
282     sub {
283     my ($body, $hdr) = @_;
284     use Data::Dumper;
285     print Dumper $hdr;
286     }
287     ;
288 root 1.1
289 root 1.29 Example: make another simple HTTP GET request, but immediately try to
290     cancel it.
291    
292     my $request = http_request GET => "http://www.nethype.de/", sub {
293     my ($body, $hdr) = @_;
294     print "$body\n";
295     };
296    
297     undef $request;
298    
299 root 1.1 =cut
300    
301 root 1.12 sub _slot_schedule;
302 root 1.11 sub _slot_schedule($) {
303     my $host = shift;
304    
305     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
306     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
307 root 1.12 # somebody wants that slot
308 root 1.11 ++$CO_SLOT{$host}[0];
309 root 1.14 ++$ACTIVE;
310 root 1.11
311     $cb->(AnyEvent::Util::guard {
312 root 1.14 --$ACTIVE;
313 root 1.11 --$CO_SLOT{$host}[0];
314     _slot_schedule $host;
315     });
316     } else {
317     # nobody wants the slot, maybe we can forget about it
318     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
319     last;
320     }
321     }
322     }
323    
324     # wait for a free slot on host, call callback
325     sub _get_slot($$) {
326     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
327    
328     _slot_schedule $_[0];
329     }
330    
331 root 1.46 our $qr_nl = qr{\015?\012};
332     our $qr_nlnl = qr{(?<![^\012])\015?\012};
333 root 1.34
334 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
335     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
336 root 1.40
337 elmex 1.15 sub http_request($$@) {
338 root 1.1 my $cb = pop;
339     my ($method, $url, %arg) = @_;
340    
341     my %hdr;
342    
343 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
344     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
345    
346 root 1.3 $method = uc $method;
347    
348 root 1.8 if (my $hdr = $arg{headers}) {
349 root 1.1 while (my ($k, $v) = each %$hdr) {
350     $hdr{lc $k} = $v;
351     }
352     }
353    
354 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
355 root 1.8
356 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
357 root 1.8 if $recurse < 0;
358    
359 root 1.2 my $proxy = $arg{proxy} || $PROXY;
360 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
361    
362 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
363 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
364 root 1.2
365 root 1.31 $uscheme = lc $uscheme;
366 root 1.1
367 root 1.31 my $uport = $uscheme eq "http" ? 80
368     : $uscheme eq "https" ? 443
369 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
370 root 1.13
371 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
372 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
373 root 1.10
374     my $uhost = $1;
375     $uport = $2 if defined $2;
376    
377 root 1.53 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
378     unless exists $hdr{host};
379 root 1.43
380 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
381     $upath .= "?$query" if length $query;
382    
383     $upath =~ s%^/?%/%;
384    
385     # cookie processing
386     if (my $jar = $arg{cookie_jar}) {
387 root 1.31 %$jar = () if $jar->{version} != 1;
388 root 1.10
389     my @cookie;
390    
391     while (my ($chost, $v) = each %$jar) {
392 root 1.30 if ($chost =~ /^\./) {
393     next unless $chost eq substr $uhost, -length $chost;
394     } elsif ($chost =~ /\./) {
395     next unless $chost eq $uhost;
396     } else {
397     next;
398     }
399 root 1.10
400     while (my ($cpath, $v) = each %$v) {
401     next unless $cpath eq substr $upath, 0, length $cpath;
402    
403     while (my ($k, $v) = each %$v) {
404 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
405     my $value = $v->{value};
406     $value =~ s/([\\"])/\\$1/g;
407     push @cookie, "$k=\"$value\"";
408 root 1.10 }
409     }
410     }
411    
412     $hdr{cookie} = join "; ", @cookie
413     if @cookie;
414     }
415 root 1.1
416 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
417 root 1.2
418 root 1.10 if ($proxy) {
419 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
420 root 1.31
421 root 1.47 $rscheme = "http" unless defined $rscheme;
422    
423 root 1.31 # don't support https requests over https-proxy transport,
424 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
425 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
426 root 1.10 } else {
427 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
428 root 1.2 }
429    
430 root 1.47 # leave out fragment and query string, just a heuristic
431     $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer};
432     $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"};
433 root 1.41
434 root 1.53 $hdr{"content-length"} = length $arg{body}
435     if length $arg{body} || $method ne "GET";
436 root 1.1
437 root 1.11 my %state = (connect_guard => 1);
438    
439     _get_slot $uhost, sub {
440     $state{slot_guard} = shift;
441 root 1.1
442 root 1.11 return unless $state{connect_guard};
443 root 1.1
444 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
445     $state{fh} = shift
446 root 1.44 or do {
447     my $err = "$!";
448     %state = ();
449     return $cb->(undef, { Status => 599, Reason => $err, URL => $url });
450     };
451    
452 root 1.34 pop; # free memory, save a tree
453 root 1.11
454 root 1.34 return unless delete $state{connect_guard};
455 root 1.11
456     # get handle
457     $state{handle} = new AnyEvent::Handle
458 root 1.40 fh => $state{fh},
459     timeout => $timeout,
460     peername => $rhost,
461     tls_ctx => $arg{tls_ctx};
462 root 1.11
463     # limit the number of persistent connections
464 root 1.34 # keepalive not yet supported
465 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
466     ++$KA_COUNT{$_[1]};
467 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
468     --$KA_COUNT{$_[1]}
469     };
470 root 1.11 $hdr{connection} = "keep-alive";
471     } else {
472     delete $hdr{connection};
473     }
474 root 1.1
475 root 1.11 # (re-)configure handle
476     $state{handle}->on_error (sub {
477     %state = ();
478 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
479 root 1.11 });
480     $state{handle}->on_eof (sub {
481     %state = ();
482 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
483 root 1.11 });
484 root 1.1
485 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
486    
487     # handle actual, non-tunneled, request
488     my $handle_actual_request = sub {
489 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
490 root 1.31
491     # send request
492     $state{handle}->push_write (
493     "$method $rpath HTTP/1.0\015\012"
494 root 1.47 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
495 root 1.31 . "\015\012"
496     . (delete $arg{body})
497 root 1.11 );
498    
499 root 1.31 %hdr = (); # reduce memory usage, save a kitten
500 root 1.10
501 root 1.31 # status line
502 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
503 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
504 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
505 root 1.31
506     my %hdr = ( # response headers
507     HTTPVersion => ",$1",
508     Status => ",$2",
509     Reason => ",$3",
510     URL => ",$url"
511     );
512    
513     # headers, could be optimized a bit
514 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
515 root 1.46 for ("$_[1]") {
516 root 1.31 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
517    
518 root 1.40 # things seen, not parsed:
519     # p3pP="NON CUR OTPi OUR NOR UNI"
520    
521 root 1.31 $hdr{lc $1} .= ",$2"
522     while /\G
523 root 1.43 ([^:\000-\037]*):
524 root 1.31 [\011\040]*
525     ((?: [^\012]+ | \012[\011\040] )*)
526     \012
527     /gxc;
528    
529     /\G$/
530 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
531 root 1.31 }
532    
533     substr $_, 0, 1, ""
534     for values %hdr;
535    
536 root 1.41 # redirect handling
537     # microsoft and other shitheads don't give a shit for following standards,
538     # try to support some common forms of broken Location headers.
539     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
540     $hdr{location} =~ s/^\.\/+//;
541    
542     my $url = "$rscheme://$uhost:$uport";
543    
544     unless ($hdr{location} =~ s/^\///) {
545     $url .= $upath;
546     $url =~ s/\/[^\/]*$//;
547     }
548    
549     $hdr{location} = "$url/$hdr{location}";
550     }
551    
552     my $redirect;
553    
554     if ($recurse) {
555     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
556     # apparently, mozilla et al. just change POST to GET here
557     # more research is needed before we do the same
558     $redirect = 1;
559     } elsif ($hdr{Status} == 303) {
560     # even http/1.1 is unclear on how to mutate the method
561     $method = "GET" unless $method eq "HEAD";
562     $redirect = 1;
563     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
564     $redirect = 1;
565     }
566     }
567    
568 root 1.31 my $finish = sub {
569 root 1.41 $state{handle}->destroy if $state{handle};
570 root 1.31 %state = ();
571    
572     # set-cookie processing
573     if ($arg{cookie_jar}) {
574 root 1.41 for ($_[1]{"set-cookie"}) {
575 root 1.31 # parse NAME=VALUE
576     my @kv;
577    
578     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
579     my $name = $1;
580     my $value = $3;
581    
582     unless ($value) {
583     $value = $2;
584     $value =~ s/\\(.)/$1/gs;
585     }
586    
587     push @kv, $name => $value;
588    
589     last unless /\G\s*;/gc;
590     }
591    
592     last unless @kv;
593 root 1.10
594 root 1.31 my $name = shift @kv;
595     my %kv = (value => shift @kv, @kv);
596 root 1.11
597 root 1.31 my $cdom;
598     my $cpath = (delete $kv{path}) || "/";
599 root 1.10
600 root 1.31 if (exists $kv{domain}) {
601     $cdom = delete $kv{domain};
602    
603     $cdom =~ s/^\.?/./; # make sure it starts with a "."
604 root 1.11
605 root 1.31 next if $cdom =~ /\.$/;
606    
607     # this is not rfc-like and not netscape-like. go figure.
608     my $ndots = $cdom =~ y/.//;
609     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
610     } else {
611     $cdom = $uhost;
612     }
613 root 1.30
614 root 1.31 # store it
615     $arg{cookie_jar}{version} = 1;
616     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
617    
618     redo if /\G\s*,/gc;
619 root 1.30 }
620 root 1.11 }
621 root 1.8
622 root 1.47 if ($redirect && exists $hdr{location}) {
623 root 1.41 # we ignore any errors, as it is very common to receive
624     # Content-Length != 0 but no actual body
625     # we also access %hdr, as $_[1] might be an erro
626     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
627 root 1.31 } else {
628     $cb->($_[0], $_[1]);
629     }
630     };
631 root 1.24
632 root 1.41 my $len = $hdr{"content-length"};
633    
634     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
635     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
636     } elsif (
637 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
638 root 1.41 or $method eq "HEAD"
639     or (defined $len && !$len)
640     ) {
641     # no body
642     $finish->("", \%hdr);
643 root 1.11 } else {
644 root 1.41 # body handling, four different code paths
645     # for want_body_handle, on_body (2x), normal (2x)
646     # we might read too much here, but it does not matter yet (no pers. connections)
647     if (!$redirect && $arg{want_body_handle}) {
648     $_[0]->on_eof (undef);
649     $_[0]->on_error (undef);
650     $_[0]->on_read (undef);
651    
652     $finish->(delete $state{handle}, \%hdr);
653 root 1.31
654 root 1.41 } elsif ($arg{on_body}) {
655     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
656     if ($len) {
657     $_[0]->on_eof (undef);
658     $_[0]->on_read (sub {
659     $len -= length $_[0]{rbuf};
660    
661     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
662     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
663    
664     $len > 0
665     or $finish->("", \%hdr);
666     });
667     } else {
668     $_[0]->on_eof (sub {
669     $finish->("", \%hdr);
670     });
671     $_[0]->on_read (sub {
672     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
673     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
674     });
675     }
676 root 1.31 } else {
677     $_[0]->on_eof (undef);
678 root 1.41
679     if ($len) {
680     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
681     $_[0]->on_read (sub {
682     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
683     if $len <= length $_[0]{rbuf};
684     });
685     } else {
686     $_[0]->on_error (sub {
687 root 1.43 $! == Errno::EPIPE || !$!
688 root 1.41 ? $finish->(delete $_[0]{rbuf}, \%hdr)
689     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
690     });
691     $_[0]->on_read (sub { });
692     }
693 root 1.31 }
694 root 1.11 }
695 root 1.31 });
696     });
697     };
698 root 1.3
699 root 1.31 # now handle proxy-CONNECT method
700     if ($proxy && $uscheme eq "https") {
701     # oh dear, we have to wrap it into a connect request
702    
703     # maybe re-use $uauthority with patched port?
704     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
705 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
706 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
707 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
708 root 1.31
709     if ($2 == 200) {
710     $rpath = $upath;
711     &$handle_actual_request;
712 root 1.3 } else {
713 root 1.31 %state = ();
714     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
715 root 1.3 }
716 root 1.11 });
717 root 1.31 } else {
718     &$handle_actual_request;
719     }
720    
721 root 1.50 }, $arg{on_prepare} || sub { $timeout };
722 root 1.1 };
723    
724     defined wantarray && AnyEvent::Util::guard { %state = () }
725     }
726    
727 elmex 1.15 sub http_get($@) {
728 root 1.1 unshift @_, "GET";
729     &http_request
730     }
731    
732 elmex 1.15 sub http_head($@) {
733 root 1.4 unshift @_, "HEAD";
734     &http_request
735     }
736    
737 elmex 1.15 sub http_post($$@) {
738 root 1.22 my $url = shift;
739     unshift @_, "POST", $url, "body";
740 root 1.3 &http_request
741     }
742    
743 root 1.9 =back
744    
745 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
746 root 1.1
747     =over 4
748    
749 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
750    
751     Sets the default proxy server to use. The proxy-url must begin with a
752 root 1.52 string of the form C<http://host:port> (optionally C<https:...>), croaks
753     otherwise.
754    
755     To clear an already-set proxy, use C<undef>.
756 root 1.2
757 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
758 root 1.1
759 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
760 root 1.1
761     =item $AnyEvent::HTTP::USERAGENT
762    
763     The default value for the C<User-Agent> header (the default is
764 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
765 root 1.1
766 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
767 root 1.1
768 root 1.47 The maximum number of concurrent connections to the same host (identified
769 root 1.43 by the hostname). If the limit is exceeded, then the additional requests
770     are queued until previous connections are closed.
771 root 1.1
772 root 1.43 The default value for this is C<4>, and it is highly advisable to not
773     increase it.
774 root 1.3
775 root 1.14 =item $AnyEvent::HTTP::ACTIVE
776    
777     The number of active connections. This is not the number of currently
778     running requests, but the number of currently open and non-idle TCP
779     connections. This number of can be useful for load-leveling.
780    
781 root 1.1 =back
782    
783     =cut
784    
785 root 1.2 sub set_proxy($) {
786 root 1.52 if (length $_[0]) {
787     $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix
788     or Carp::croak "$_[0]: invalid proxy URL";
789     $PROXY = [$2, $3 || 3128, $1]
790     } else {
791     undef $PROXY;
792     }
793 root 1.2 }
794    
795     # initialise proxy from environment
796 root 1.52 eval {
797     set_proxy $ENV{http_proxy};
798     };
799 root 1.2
800 root 1.1 =head1 SEE ALSO
801    
802     L<AnyEvent>.
803    
804     =head1 AUTHOR
805    
806 root 1.18 Marc Lehmann <schmorp@schmorp.de>
807     http://home.schmorp.de/
808 root 1.1
809 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
810     testcases and bugreports.
811    
812 root 1.1 =cut
813    
814     1
815