ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.51
Committed: Fri Aug 14 15:21:33 2009 UTC (14 years, 9 months ago) by root
Branch: MAIN
CVS Tags: rel-1_43
Changes since 1.50: +15 -3 lines
Log Message:
1.43

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.51 use AnyEvent 5.0 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.51 our $VERSION = '1.43';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143 root 1.47 and will provide defaults for C<User-Agent:> and C<Referer:> (this can be
144     suppressed by using C<undef> for these headers in which case they won't be
145 root 1.49 sent at all).
146 root 1.1
147     =item timeout => $seconds
148    
149     The time-out to use for various stages - each connect attempt will reset
150 root 1.51 the timeout, as will read or write activity, i.e. this is not an overall
151     timeout.
152    
153     Default timeout is 5 minutes.
154 root 1.2
155     =item proxy => [$host, $port[, $scheme]] or undef
156    
157     Use the given http proxy for all requests. If not specified, then the
158     default proxy (as specified by C<$ENV{http_proxy}>) is used.
159    
160 root 1.47 C<$scheme> must be either missing, C<http> for HTTP or C<https> for
161 root 1.2 HTTPS.
162 root 1.1
163 root 1.3 =item body => $string
164    
165     The request body, usually empty. Will be-sent as-is (future versions of
166     this module might offer more options).
167    
168 root 1.10 =item cookie_jar => $hash_ref
169    
170     Passing this parameter enables (simplified) cookie-processing, loosely
171     based on the original netscape specification.
172    
173     The C<$hash_ref> must be an (initially empty) hash reference which will
174     get updated automatically. It is possible to save the cookie_jar to
175     persistent storage with something like JSON or Storable, but this is not
176 root 1.40 recommended, as expiry times are currently being ignored.
177 root 1.10
178     Note that this cookie implementation is not of very high quality, nor
179     meant to be complete. If you want complete cookie management you have to
180     do that on your own. C<cookie_jar> is meant as a quick fix to get some
181     cookie-using sites working. Cookies are a privacy disaster, do not use
182     them unless required to.
183    
184 root 1.40 =item tls_ctx => $scheme | $tls_ctx
185    
186     Specifies the AnyEvent::TLS context to be used for https connections. This
187     parameter follows the same rules as the C<tls_ctx> parameter to
188     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
189     C<high> can be specified, which give you a predefined low-security (no
190     verification, highest compatibility) and high-security (CA and common-name
191     verification) TLS context.
192    
193     The default for this option is C<low>, which could be interpreted as "give
194     me the page, no matter what".
195    
196 root 1.51 =item on_prepare => $callback->($fh)
197    
198     In rare cases you need to "tune" the socket before it is used to
199     connect (for exmaple, to bind it on a given IP address). This parameter
200     overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
201     and behaves exactly the same way (e.g. it has to provide a
202     timeout). See the description for the C<$prepare_cb> argument of
203     C<AnyEvent::Socket::tcp_connect> for details.
204    
205 root 1.42 =item on_header => $callback->($headers)
206 root 1.41
207     When specified, this callback will be called with the header hash as soon
208     as headers have been successfully received from the remote server (not on
209     locally-generated errors).
210    
211     It has to return either true (in which case AnyEvent::HTTP will continue),
212     or false, in which case AnyEvent::HTTP will cancel the download (and call
213     the finish callback with an error code of C<598>).
214    
215     This callback is useful, among other things, to quickly reject unwanted
216     content, which, if it is supposed to be rare, can be faster than first
217     doing a C<HEAD> request.
218    
219 root 1.42 Example: cancel the request unless the content-type is "text/html".
220 root 1.41
221 root 1.42 on_header => sub {
222     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
223     },
224 root 1.41
225 root 1.42 =item on_body => $callback->($partial_body, $headers)
226 root 1.41
227 root 1.42 When specified, all body data will be passed to this callback instead of
228     to the completion callback. The completion callback will get the empty
229     string instead of the body data.
230 root 1.41
231 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
232     or false, in which case AnyEvent::HTTP will cancel the download (and call
233     the completion callback with an error code of C<598>).
234    
235     This callback is useful when the data is too large to be held in memory
236     (so the callback writes it to a file) or when only some information should
237     be extracted, or when the body should be processed incrementally.
238 root 1.41
239     It is usually preferred over doing your own body handling via
240 root 1.45 C<want_body_handle>, but in case of streaming APIs, where HTTP is
241     only used to create a connection, C<want_body_handle> is the better
242     alternative, as it allows you to install your own event handler, reducing
243     resource usage.
244 root 1.41
245     =item want_body_handle => $enable
246    
247     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
248     changes considerably: after parsing the headers, and instead of
249     downloading the body (if any), the completion callback will be
250     called. Instead of the C<$body> argument containing the body data, the
251     callback will receive the L<AnyEvent::Handle> object associated with the
252     connection. In error cases, C<undef> will be passed. When there is no body
253     (e.g. status C<304>), the empty string will be passed.
254    
255     The handle object might or might not be in TLS mode, might be connected to
256     a proxy, be a persistent connection etc., and configured in unspecified
257     ways. The user is responsible for this handle (it will not be used by this
258     module anymore).
259    
260     This is useful with some push-type services, where, after the initial
261     headers, an interactive protocol is used (typical example would be the
262     push-style twitter API which starts a JSON/XML stream).
263    
264     If you think you need this, first have a look at C<on_body>, to see if
265 root 1.45 that doesn't solve your problem in a better way.
266 root 1.41
267 root 1.1 =back
268    
269 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
270    
271     http_request GET => "http://www.nethype.de/", sub {
272     my ($body, $hdr) = @_;
273     print "$body\n";
274     };
275    
276     Example: make a HTTP HEAD request on https://www.google.com/, use a
277     timeout of 30 seconds.
278    
279     http_request
280     GET => "https://www.google.com",
281     timeout => 30,
282     sub {
283     my ($body, $hdr) = @_;
284     use Data::Dumper;
285     print Dumper $hdr;
286     }
287     ;
288 root 1.1
289 root 1.29 Example: make another simple HTTP GET request, but immediately try to
290     cancel it.
291    
292     my $request = http_request GET => "http://www.nethype.de/", sub {
293     my ($body, $hdr) = @_;
294     print "$body\n";
295     };
296    
297     undef $request;
298    
299 root 1.1 =cut
300    
301 root 1.12 sub _slot_schedule;
302 root 1.11 sub _slot_schedule($) {
303     my $host = shift;
304    
305     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
306     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
307 root 1.12 # somebody wants that slot
308 root 1.11 ++$CO_SLOT{$host}[0];
309 root 1.14 ++$ACTIVE;
310 root 1.11
311     $cb->(AnyEvent::Util::guard {
312 root 1.14 --$ACTIVE;
313 root 1.11 --$CO_SLOT{$host}[0];
314     _slot_schedule $host;
315     });
316     } else {
317     # nobody wants the slot, maybe we can forget about it
318     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
319     last;
320     }
321     }
322     }
323    
324     # wait for a free slot on host, call callback
325     sub _get_slot($$) {
326     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
327    
328     _slot_schedule $_[0];
329     }
330    
331 root 1.46 our $qr_nl = qr{\015?\012};
332     our $qr_nlnl = qr{(?<![^\012])\015?\012};
333 root 1.34
334 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
335     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
336 root 1.40
337 elmex 1.15 sub http_request($$@) {
338 root 1.1 my $cb = pop;
339     my ($method, $url, %arg) = @_;
340    
341     my %hdr;
342    
343 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
344     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
345    
346 root 1.3 $method = uc $method;
347    
348 root 1.8 if (my $hdr = $arg{headers}) {
349 root 1.1 while (my ($k, $v) = each %$hdr) {
350     $hdr{lc $k} = $v;
351     }
352     }
353    
354 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
355 root 1.8
356 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
357 root 1.8 if $recurse < 0;
358    
359 root 1.2 my $proxy = $arg{proxy} || $PROXY;
360 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
361    
362 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
363 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
364 root 1.2
365 root 1.31 $uscheme = lc $uscheme;
366 root 1.1
367 root 1.31 my $uport = $uscheme eq "http" ? 80
368     : $uscheme eq "https" ? 443
369 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
370 root 1.13
371 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
372 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
373 root 1.10
374     my $uhost = $1;
375     $uport = $2 if defined $2;
376    
377 root 1.43 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost";
378    
379 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
380     $upath .= "?$query" if length $query;
381    
382     $upath =~ s%^/?%/%;
383    
384     # cookie processing
385     if (my $jar = $arg{cookie_jar}) {
386 root 1.31 %$jar = () if $jar->{version} != 1;
387 root 1.10
388     my @cookie;
389    
390     while (my ($chost, $v) = each %$jar) {
391 root 1.30 if ($chost =~ /^\./) {
392     next unless $chost eq substr $uhost, -length $chost;
393     } elsif ($chost =~ /\./) {
394     next unless $chost eq $uhost;
395     } else {
396     next;
397     }
398 root 1.10
399     while (my ($cpath, $v) = each %$v) {
400     next unless $cpath eq substr $upath, 0, length $cpath;
401    
402     while (my ($k, $v) = each %$v) {
403 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
404     my $value = $v->{value};
405     $value =~ s/([\\"])/\\$1/g;
406     push @cookie, "$k=\"$value\"";
407 root 1.10 }
408     }
409     }
410    
411     $hdr{cookie} = join "; ", @cookie
412     if @cookie;
413     }
414 root 1.1
415 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
416 root 1.2
417 root 1.10 if ($proxy) {
418 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
419 root 1.31
420 root 1.47 $rscheme = "http" unless defined $rscheme;
421    
422 root 1.31 # don't support https requests over https-proxy transport,
423 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
424 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
425 root 1.10 } else {
426 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
427 root 1.2 }
428    
429 root 1.47 # leave out fragment and query string, just a heuristic
430     $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer};
431     $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"};
432 root 1.41
433 root 1.10 $hdr{"content-length"} = length $arg{body};
434 root 1.1
435 root 1.11 my %state = (connect_guard => 1);
436    
437     _get_slot $uhost, sub {
438     $state{slot_guard} = shift;
439 root 1.1
440 root 1.11 return unless $state{connect_guard};
441 root 1.1
442 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
443     $state{fh} = shift
444 root 1.44 or do {
445     my $err = "$!";
446     %state = ();
447     return $cb->(undef, { Status => 599, Reason => $err, URL => $url });
448     };
449    
450 root 1.34 pop; # free memory, save a tree
451 root 1.11
452 root 1.34 return unless delete $state{connect_guard};
453 root 1.11
454     # get handle
455     $state{handle} = new AnyEvent::Handle
456 root 1.40 fh => $state{fh},
457     timeout => $timeout,
458     peername => $rhost,
459     tls_ctx => $arg{tls_ctx};
460 root 1.11
461     # limit the number of persistent connections
462 root 1.34 # keepalive not yet supported
463 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
464     ++$KA_COUNT{$_[1]};
465 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
466     --$KA_COUNT{$_[1]}
467     };
468 root 1.11 $hdr{connection} = "keep-alive";
469     } else {
470     delete $hdr{connection};
471     }
472 root 1.1
473 root 1.11 # (re-)configure handle
474     $state{handle}->on_error (sub {
475     %state = ();
476 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
477 root 1.11 });
478     $state{handle}->on_eof (sub {
479     %state = ();
480 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
481 root 1.11 });
482 root 1.1
483 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
484    
485     # handle actual, non-tunneled, request
486     my $handle_actual_request = sub {
487 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
488 root 1.31
489     # send request
490     $state{handle}->push_write (
491     "$method $rpath HTTP/1.0\015\012"
492 root 1.47 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
493 root 1.31 . "\015\012"
494     . (delete $arg{body})
495 root 1.11 );
496    
497 root 1.31 %hdr = (); # reduce memory usage, save a kitten
498 root 1.10
499 root 1.31 # status line
500 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
501 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
502 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
503 root 1.31
504     my %hdr = ( # response headers
505     HTTPVersion => ",$1",
506     Status => ",$2",
507     Reason => ",$3",
508     URL => ",$url"
509     );
510    
511     # headers, could be optimized a bit
512 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
513 root 1.46 for ("$_[1]") {
514 root 1.31 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
515    
516 root 1.40 # things seen, not parsed:
517     # p3pP="NON CUR OTPi OUR NOR UNI"
518    
519 root 1.31 $hdr{lc $1} .= ",$2"
520     while /\G
521 root 1.43 ([^:\000-\037]*):
522 root 1.31 [\011\040]*
523     ((?: [^\012]+ | \012[\011\040] )*)
524     \012
525     /gxc;
526    
527     /\G$/
528 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
529 root 1.31 }
530    
531     substr $_, 0, 1, ""
532     for values %hdr;
533    
534 root 1.41 # redirect handling
535     # microsoft and other shitheads don't give a shit for following standards,
536     # try to support some common forms of broken Location headers.
537     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
538     $hdr{location} =~ s/^\.\/+//;
539    
540     my $url = "$rscheme://$uhost:$uport";
541    
542     unless ($hdr{location} =~ s/^\///) {
543     $url .= $upath;
544     $url =~ s/\/[^\/]*$//;
545     }
546    
547     $hdr{location} = "$url/$hdr{location}";
548     }
549    
550     my $redirect;
551    
552     if ($recurse) {
553     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
554     # apparently, mozilla et al. just change POST to GET here
555     # more research is needed before we do the same
556     $redirect = 1;
557     } elsif ($hdr{Status} == 303) {
558     # even http/1.1 is unclear on how to mutate the method
559     $method = "GET" unless $method eq "HEAD";
560     $redirect = 1;
561     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
562     $redirect = 1;
563     }
564     }
565    
566 root 1.31 my $finish = sub {
567 root 1.41 $state{handle}->destroy if $state{handle};
568 root 1.31 %state = ();
569    
570     # set-cookie processing
571     if ($arg{cookie_jar}) {
572 root 1.41 for ($_[1]{"set-cookie"}) {
573 root 1.31 # parse NAME=VALUE
574     my @kv;
575    
576     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
577     my $name = $1;
578     my $value = $3;
579    
580     unless ($value) {
581     $value = $2;
582     $value =~ s/\\(.)/$1/gs;
583     }
584    
585     push @kv, $name => $value;
586    
587     last unless /\G\s*;/gc;
588     }
589    
590     last unless @kv;
591 root 1.10
592 root 1.31 my $name = shift @kv;
593     my %kv = (value => shift @kv, @kv);
594 root 1.11
595 root 1.31 my $cdom;
596     my $cpath = (delete $kv{path}) || "/";
597 root 1.10
598 root 1.31 if (exists $kv{domain}) {
599     $cdom = delete $kv{domain};
600    
601     $cdom =~ s/^\.?/./; # make sure it starts with a "."
602 root 1.11
603 root 1.31 next if $cdom =~ /\.$/;
604    
605     # this is not rfc-like and not netscape-like. go figure.
606     my $ndots = $cdom =~ y/.//;
607     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
608     } else {
609     $cdom = $uhost;
610     }
611 root 1.30
612 root 1.31 # store it
613     $arg{cookie_jar}{version} = 1;
614     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
615    
616     redo if /\G\s*,/gc;
617 root 1.30 }
618 root 1.11 }
619 root 1.8
620 root 1.47 if ($redirect && exists $hdr{location}) {
621 root 1.41 # we ignore any errors, as it is very common to receive
622     # Content-Length != 0 but no actual body
623     # we also access %hdr, as $_[1] might be an erro
624     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
625 root 1.31 } else {
626     $cb->($_[0], $_[1]);
627     }
628     };
629 root 1.24
630 root 1.41 my $len = $hdr{"content-length"};
631    
632     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
633     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
634     } elsif (
635 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
636 root 1.41 or $method eq "HEAD"
637     or (defined $len && !$len)
638     ) {
639     # no body
640     $finish->("", \%hdr);
641 root 1.11 } else {
642 root 1.41 # body handling, four different code paths
643     # for want_body_handle, on_body (2x), normal (2x)
644     # we might read too much here, but it does not matter yet (no pers. connections)
645     if (!$redirect && $arg{want_body_handle}) {
646     $_[0]->on_eof (undef);
647     $_[0]->on_error (undef);
648     $_[0]->on_read (undef);
649    
650     $finish->(delete $state{handle}, \%hdr);
651 root 1.31
652 root 1.41 } elsif ($arg{on_body}) {
653     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
654     if ($len) {
655     $_[0]->on_eof (undef);
656     $_[0]->on_read (sub {
657     $len -= length $_[0]{rbuf};
658    
659     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
660     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
661    
662     $len > 0
663     or $finish->("", \%hdr);
664     });
665     } else {
666     $_[0]->on_eof (sub {
667     $finish->("", \%hdr);
668     });
669     $_[0]->on_read (sub {
670     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
671     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
672     });
673     }
674 root 1.31 } else {
675     $_[0]->on_eof (undef);
676 root 1.41
677     if ($len) {
678     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
679     $_[0]->on_read (sub {
680     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
681     if $len <= length $_[0]{rbuf};
682     });
683     } else {
684     $_[0]->on_error (sub {
685 root 1.43 $! == Errno::EPIPE || !$!
686 root 1.41 ? $finish->(delete $_[0]{rbuf}, \%hdr)
687     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
688     });
689     $_[0]->on_read (sub { });
690     }
691 root 1.31 }
692 root 1.11 }
693 root 1.31 });
694     });
695     };
696 root 1.3
697 root 1.31 # now handle proxy-CONNECT method
698     if ($proxy && $uscheme eq "https") {
699     # oh dear, we have to wrap it into a connect request
700    
701     # maybe re-use $uauthority with patched port?
702     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
703 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
704 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
705 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
706 root 1.31
707     if ($2 == 200) {
708     $rpath = $upath;
709     &$handle_actual_request;
710 root 1.3 } else {
711 root 1.31 %state = ();
712     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
713 root 1.3 }
714 root 1.11 });
715 root 1.31 } else {
716     &$handle_actual_request;
717     }
718    
719 root 1.50 }, $arg{on_prepare} || sub { $timeout };
720 root 1.1 };
721    
722     defined wantarray && AnyEvent::Util::guard { %state = () }
723     }
724    
725 elmex 1.15 sub http_get($@) {
726 root 1.1 unshift @_, "GET";
727     &http_request
728     }
729    
730 elmex 1.15 sub http_head($@) {
731 root 1.4 unshift @_, "HEAD";
732     &http_request
733     }
734    
735 elmex 1.15 sub http_post($$@) {
736 root 1.22 my $url = shift;
737     unshift @_, "POST", $url, "body";
738 root 1.3 &http_request
739     }
740    
741 root 1.9 =back
742    
743 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
744 root 1.1
745     =over 4
746    
747 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
748    
749     Sets the default proxy server to use. The proxy-url must begin with a
750     string of the form C<http://host:port> (optionally C<https:...>).
751    
752 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
753 root 1.1
754 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
755 root 1.1
756     =item $AnyEvent::HTTP::USERAGENT
757    
758     The default value for the C<User-Agent> header (the default is
759 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
760 root 1.1
761 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
762 root 1.1
763 root 1.47 The maximum number of concurrent connections to the same host (identified
764 root 1.43 by the hostname). If the limit is exceeded, then the additional requests
765     are queued until previous connections are closed.
766 root 1.1
767 root 1.43 The default value for this is C<4>, and it is highly advisable to not
768     increase it.
769 root 1.3
770 root 1.14 =item $AnyEvent::HTTP::ACTIVE
771    
772     The number of active connections. This is not the number of currently
773     running requests, but the number of currently open and non-idle TCP
774     connections. This number of can be useful for load-leveling.
775    
776 root 1.1 =back
777    
778     =cut
779    
780 root 1.2 sub set_proxy($) {
781     $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
782     }
783    
784     # initialise proxy from environment
785     set_proxy $ENV{http_proxy};
786    
787 root 1.1 =head1 SEE ALSO
788    
789     L<AnyEvent>.
790    
791     =head1 AUTHOR
792    
793 root 1.18 Marc Lehmann <schmorp@schmorp.de>
794     http://home.schmorp.de/
795 root 1.1
796 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
797     testcases and bugreports.
798    
799 root 1.1 =cut
800    
801     1
802