ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.57
Committed: Mon Sep 6 06:31:32 2010 UTC (13 years, 8 months ago) by root
Branch: MAIN
CVS Tags: rel-1_46
Changes since 1.56: +174 -177 lines
Log Message:
1.46

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.51 use AnyEvent 5.0 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.57 our $VERSION = '1.46';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.55 headers, the "pseudo-headers" (uppercase to avoid clashing with possible
105     response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
106     three parts of the HTTP Status-Line of the same name.
107    
108     The pseudo-header C<URL> contains the actual URL (which can differ from
109     the requested URL when following redirects - for example, you might get
110     an error that your URL scheme is not supported even though your URL is a
111     valid http URL because it redirected to an ftp URL, in which case you can
112     look at the URL pseudo header).
113    
114     The pseudo-header C<Redirect> only exists when the request was a result
115     of an internal redirect. In that case it is an array reference with
116     the C<($data, $headers)> from the redirect response. Note that this
117     response could in turn be the result of a redirect itself, and C<<
118     $headers->{Redirect}[1]{Redirect} >> will then contain the original
119     response, and so on.
120 root 1.20
121 root 1.32 If the server sends a header multiple times, then their contents will be
122     joined together with a comma (C<,>), as per the HTTP spec.
123 root 1.2
124     If an internal error occurs, such as not being able to resolve a hostname,
125 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
126     (usually C<599>) and the C<Reason> pseudo-header will contain an error
127     message.
128 root 1.2
129 root 1.6 A typical callback might look like this:
130    
131     sub {
132     my ($body, $hdr) = @_;
133    
134     if ($hdr->{Status} =~ /^2/) {
135     ... everything should be ok
136     } else {
137     print "error, $hdr->{Status} $hdr->{Reason}\n";
138     }
139     }
140    
141 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
142     include:
143    
144     =over 4
145    
146 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
147 root 1.1
148     Whether to recurse requests or not, e.g. on redirects, authentication
149 root 1.3 retries and so on, and how often to do so.
150 root 1.1
151     =item headers => hashref
152    
153 root 1.12 The request headers to use. Currently, C<http_request> may provide its
154     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
155 root 1.47 and will provide defaults for C<User-Agent:> and C<Referer:> (this can be
156     suppressed by using C<undef> for these headers in which case they won't be
157 root 1.49 sent at all).
158 root 1.1
159     =item timeout => $seconds
160    
161     The time-out to use for various stages - each connect attempt will reset
162 root 1.51 the timeout, as will read or write activity, i.e. this is not an overall
163     timeout.
164    
165     Default timeout is 5 minutes.
166 root 1.2
167     =item proxy => [$host, $port[, $scheme]] or undef
168    
169     Use the given http proxy for all requests. If not specified, then the
170     default proxy (as specified by C<$ENV{http_proxy}>) is used.
171    
172 root 1.47 C<$scheme> must be either missing, C<http> for HTTP or C<https> for
173 root 1.2 HTTPS.
174 root 1.1
175 root 1.3 =item body => $string
176    
177     The request body, usually empty. Will be-sent as-is (future versions of
178     this module might offer more options).
179    
180 root 1.10 =item cookie_jar => $hash_ref
181    
182     Passing this parameter enables (simplified) cookie-processing, loosely
183     based on the original netscape specification.
184    
185     The C<$hash_ref> must be an (initially empty) hash reference which will
186     get updated automatically. It is possible to save the cookie_jar to
187     persistent storage with something like JSON or Storable, but this is not
188 root 1.40 recommended, as expiry times are currently being ignored.
189 root 1.10
190     Note that this cookie implementation is not of very high quality, nor
191     meant to be complete. If you want complete cookie management you have to
192     do that on your own. C<cookie_jar> is meant as a quick fix to get some
193     cookie-using sites working. Cookies are a privacy disaster, do not use
194     them unless required to.
195    
196 root 1.40 =item tls_ctx => $scheme | $tls_ctx
197    
198     Specifies the AnyEvent::TLS context to be used for https connections. This
199     parameter follows the same rules as the C<tls_ctx> parameter to
200     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
201     C<high> can be specified, which give you a predefined low-security (no
202     verification, highest compatibility) and high-security (CA and common-name
203     verification) TLS context.
204    
205     The default for this option is C<low>, which could be interpreted as "give
206     me the page, no matter what".
207    
208 root 1.51 =item on_prepare => $callback->($fh)
209    
210     In rare cases you need to "tune" the socket before it is used to
211     connect (for exmaple, to bind it on a given IP address). This parameter
212     overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
213     and behaves exactly the same way (e.g. it has to provide a
214     timeout). See the description for the C<$prepare_cb> argument of
215     C<AnyEvent::Socket::tcp_connect> for details.
216    
217 root 1.42 =item on_header => $callback->($headers)
218 root 1.41
219     When specified, this callback will be called with the header hash as soon
220     as headers have been successfully received from the remote server (not on
221     locally-generated errors).
222    
223     It has to return either true (in which case AnyEvent::HTTP will continue),
224     or false, in which case AnyEvent::HTTP will cancel the download (and call
225     the finish callback with an error code of C<598>).
226    
227     This callback is useful, among other things, to quickly reject unwanted
228     content, which, if it is supposed to be rare, can be faster than first
229     doing a C<HEAD> request.
230    
231 root 1.42 Example: cancel the request unless the content-type is "text/html".
232 root 1.41
233 root 1.42 on_header => sub {
234     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
235     },
236 root 1.41
237 root 1.42 =item on_body => $callback->($partial_body, $headers)
238 root 1.41
239 root 1.42 When specified, all body data will be passed to this callback instead of
240     to the completion callback. The completion callback will get the empty
241     string instead of the body data.
242 root 1.41
243 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
244     or false, in which case AnyEvent::HTTP will cancel the download (and call
245     the completion callback with an error code of C<598>).
246    
247     This callback is useful when the data is too large to be held in memory
248     (so the callback writes it to a file) or when only some information should
249     be extracted, or when the body should be processed incrementally.
250 root 1.41
251     It is usually preferred over doing your own body handling via
252 root 1.45 C<want_body_handle>, but in case of streaming APIs, where HTTP is
253     only used to create a connection, C<want_body_handle> is the better
254     alternative, as it allows you to install your own event handler, reducing
255     resource usage.
256 root 1.41
257     =item want_body_handle => $enable
258    
259     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
260     changes considerably: after parsing the headers, and instead of
261     downloading the body (if any), the completion callback will be
262     called. Instead of the C<$body> argument containing the body data, the
263     callback will receive the L<AnyEvent::Handle> object associated with the
264     connection. In error cases, C<undef> will be passed. When there is no body
265     (e.g. status C<304>), the empty string will be passed.
266    
267     The handle object might or might not be in TLS mode, might be connected to
268     a proxy, be a persistent connection etc., and configured in unspecified
269     ways. The user is responsible for this handle (it will not be used by this
270     module anymore).
271    
272     This is useful with some push-type services, where, after the initial
273     headers, an interactive protocol is used (typical example would be the
274     push-style twitter API which starts a JSON/XML stream).
275    
276     If you think you need this, first have a look at C<on_body>, to see if
277 root 1.45 that doesn't solve your problem in a better way.
278 root 1.41
279 root 1.1 =back
280    
281 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
282    
283     http_request GET => "http://www.nethype.de/", sub {
284     my ($body, $hdr) = @_;
285     print "$body\n";
286     };
287    
288     Example: make a HTTP HEAD request on https://www.google.com/, use a
289     timeout of 30 seconds.
290    
291     http_request
292     GET => "https://www.google.com",
293     timeout => 30,
294     sub {
295     my ($body, $hdr) = @_;
296     use Data::Dumper;
297     print Dumper $hdr;
298     }
299     ;
300 root 1.1
301 root 1.29 Example: make another simple HTTP GET request, but immediately try to
302     cancel it.
303    
304     my $request = http_request GET => "http://www.nethype.de/", sub {
305     my ($body, $hdr) = @_;
306     print "$body\n";
307     };
308    
309     undef $request;
310    
311 root 1.1 =cut
312    
313 root 1.12 sub _slot_schedule;
314 root 1.11 sub _slot_schedule($) {
315     my $host = shift;
316    
317     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
318     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
319 root 1.12 # somebody wants that slot
320 root 1.11 ++$CO_SLOT{$host}[0];
321 root 1.14 ++$ACTIVE;
322 root 1.11
323     $cb->(AnyEvent::Util::guard {
324 root 1.14 --$ACTIVE;
325 root 1.11 --$CO_SLOT{$host}[0];
326     _slot_schedule $host;
327     });
328     } else {
329     # nobody wants the slot, maybe we can forget about it
330     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
331     last;
332     }
333     }
334     }
335    
336     # wait for a free slot on host, call callback
337     sub _get_slot($$) {
338     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
339    
340     _slot_schedule $_[0];
341     }
342    
343 root 1.46 our $qr_nlnl = qr{(?<![^\012])\015?\012};
344 root 1.34
345 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
346     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
347 root 1.40
348 elmex 1.15 sub http_request($$@) {
349 root 1.1 my $cb = pop;
350     my ($method, $url, %arg) = @_;
351    
352     my %hdr;
353    
354 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
355     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
356    
357 root 1.3 $method = uc $method;
358    
359 root 1.8 if (my $hdr = $arg{headers}) {
360 root 1.1 while (my ($k, $v) = each %$hdr) {
361     $hdr{lc $k} = $v;
362     }
363     }
364    
365 root 1.55 # pseudo headers for all subsequent responses
366     my @pseudo = (URL => $url);
367     push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect};
368    
369 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
370 root 1.8
371 root 1.55 return $cb->(undef, { Status => 599, Reason => "Too many redirections", @pseudo })
372 root 1.8 if $recurse < 0;
373    
374 root 1.2 my $proxy = $arg{proxy} || $PROXY;
375 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
376    
377 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
378 root 1.56 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|;
379 root 1.2
380 root 1.31 $uscheme = lc $uscheme;
381 root 1.1
382 root 1.31 my $uport = $uscheme eq "http" ? 80
383     : $uscheme eq "https" ? 443
384 root 1.55 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", @pseudo });
385 root 1.13
386 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
387 root 1.55 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", @pseudo });
388 root 1.10
389     my $uhost = $1;
390     $uport = $2 if defined $2;
391    
392 root 1.53 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
393     unless exists $hdr{host};
394 root 1.43
395 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
396 root 1.56 $upath .= $query if length $query;
397 root 1.10
398     $upath =~ s%^/?%/%;
399    
400     # cookie processing
401     if (my $jar = $arg{cookie_jar}) {
402 root 1.31 %$jar = () if $jar->{version} != 1;
403 root 1.10
404     my @cookie;
405    
406     while (my ($chost, $v) = each %$jar) {
407 root 1.30 if ($chost =~ /^\./) {
408     next unless $chost eq substr $uhost, -length $chost;
409     } elsif ($chost =~ /\./) {
410     next unless $chost eq $uhost;
411     } else {
412     next;
413     }
414 root 1.10
415     while (my ($cpath, $v) = each %$v) {
416     next unless $cpath eq substr $upath, 0, length $cpath;
417    
418     while (my ($k, $v) = each %$v) {
419 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
420     my $value = $v->{value};
421     $value =~ s/([\\"])/\\$1/g;
422     push @cookie, "$k=\"$value\"";
423 root 1.10 }
424     }
425     }
426    
427     $hdr{cookie} = join "; ", @cookie
428     if @cookie;
429     }
430 root 1.1
431 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
432 root 1.2
433 root 1.10 if ($proxy) {
434 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
435 root 1.31
436 root 1.47 $rscheme = "http" unless defined $rscheme;
437    
438 root 1.31 # don't support https requests over https-proxy transport,
439 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
440 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
441 root 1.10 } else {
442 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
443 root 1.2 }
444    
445 root 1.47 # leave out fragment and query string, just a heuristic
446     $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer};
447     $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"};
448 root 1.41
449 root 1.53 $hdr{"content-length"} = length $arg{body}
450     if length $arg{body} || $method ne "GET";
451 root 1.1
452 root 1.11 my %state = (connect_guard => 1);
453    
454     _get_slot $uhost, sub {
455     $state{slot_guard} = shift;
456 root 1.1
457 root 1.11 return unless $state{connect_guard};
458 root 1.1
459 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
460     $state{fh} = shift
461 root 1.44 or do {
462     my $err = "$!";
463     %state = ();
464 root 1.55 return $cb->(undef, { Status => 599, Reason => $err, @pseudo });
465 root 1.44 };
466    
467 root 1.34 pop; # free memory, save a tree
468 root 1.11
469 root 1.34 return unless delete $state{connect_guard};
470 root 1.11
471     # get handle
472     $state{handle} = new AnyEvent::Handle
473 root 1.40 fh => $state{fh},
474 root 1.56 peername => $rhost,
475     tls_ctx => $arg{tls_ctx},
476     # these need to be reconfigured on keepalive handles
477 root 1.40 timeout => $timeout,
478 root 1.56 on_error => sub {
479     %state = ();
480     $cb->(undef, { Status => 599, Reason => $_[2], @pseudo });
481     },
482     on_eof => sub {
483     %state = ();
484     $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", @pseudo });
485     },
486     ;
487 root 1.11
488     # limit the number of persistent connections
489 root 1.34 # keepalive not yet supported
490 root 1.56 # if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
491     # ++$KA_COUNT{$_[1]};
492     # $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
493     # --$KA_COUNT{$_[1]}
494     # };
495     # $hdr{connection} = "keep-alive";
496     # } else {
497 root 1.11 delete $hdr{connection};
498 root 1.56 # }
499 root 1.1
500 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
501    
502     # handle actual, non-tunneled, request
503     my $handle_actual_request = sub {
504 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
505 root 1.31
506     # send request
507     $state{handle}->push_write (
508     "$method $rpath HTTP/1.0\015\012"
509 root 1.47 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
510 root 1.31 . "\015\012"
511     . (delete $arg{body})
512 root 1.11 );
513    
514 root 1.54 # return if error occured during push_write()
515     return unless %state;
516    
517 root 1.55 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
518 root 1.10
519 root 1.57 # status line and headers
520     $state{handle}->push_read (line => $qr_nlnl, sub {
521     for ("$_[1]") {
522     y/\015//d; # weed out any \015, as they show up in the weirdest of places.
523 root 1.31
524 root 1.57 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )? \015?\012/igxc
525     or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response", @pseudo }));
526 root 1.31
527 root 1.57 push @pseudo,
528     HTTPVersion => $1,
529     Status => $2,
530     Reason => $3,
531     ;
532    
533     # things seen, not parsed:
534     # p3pP="NON CUR OTPi OUR NOR UNI"
535    
536     $hdr{lc $1} .= ",$2"
537     while /\G
538     ([^:\000-\037]*):
539     [\011\040]*
540     ((?: [^\012]+ | \012[\011\040] )*)
541     \012
542     /gxc;
543 root 1.31
544 root 1.57 /\G$/
545     or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", @pseudo }));
546     }
547 root 1.41
548 root 1.57 # remove the "," prefix we added to all headers above
549     substr $_, 0, 1, ""
550     for values %hdr;
551    
552     # patch in all pseudo headers
553     %hdr = (%hdr, @pseudo);
554    
555     # redirect handling
556     # microsoft and other shitheads don't give a shit for following standards,
557     # try to support some common forms of broken Location headers.
558     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
559     $hdr{location} =~ s/^\.\/+//;
560    
561     my $url = "$rscheme://$uhost:$uport";
562    
563     unless ($hdr{location} =~ s/^\///) {
564     $url .= $upath;
565     $url =~ s/\/[^\/]*$//;
566 root 1.41 }
567    
568 root 1.57 $hdr{location} = "$url/$hdr{location}";
569     }
570    
571     my $redirect;
572 root 1.41
573 root 1.57 if ($recurse) {
574     my $status = $hdr{Status};
575 root 1.55
576 root 1.57 # industry standard is to redirect POST as GET for
577     # 301, 302 and 303, in contrast to http/1.0 and 1.1.
578     # also, the UA should ask the user for 301 and 307 and POST,
579     # industry standard seems to be to simply follow.
580     # we go with the industry standard.
581     if ($status == 301 or $status == 302 or $status == 303) {
582     # HTTP/1.1 is unclear on how to mutate the method
583     $method = "GET" unless $method eq "HEAD";
584     $redirect = 1;
585     } elsif ($status == 307) {
586     $redirect = 1;
587 root 1.41 }
588 root 1.57 }
589 root 1.41
590 root 1.57 my $finish = sub {
591     $state{handle}->destroy if $state{handle};
592     %state = ();
593 root 1.31
594 root 1.57 # set-cookie processing
595     if ($arg{cookie_jar}) {
596     for ($_[1]{"set-cookie"}) {
597     # parse NAME=VALUE
598     my @kv;
599    
600     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
601     my $name = $1;
602     my $value = $3;
603    
604     unless ($value) {
605     $value = $2;
606     $value =~ s/\\(.)/$1/gs;
607     }
608    
609     push @kv, $name => $value;
610 root 1.31
611 root 1.57 last unless /\G\s*;/gc;
612     }
613 root 1.31
614 root 1.57 last unless @kv;
615 root 1.10
616 root 1.57 my $name = shift @kv;
617     my %kv = (value => shift @kv, @kv);
618 root 1.11
619 root 1.57 my $cdom;
620     my $cpath = (delete $kv{path}) || "/";
621 root 1.10
622 root 1.57 if (exists $kv{domain}) {
623     $cdom = delete $kv{domain};
624    
625     $cdom =~ s/^\.?/./; # make sure it starts with a "."
626 root 1.11
627 root 1.57 next if $cdom =~ /\.$/;
628 root 1.30
629 root 1.57 # this is not rfc-like and not netscape-like. go figure.
630     my $ndots = $cdom =~ y/.//;
631     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
632     } else {
633     $cdom = $uhost;
634     }
635    
636     # store it
637     $arg{cookie_jar}{version} = 1;
638     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
639 root 1.31
640 root 1.57 redo if /\G\s*,/gc;
641 root 1.11 }
642 root 1.57 }
643 root 1.8
644 root 1.57 if ($redirect && exists $hdr{location}) {
645     # we ignore any errors, as it is very common to receive
646     # Content-Length != 0 but no actual body
647     # we also access %hdr, as $_[1] might be an erro
648     http_request (
649     $method => $hdr{location},
650     %arg,
651     recurse => $recurse - 1,
652     Redirect => \@_,
653     $cb);
654     } else {
655     $cb->($_[0], $_[1]);
656     }
657     };
658 root 1.24
659 root 1.57 my $len = $hdr{"content-length"};
660 root 1.41
661 root 1.57 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
662     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", @pseudo });
663     } elsif (
664     $hdr{Status} =~ /^(?:1..|[23]04)$/
665     or $method eq "HEAD"
666     or (defined $len && !$len)
667     ) {
668     # no body
669     $finish->("", \%hdr);
670     } else {
671     # body handling, four different code paths
672     # for want_body_handle, on_body (2x), normal (2x)
673     # we might read too much here, but it does not matter yet (no pers. connections)
674     if (!$redirect && $arg{want_body_handle}) {
675     $_[0]->on_eof (undef);
676     $_[0]->on_error (undef);
677     $_[0]->on_read (undef);
678    
679     $finish->(delete $state{handle}, \%hdr);
680    
681     } elsif ($arg{on_body}) {
682     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) });
683     if ($len) {
684 root 1.41 $_[0]->on_eof (undef);
685 root 1.57 $_[0]->on_read (sub {
686     $len -= length $_[0]{rbuf};
687 root 1.41
688 root 1.57 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
689     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo });
690    
691     $len > 0
692     or $finish->("", \%hdr);
693     });
694     } else {
695     $_[0]->on_eof (sub {
696     $finish->("", \%hdr);
697     });
698     $_[0]->on_read (sub {
699     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
700     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo });
701     });
702     }
703     } else {
704     $_[0]->on_eof (undef);
705 root 1.31
706 root 1.57 if ($len) {
707 root 1.55 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) });
708 root 1.57 $_[0]->on_read (sub {
709     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
710     if $len <= length $_[0]{rbuf};
711     });
712 root 1.31 } else {
713 root 1.57 $_[0]->on_error (sub {
714     ($! == Errno::EPIPE || !$!)
715     ? $finish->(delete $_[0]{rbuf}, \%hdr)
716     : $finish->(undef, { Status => 599, Reason => $_[2], @pseudo });
717     });
718     $_[0]->on_read (sub { });
719 root 1.31 }
720 root 1.11 }
721 root 1.57 }
722 root 1.31 });
723     };
724 root 1.3
725 root 1.31 # now handle proxy-CONNECT method
726     if ($proxy && $uscheme eq "https") {
727     # oh dear, we have to wrap it into a connect request
728    
729     # maybe re-use $uauthority with patched port?
730     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
731 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
732 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
733 root 1.55 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", @pseudo }));
734 root 1.31
735     if ($2 == 200) {
736     $rpath = $upath;
737     &$handle_actual_request;
738 root 1.3 } else {
739 root 1.31 %state = ();
740 root 1.55 $cb->(undef, { Status => $2, Reason => $3, @pseudo });
741 root 1.3 }
742 root 1.11 });
743 root 1.31 } else {
744     &$handle_actual_request;
745     }
746    
747 root 1.50 }, $arg{on_prepare} || sub { $timeout };
748 root 1.1 };
749    
750     defined wantarray && AnyEvent::Util::guard { %state = () }
751     }
752    
753 elmex 1.15 sub http_get($@) {
754 root 1.1 unshift @_, "GET";
755     &http_request
756     }
757    
758 elmex 1.15 sub http_head($@) {
759 root 1.4 unshift @_, "HEAD";
760     &http_request
761     }
762    
763 elmex 1.15 sub http_post($$@) {
764 root 1.22 my $url = shift;
765     unshift @_, "POST", $url, "body";
766 root 1.3 &http_request
767     }
768    
769 root 1.9 =back
770    
771 root 1.55 =head2 DNS CACHING
772    
773     AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for
774     the actual connection, which in turn uses AnyEvent::DNS to resolve
775     hostnames. The latter is a simple stub resolver and does no caching
776     on its own. If you want DNS caching, you currently have to provide
777     your own default resolver (by storing a suitable resolver object in
778     C<$AnyEvent::DNS::RESOLVER>).
779    
780 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
781 root 1.1
782     =over 4
783    
784 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
785    
786     Sets the default proxy server to use. The proxy-url must begin with a
787 root 1.52 string of the form C<http://host:port> (optionally C<https:...>), croaks
788     otherwise.
789    
790     To clear an already-set proxy, use C<undef>.
791 root 1.2
792 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
793 root 1.1
794 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
795 root 1.1
796     =item $AnyEvent::HTTP::USERAGENT
797    
798     The default value for the C<User-Agent> header (the default is
799 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
800 root 1.1
801 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
802 root 1.1
803 root 1.47 The maximum number of concurrent connections to the same host (identified
804 root 1.43 by the hostname). If the limit is exceeded, then the additional requests
805     are queued until previous connections are closed.
806 root 1.1
807 root 1.43 The default value for this is C<4>, and it is highly advisable to not
808     increase it.
809 root 1.3
810 root 1.14 =item $AnyEvent::HTTP::ACTIVE
811    
812     The number of active connections. This is not the number of currently
813     running requests, but the number of currently open and non-idle TCP
814     connections. This number of can be useful for load-leveling.
815    
816 root 1.1 =back
817    
818     =cut
819    
820 root 1.2 sub set_proxy($) {
821 root 1.52 if (length $_[0]) {
822     $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix
823     or Carp::croak "$_[0]: invalid proxy URL";
824     $PROXY = [$2, $3 || 3128, $1]
825     } else {
826     undef $PROXY;
827     }
828 root 1.2 }
829    
830     # initialise proxy from environment
831 root 1.52 eval {
832     set_proxy $ENV{http_proxy};
833     };
834 root 1.2
835 root 1.1 =head1 SEE ALSO
836    
837     L<AnyEvent>.
838    
839     =head1 AUTHOR
840    
841 root 1.18 Marc Lehmann <schmorp@schmorp.de>
842     http://home.schmorp.de/
843 root 1.1
844 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
845     testcases and bugreports.
846    
847 root 1.1 =cut
848    
849     1
850