ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
Revision: 1.54
Committed: Wed Jun 16 18:09:52 2010 UTC (13 years, 11 months ago) by root
Branch: MAIN
Changes since 1.53: +3 -0 lines
Log Message:
Don't die on errors in push_write().

During processing of push_write() error may happen (e.g. EPIPE due to
connection close by server).  This will result in callbacks called and
%state cleared.  Avoid doing push_read() on undefined $state{handle} if
it happens.

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4    
5     =head1 SYNOPSIS
6    
7     use AnyEvent::HTTP;
8    
9 root 1.17 http_get "http://www.nethype.de/", sub { print $_[1] };
10    
11     # ... do something else here
12    
13 root 1.1 =head1 DESCRIPTION
14    
15     This module is an L<AnyEvent> user, you need to make sure that you use and
16     run a supported event loop.
17    
18 root 1.11 This module implements a simple, stateless and non-blocking HTTP
19     client. It supports GET, POST and other request methods, cookies and more,
20     all on a very low level. It can follow redirects supports proxies and
21     automatically limits the number of connections to the values specified in
22     the RFC.
23    
24     It should generally be a "good client" that is enough for most HTTP
25     tasks. Simple tasks should be simple, but complex tasks should still be
26     possible as the user retains control over request and response headers.
27    
28     The caller is responsible for authentication management, cookies (if
29     the simplistic implementation in this module doesn't suffice), referer
30     and other high-level protocol details for which this module offers only
31     limited support.
32    
33 root 1.1 =head2 METHODS
34    
35     =over 4
36    
37     =cut
38    
39     package AnyEvent::HTTP;
40    
41     use strict;
42     no warnings;
43    
44 root 1.41 use Errno ();
45 root 1.1
46 root 1.51 use AnyEvent 5.0 ();
47 root 1.1 use AnyEvent::Util ();
48     use AnyEvent::Socket ();
49     use AnyEvent::Handle ();
50    
51     use base Exporter::;
52    
53 root 1.53 our $VERSION = '1.44';
54 root 1.1
55 root 1.17 our @EXPORT = qw(http_get http_post http_head http_request);
56 root 1.1
57 root 1.40 our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58 root 1.3 our $MAX_RECURSE = 10;
59 root 1.2 our $MAX_PERSISTENT = 8;
60     our $PERSISTENT_TIMEOUT = 2;
61     our $TIMEOUT = 300;
62 root 1.1
63     # changing these is evil
64 root 1.43 our $MAX_PERSISTENT_PER_HOST = 0;
65 root 1.11 our $MAX_PER_HOST = 4;
66 root 1.1
67 root 1.2 our $PROXY;
68 root 1.14 our $ACTIVE = 0;
69 root 1.2
70 root 1.1 my %KA_COUNT; # number of open keep-alive connections per host
71 root 1.11 my %CO_SLOT; # number of open connections, and wait queue, per host
72 root 1.1
73     =item http_get $url, key => value..., $cb->($data, $headers)
74    
75     Executes an HTTP-GET request. See the http_request function for details on
76 root 1.29 additional parameters and the return value.
77 root 1.1
78 root 1.5 =item http_head $url, key => value..., $cb->($data, $headers)
79    
80 root 1.29 Executes an HTTP-HEAD request. See the http_request function for details
81     on additional parameters and the return value.
82 root 1.5
83     =item http_post $url, $body, key => value..., $cb->($data, $headers)
84 root 1.3
85 root 1.26 Executes an HTTP-POST request with a request body of C<$body>. See the
86 root 1.29 http_request function for details on additional parameters and the return
87     value.
88 root 1.3
89 root 1.1 =item http_request $method => $url, key => value..., $cb->($data, $headers)
90    
91     Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
92     must be an absolute http or https URL.
93    
94 root 1.29 When called in void context, nothing is returned. In other contexts,
95     C<http_request> returns a "cancellation guard" - you have to keep the
96     object at least alive until the callback get called. If the object gets
97     destroyed before the callbakc is called, the request will be cancelled.
98    
99 root 1.42 The callback will be called with the response body data as first argument
100     (or C<undef> if an error occured), and a hash-ref with response headers as
101     second argument.
102 root 1.2
103 root 1.7 All the headers in that hash are lowercased. In addition to the response
104 root 1.20 headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason>
105     contain the three parts of the HTTP Status-Line of the same name. The
106     pseudo-header C<URL> contains the original URL (which can differ from the
107     requested URL when following redirects).
108    
109 root 1.32 If the server sends a header multiple times, then their contents will be
110     joined together with a comma (C<,>), as per the HTTP spec.
111 root 1.2
112     If an internal error occurs, such as not being able to resolve a hostname,
113 root 1.41 then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<59x>
114     (usually C<599>) and the C<Reason> pseudo-header will contain an error
115     message.
116 root 1.2
117 root 1.6 A typical callback might look like this:
118    
119     sub {
120     my ($body, $hdr) = @_;
121    
122     if ($hdr->{Status} =~ /^2/) {
123     ... everything should be ok
124     } else {
125     print "error, $hdr->{Status} $hdr->{Reason}\n";
126     }
127     }
128    
129 root 1.1 Additional parameters are key-value pairs, and are fully optional. They
130     include:
131    
132     =over 4
133    
134 root 1.3 =item recurse => $count (default: $MAX_RECURSE)
135 root 1.1
136     Whether to recurse requests or not, e.g. on redirects, authentication
137 root 1.3 retries and so on, and how often to do so.
138 root 1.1
139     =item headers => hashref
140    
141 root 1.12 The request headers to use. Currently, C<http_request> may provide its
142     own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
143 root 1.47 and will provide defaults for C<User-Agent:> and C<Referer:> (this can be
144     suppressed by using C<undef> for these headers in which case they won't be
145 root 1.49 sent at all).
146 root 1.1
147     =item timeout => $seconds
148    
149     The time-out to use for various stages - each connect attempt will reset
150 root 1.51 the timeout, as will read or write activity, i.e. this is not an overall
151     timeout.
152    
153     Default timeout is 5 minutes.
154 root 1.2
155     =item proxy => [$host, $port[, $scheme]] or undef
156    
157     Use the given http proxy for all requests. If not specified, then the
158     default proxy (as specified by C<$ENV{http_proxy}>) is used.
159    
160 root 1.47 C<$scheme> must be either missing, C<http> for HTTP or C<https> for
161 root 1.2 HTTPS.
162 root 1.1
163 root 1.3 =item body => $string
164    
165     The request body, usually empty. Will be-sent as-is (future versions of
166     this module might offer more options).
167    
168 root 1.10 =item cookie_jar => $hash_ref
169    
170     Passing this parameter enables (simplified) cookie-processing, loosely
171     based on the original netscape specification.
172    
173     The C<$hash_ref> must be an (initially empty) hash reference which will
174     get updated automatically. It is possible to save the cookie_jar to
175     persistent storage with something like JSON or Storable, but this is not
176 root 1.40 recommended, as expiry times are currently being ignored.
177 root 1.10
178     Note that this cookie implementation is not of very high quality, nor
179     meant to be complete. If you want complete cookie management you have to
180     do that on your own. C<cookie_jar> is meant as a quick fix to get some
181     cookie-using sites working. Cookies are a privacy disaster, do not use
182     them unless required to.
183    
184 root 1.40 =item tls_ctx => $scheme | $tls_ctx
185    
186     Specifies the AnyEvent::TLS context to be used for https connections. This
187     parameter follows the same rules as the C<tls_ctx> parameter to
188     L<AnyEvent::Handle>, but additionally, the two strings C<low> or
189     C<high> can be specified, which give you a predefined low-security (no
190     verification, highest compatibility) and high-security (CA and common-name
191     verification) TLS context.
192    
193     The default for this option is C<low>, which could be interpreted as "give
194     me the page, no matter what".
195    
196 root 1.51 =item on_prepare => $callback->($fh)
197    
198     In rare cases you need to "tune" the socket before it is used to
199     connect (for exmaple, to bind it on a given IP address). This parameter
200     overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
201     and behaves exactly the same way (e.g. it has to provide a
202     timeout). See the description for the C<$prepare_cb> argument of
203     C<AnyEvent::Socket::tcp_connect> for details.
204    
205 root 1.42 =item on_header => $callback->($headers)
206 root 1.41
207     When specified, this callback will be called with the header hash as soon
208     as headers have been successfully received from the remote server (not on
209     locally-generated errors).
210    
211     It has to return either true (in which case AnyEvent::HTTP will continue),
212     or false, in which case AnyEvent::HTTP will cancel the download (and call
213     the finish callback with an error code of C<598>).
214    
215     This callback is useful, among other things, to quickly reject unwanted
216     content, which, if it is supposed to be rare, can be faster than first
217     doing a C<HEAD> request.
218    
219 root 1.42 Example: cancel the request unless the content-type is "text/html".
220 root 1.41
221 root 1.42 on_header => sub {
222     $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
223     },
224 root 1.41
225 root 1.42 =item on_body => $callback->($partial_body, $headers)
226 root 1.41
227 root 1.42 When specified, all body data will be passed to this callback instead of
228     to the completion callback. The completion callback will get the empty
229     string instead of the body data.
230 root 1.41
231 root 1.42 It has to return either true (in which case AnyEvent::HTTP will continue),
232     or false, in which case AnyEvent::HTTP will cancel the download (and call
233     the completion callback with an error code of C<598>).
234    
235     This callback is useful when the data is too large to be held in memory
236     (so the callback writes it to a file) or when only some information should
237     be extracted, or when the body should be processed incrementally.
238 root 1.41
239     It is usually preferred over doing your own body handling via
240 root 1.45 C<want_body_handle>, but in case of streaming APIs, where HTTP is
241     only used to create a connection, C<want_body_handle> is the better
242     alternative, as it allows you to install your own event handler, reducing
243     resource usage.
244 root 1.41
245     =item want_body_handle => $enable
246    
247     When enabled (default is disabled), the behaviour of AnyEvent::HTTP
248     changes considerably: after parsing the headers, and instead of
249     downloading the body (if any), the completion callback will be
250     called. Instead of the C<$body> argument containing the body data, the
251     callback will receive the L<AnyEvent::Handle> object associated with the
252     connection. In error cases, C<undef> will be passed. When there is no body
253     (e.g. status C<304>), the empty string will be passed.
254    
255     The handle object might or might not be in TLS mode, might be connected to
256     a proxy, be a persistent connection etc., and configured in unspecified
257     ways. The user is responsible for this handle (it will not be used by this
258     module anymore).
259    
260     This is useful with some push-type services, where, after the initial
261     headers, an interactive protocol is used (typical example would be the
262     push-style twitter API which starts a JSON/XML stream).
263    
264     If you think you need this, first have a look at C<on_body>, to see if
265 root 1.45 that doesn't solve your problem in a better way.
266 root 1.41
267 root 1.1 =back
268    
269 root 1.9 Example: make a simple HTTP GET request for http://www.nethype.de/
270    
271     http_request GET => "http://www.nethype.de/", sub {
272     my ($body, $hdr) = @_;
273     print "$body\n";
274     };
275    
276     Example: make a HTTP HEAD request on https://www.google.com/, use a
277     timeout of 30 seconds.
278    
279     http_request
280     GET => "https://www.google.com",
281     timeout => 30,
282     sub {
283     my ($body, $hdr) = @_;
284     use Data::Dumper;
285     print Dumper $hdr;
286     }
287     ;
288 root 1.1
289 root 1.29 Example: make another simple HTTP GET request, but immediately try to
290     cancel it.
291    
292     my $request = http_request GET => "http://www.nethype.de/", sub {
293     my ($body, $hdr) = @_;
294     print "$body\n";
295     };
296    
297     undef $request;
298    
299 root 1.1 =cut
300    
301 root 1.12 sub _slot_schedule;
302 root 1.11 sub _slot_schedule($) {
303     my $host = shift;
304    
305     while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
306     if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
307 root 1.12 # somebody wants that slot
308 root 1.11 ++$CO_SLOT{$host}[0];
309 root 1.14 ++$ACTIVE;
310 root 1.11
311     $cb->(AnyEvent::Util::guard {
312 root 1.14 --$ACTIVE;
313 root 1.11 --$CO_SLOT{$host}[0];
314     _slot_schedule $host;
315     });
316     } else {
317     # nobody wants the slot, maybe we can forget about it
318     delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
319     last;
320     }
321     }
322     }
323    
324     # wait for a free slot on host, call callback
325     sub _get_slot($$) {
326     push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
327    
328     _slot_schedule $_[0];
329     }
330    
331 root 1.46 our $qr_nl = qr{\015?\012};
332     our $qr_nlnl = qr{(?<![^\012])\015?\012};
333 root 1.34
334 root 1.41 our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
335     our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
336 root 1.40
337 elmex 1.15 sub http_request($$@) {
338 root 1.1 my $cb = pop;
339     my ($method, $url, %arg) = @_;
340    
341     my %hdr;
342    
343 root 1.40 $arg{tls_ctx} = $TLS_CTX_LOW if $arg{tls_ctx} eq "low" || !exists $arg{tls_ctx};
344     $arg{tls_ctx} = $TLS_CTX_HIGH if $arg{tls_ctx} eq "high";
345    
346 root 1.3 $method = uc $method;
347    
348 root 1.8 if (my $hdr = $arg{headers}) {
349 root 1.1 while (my ($k, $v) = each %$hdr) {
350     $hdr{lc $k} = $v;
351     }
352     }
353    
354 root 1.23 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
355 root 1.8
356 root 1.40 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url })
357 root 1.8 if $recurse < 0;
358    
359 root 1.2 my $proxy = $arg{proxy} || $PROXY;
360 root 1.1 my $timeout = $arg{timeout} || $TIMEOUT;
361    
362 root 1.31 my ($uscheme, $uauthority, $upath, $query, $fragment) =
363 root 1.10 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
364 root 1.2
365 root 1.31 $uscheme = lc $uscheme;
366 root 1.1
367 root 1.31 my $uport = $uscheme eq "http" ? 80
368     : $uscheme eq "https" ? 443
369 root 1.41 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url });
370 root 1.13
371 root 1.31 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
372 root 1.40 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url });
373 root 1.10
374     my $uhost = $1;
375     $uport = $2 if defined $2;
376    
377 root 1.53 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
378     unless exists $hdr{host};
379 root 1.43
380 root 1.10 $uhost =~ s/^\[(.*)\]$/$1/;
381     $upath .= "?$query" if length $query;
382    
383     $upath =~ s%^/?%/%;
384    
385     # cookie processing
386     if (my $jar = $arg{cookie_jar}) {
387 root 1.31 %$jar = () if $jar->{version} != 1;
388 root 1.10
389     my @cookie;
390    
391     while (my ($chost, $v) = each %$jar) {
392 root 1.30 if ($chost =~ /^\./) {
393     next unless $chost eq substr $uhost, -length $chost;
394     } elsif ($chost =~ /\./) {
395     next unless $chost eq $uhost;
396     } else {
397     next;
398     }
399 root 1.10
400     while (my ($cpath, $v) = each %$v) {
401     next unless $cpath eq substr $upath, 0, length $cpath;
402    
403     while (my ($k, $v) = each %$v) {
404 root 1.31 next if $uscheme ne "https" && exists $v->{secure};
405     my $value = $v->{value};
406     $value =~ s/([\\"])/\\$1/g;
407     push @cookie, "$k=\"$value\"";
408 root 1.10 }
409     }
410     }
411    
412     $hdr{cookie} = join "; ", @cookie
413     if @cookie;
414     }
415 root 1.1
416 root 1.31 my ($rhost, $rport, $rscheme, $rpath); # request host, port, path
417 root 1.2
418 root 1.10 if ($proxy) {
419 root 1.38 ($rpath, $rhost, $rport, $rscheme) = ($url, @$proxy);
420 root 1.31
421 root 1.47 $rscheme = "http" unless defined $rscheme;
422    
423 root 1.31 # don't support https requests over https-proxy transport,
424 root 1.38 # can't be done with tls as spec'ed, unless you double-encrypt.
425 root 1.31 $rscheme = "http" if $uscheme eq "https" && $rscheme eq "https";
426 root 1.10 } else {
427 root 1.31 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
428 root 1.2 }
429    
430 root 1.47 # leave out fragment and query string, just a heuristic
431     $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer};
432     $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"};
433 root 1.41
434 root 1.53 $hdr{"content-length"} = length $arg{body}
435     if length $arg{body} || $method ne "GET";
436 root 1.1
437 root 1.11 my %state = (connect_guard => 1);
438    
439     _get_slot $uhost, sub {
440     $state{slot_guard} = shift;
441 root 1.1
442 root 1.11 return unless $state{connect_guard};
443 root 1.1
444 root 1.11 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
445     $state{fh} = shift
446 root 1.44 or do {
447     my $err = "$!";
448     %state = ();
449     return $cb->(undef, { Status => 599, Reason => $err, URL => $url });
450     };
451    
452 root 1.34 pop; # free memory, save a tree
453 root 1.11
454 root 1.34 return unless delete $state{connect_guard};
455 root 1.11
456     # get handle
457     $state{handle} = new AnyEvent::Handle
458 root 1.40 fh => $state{fh},
459     timeout => $timeout,
460     peername => $rhost,
461     tls_ctx => $arg{tls_ctx};
462 root 1.11
463     # limit the number of persistent connections
464 root 1.34 # keepalive not yet supported
465 root 1.11 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
466     ++$KA_COUNT{$_[1]};
467 root 1.34 $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
468     --$KA_COUNT{$_[1]}
469     };
470 root 1.11 $hdr{connection} = "keep-alive";
471     } else {
472     delete $hdr{connection};
473     }
474 root 1.1
475 root 1.11 # (re-)configure handle
476     $state{handle}->on_error (sub {
477     %state = ();
478 root 1.40 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
479 root 1.11 });
480     $state{handle}->on_eof (sub {
481     %state = ();
482 root 1.40 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
483 root 1.11 });
484 root 1.1
485 root 1.31 $state{handle}->starttls ("connect") if $rscheme eq "https";
486    
487     # handle actual, non-tunneled, request
488     my $handle_actual_request = sub {
489 root 1.34 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
490 root 1.31
491     # send request
492     $state{handle}->push_write (
493     "$method $rpath HTTP/1.0\015\012"
494 root 1.47 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
495 root 1.31 . "\015\012"
496     . (delete $arg{body})
497 root 1.11 );
498    
499 root 1.54 # return if error occured during push_write()
500     return unless %state;
501    
502 root 1.31 %hdr = (); # reduce memory usage, save a kitten
503 root 1.10
504 root 1.31 # status line
505 root 1.34 $state{handle}->push_read (line => $qr_nl, sub {
506 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
507 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
508 root 1.31
509     my %hdr = ( # response headers
510     HTTPVersion => ",$1",
511     Status => ",$2",
512     Reason => ",$3",
513     URL => ",$url"
514     );
515    
516     # headers, could be optimized a bit
517 root 1.34 $state{handle}->unshift_read (line => $qr_nlnl, sub {
518 root 1.46 for ("$_[1]") {
519 root 1.31 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
520    
521 root 1.40 # things seen, not parsed:
522     # p3pP="NON CUR OTPi OUR NOR UNI"
523    
524 root 1.31 $hdr{lc $1} .= ",$2"
525     while /\G
526 root 1.43 ([^:\000-\037]*):
527 root 1.31 [\011\040]*
528     ((?: [^\012]+ | \012[\011\040] )*)
529     \012
530     /gxc;
531    
532     /\G$/
533 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url }));
534 root 1.31 }
535    
536     substr $_, 0, 1, ""
537     for values %hdr;
538    
539 root 1.41 # redirect handling
540     # microsoft and other shitheads don't give a shit for following standards,
541     # try to support some common forms of broken Location headers.
542     if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
543     $hdr{location} =~ s/^\.\/+//;
544    
545     my $url = "$rscheme://$uhost:$uport";
546    
547     unless ($hdr{location} =~ s/^\///) {
548     $url .= $upath;
549     $url =~ s/\/[^\/]*$//;
550     }
551    
552     $hdr{location} = "$url/$hdr{location}";
553     }
554    
555     my $redirect;
556    
557     if ($recurse) {
558     if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") {
559     # apparently, mozilla et al. just change POST to GET here
560     # more research is needed before we do the same
561     $redirect = 1;
562     } elsif ($hdr{Status} == 303) {
563     # even http/1.1 is unclear on how to mutate the method
564     $method = "GET" unless $method eq "HEAD";
565     $redirect = 1;
566     } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) {
567     $redirect = 1;
568     }
569     }
570    
571 root 1.31 my $finish = sub {
572 root 1.41 $state{handle}->destroy if $state{handle};
573 root 1.31 %state = ();
574    
575     # set-cookie processing
576     if ($arg{cookie_jar}) {
577 root 1.41 for ($_[1]{"set-cookie"}) {
578 root 1.31 # parse NAME=VALUE
579     my @kv;
580    
581     while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
582     my $name = $1;
583     my $value = $3;
584    
585     unless ($value) {
586     $value = $2;
587     $value =~ s/\\(.)/$1/gs;
588     }
589    
590     push @kv, $name => $value;
591    
592     last unless /\G\s*;/gc;
593     }
594    
595     last unless @kv;
596 root 1.10
597 root 1.31 my $name = shift @kv;
598     my %kv = (value => shift @kv, @kv);
599 root 1.11
600 root 1.31 my $cdom;
601     my $cpath = (delete $kv{path}) || "/";
602 root 1.10
603 root 1.31 if (exists $kv{domain}) {
604     $cdom = delete $kv{domain};
605    
606     $cdom =~ s/^\.?/./; # make sure it starts with a "."
607 root 1.11
608 root 1.31 next if $cdom =~ /\.$/;
609    
610     # this is not rfc-like and not netscape-like. go figure.
611     my $ndots = $cdom =~ y/.//;
612     next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
613     } else {
614     $cdom = $uhost;
615     }
616 root 1.30
617 root 1.31 # store it
618     $arg{cookie_jar}{version} = 1;
619     $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
620    
621     redo if /\G\s*,/gc;
622 root 1.30 }
623 root 1.11 }
624 root 1.8
625 root 1.47 if ($redirect && exists $hdr{location}) {
626 root 1.41 # we ignore any errors, as it is very common to receive
627     # Content-Length != 0 but no actual body
628     # we also access %hdr, as $_[1] might be an erro
629     http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb);
630 root 1.31 } else {
631     $cb->($_[0], $_[1]);
632     }
633     };
634 root 1.24
635 root 1.41 my $len = $hdr{"content-length"};
636    
637     if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
638     $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
639     } elsif (
640 root 1.42 $hdr{Status} =~ /^(?:1..|[23]04)$/
641 root 1.41 or $method eq "HEAD"
642     or (defined $len && !$len)
643     ) {
644     # no body
645     $finish->("", \%hdr);
646 root 1.11 } else {
647 root 1.41 # body handling, four different code paths
648     # for want_body_handle, on_body (2x), normal (2x)
649     # we might read too much here, but it does not matter yet (no pers. connections)
650     if (!$redirect && $arg{want_body_handle}) {
651     $_[0]->on_eof (undef);
652     $_[0]->on_error (undef);
653     $_[0]->on_read (undef);
654    
655     $finish->(delete $state{handle}, \%hdr);
656 root 1.31
657 root 1.41 } elsif ($arg{on_body}) {
658     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
659     if ($len) {
660     $_[0]->on_eof (undef);
661     $_[0]->on_read (sub {
662     $len -= length $_[0]{rbuf};
663    
664     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
665     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
666    
667     $len > 0
668     or $finish->("", \%hdr);
669     });
670     } else {
671     $_[0]->on_eof (sub {
672     $finish->("", \%hdr);
673     });
674     $_[0]->on_read (sub {
675     $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
676     or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
677     });
678     }
679 root 1.31 } else {
680     $_[0]->on_eof (undef);
681 root 1.41
682     if ($len) {
683     $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
684     $_[0]->on_read (sub {
685     $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
686     if $len <= length $_[0]{rbuf};
687     });
688     } else {
689     $_[0]->on_error (sub {
690 root 1.43 $! == Errno::EPIPE || !$!
691 root 1.41 ? $finish->(delete $_[0]{rbuf}, \%hdr)
692     : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url });
693     });
694     $_[0]->on_read (sub { });
695     }
696 root 1.31 }
697 root 1.11 }
698 root 1.31 });
699     });
700     };
701 root 1.3
702 root 1.31 # now handle proxy-CONNECT method
703     if ($proxy && $uscheme eq "https") {
704     # oh dear, we have to wrap it into a connect request
705    
706     # maybe re-use $uauthority with patched port?
707     $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
708 root 1.34 $state{handle}->push_read (line => $qr_nlnl, sub {
709 root 1.31 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
710 root 1.40 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url }));
711 root 1.31
712     if ($2 == 200) {
713     $rpath = $upath;
714     &$handle_actual_request;
715 root 1.3 } else {
716 root 1.31 %state = ();
717     $cb->(undef, { Status => $2, Reason => $3, URL => $url });
718 root 1.3 }
719 root 1.11 });
720 root 1.31 } else {
721     &$handle_actual_request;
722     }
723    
724 root 1.50 }, $arg{on_prepare} || sub { $timeout };
725 root 1.1 };
726    
727     defined wantarray && AnyEvent::Util::guard { %state = () }
728     }
729    
730 elmex 1.15 sub http_get($@) {
731 root 1.1 unshift @_, "GET";
732     &http_request
733     }
734    
735 elmex 1.15 sub http_head($@) {
736 root 1.4 unshift @_, "HEAD";
737     &http_request
738     }
739    
740 elmex 1.15 sub http_post($$@) {
741 root 1.22 my $url = shift;
742     unshift @_, "POST", $url, "body";
743 root 1.3 &http_request
744     }
745    
746 root 1.9 =back
747    
748 root 1.2 =head2 GLOBAL FUNCTIONS AND VARIABLES
749 root 1.1
750     =over 4
751    
752 root 1.2 =item AnyEvent::HTTP::set_proxy "proxy-url"
753    
754     Sets the default proxy server to use. The proxy-url must begin with a
755 root 1.52 string of the form C<http://host:port> (optionally C<https:...>), croaks
756     otherwise.
757    
758     To clear an already-set proxy, use C<undef>.
759 root 1.2
760 root 1.3 =item $AnyEvent::HTTP::MAX_RECURSE
761 root 1.1
762 root 1.3 The default value for the C<recurse> request parameter (default: C<10>).
763 root 1.1
764     =item $AnyEvent::HTTP::USERAGENT
765    
766     The default value for the C<User-Agent> header (the default is
767 root 1.40 C<Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
768 root 1.1
769 root 1.43 =item $AnyEvent::HTTP::MAX_PER_HOST
770 root 1.1
771 root 1.47 The maximum number of concurrent connections to the same host (identified
772 root 1.43 by the hostname). If the limit is exceeded, then the additional requests
773     are queued until previous connections are closed.
774 root 1.1
775 root 1.43 The default value for this is C<4>, and it is highly advisable to not
776     increase it.
777 root 1.3
778 root 1.14 =item $AnyEvent::HTTP::ACTIVE
779    
780     The number of active connections. This is not the number of currently
781     running requests, but the number of currently open and non-idle TCP
782     connections. This number of can be useful for load-leveling.
783    
784 root 1.1 =back
785    
786     =cut
787    
788 root 1.2 sub set_proxy($) {
789 root 1.52 if (length $_[0]) {
790     $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix
791     or Carp::croak "$_[0]: invalid proxy URL";
792     $PROXY = [$2, $3 || 3128, $1]
793     } else {
794     undef $PROXY;
795     }
796 root 1.2 }
797    
798     # initialise proxy from environment
799 root 1.52 eval {
800     set_proxy $ENV{http_proxy};
801     };
802 root 1.2
803 root 1.1 =head1 SEE ALSO
804    
805     L<AnyEvent>.
806    
807     =head1 AUTHOR
808    
809 root 1.18 Marc Lehmann <schmorp@schmorp.de>
810     http://home.schmorp.de/
811 root 1.1
812 root 1.36 With many thanks to Дмитрий Шалашов, who provided countless
813     testcases and bugreports.
814    
815 root 1.1 =cut
816    
817     1
818