ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.9 by root, Wed Jun 4 13:51:53 2008 UTC vs.
Revision 1.18 by root, Fri Jun 6 16:23:57 2008 UTC

3AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client 3AnyEvent::HTTP - simple but non-blocking HTTP/HTTPS client
4 4
5=head1 SYNOPSIS 5=head1 SYNOPSIS
6 6
7 use AnyEvent::HTTP; 7 use AnyEvent::HTTP;
8
9 http_get "http://www.nethype.de/", sub { print $_[1] };
10
11 # ... do something else here
8 12
9=head1 DESCRIPTION 13=head1 DESCRIPTION
10 14
11This module is an L<AnyEvent> user, you need to make sure that you use and 15This module is an L<AnyEvent> user, you need to make sure that you use and
12run a supported event loop. 16run a supported event loop.
17
18This module implements a simple, stateless and non-blocking HTTP
19client. It supports GET, POST and other request methods, cookies and more,
20all on a very low level. It can follow redirects supports proxies and
21automatically limits the number of connections to the values specified in
22the RFC.
23
24It should generally be a "good client" that is enough for most HTTP
25tasks. Simple tasks should be simple, but complex tasks should still be
26possible as the user retains control over request and response headers.
27
28The caller is responsible for authentication management, cookies (if
29the simplistic implementation in this module doesn't suffice), referer
30and other high-level protocol details for which this module offers only
31limited support.
13 32
14=head2 METHODS 33=head2 METHODS
15 34
16=over 4 35=over 4
17 36
29use AnyEvent::Socket (); 48use AnyEvent::Socket ();
30use AnyEvent::Handle (); 49use AnyEvent::Handle ();
31 50
32use base Exporter::; 51use base Exporter::;
33 52
34our $VERSION = '1.0'; 53our $VERSION = '1.01';
35 54
36our @EXPORT = qw(http_get http_request); 55our @EXPORT = qw(http_get http_post http_head http_request);
37 56
38our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 57our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
39our $MAX_RECURSE = 10; 58our $MAX_RECURSE = 10;
40our $MAX_PERSISTENT = 8; 59our $MAX_PERSISTENT = 8;
41our $PERSISTENT_TIMEOUT = 2; 60our $PERSISTENT_TIMEOUT = 2;
42our $TIMEOUT = 300; 61our $TIMEOUT = 300;
43 62
44# changing these is evil 63# changing these is evil
45our $MAX_PERSISTENT_PER_HOST = 2; 64our $MAX_PERSISTENT_PER_HOST = 2;
46our $MAX_PER_HOST = 4; # not respected yet :( 65our $MAX_PER_HOST = 4;
47 66
48our $PROXY; 67our $PROXY;
68our $ACTIVE = 0;
49 69
50my %KA_COUNT; # number of open keep-alive connections per host 70my %KA_COUNT; # number of open keep-alive connections per host
71my %CO_SLOT; # number of open connections, and wait queue, per host
51 72
52=item http_get $url, key => value..., $cb->($data, $headers) 73=item http_get $url, key => value..., $cb->($data, $headers)
53 74
54Executes an HTTP-GET request. See the http_request function for details on 75Executes an HTTP-GET request. See the http_request function for details on
55additional parameters. 76additional parameters.
74response headers as second argument. 95response headers as second argument.
75 96
76All the headers in that hash are lowercased. In addition to the response 97All the headers in that hash are lowercased. In addition to the response
77headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and 98headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and
78C<Reason> contain the three parts of the HTTP Status-Line of the same 99C<Reason> contain the three parts of the HTTP Status-Line of the same
79name. 100name. If the server sends a header multiple lines, then their contents
101will be joined together with C<\x00>.
80 102
81If an internal error occurs, such as not being able to resolve a hostname, 103If an internal error occurs, such as not being able to resolve a hostname,
82then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599> 104then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599>
83and the C<Reason> pseudo-header will contain an error message. 105and the C<Reason> pseudo-header will contain an error message.
84 106
104Whether to recurse requests or not, e.g. on redirects, authentication 126Whether to recurse requests or not, e.g. on redirects, authentication
105retries and so on, and how often to do so. 127retries and so on, and how often to do so.
106 128
107=item headers => hashref 129=item headers => hashref
108 130
109The request headers to use. 131The request headers to use. Currently, C<http_request> may provide its
132own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
133and will provide defaults for C<User-Agent:> and C<Referer:>.
110 134
111=item timeout => $seconds 135=item timeout => $seconds
112 136
113The time-out to use for various stages - each connect attempt will reset 137The time-out to use for various stages - each connect attempt will reset
114the timeout, as will read or write activity. Default timeout is 5 minutes. 138the timeout, as will read or write activity. Default timeout is 5 minutes.
123 147
124=item body => $string 148=item body => $string
125 149
126The request body, usually empty. Will be-sent as-is (future versions of 150The request body, usually empty. Will be-sent as-is (future versions of
127this module might offer more options). 151this module might offer more options).
152
153=item cookie_jar => $hash_ref
154
155Passing this parameter enables (simplified) cookie-processing, loosely
156based on the original netscape specification.
157
158The C<$hash_ref> must be an (initially empty) hash reference which will
159get updated automatically. It is possible to save the cookie_jar to
160persistent storage with something like JSON or Storable, but this is not
161recommended, as expire times are currently being ignored.
162
163Note that this cookie implementation is not of very high quality, nor
164meant to be complete. If you want complete cookie management you have to
165do that on your own. C<cookie_jar> is meant as a quick fix to get some
166cookie-using sites working. Cookies are a privacy disaster, do not use
167them unless required to.
128 168
129=back 169=back
130 170
131Example: make a simple HTTP GET request for http://www.nethype.de/ 171Example: make a simple HTTP GET request for http://www.nethype.de/
132 172
148 } 188 }
149 ; 189 ;
150 190
151=cut 191=cut
152 192
193sub _slot_schedule;
194sub _slot_schedule($) {
195 my $host = shift;
196
197 while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
198 if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
199 # somebody wants that slot
200 ++$CO_SLOT{$host}[0];
201 ++$ACTIVE;
202
203 $cb->(AnyEvent::Util::guard {
204 --$ACTIVE;
205 --$CO_SLOT{$host}[0];
206 _slot_schedule $host;
207 });
208 } else {
209 # nobody wants the slot, maybe we can forget about it
210 delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
211 last;
212 }
213 }
214}
215
216# wait for a free slot on host, call callback
217sub _get_slot($$) {
218 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
219
220 _slot_schedule $_[0];
221}
222
153sub http_request($$$;@) { 223sub http_request($$@) {
154 my $cb = pop; 224 my $cb = pop;
155 my ($method, $url, %arg) = @_; 225 my ($method, $url, %arg) = @_;
156 226
157 my %hdr; 227 my %hdr;
158 228
172 my $proxy = $arg{proxy} || $PROXY; 242 my $proxy = $arg{proxy} || $PROXY;
173 my $timeout = $arg{timeout} || $TIMEOUT; 243 my $timeout = $arg{timeout} || $TIMEOUT;
174 244
175 $hdr{"user-agent"} ||= $USERAGENT; 245 $hdr{"user-agent"} ||= $USERAGENT;
176 246
177 my ($host, $port, $path, $scheme); 247 my ($scheme, $authority, $upath, $query, $fragment) =
248 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
249
250 $scheme = lc $scheme;
251
252 my $uport = $scheme eq "http" ? 80
253 : $scheme eq "https" ? 443
254 : return $cb->(undef, { Status => 599, Reason => "only http and https URL schemes supported" });
255
256 $hdr{referer} ||= "$scheme://$authority$upath"; # leave out fragment and query string, just a heuristic
257
258 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
259 or return $cb->(undef, { Status => 599, Reason => "unparsable URL" });
260
261 my $uhost = $1;
262 $uport = $2 if defined $2;
263
264 $uhost =~ s/^\[(.*)\]$/$1/;
265 $upath .= "?$query" if length $query;
266
267 $upath =~ s%^/?%/%;
268
269 # cookie processing
270 if (my $jar = $arg{cookie_jar}) {
271 %$jar = () if $jar->{version} < 1;
272
273 my @cookie;
274
275 while (my ($chost, $v) = each %$jar) {
276 next unless $chost eq substr $uhost, -length $chost;
277 next unless $chost =~ /^\./;
278
279 while (my ($cpath, $v) = each %$v) {
280 next unless $cpath eq substr $upath, 0, length $cpath;
281
282 while (my ($k, $v) = each %$v) {
283 next if $scheme ne "https" && exists $v->{secure};
284 push @cookie, "$k=$v->{value}";
285 }
286 }
287 }
288
289 $hdr{cookie} = join "; ", @cookie
290 if @cookie;
291 }
292
293 my ($rhost, $rport, $rpath); # request host, port, path
178 294
179 if ($proxy) { 295 if ($proxy) {
180 ($host, $port, $scheme) = @$proxy; 296 ($rhost, $rport, $scheme) = @$proxy;
181 $path = $url; 297 $rpath = $url;
182 } else { 298 } else {
183 ($scheme, my $authority, $path, my $query, my $fragment) = 299 ($rhost, $rport, $rpath) = ($uhost, $uport, $upath);
184 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
185
186 $port = $scheme eq "http" ? 80
187 : $scheme eq "https" ? 443
188 : return $cb->(undef, { Status => 599, Reason => "$url: only http and https URLs supported" });
189
190 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
191 or return $cb->(undef, { Status => 599, Reason => "$url: unparsable URL" });
192
193 $host = $1;
194 $port = $2 if defined $2;
195
196 $host =~ s/^\[(.*)\]$/$1/;
197 $path .= "?$query" if length $query;
198
199 $path = "/" unless $path;
200
201 $hdr{host} = $host = lc $host; 300 $hdr{host} = $uhost;
202 } 301 }
203 302
204 $scheme = lc $scheme;
205
206 my %state;
207
208 $hdr{"content-length"} = length $arg{body}; 303 $hdr{"content-length"} = length $arg{body};
209 304
305 my %state = (connect_guard => 1);
306
307 _get_slot $uhost, sub {
308 $state{slot_guard} = shift;
309
310 return unless $state{connect_guard};
311
210 $state{connect_guard} = AnyEvent::Socket::tcp_connect $host, $port, sub { 312 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
211 $state{fh} = shift 313 $state{fh} = shift
212 or return $cb->(undef, { Status => 599, Reason => "$!" }); 314 or return $cb->(undef, { Status => 599, Reason => "$!" });
213 315
214 delete $state{connect_guard}; # reduce memory usage, save a tree 316 delete $state{connect_guard}; # reduce memory usage, save a tree
215 317
216 # get handle 318 # get handle
217 $state{handle} = new AnyEvent::Handle 319 $state{handle} = new AnyEvent::Handle
218 fh => $state{fh}, 320 fh => $state{fh},
219 ($scheme eq "https" ? (tls => "connect") : ()); 321 ($scheme eq "https" ? (tls => "connect") : ());
220 322
221 # limit the number of persistent connections 323 # limit the number of persistent connections
222 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 324 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
223 ++$KA_COUNT{$_[1]}; 325 ++$KA_COUNT{$_[1]};
224 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} }; 326 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} };
225 $hdr{connection} = "keep-alive"; 327 $hdr{connection} = "keep-alive";
226 delete $hdr{connection}; # keep-alive not yet supported 328 delete $hdr{connection}; # keep-alive not yet supported
227 } else { 329 } else {
228 delete $hdr{connection}; 330 delete $hdr{connection};
229 } 331 }
230 332
231 # (re-)configure handle 333 # (re-)configure handle
232 $state{handle}->timeout ($timeout); 334 $state{handle}->timeout ($timeout);
233 $state{handle}->on_error (sub { 335 $state{handle}->on_error (sub {
336 my $errno = "$!";
234 %state = (); 337 %state = ();
235 $cb->(undef, { Status => 599, Reason => "$!" }); 338 $cb->(undef, { Status => 599, Reason => $errno });
236 }); 339 });
237 $state{handle}->on_eof (sub { 340 $state{handle}->on_eof (sub {
238 %state = (); 341 %state = ();
239 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" }); 342 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" });
240 }); 343 });
241 344
242 # send request 345 # send request
243 $state{handle}->push_write ( 346 $state{handle}->push_write (
244 "$method $path HTTP/1.0\015\012" 347 "$method $rpath HTTP/1.0\015\012"
245 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr) 348 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr)
246 . "\015\012" 349 . "\015\012"
247 . (delete $arg{body}) 350 . (delete $arg{body})
248 );
249
250 %hdr = (); # reduce memory usage, save a kitten
251
252 # status line
253 $state{handle}->push_read (line => qr/\015?\012/, sub {
254 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
255 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
256
257 my %hdr = ( # response headers
258 HTTPVersion => ",$1",
259 Status => ",$2",
260 Reason => ",$3",
261 ); 351 );
262 352
353 %hdr = (); # reduce memory usage, save a kitten
354
355 # status line
356 $state{handle}->push_read (line => qr/\015?\012/, sub {
357 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
358 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
359
360 my %hdr = ( # response headers
361 HTTPVersion => "\x00$1",
362 Status => "\x00$2",
363 Reason => "\x00$3",
364 );
365
263 # headers, could be optimized a bit 366 # headers, could be optimized a bit
264 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub { 367 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub {
265 for ("$_[1]\012") { 368 for ("$_[1]\012") {
266 # we support spaces in field names, as lotus domino 369 # we support spaces in field names, as lotus domino
267 # creates them. 370 # creates them.
268 $hdr{lc $1} .= ",$2" 371 $hdr{lc $1} .= "\x00$2"
269 while /\G 372 while /\G
270 ([^:\000-\037]+): 373 ([^:\000-\037]+):
271 [\011\040]* 374 [\011\040]*
272 ((?: [^\015\012]+ | \015?\012[\011\040] )*) 375 ((?: [^\015\012]+ | \015?\012[\011\040] )*)
273 \015?\012 376 \015?\012
274 /gxc; 377 /gxc;
275 378
276 /\G$/ 379 /\G$/
277 or return $cb->(undef, { Status => 599, Reason => "garbled response headers" }); 380 or return (%state = (), $cb->(undef, { Status => 599, Reason => "garbled response headers" }));
278 } 381 }
279 382
280 substr $_, 0, 1, "" 383 substr $_, 0, 1, ""
281 for values %hdr; 384 for values %hdr;
282 385
283 my $finish = sub { 386 my $finish = sub {
387 %state = ();
388
389 # set-cookie processing
390 if ($arg{cookie_jar} && exists $hdr{"set-cookie"}) {
391 for (split /\x00/, $hdr{"set-cookie"}) {
392 my ($cookie, @arg) = split /;\s*/;
393 my ($name, $value) = split /=/, $cookie, 2;
394 my %kv = (value => $value, map { split /=/, $_, 2 } @arg);
395
396 my $cdom = (delete $kv{domain}) || $uhost;
397 my $cpath = (delete $kv{path}) || "/";
398
399 $cdom =~ s/^.?/./; # make sure it starts with a "."
400
401 next if $cdom =~ /\.$/;
402
403 # this is not rfc-like and not netscape-like. go figure.
404 my $ndots = $cdom =~ y/.//;
405 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
406
407 # store it
408 $arg{cookie_jar}{version} = 1;
409 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
410 }
411 }
412
284 if ($_[1]{Status} =~ /^30[12]$/ && $recurse) { 413 if ($_[1]{Status} =~ /^30[12]$/ && $recurse) {
414 # microsoft and other assholes don't give a shit for following standards,
415 # try to support a common form of broken Location header.
416 $_[1]{location} =~ s%^/%$scheme://$uhost:$uport/%;
417
285 http_request ($method, $_[1]{location}, %arg, recurse => $recurse - 1, $cb); 418 http_request ($method, $_[1]{location}, %arg, recurse => $recurse - 1, $cb);
419 } else {
420 $cb->($_[0], $_[1]);
421 }
422 };
423
424 if ($hdr{Status} =~ /^(?:1..|204|304)$/ or $method eq "HEAD") {
425 $finish->(undef, \%hdr);
286 } else { 426 } else {
287 $cb->($_[0], $_[1]); 427 if (exists $hdr{"content-length"}) {
428 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
429 # could cache persistent connection now
430 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
431 # but we don't, due to misdesigns, this is annoyingly complex
432 };
433
434 $finish->($_[1], \%hdr);
435 });
436 } else {
437 # too bad, need to read until we get an error or EOF,
438 # no way to detect winged data.
439 $_[0]->on_error (sub {
440 $finish->($_[0]{rbuf}, \%hdr);
441 });
442 $_[0]->on_eof (undef);
443 $_[0]->on_read (sub { });
444 }
288 } 445 }
289 }; 446 });
290
291 if ($hdr{Status} =~ /^(?:1..|204|304)$/ or $method eq "HEAD") {
292 %state = ();
293 $finish->(undef, \%hdr);
294 } else {
295 if (exists $hdr{"content-length"}) {
296 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
297 # could cache persistent connection now
298 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
299 # but we don't, due to misdesigns, this is annoyingly complex
300 };
301
302 %state = ();
303 $finish->($_[1], \%hdr);
304 });
305 } else {
306 # too bad, need to read until we get an error or EOF,
307 # no way to detect winged data.
308 $_[0]->on_error (sub {
309 %state = ();
310 $finish->($_[0]{rbuf}, \%hdr);
311 });
312 $_[0]->on_eof (undef);
313 $_[0]->on_read (sub { });
314 }
315 }
316 }); 447 });
448 }, sub {
449 $timeout
317 }); 450 };
318 }, sub {
319 $timeout
320 }; 451 };
321 452
322 defined wantarray && AnyEvent::Util::guard { %state = () } 453 defined wantarray && AnyEvent::Util::guard { %state = () }
323} 454}
324 455
325sub http_get($$;@) { 456sub http_get($@) {
326 unshift @_, "GET"; 457 unshift @_, "GET";
327 &http_request 458 &http_request
328} 459}
329 460
330sub http_head($$;@) { 461sub http_head($@) {
331 unshift @_, "HEAD"; 462 unshift @_, "HEAD";
332 &http_request 463 &http_request
333} 464}
334 465
335sub http_post($$$;@) { 466sub http_post($$@) {
336 unshift @_, "POST", "body"; 467 unshift @_, "POST", "body";
337 &http_request 468 &http_request
338} 469}
339 470
340=back 471=back
367 498
368The maximum time to cache a persistent connection, in seconds (default: 2). 499The maximum time to cache a persistent connection, in seconds (default: 2).
369 500
370Not implemented currently. 501Not implemented currently.
371 502
503=item $AnyEvent::HTTP::ACTIVE
504
505The number of active connections. This is not the number of currently
506running requests, but the number of currently open and non-idle TCP
507connections. This number of can be useful for load-leveling.
508
372=back 509=back
373 510
374=cut 511=cut
375 512
376sub set_proxy($) { 513sub set_proxy($) {
384 521
385L<AnyEvent>. 522L<AnyEvent>.
386 523
387=head1 AUTHOR 524=head1 AUTHOR
388 525
389 Marc Lehmann <schmorp@schmorp.de> 526 Marc Lehmann <schmorp@schmorp.de>
390 http://home.schmorp.de/ 527 http://home.schmorp.de/
391 528
392=cut 529=cut
393 530
3941 5311
395 532

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines