ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.1 by root, Tue Jun 3 16:37:13 2008 UTC vs.
Revision 1.12 by root, Thu Jun 5 16:33:02 2008 UTC

8 8
9=head1 DESCRIPTION 9=head1 DESCRIPTION
10 10
11This module is an L<AnyEvent> user, you need to make sure that you use and 11This module is an L<AnyEvent> user, you need to make sure that you use and
12run a supported event loop. 12run a supported event loop.
13
14This module implements a simple, stateless and non-blocking HTTP
15client. It supports GET, POST and other request methods, cookies and more,
16all on a very low level. It can follow redirects supports proxies and
17automatically limits the number of connections to the values specified in
18the RFC.
19
20It should generally be a "good client" that is enough for most HTTP
21tasks. Simple tasks should be simple, but complex tasks should still be
22possible as the user retains control over request and response headers.
23
24The caller is responsible for authentication management, cookies (if
25the simplistic implementation in this module doesn't suffice), referer
26and other high-level protocol details for which this module offers only
27limited support.
13 28
14=head2 METHODS 29=head2 METHODS
15 30
16=over 4 31=over 4
17 32
33 48
34our $VERSION = '1.0'; 49our $VERSION = '1.0';
35 50
36our @EXPORT = qw(http_get http_request); 51our @EXPORT = qw(http_get http_request);
37 52
38our $MAX_REDIRECTS = 10;
39our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 53our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
54our $MAX_RECURSE = 10;
40our $MAX_PERSISTENT = 8; 55our $MAX_PERSISTENT = 8;
41our $PERSISTENT_TIMEOUT = 15; 56our $PERSISTENT_TIMEOUT = 2;
42our $TIMEOUT = 60; 57our $TIMEOUT = 300;
43 58
44# changing these is evil 59# changing these is evil
45our $MAX_PERSISTENT_PER_HOST = 2; 60our $MAX_PERSISTENT_PER_HOST = 2;
46our $MAX_PER_HOST = 4; # not respected yet :( 61our $MAX_PER_HOST = 4;
62
63our $PROXY;
47 64
48my %KA_COUNT; # number of open keep-alive connections per host 65my %KA_COUNT; # number of open keep-alive connections per host
66my %CO_SLOT; # number of open connections, and wait queue, per host
49 67
50=item http_get $url, key => value..., $cb->($data, $headers) 68=item http_get $url, key => value..., $cb->($data, $headers)
51 69
52Executes an HTTP-GET request. See the http_request function for details on 70Executes an HTTP-GET request. See the http_request function for details on
53additional parameters. 71additional parameters.
54 72
73=item http_head $url, key => value..., $cb->($data, $headers)
74
75Executes an HTTP-HEAD request. See the http_request function for details on
76additional parameters.
77
78=item http_post $url, $body, key => value..., $cb->($data, $headers)
79
80Executes an HTTP-POST request with a request body of C<$bod>. See the
81http_request function for details on additional parameters.
82
55=item http_request $method => $url, key => value..., $cb->($data, $headers) 83=item http_request $method => $url, key => value..., $cb->($data, $headers)
56 84
57Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL 85Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
58must be an absolute http or https URL. 86must be an absolute http or https URL.
59 87
88The callback will be called with the response data as first argument
89(or C<undef> if it wasn't available due to errors), and a hash-ref with
90response headers as second argument.
91
92All the headers in that hash are lowercased. In addition to the response
93headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and
94C<Reason> contain the three parts of the HTTP Status-Line of the same
95name. If the server sends a header multiple lines, then their contents
96will be joined together with C<\x00>.
97
98If an internal error occurs, such as not being able to resolve a hostname,
99then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599>
100and the C<Reason> pseudo-header will contain an error message.
101
102A typical callback might look like this:
103
104 sub {
105 my ($body, $hdr) = @_;
106
107 if ($hdr->{Status} =~ /^2/) {
108 ... everything should be ok
109 } else {
110 print "error, $hdr->{Status} $hdr->{Reason}\n";
111 }
112 }
113
60Additional parameters are key-value pairs, and are fully optional. They 114Additional parameters are key-value pairs, and are fully optional. They
61include: 115include:
62 116
63=over 4 117=over 4
64 118
65=item recurse => $boolean (default: true) 119=item recurse => $count (default: $MAX_RECURSE)
66 120
67Whether to recurse requests or not, e.g. on redirects, authentication 121Whether to recurse requests or not, e.g. on redirects, authentication
68retries and so on. 122retries and so on, and how often to do so.
69 123
70=item headers => hashref 124=item headers => hashref
71 125
72The request headers to use. 126The request headers to use. Currently, C<http_request> may provide its
127own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers
128and will provide defaults for C<User-Agent:> and C<Referer:>.
73 129
74=item timeout => $seconds 130=item timeout => $seconds
75 131
76The time-out to use for various stages - each connect attempt will reset 132The time-out to use for various stages - each connect attempt will reset
77the timeout, as will read or write activity. 133the timeout, as will read or write activity. Default timeout is 5 minutes.
134
135=item proxy => [$host, $port[, $scheme]] or undef
136
137Use the given http proxy for all requests. If not specified, then the
138default proxy (as specified by C<$ENV{http_proxy}>) is used.
139
140C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
141HTTPS.
142
143=item body => $string
144
145The request body, usually empty. Will be-sent as-is (future versions of
146this module might offer more options).
147
148=item cookie_jar => $hash_ref
149
150Passing this parameter enables (simplified) cookie-processing, loosely
151based on the original netscape specification.
152
153The C<$hash_ref> must be an (initially empty) hash reference which will
154get updated automatically. It is possible to save the cookie_jar to
155persistent storage with something like JSON or Storable, but this is not
156recommended, as expire times are currently being ignored.
157
158Note that this cookie implementation is not of very high quality, nor
159meant to be complete. If you want complete cookie management you have to
160do that on your own. C<cookie_jar> is meant as a quick fix to get some
161cookie-using sites working. Cookies are a privacy disaster, do not use
162them unless required to.
78 163
79=back 164=back
80 165
81=back 166Example: make a simple HTTP GET request for http://www.nethype.de/
167
168 http_request GET => "http://www.nethype.de/", sub {
169 my ($body, $hdr) = @_;
170 print "$body\n";
171 };
172
173Example: make a HTTP HEAD request on https://www.google.com/, use a
174timeout of 30 seconds.
175
176 http_request
177 GET => "https://www.google.com",
178 timeout => 30,
179 sub {
180 my ($body, $hdr) = @_;
181 use Data::Dumper;
182 print Dumper $hdr;
183 }
184 ;
82 185
83=cut 186=cut
187
188sub _slot_schedule;
189sub _slot_schedule($) {
190 my $host = shift;
191
192 while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
193 if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
194 # somebody wants that slot
195 ++$CO_SLOT{$host}[0];
196
197 $cb->(AnyEvent::Util::guard {
198 --$CO_SLOT{$host}[0];
199 _slot_schedule $host;
200 });
201 } else {
202 # nobody wants the slot, maybe we can forget about it
203 delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
204 last;
205 }
206 }
207}
208
209# wait for a free slot on host, call callback
210sub _get_slot($$) {
211 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
212
213 _slot_schedule $_[0];
214}
84 215
85sub http_request($$$;@) { 216sub http_request($$$;@) {
86 my $cb = pop; 217 my $cb = pop;
87 my ($method, $url, %arg) = @_; 218 my ($method, $url, %arg) = @_;
88 219
89 my %hdr; 220 my %hdr;
90 221
222 $method = uc $method;
223
91 if (my $hdr = delete $arg{headers}) { 224 if (my $hdr = $arg{headers}) {
92 while (my ($k, $v) = each %$hdr) { 225 while (my ($k, $v) = each %$hdr) {
93 $hdr{lc $k} = $v; 226 $hdr{lc $k} = $v;
94 } 227 }
95 } 228 }
96 229
230 my $recurse = exists $arg{recurse} ? $arg{recurse} : $MAX_RECURSE;
231
232 return $cb->(undef, { Status => 599, Reason => "recursion limit reached" })
233 if $recurse < 0;
234
235 my $proxy = $arg{proxy} || $PROXY;
97 my $timeout = $arg{timeout} || $TIMEOUT; 236 my $timeout = $arg{timeout} || $TIMEOUT;
98 237
99 $hdr{"user-agent"} ||= $USERAGENT; 238 $hdr{"user-agent"} ||= $USERAGENT;
100 239
101 my ($scheme, $authority, $path, $query, $fragment) = 240 my ($scheme, $authority, $upath, $query, $fragment) =
102 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|; 241 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
103 242
104 $scheme = lc $scheme; 243 $scheme = lc $scheme;
244
105 my $port = $scheme eq "http" ? 80 245 my $uport = $scheme eq "http" ? 80
106 : $scheme eq "https" ? 443 246 : $scheme eq "https" ? 443
107 : croak "$url: only http and https URLs supported"; 247 : return $cb->(undef, { Status => 599, Reason => "only http and https URL schemes supported" });
108 248
109 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 249 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
110 or croak "$authority: unparsable URL"; 250 or return $cb->(undef, { Status => 599, Reason => "unparsable URL" });
111 251
112 my $host = $1; 252 my $uhost = $1;
113 $port = $2 if defined $2; 253 $uport = $2 if defined $2;
114 254
115 $host =~ s/^\[(.*)\]$/$1/; 255 $uhost =~ s/^\[(.*)\]$/$1/;
116 $path .= "?$query" if length $query; 256 $upath .= "?$query" if length $query;
117 257
118 $hdr{host} = $host = lc $host; 258 $upath =~ s%^/?%/%;
119 259
120 my %state; 260 $hdr{referer} ||= "$scheme://$authority$upath";
121 261
122 my $body = ""; 262 # cookie processing
123 $state{body} = $body; 263 if (my $jar = $arg{cookie_jar}) {
124 264 %$jar = () if $jar->{version} < 1;
125 $hdr{"content-length"} = length $body; 265
126 266 my @cookie;
127 $state{connect_guard} = AnyEvent::Socket::tcp_connect $host, $port, sub { 267
128 $state{fh} = shift 268 while (my ($chost, $v) = each %$jar) {
129 or return $cb->(undef, { Status => 599, Reason => "$!" }); 269 next unless $chost eq substr $uhost, -length $chost;
130 270 next unless $chost =~ /^\./;
131 delete $state{connect_guard}; # reduce memory usage, save a tree 271
132 272 while (my ($cpath, $v) = each %$v) {
133 # get handle 273 next unless $cpath eq substr $upath, 0, length $cpath;
134 $state{handle} = new AnyEvent::Handle 274
135 fh => $state{fh}, 275 while (my ($k, $v) = each %$v) {
136 ($scheme eq "https" ? (tls => "connect") : ()); 276 next if $scheme ne "https" && exists $v->{secure};
137 277 push @cookie, "$k=$v->{value}";
138 # limit the number of persistent connections 278 }
139 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 279 }
140 ++$KA_COUNT{$_[1]};
141 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} };
142 $hdr{connection} = "keep-alive";
143 } else {
144 delete $hdr{connection};
145 } 280 }
281
282 $hdr{cookie} = join "; ", @cookie
283 if @cookie;
284 }
146 285
286 my ($rhost, $rport, $rpath); # request host, port, path
287
288 if ($proxy) {
289 ($rhost, $rport, $scheme) = @$proxy;
290 $rpath = $url;
291 } else {
292 ($rhost, $rport, $rpath) = ($uhost, $uport, $upath);
293 $hdr{host} = $uhost;
294 }
295
296 $hdr{"content-length"} = length $arg{body};
297
298 my %state = (connect_guard => 1);
299
300 _get_slot $uhost, sub {
301 $state{slot_guard} = shift;
302
303 return unless $state{connect_guard};
304
305 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
306 $state{fh} = shift
307 or return $cb->(undef, { Status => 599, Reason => "$!" });
308
309 delete $state{connect_guard}; # reduce memory usage, save a tree
310
311 # get handle
312 $state{handle} = new AnyEvent::Handle
313 fh => $state{fh},
314 ($scheme eq "https" ? (tls => "connect") : ());
315
316 # limit the number of persistent connections
317 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
318 ++$KA_COUNT{$_[1]};
319 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} };
320 $hdr{connection} = "keep-alive";
321 delete $hdr{connection}; # keep-alive not yet supported
322 } else {
323 delete $hdr{connection};
324 }
325
147 # (re-)configure handle 326 # (re-)configure handle
148 $state{handle}->timeout ($timeout); 327 $state{handle}->timeout ($timeout);
149 $state{handle}->on_error (sub { 328 $state{handle}->on_error (sub {
150 %state = (); 329 %state = ();
151 $cb->(undef, { Status => 599, Reason => "$!" }); 330 $cb->(undef, { Status => 599, Reason => "$!" });
152 }); 331 });
153 $state{handle}->on_eof (sub { 332 $state{handle}->on_eof (sub {
154 %state = (); 333 %state = ();
155 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" }); 334 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" });
156 }); 335 });
157 336
158 # send request 337 # send request
159 $state{handle}->push_write ( 338 $state{handle}->push_write (
160 "\U$method\E $path HTTP/1.0\015\012" 339 "$method $rpath HTTP/1.0\015\012"
161 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr) 340 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr)
162 . "\015\012" 341 . "\015\012"
163 . (delete $state{body}) 342 . (delete $arg{body})
164 );
165
166 %hdr = (); # reduce memory usage, save a kitten
167
168 # status line
169 $state{handle}->push_read (line => qr/\015?\012/, sub {
170 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
171 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
172
173 my %hdr = ( # response headers
174 HTTPVersion => ",$1",
175 Status => ",$2",
176 Reason => ",$3",
177 ); 343 );
178 344
345 %hdr = (); # reduce memory usage, save a kitten
346
347 # status line
348 $state{handle}->push_read (line => qr/\015?\012/, sub {
349 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
350 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
351
352 my %hdr = ( # response headers
353 HTTPVersion => "\x00$1",
354 Status => "\x00$2",
355 Reason => "\x00$3",
356 );
357
179 # headers, could be optimized a bit 358 # headers, could be optimized a bit
180 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub { 359 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub {
181 for ("$_[1]\012") { 360 for ("$_[1]\012") {
361 # we support spaces in field names, as lotus domino
362 # creates them.
182 $hdr{lc $1} .= ",$2" 363 $hdr{lc $1} .= "\x00$2"
183 while /\G 364 while /\G
184 ([^:\000-\040]+): 365 ([^:\000-\037]+):
185 [\011\040]* 366 [\011\040]*
186 ((?: [^\015\012]+ | \015?\012[\011\040] )*) 367 ((?: [^\015\012]+ | \015?\012[\011\040] )*)
187 \015?\012 368 \015?\012
188 /gxc; 369 /gxc;
189 370
190 /\G$/ 371 /\G$/
191 or return $cb->(undef, { Status => 599, Reason => "garbled response headers" }); 372 or return (%state = (), $cb->(undef, { Status => 599, Reason => "garbled response headers" }));
192 } 373 }
193 374
194 substr $_, 0, 1, "" 375 substr $_, 0, 1, ""
195 for values %hdr; 376 for values %hdr;
196 377
197 if (exists $hdr{"content-length"}) { 378 my $finish = sub {
198 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
199 # could cache persistent connection now
200 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
201 };
202
203 %state = (); 379 %state = ();
380
381 # set-cookie processing
382 if ($arg{cookie_jar} && exists $hdr{"set-cookie"}) {
383 for (split /\x00/, $hdr{"set-cookie"}) {
384 my ($cookie, @arg) = split /;\s*/;
385 my ($name, $value) = split /=/, $cookie, 2;
386 my %kv = (value => $value, map { split /=/, $_, 2 } @arg);
387
388 my $cdom = (delete $kv{domain}) || $uhost;
389 my $cpath = (delete $kv{path}) || "/";
390
391 $cdom =~ s/^.?/./; # make sure it starts with a "."
392
393 next if $cdom =~ /\.$/;
394
395 # this is not rfc-like and not netscape-like. go figure.
396 my $ndots = $cdom =~ y/.//;
397 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
398
399 # store it
400 $arg{cookie_jar}{version} = 1;
401 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
402 }
403 }
404
405 if ($_[1]{Status} =~ /^x30[12]$/ && $recurse) {
406 # microsoft and other assholes don't give a shit for following standards,
407 # try to support a common form of broken Location header.
408 $_[1]{location} =~ s%^/%$scheme://$uhost:$uport/%;
409
410 http_request ($method, $_[1]{location}, %arg, recurse => $recurse - 1, $cb);
411 } else {
204 $cb->($_[1], \%hdr); 412 $cb->($_[0], $_[1]);
413 }
205 }); 414 };
415
416 if ($hdr{Status} =~ /^(?:1..|204|304)$/ or $method eq "HEAD") {
417 $finish->(undef, \%hdr);
206 } else { 418 } else {
419 if (exists $hdr{"content-length"}) {
420 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
421 # could cache persistent connection now
422 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
423 # but we don't, due to misdesigns, this is annoyingly complex
424 };
425
426 $finish->($_[1], \%hdr);
427 });
428 } else {
207 # too bad, need to read until we get an error or EOF, 429 # too bad, need to read until we get an error or EOF,
208 # no way to detect winged data. 430 # no way to detect winged data.
209 $_[0]->on_error (sub { 431 $_[0]->on_error (sub {
210 %state = ();
211 $cb->($_[0]{rbuf}, \%hdr); 432 $finish->($_[0]{rbuf}, \%hdr);
433 });
434 $_[0]->on_eof (undef);
435 $_[0]->on_read (sub { });
436 }
212 }); 437 }
213 $_[0]->on_eof (undef);
214 $_[0]->on_read (sub { });
215 } 438 });
216 }); 439 });
440 }, sub {
441 $timeout
217 }); 442 };
218 }, sub {
219 $timeout
220 }; 443 };
221 444
222 defined wantarray && AnyEvent::Util::guard { %state = () } 445 defined wantarray && AnyEvent::Util::guard { %state = () }
223} 446}
224 447
225sub http_get($$;@) { 448sub http_get($$;@) {
226 unshift @_, "GET"; 449 unshift @_, "GET";
227 &http_request 450 &http_request
228} 451}
229 452
453sub http_head($$;@) {
454 unshift @_, "HEAD";
455 &http_request
456}
457
458sub http_post($$$;@) {
459 unshift @_, "POST", "body";
460 &http_request
461}
462
463=back
464
230=head2 GLOBAL VARIABLES 465=head2 GLOBAL FUNCTIONS AND VARIABLES
231 466
232=over 4 467=over 4
233 468
469=item AnyEvent::HTTP::set_proxy "proxy-url"
470
471Sets the default proxy server to use. The proxy-url must begin with a
472string of the form C<http://host:port> (optionally C<https:...>).
473
234=item $AnyEvent::HTTP::MAX_REDIRECTS 474=item $AnyEvent::HTTP::MAX_RECURSE
235 475
236The default value for the C<max_redirects> request parameter 476The default value for the C<recurse> request parameter (default: C<10>).
237(default: C<10>).
238 477
239=item $AnyEvent::HTTP::USERAGENT 478=item $AnyEvent::HTTP::USERAGENT
240 479
241The default value for the C<User-Agent> header (the default is 480The default value for the C<User-Agent> header (the default is
242C<Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 481C<Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
243 482
244=item $AnyEvent::HTTP::MAX_PERSISTENT 483=item $AnyEvent::HTTP::MAX_PERSISTENT
245 484
246The maximum number of persistent connections to keep open (default: 8). 485The maximum number of persistent connections to keep open (default: 8).
247 486
487Not implemented currently.
488
248=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 489=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
249 490
250The maximum time to cache a persistent connection, in seconds (default: 15). 491The maximum time to cache a persistent connection, in seconds (default: 2).
492
493Not implemented currently.
251 494
252=back 495=back
253 496
254=cut 497=cut
498
499sub set_proxy($) {
500 $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
501}
502
503# initialise proxy from environment
504set_proxy $ENV{http_proxy};
255 505
256=head1 SEE ALSO 506=head1 SEE ALSO
257 507
258L<AnyEvent>. 508L<AnyEvent>.
259 509

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines