ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.4 by root, Wed Jun 4 11:59:22 2008 UTC vs.
Revision 1.11 by root, Thu Jun 5 15:34:00 2008 UTC

8 8
9=head1 DESCRIPTION 9=head1 DESCRIPTION
10 10
11This module is an L<AnyEvent> user, you need to make sure that you use and 11This module is an L<AnyEvent> user, you need to make sure that you use and
12run a supported event loop. 12run a supported event loop.
13
14This module implements a simple, stateless and non-blocking HTTP
15client. It supports GET, POST and other request methods, cookies and more,
16all on a very low level. It can follow redirects supports proxies and
17automatically limits the number of connections to the values specified in
18the RFC.
19
20It should generally be a "good client" that is enough for most HTTP
21tasks. Simple tasks should be simple, but complex tasks should still be
22possible as the user retains control over request and response headers.
23
24The caller is responsible for authentication management, cookies (if
25the simplistic implementation in this module doesn't suffice), referer
26and other high-level protocol details for which this module offers only
27limited support.
13 28
14=head2 METHODS 29=head2 METHODS
15 30
16=over 4 31=over 4
17 32
41our $PERSISTENT_TIMEOUT = 2; 56our $PERSISTENT_TIMEOUT = 2;
42our $TIMEOUT = 300; 57our $TIMEOUT = 300;
43 58
44# changing these is evil 59# changing these is evil
45our $MAX_PERSISTENT_PER_HOST = 2; 60our $MAX_PERSISTENT_PER_HOST = 2;
46our $MAX_PER_HOST = 4; # not respected yet :( 61our $MAX_PER_HOST = 4;
47 62
48our $PROXY; 63our $PROXY;
49 64
50my %KA_COUNT; # number of open keep-alive connections per host 65my %KA_COUNT; # number of open keep-alive connections per host
66my %CO_SLOT; # number of open connections, and wait queue, per host
51 67
52=item http_get $url, key => value..., $cb->($data, $headers) 68=item http_get $url, key => value..., $cb->($data, $headers)
53 69
54Executes an HTTP-GET request. See the http_request function for details on 70Executes an HTTP-GET request. See the http_request function for details on
55additional parameters. 71additional parameters.
56 72
73=item http_head $url, key => value..., $cb->($data, $headers)
74
75Executes an HTTP-HEAD request. See the http_request function for details on
76additional parameters.
77
57=item http_get $url, $body, key => value..., $cb->($data, $headers) 78=item http_post $url, $body, key => value..., $cb->($data, $headers)
58 79
59Executes an HTTP-POST request with a requets body of C<$bod>. See the 80Executes an HTTP-POST request with a request body of C<$bod>. See the
60http_request function for details on additional parameters. 81http_request function for details on additional parameters.
61 82
62=item http_request $method => $url, key => value..., $cb->($data, $headers) 83=item http_request $method => $url, key => value..., $cb->($data, $headers)
63 84
64Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL 85Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
66 87
67The callback will be called with the response data as first argument 88The callback will be called with the response data as first argument
68(or C<undef> if it wasn't available due to errors), and a hash-ref with 89(or C<undef> if it wasn't available due to errors), and a hash-ref with
69response headers as second argument. 90response headers as second argument.
70 91
71All the headers in that has are lowercased. In addition to the response 92All the headers in that hash are lowercased. In addition to the response
72headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and 93headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and
73C<Reason> contain the three parts of the HTTP Status-Line of the same 94C<Reason> contain the three parts of the HTTP Status-Line of the same
74name. 95name. If the server sends a header multiple lines, then their contents
96will be joined together with C<\x00>.
75 97
76If an internal error occurs, such as not being able to resolve a hostname, 98If an internal error occurs, such as not being able to resolve a hostname,
77then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599> 99then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599>
78and the C<Reason> pseudo-header will contain an error message. 100and the C<Reason> pseudo-header will contain an error message.
79 101
102A typical callback might look like this:
103
104 sub {
105 my ($body, $hdr) = @_;
106
107 if ($hdr->{Status} =~ /^2/) {
108 ... everything should be ok
109 } else {
110 print "error, $hdr->{Status} $hdr->{Reason}\n";
111 }
112 }
113
80Additional parameters are key-value pairs, and are fully optional. They 114Additional parameters are key-value pairs, and are fully optional. They
81include: 115include:
82 116
83=over 4 117=over 4
84 118
107=item body => $string 141=item body => $string
108 142
109The request body, usually empty. Will be-sent as-is (future versions of 143The request body, usually empty. Will be-sent as-is (future versions of
110this module might offer more options). 144this module might offer more options).
111 145
146=item cookie_jar => $hash_ref
147
148Passing this parameter enables (simplified) cookie-processing, loosely
149based on the original netscape specification.
150
151The C<$hash_ref> must be an (initially empty) hash reference which will
152get updated automatically. It is possible to save the cookie_jar to
153persistent storage with something like JSON or Storable, but this is not
154recommended, as expire times are currently being ignored.
155
156Note that this cookie implementation is not of very high quality, nor
157meant to be complete. If you want complete cookie management you have to
158do that on your own. C<cookie_jar> is meant as a quick fix to get some
159cookie-using sites working. Cookies are a privacy disaster, do not use
160them unless required to.
161
112=back 162=back
113 163
114=back 164Example: make a simple HTTP GET request for http://www.nethype.de/
165
166 http_request GET => "http://www.nethype.de/", sub {
167 my ($body, $hdr) = @_;
168 print "$body\n";
169 };
170
171Example: make a HTTP HEAD request on https://www.google.com/, use a
172timeout of 30 seconds.
173
174 http_request
175 GET => "https://www.google.com",
176 timeout => 30,
177 sub {
178 my ($body, $hdr) = @_;
179 use Data::Dumper;
180 print Dumper $hdr;
181 }
182 ;
115 183
116=cut 184=cut
185
186sub _slot_schedule($) {
187 my $host = shift;
188
189 while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
190 if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
191 # somebody wnats that slot
192 ++$CO_SLOT{$host}[0];
193
194 $cb->(AnyEvent::Util::guard {
195 --$CO_SLOT{$host}[0];
196 _slot_schedule $host;
197 });
198 } else {
199 # nobody wants the slot, maybe we can forget about it
200 delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
201 warn "$host deleted" unless $CO_SLOT{$host}[0];#d#
202 last;
203 }
204 }
205}
206
207# wait for a free slot on host, call callback
208sub _get_slot($$) {
209 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
210
211 _slot_schedule $_[0];
212}
117 213
118sub http_request($$$;@) { 214sub http_request($$$;@) {
119 my $cb = pop; 215 my $cb = pop;
120 my ($method, $url, %arg) = @_; 216 my ($method, $url, %arg) = @_;
121 217
122 my %hdr; 218 my %hdr;
123 219
124 $method = uc $method; 220 $method = uc $method;
125 221
126 if (my $hdr = delete $arg{headers}) { 222 if (my $hdr = $arg{headers}) {
127 while (my ($k, $v) = each %$hdr) { 223 while (my ($k, $v) = each %$hdr) {
128 $hdr{lc $k} = $v; 224 $hdr{lc $k} = $v;
129 } 225 }
130 } 226 }
131 227
228 my $recurse = exists $arg{recurse} ? $arg{recurse} : $MAX_RECURSE;
229
230 return $cb->(undef, { Status => 599, Reason => "recursion limit reached" })
231 if $recurse < 0;
232
132 my $proxy = $arg{proxy} || $PROXY; 233 my $proxy = $arg{proxy} || $PROXY;
133 my $timeout = $arg{timeout} || $TIMEOUT; 234 my $timeout = $arg{timeout} || $TIMEOUT;
134 my $recurse = exists $arg{recurse} ? $arg{recurse} : $MAX_RECURSE;
135 235
136 $hdr{"user-agent"} ||= $USERAGENT; 236 $hdr{"user-agent"} ||= $USERAGENT;
137 237
138 my ($host, $port, $path, $scheme); 238 my ($scheme, $authority, $upath, $query, $fragment) =
239 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
240
241 $scheme = lc $scheme;
242
243 my $uport = $scheme eq "http" ? 80
244 : $scheme eq "https" ? 443
245 : return $cb->(undef, { Status => 599, Reason => "only http and https URL schemes supported" });
246
247 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
248 or return $cb->(undef, { Status => 599, Reason => "unparsable URL" });
249
250 my $uhost = $1;
251 $uport = $2 if defined $2;
252
253 $uhost =~ s/^\[(.*)\]$/$1/;
254 $upath .= "?$query" if length $query;
255
256 $upath =~ s%^/?%/%;
257
258 # cookie processing
259 if (my $jar = $arg{cookie_jar}) {
260 %$jar = () if $jar->{version} < 1;
261
262 my @cookie;
263
264 while (my ($chost, $v) = each %$jar) {
265 next unless $chost eq substr $uhost, -length $chost;
266 next unless $chost =~ /^\./;
267
268 while (my ($cpath, $v) = each %$v) {
269 next unless $cpath eq substr $upath, 0, length $cpath;
270
271 while (my ($k, $v) = each %$v) {
272 next if $scheme ne "https" && exists $v->{secure};
273 push @cookie, "$k=$v->{value}";
274 }
275 }
276 }
277
278 $hdr{cookie} = join "; ", @cookie
279 if @cookie;
280 }
281
282 my ($rhost, $rport, $rpath); # request host, port, path
139 283
140 if ($proxy) { 284 if ($proxy) {
141 ($host, $port, $scheme) = @$proxy; 285 ($rhost, $rport, $scheme) = @$proxy;
142 $path = $url; 286 $rpath = $url;
143 } else { 287 } else {
144 ($scheme, my $authority, $path, my $query, my $fragment) = 288 ($rhost, $rport, $rpath) = ($uhost, $uport, $upath);
145 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
146
147 $port = $scheme eq "http" ? 80
148 : $scheme eq "https" ? 443
149 : croak "$url: only http and https URLs supported";
150
151 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
152 or croak "$authority: unparsable URL";
153
154 $host = $1;
155 $port = $2 if defined $2;
156
157 $host =~ s/^\[(.*)\]$/$1/;
158 $path .= "?$query" if length $query;
159
160 $path = "/" unless $path;
161
162 $hdr{host} = $host = lc $host; 289 $hdr{host} = $uhost;
163 } 290 }
164 291
165 $scheme = lc $scheme;
166
167 my %state;
168
169 $state{body} = delete $arg{body};
170
171 $hdr{"content-length"} = length $state{body}; 292 $hdr{"content-length"} = length $arg{body};
172 293
294 my %state = (connect_guard => 1);
295
296 _get_slot $uhost, sub {
297 $state{slot_guard} = shift;
298
299 return unless $state{connect_guard};
300
173 $state{connect_guard} = AnyEvent::Socket::tcp_connect $host, $port, sub { 301 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
174 $state{fh} = shift 302 $state{fh} = shift
175 or return $cb->(undef, { Status => 599, Reason => "$!" }); 303 or return $cb->(undef, { Status => 599, Reason => "$!" });
176 304
177 delete $state{connect_guard}; # reduce memory usage, save a tree 305 delete $state{connect_guard}; # reduce memory usage, save a tree
178 306
179 # get handle 307 # get handle
180 $state{handle} = new AnyEvent::Handle 308 $state{handle} = new AnyEvent::Handle
181 fh => $state{fh}, 309 fh => $state{fh},
182 ($scheme eq "https" ? (tls => "connect") : ()); 310 ($scheme eq "https" ? (tls => "connect") : ());
183 311
184 # limit the number of persistent connections 312 # limit the number of persistent connections
185 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 313 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
186 ++$KA_COUNT{$_[1]}; 314 ++$KA_COUNT{$_[1]};
187 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} }; 315 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} };
188 $hdr{connection} = "keep-alive"; 316 $hdr{connection} = "keep-alive";
189 delete $hdr{connection}; # keep-alive not yet supported 317 delete $hdr{connection}; # keep-alive not yet supported
190 } else { 318 } else {
191 delete $hdr{connection}; 319 delete $hdr{connection};
192 } 320 }
193 321
194 # (re-)configure handle 322 # (re-)configure handle
195 $state{handle}->timeout ($timeout); 323 $state{handle}->timeout ($timeout);
196 $state{handle}->on_error (sub { 324 $state{handle}->on_error (sub {
197 %state = (); 325 %state = ();
198 $cb->(undef, { Status => 599, Reason => "$!" }); 326 $cb->(undef, { Status => 599, Reason => "$!" });
199 }); 327 });
200 $state{handle}->on_eof (sub { 328 $state{handle}->on_eof (sub {
201 %state = (); 329 %state = ();
202 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" }); 330 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" });
203 }); 331 });
204 332
205 # send request 333 # send request
206 $state{handle}->push_write ( 334 $state{handle}->push_write (
207 "$method $path HTTP/1.0\015\012" 335 "$method $rpath HTTP/1.0\015\012"
208 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr) 336 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr)
209 . "\015\012" 337 . "\015\012"
210 . (delete $state{body}) 338 . (delete $arg{body})
211 );
212
213 %hdr = (); # reduce memory usage, save a kitten
214
215 # status line
216 $state{handle}->push_read (line => qr/\015?\012/, sub {
217 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
218 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
219
220 my %hdr = ( # response headers
221 HTTPVersion => ",$1",
222 Status => ",$2",
223 Reason => ",$3",
224 ); 339 );
225 340
341 %hdr = (); # reduce memory usage, save a kitten
342
343 # status line
344 $state{handle}->push_read (line => qr/\015?\012/, sub {
345 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
346 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
347
348 my %hdr = ( # response headers
349 HTTPVersion => "\x00$1",
350 Status => "\x00$2",
351 Reason => "\x00$3",
352 );
353
226 # headers, could be optimized a bit 354 # headers, could be optimized a bit
227 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub { 355 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub {
228 for ("$_[1]\012") { 356 for ("$_[1]\012") {
229 # we support spaces in field names, as lotus domino 357 # we support spaces in field names, as lotus domino
230 # creates them. 358 # creates them.
231 $hdr{lc $1} .= ",$2" 359 $hdr{lc $1} .= "\x00$2"
232 while /\G 360 while /\G
233 ([^:\000-\037]+): 361 ([^:\000-\037]+):
234 [\011\040]* 362 [\011\040]*
235 ((?: [^\015\012]+ | \015?\012[\011\040] )*) 363 ((?: [^\015\012]+ | \015?\012[\011\040] )*)
236 \015?\012 364 \015?\012
237 /gxc; 365 /gxc;
238 366
239 /\G$/ 367 /\G$/
240 or return $cb->(undef, { Status => 599, Reason => "garbled response headers" }); 368 or return (%state = (), $cb->(undef, { Status => 599, Reason => "garbled response headers" }));
241 } 369 }
242 370
243 substr $_, 0, 1, "" 371 substr $_, 0, 1, ""
244 for values %hdr; 372 for values %hdr;
245 373
246 if ($method eq "HEAD") { 374 my $finish = sub {
247 %state = (); 375 %state = ();
248 $cb->(undef, \%hdr); 376
249 } else { 377 # set-cookie processing
250 if (exists $hdr{"content-length"}) { 378 if ($arg{cookie_jar} && exists $hdr{"set-cookie"}) {
251 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub { 379 for (split /\x00/, $hdr{"set-cookie"}) {
252 # could cache persistent connection now 380 my ($cookie, @arg) = split /;\s*/;
253 if ($hdr{connection} =~ /\bkeep-alive\b/i) { 381 my ($name, $value) = split /=/, $cookie, 2;
254 # but we don't, due to misdesigns, this is annoyingly complex 382 my %kv = (value => $value, map { split /=/, $_, 2 } @arg);
383
384 my $cdom = (delete $kv{domain}) || $uhost;
385 my $cpath = (delete $kv{path}) || "/";
386
387 $cdom =~ s/^.?/./; # make sure it starts with a "."
388
389 next if $cdom =~ /\.$/;
390
391 # this is not rfc-like and not netscape-like. go figure.
392 my $ndots = $cdom =~ y/.//;
393 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
394
395 # store it
396 $arg{cookie_jar}{version} = 1;
397 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
255 }; 398 }
256
257 %state = ();
258 $cb->($_[1], \%hdr);
259 }); 399 }
400
401 if ($_[1]{Status} =~ /^x30[12]$/ && $recurse) {
402 # microsoft and other assholes don't give a shit for following standards,
403 # try to support a common form of broken Location header.
404 $_[1]{location} =~ s%^/%$scheme://$uhost:$uport/%;
405
406 http_request ($method, $_[1]{location}, %arg, recurse => $recurse - 1, $cb);
407 } else {
408 $cb->($_[0], $_[1]);
409 }
410 };
411
412 if ($hdr{Status} =~ /^(?:1..|204|304)$/ or $method eq "HEAD") {
413 $finish->(undef, \%hdr);
260 } else { 414 } else {
415 if (exists $hdr{"content-length"}) {
416 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
417 # could cache persistent connection now
418 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
419 # but we don't, due to misdesigns, this is annoyingly complex
420 };
421
422 $finish->($_[1], \%hdr);
423 });
424 } else {
261 # too bad, need to read until we get an error or EOF, 425 # too bad, need to read until we get an error or EOF,
262 # no way to detect winged data. 426 # no way to detect winged data.
263 $_[0]->on_error (sub { 427 $_[0]->on_error (sub {
264 %state = ();
265 $cb->($_[0]{rbuf}, \%hdr); 428 $finish->($_[0]{rbuf}, \%hdr);
266 }); 429 });
267 $_[0]->on_eof (undef); 430 $_[0]->on_eof (undef);
268 $_[0]->on_read (sub { }); 431 $_[0]->on_read (sub { });
432 }
269 } 433 }
270 } 434 });
271 }); 435 });
436 }, sub {
437 $timeout
272 }); 438 };
273 }, sub {
274 $timeout
275 }; 439 };
276 440
277 defined wantarray && AnyEvent::Util::guard { %state = () } 441 defined wantarray && AnyEvent::Util::guard { %state = () }
278} 442}
279 443
290sub http_post($$$;@) { 454sub http_post($$$;@) {
291 unshift @_, "POST", "body"; 455 unshift @_, "POST", "body";
292 &http_request 456 &http_request
293} 457}
294 458
459=back
460
295=head2 GLOBAL FUNCTIONS AND VARIABLES 461=head2 GLOBAL FUNCTIONS AND VARIABLES
296 462
297=over 4 463=over 4
298 464
299=item AnyEvent::HTTP::set_proxy "proxy-url" 465=item AnyEvent::HTTP::set_proxy "proxy-url"

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines