ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.1 by root, Tue Jun 3 16:37:13 2008 UTC vs.
Revision 1.11 by root, Thu Jun 5 15:34:00 2008 UTC

8 8
9=head1 DESCRIPTION 9=head1 DESCRIPTION
10 10
11This module is an L<AnyEvent> user, you need to make sure that you use and 11This module is an L<AnyEvent> user, you need to make sure that you use and
12run a supported event loop. 12run a supported event loop.
13
14This module implements a simple, stateless and non-blocking HTTP
15client. It supports GET, POST and other request methods, cookies and more,
16all on a very low level. It can follow redirects supports proxies and
17automatically limits the number of connections to the values specified in
18the RFC.
19
20It should generally be a "good client" that is enough for most HTTP
21tasks. Simple tasks should be simple, but complex tasks should still be
22possible as the user retains control over request and response headers.
23
24The caller is responsible for authentication management, cookies (if
25the simplistic implementation in this module doesn't suffice), referer
26and other high-level protocol details for which this module offers only
27limited support.
13 28
14=head2 METHODS 29=head2 METHODS
15 30
16=over 4 31=over 4
17 32
33 48
34our $VERSION = '1.0'; 49our $VERSION = '1.0';
35 50
36our @EXPORT = qw(http_get http_request); 51our @EXPORT = qw(http_get http_request);
37 52
38our $MAX_REDIRECTS = 10;
39our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 53our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
54our $MAX_RECURSE = 10;
40our $MAX_PERSISTENT = 8; 55our $MAX_PERSISTENT = 8;
41our $PERSISTENT_TIMEOUT = 15; 56our $PERSISTENT_TIMEOUT = 2;
42our $TIMEOUT = 60; 57our $TIMEOUT = 300;
43 58
44# changing these is evil 59# changing these is evil
45our $MAX_PERSISTENT_PER_HOST = 2; 60our $MAX_PERSISTENT_PER_HOST = 2;
46our $MAX_PER_HOST = 4; # not respected yet :( 61our $MAX_PER_HOST = 4;
62
63our $PROXY;
47 64
48my %KA_COUNT; # number of open keep-alive connections per host 65my %KA_COUNT; # number of open keep-alive connections per host
66my %CO_SLOT; # number of open connections, and wait queue, per host
49 67
50=item http_get $url, key => value..., $cb->($data, $headers) 68=item http_get $url, key => value..., $cb->($data, $headers)
51 69
52Executes an HTTP-GET request. See the http_request function for details on 70Executes an HTTP-GET request. See the http_request function for details on
53additional parameters. 71additional parameters.
54 72
73=item http_head $url, key => value..., $cb->($data, $headers)
74
75Executes an HTTP-HEAD request. See the http_request function for details on
76additional parameters.
77
78=item http_post $url, $body, key => value..., $cb->($data, $headers)
79
80Executes an HTTP-POST request with a request body of C<$bod>. See the
81http_request function for details on additional parameters.
82
55=item http_request $method => $url, key => value..., $cb->($data, $headers) 83=item http_request $method => $url, key => value..., $cb->($data, $headers)
56 84
57Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL 85Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
58must be an absolute http or https URL. 86must be an absolute http or https URL.
59 87
88The callback will be called with the response data as first argument
89(or C<undef> if it wasn't available due to errors), and a hash-ref with
90response headers as second argument.
91
92All the headers in that hash are lowercased. In addition to the response
93headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and
94C<Reason> contain the three parts of the HTTP Status-Line of the same
95name. If the server sends a header multiple lines, then their contents
96will be joined together with C<\x00>.
97
98If an internal error occurs, such as not being able to resolve a hostname,
99then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599>
100and the C<Reason> pseudo-header will contain an error message.
101
102A typical callback might look like this:
103
104 sub {
105 my ($body, $hdr) = @_;
106
107 if ($hdr->{Status} =~ /^2/) {
108 ... everything should be ok
109 } else {
110 print "error, $hdr->{Status} $hdr->{Reason}\n";
111 }
112 }
113
60Additional parameters are key-value pairs, and are fully optional. They 114Additional parameters are key-value pairs, and are fully optional. They
61include: 115include:
62 116
63=over 4 117=over 4
64 118
65=item recurse => $boolean (default: true) 119=item recurse => $count (default: $MAX_RECURSE)
66 120
67Whether to recurse requests or not, e.g. on redirects, authentication 121Whether to recurse requests or not, e.g. on redirects, authentication
68retries and so on. 122retries and so on, and how often to do so.
69 123
70=item headers => hashref 124=item headers => hashref
71 125
72The request headers to use. 126The request headers to use.
73 127
74=item timeout => $seconds 128=item timeout => $seconds
75 129
76The time-out to use for various stages - each connect attempt will reset 130The time-out to use for various stages - each connect attempt will reset
77the timeout, as will read or write activity. 131the timeout, as will read or write activity. Default timeout is 5 minutes.
132
133=item proxy => [$host, $port[, $scheme]] or undef
134
135Use the given http proxy for all requests. If not specified, then the
136default proxy (as specified by C<$ENV{http_proxy}>) is used.
137
138C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
139HTTPS.
140
141=item body => $string
142
143The request body, usually empty. Will be-sent as-is (future versions of
144this module might offer more options).
145
146=item cookie_jar => $hash_ref
147
148Passing this parameter enables (simplified) cookie-processing, loosely
149based on the original netscape specification.
150
151The C<$hash_ref> must be an (initially empty) hash reference which will
152get updated automatically. It is possible to save the cookie_jar to
153persistent storage with something like JSON or Storable, but this is not
154recommended, as expire times are currently being ignored.
155
156Note that this cookie implementation is not of very high quality, nor
157meant to be complete. If you want complete cookie management you have to
158do that on your own. C<cookie_jar> is meant as a quick fix to get some
159cookie-using sites working. Cookies are a privacy disaster, do not use
160them unless required to.
78 161
79=back 162=back
80 163
81=back 164Example: make a simple HTTP GET request for http://www.nethype.de/
165
166 http_request GET => "http://www.nethype.de/", sub {
167 my ($body, $hdr) = @_;
168 print "$body\n";
169 };
170
171Example: make a HTTP HEAD request on https://www.google.com/, use a
172timeout of 30 seconds.
173
174 http_request
175 GET => "https://www.google.com",
176 timeout => 30,
177 sub {
178 my ($body, $hdr) = @_;
179 use Data::Dumper;
180 print Dumper $hdr;
181 }
182 ;
82 183
83=cut 184=cut
185
186sub _slot_schedule($) {
187 my $host = shift;
188
189 while ($CO_SLOT{$host}[0] < $MAX_PER_HOST) {
190 if (my $cb = shift @{ $CO_SLOT{$host}[1] }) {
191 # somebody wnats that slot
192 ++$CO_SLOT{$host}[0];
193
194 $cb->(AnyEvent::Util::guard {
195 --$CO_SLOT{$host}[0];
196 _slot_schedule $host;
197 });
198 } else {
199 # nobody wants the slot, maybe we can forget about it
200 delete $CO_SLOT{$host} unless $CO_SLOT{$host}[0];
201 warn "$host deleted" unless $CO_SLOT{$host}[0];#d#
202 last;
203 }
204 }
205}
206
207# wait for a free slot on host, call callback
208sub _get_slot($$) {
209 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
210
211 _slot_schedule $_[0];
212}
84 213
85sub http_request($$$;@) { 214sub http_request($$$;@) {
86 my $cb = pop; 215 my $cb = pop;
87 my ($method, $url, %arg) = @_; 216 my ($method, $url, %arg) = @_;
88 217
89 my %hdr; 218 my %hdr;
90 219
220 $method = uc $method;
221
91 if (my $hdr = delete $arg{headers}) { 222 if (my $hdr = $arg{headers}) {
92 while (my ($k, $v) = each %$hdr) { 223 while (my ($k, $v) = each %$hdr) {
93 $hdr{lc $k} = $v; 224 $hdr{lc $k} = $v;
94 } 225 }
95 } 226 }
96 227
228 my $recurse = exists $arg{recurse} ? $arg{recurse} : $MAX_RECURSE;
229
230 return $cb->(undef, { Status => 599, Reason => "recursion limit reached" })
231 if $recurse < 0;
232
233 my $proxy = $arg{proxy} || $PROXY;
97 my $timeout = $arg{timeout} || $TIMEOUT; 234 my $timeout = $arg{timeout} || $TIMEOUT;
98 235
99 $hdr{"user-agent"} ||= $USERAGENT; 236 $hdr{"user-agent"} ||= $USERAGENT;
100 237
101 my ($scheme, $authority, $path, $query, $fragment) = 238 my ($scheme, $authority, $upath, $query, $fragment) =
102 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|; 239 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
103 240
104 $scheme = lc $scheme; 241 $scheme = lc $scheme;
242
105 my $port = $scheme eq "http" ? 80 243 my $uport = $scheme eq "http" ? 80
106 : $scheme eq "https" ? 443 244 : $scheme eq "https" ? 443
107 : croak "$url: only http and https URLs supported"; 245 : return $cb->(undef, { Status => 599, Reason => "only http and https URL schemes supported" });
108 246
109 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 247 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
110 or croak "$authority: unparsable URL"; 248 or return $cb->(undef, { Status => 599, Reason => "unparsable URL" });
111 249
112 my $host = $1; 250 my $uhost = $1;
113 $port = $2 if defined $2; 251 $uport = $2 if defined $2;
114 252
115 $host =~ s/^\[(.*)\]$/$1/; 253 $uhost =~ s/^\[(.*)\]$/$1/;
116 $path .= "?$query" if length $query; 254 $upath .= "?$query" if length $query;
117 255
118 $hdr{host} = $host = lc $host; 256 $upath =~ s%^/?%/%;
119 257
120 my %state; 258 # cookie processing
121 259 if (my $jar = $arg{cookie_jar}) {
122 my $body = ""; 260 %$jar = () if $jar->{version} < 1;
123 $state{body} = $body; 261
124 262 my @cookie;
125 $hdr{"content-length"} = length $body; 263
126 264 while (my ($chost, $v) = each %$jar) {
127 $state{connect_guard} = AnyEvent::Socket::tcp_connect $host, $port, sub { 265 next unless $chost eq substr $uhost, -length $chost;
128 $state{fh} = shift 266 next unless $chost =~ /^\./;
129 or return $cb->(undef, { Status => 599, Reason => "$!" }); 267
130 268 while (my ($cpath, $v) = each %$v) {
131 delete $state{connect_guard}; # reduce memory usage, save a tree 269 next unless $cpath eq substr $upath, 0, length $cpath;
132 270
133 # get handle 271 while (my ($k, $v) = each %$v) {
134 $state{handle} = new AnyEvent::Handle 272 next if $scheme ne "https" && exists $v->{secure};
135 fh => $state{fh}, 273 push @cookie, "$k=$v->{value}";
136 ($scheme eq "https" ? (tls => "connect") : ()); 274 }
137 275 }
138 # limit the number of persistent connections
139 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
140 ++$KA_COUNT{$_[1]};
141 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} };
142 $hdr{connection} = "keep-alive";
143 } else {
144 delete $hdr{connection};
145 } 276 }
277
278 $hdr{cookie} = join "; ", @cookie
279 if @cookie;
280 }
146 281
282 my ($rhost, $rport, $rpath); # request host, port, path
283
284 if ($proxy) {
285 ($rhost, $rport, $scheme) = @$proxy;
286 $rpath = $url;
287 } else {
288 ($rhost, $rport, $rpath) = ($uhost, $uport, $upath);
289 $hdr{host} = $uhost;
290 }
291
292 $hdr{"content-length"} = length $arg{body};
293
294 my %state = (connect_guard => 1);
295
296 _get_slot $uhost, sub {
297 $state{slot_guard} = shift;
298
299 return unless $state{connect_guard};
300
301 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
302 $state{fh} = shift
303 or return $cb->(undef, { Status => 599, Reason => "$!" });
304
305 delete $state{connect_guard}; # reduce memory usage, save a tree
306
307 # get handle
308 $state{handle} = new AnyEvent::Handle
309 fh => $state{fh},
310 ($scheme eq "https" ? (tls => "connect") : ());
311
312 # limit the number of persistent connections
313 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
314 ++$KA_COUNT{$_[1]};
315 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} };
316 $hdr{connection} = "keep-alive";
317 delete $hdr{connection}; # keep-alive not yet supported
318 } else {
319 delete $hdr{connection};
320 }
321
147 # (re-)configure handle 322 # (re-)configure handle
148 $state{handle}->timeout ($timeout); 323 $state{handle}->timeout ($timeout);
149 $state{handle}->on_error (sub { 324 $state{handle}->on_error (sub {
150 %state = (); 325 %state = ();
151 $cb->(undef, { Status => 599, Reason => "$!" }); 326 $cb->(undef, { Status => 599, Reason => "$!" });
152 }); 327 });
153 $state{handle}->on_eof (sub { 328 $state{handle}->on_eof (sub {
154 %state = (); 329 %state = ();
155 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" }); 330 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" });
156 }); 331 });
157 332
158 # send request 333 # send request
159 $state{handle}->push_write ( 334 $state{handle}->push_write (
160 "\U$method\E $path HTTP/1.0\015\012" 335 "$method $rpath HTTP/1.0\015\012"
161 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr) 336 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr)
162 . "\015\012" 337 . "\015\012"
163 . (delete $state{body}) 338 . (delete $arg{body})
164 );
165
166 %hdr = (); # reduce memory usage, save a kitten
167
168 # status line
169 $state{handle}->push_read (line => qr/\015?\012/, sub {
170 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
171 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
172
173 my %hdr = ( # response headers
174 HTTPVersion => ",$1",
175 Status => ",$2",
176 Reason => ",$3",
177 ); 339 );
178 340
341 %hdr = (); # reduce memory usage, save a kitten
342
343 # status line
344 $state{handle}->push_read (line => qr/\015?\012/, sub {
345 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
346 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
347
348 my %hdr = ( # response headers
349 HTTPVersion => "\x00$1",
350 Status => "\x00$2",
351 Reason => "\x00$3",
352 );
353
179 # headers, could be optimized a bit 354 # headers, could be optimized a bit
180 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub { 355 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub {
181 for ("$_[1]\012") { 356 for ("$_[1]\012") {
357 # we support spaces in field names, as lotus domino
358 # creates them.
182 $hdr{lc $1} .= ",$2" 359 $hdr{lc $1} .= "\x00$2"
183 while /\G 360 while /\G
184 ([^:\000-\040]+): 361 ([^:\000-\037]+):
185 [\011\040]* 362 [\011\040]*
186 ((?: [^\015\012]+ | \015?\012[\011\040] )*) 363 ((?: [^\015\012]+ | \015?\012[\011\040] )*)
187 \015?\012 364 \015?\012
188 /gxc; 365 /gxc;
189 366
190 /\G$/ 367 /\G$/
191 or return $cb->(undef, { Status => 599, Reason => "garbled response headers" }); 368 or return (%state = (), $cb->(undef, { Status => 599, Reason => "garbled response headers" }));
192 } 369 }
193 370
194 substr $_, 0, 1, "" 371 substr $_, 0, 1, ""
195 for values %hdr; 372 for values %hdr;
196 373
197 if (exists $hdr{"content-length"}) { 374 my $finish = sub {
198 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
199 # could cache persistent connection now
200 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
201 };
202
203 %state = (); 375 %state = ();
376
377 # set-cookie processing
378 if ($arg{cookie_jar} && exists $hdr{"set-cookie"}) {
379 for (split /\x00/, $hdr{"set-cookie"}) {
380 my ($cookie, @arg) = split /;\s*/;
381 my ($name, $value) = split /=/, $cookie, 2;
382 my %kv = (value => $value, map { split /=/, $_, 2 } @arg);
383
384 my $cdom = (delete $kv{domain}) || $uhost;
385 my $cpath = (delete $kv{path}) || "/";
386
387 $cdom =~ s/^.?/./; # make sure it starts with a "."
388
389 next if $cdom =~ /\.$/;
390
391 # this is not rfc-like and not netscape-like. go figure.
392 my $ndots = $cdom =~ y/.//;
393 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
394
395 # store it
396 $arg{cookie_jar}{version} = 1;
397 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
398 }
399 }
400
401 if ($_[1]{Status} =~ /^x30[12]$/ && $recurse) {
402 # microsoft and other assholes don't give a shit for following standards,
403 # try to support a common form of broken Location header.
404 $_[1]{location} =~ s%^/%$scheme://$uhost:$uport/%;
405
406 http_request ($method, $_[1]{location}, %arg, recurse => $recurse - 1, $cb);
407 } else {
204 $cb->($_[1], \%hdr); 408 $cb->($_[0], $_[1]);
409 }
205 }); 410 };
411
412 if ($hdr{Status} =~ /^(?:1..|204|304)$/ or $method eq "HEAD") {
413 $finish->(undef, \%hdr);
206 } else { 414 } else {
415 if (exists $hdr{"content-length"}) {
416 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
417 # could cache persistent connection now
418 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
419 # but we don't, due to misdesigns, this is annoyingly complex
420 };
421
422 $finish->($_[1], \%hdr);
423 });
424 } else {
207 # too bad, need to read until we get an error or EOF, 425 # too bad, need to read until we get an error or EOF,
208 # no way to detect winged data. 426 # no way to detect winged data.
209 $_[0]->on_error (sub { 427 $_[0]->on_error (sub {
210 %state = ();
211 $cb->($_[0]{rbuf}, \%hdr); 428 $finish->($_[0]{rbuf}, \%hdr);
429 });
430 $_[0]->on_eof (undef);
431 $_[0]->on_read (sub { });
432 }
212 }); 433 }
213 $_[0]->on_eof (undef);
214 $_[0]->on_read (sub { });
215 } 434 });
216 }); 435 });
436 }, sub {
437 $timeout
217 }); 438 };
218 }, sub {
219 $timeout
220 }; 439 };
221 440
222 defined wantarray && AnyEvent::Util::guard { %state = () } 441 defined wantarray && AnyEvent::Util::guard { %state = () }
223} 442}
224 443
225sub http_get($$;@) { 444sub http_get($$;@) {
226 unshift @_, "GET"; 445 unshift @_, "GET";
227 &http_request 446 &http_request
228} 447}
229 448
449sub http_head($$;@) {
450 unshift @_, "HEAD";
451 &http_request
452}
453
454sub http_post($$$;@) {
455 unshift @_, "POST", "body";
456 &http_request
457}
458
459=back
460
230=head2 GLOBAL VARIABLES 461=head2 GLOBAL FUNCTIONS AND VARIABLES
231 462
232=over 4 463=over 4
233 464
465=item AnyEvent::HTTP::set_proxy "proxy-url"
466
467Sets the default proxy server to use. The proxy-url must begin with a
468string of the form C<http://host:port> (optionally C<https:...>).
469
234=item $AnyEvent::HTTP::MAX_REDIRECTS 470=item $AnyEvent::HTTP::MAX_RECURSE
235 471
236The default value for the C<max_redirects> request parameter 472The default value for the C<recurse> request parameter (default: C<10>).
237(default: C<10>).
238 473
239=item $AnyEvent::HTTP::USERAGENT 474=item $AnyEvent::HTTP::USERAGENT
240 475
241The default value for the C<User-Agent> header (the default is 476The default value for the C<User-Agent> header (the default is
242C<Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 477C<Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
243 478
244=item $AnyEvent::HTTP::MAX_PERSISTENT 479=item $AnyEvent::HTTP::MAX_PERSISTENT
245 480
246The maximum number of persistent connections to keep open (default: 8). 481The maximum number of persistent connections to keep open (default: 8).
247 482
483Not implemented currently.
484
248=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 485=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
249 486
250The maximum time to cache a persistent connection, in seconds (default: 15). 487The maximum time to cache a persistent connection, in seconds (default: 2).
488
489Not implemented currently.
251 490
252=back 491=back
253 492
254=cut 493=cut
494
495sub set_proxy($) {
496 $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
497}
498
499# initialise proxy from environment
500set_proxy $ENV{http_proxy};
255 501
256=head1 SEE ALSO 502=head1 SEE ALSO
257 503
258L<AnyEvent>. 504L<AnyEvent>.
259 505

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines