ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.1 by root, Tue Jun 3 16:37:13 2008 UTC vs.
Revision 1.10 by root, Thu Jun 5 13:06:43 2008 UTC

33 33
34our $VERSION = '1.0'; 34our $VERSION = '1.0';
35 35
36our @EXPORT = qw(http_get http_request); 36our @EXPORT = qw(http_get http_request);
37 37
38our $MAX_REDIRECTS = 10;
39our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 38our $USERAGENT = "Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
39our $MAX_RECURSE = 10;
40our $MAX_PERSISTENT = 8; 40our $MAX_PERSISTENT = 8;
41our $PERSISTENT_TIMEOUT = 15; 41our $PERSISTENT_TIMEOUT = 2;
42our $TIMEOUT = 60; 42our $TIMEOUT = 300;
43 43
44# changing these is evil 44# changing these is evil
45our $MAX_PERSISTENT_PER_HOST = 2; 45our $MAX_PERSISTENT_PER_HOST = 2;
46our $MAX_PER_HOST = 4; # not respected yet :( 46our $MAX_PER_HOST = 4; # not respected yet :(
47 47
48our $PROXY;
49
48my %KA_COUNT; # number of open keep-alive connections per host 50my %KA_COUNT; # number of open keep-alive connections per host
49 51
50=item http_get $url, key => value..., $cb->($data, $headers) 52=item http_get $url, key => value..., $cb->($data, $headers)
51 53
52Executes an HTTP-GET request. See the http_request function for details on 54Executes an HTTP-GET request. See the http_request function for details on
53additional parameters. 55additional parameters.
54 56
57=item http_head $url, key => value..., $cb->($data, $headers)
58
59Executes an HTTP-HEAD request. See the http_request function for details on
60additional parameters.
61
62=item http_post $url, $body, key => value..., $cb->($data, $headers)
63
64Executes an HTTP-POST request with a request body of C<$bod>. See the
65http_request function for details on additional parameters.
66
55=item http_request $method => $url, key => value..., $cb->($data, $headers) 67=item http_request $method => $url, key => value..., $cb->($data, $headers)
56 68
57Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL 69Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
58must be an absolute http or https URL. 70must be an absolute http or https URL.
59 71
72The callback will be called with the response data as first argument
73(or C<undef> if it wasn't available due to errors), and a hash-ref with
74response headers as second argument.
75
76All the headers in that hash are lowercased. In addition to the response
77headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and
78C<Reason> contain the three parts of the HTTP Status-Line of the same
79name. If the server sends a header multiple lines, then their contents
80will be joined together with C<\x00>.
81
82If an internal error occurs, such as not being able to resolve a hostname,
83then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599>
84and the C<Reason> pseudo-header will contain an error message.
85
86A typical callback might look like this:
87
88 sub {
89 my ($body, $hdr) = @_;
90
91 if ($hdr->{Status} =~ /^2/) {
92 ... everything should be ok
93 } else {
94 print "error, $hdr->{Status} $hdr->{Reason}\n";
95 }
96 }
97
60Additional parameters are key-value pairs, and are fully optional. They 98Additional parameters are key-value pairs, and are fully optional. They
61include: 99include:
62 100
63=over 4 101=over 4
64 102
65=item recurse => $boolean (default: true) 103=item recurse => $count (default: $MAX_RECURSE)
66 104
67Whether to recurse requests or not, e.g. on redirects, authentication 105Whether to recurse requests or not, e.g. on redirects, authentication
68retries and so on. 106retries and so on, and how often to do so.
69 107
70=item headers => hashref 108=item headers => hashref
71 109
72The request headers to use. 110The request headers to use.
73 111
74=item timeout => $seconds 112=item timeout => $seconds
75 113
76The time-out to use for various stages - each connect attempt will reset 114The time-out to use for various stages - each connect attempt will reset
77the timeout, as will read or write activity. 115the timeout, as will read or write activity. Default timeout is 5 minutes.
116
117=item proxy => [$host, $port[, $scheme]] or undef
118
119Use the given http proxy for all requests. If not specified, then the
120default proxy (as specified by C<$ENV{http_proxy}>) is used.
121
122C<$scheme> must be either missing or C<http> for HTTP, or C<https> for
123HTTPS.
124
125=item body => $string
126
127The request body, usually empty. Will be-sent as-is (future versions of
128this module might offer more options).
129
130=item cookie_jar => $hash_ref
131
132Passing this parameter enables (simplified) cookie-processing, loosely
133based on the original netscape specification.
134
135The C<$hash_ref> must be an (initially empty) hash reference which will
136get updated automatically. It is possible to save the cookie_jar to
137persistent storage with something like JSON or Storable, but this is not
138recommended, as expire times are currently being ignored.
139
140Note that this cookie implementation is not of very high quality, nor
141meant to be complete. If you want complete cookie management you have to
142do that on your own. C<cookie_jar> is meant as a quick fix to get some
143cookie-using sites working. Cookies are a privacy disaster, do not use
144them unless required to.
78 145
79=back 146=back
80 147
81=back 148Example: make a simple HTTP GET request for http://www.nethype.de/
149
150 http_request GET => "http://www.nethype.de/", sub {
151 my ($body, $hdr) = @_;
152 print "$body\n";
153 };
154
155Example: make a HTTP HEAD request on https://www.google.com/, use a
156timeout of 30 seconds.
157
158 http_request
159 GET => "https://www.google.com",
160 timeout => 30,
161 sub {
162 my ($body, $hdr) = @_;
163 use Data::Dumper;
164 print Dumper $hdr;
165 }
166 ;
82 167
83=cut 168=cut
84 169
85sub http_request($$$;@) { 170sub http_request($$$;@) {
86 my $cb = pop; 171 my $cb = pop;
87 my ($method, $url, %arg) = @_; 172 my ($method, $url, %arg) = @_;
88 173
89 my %hdr; 174 my %hdr;
90 175
176 $method = uc $method;
177
91 if (my $hdr = delete $arg{headers}) { 178 if (my $hdr = $arg{headers}) {
92 while (my ($k, $v) = each %$hdr) { 179 while (my ($k, $v) = each %$hdr) {
93 $hdr{lc $k} = $v; 180 $hdr{lc $k} = $v;
94 } 181 }
95 } 182 }
96 183
184 my $recurse = exists $arg{recurse} ? $arg{recurse} : $MAX_RECURSE;
185
186 return $cb->(undef, { Status => 599, Reason => "recursion limit reached" })
187 if $recurse < 0;
188
189 my $proxy = $arg{proxy} || $PROXY;
97 my $timeout = $arg{timeout} || $TIMEOUT; 190 my $timeout = $arg{timeout} || $TIMEOUT;
98 191
99 $hdr{"user-agent"} ||= $USERAGENT; 192 $hdr{"user-agent"} ||= $USERAGENT;
100 193
101 my ($scheme, $authority, $path, $query, $fragment) = 194 my ($scheme, $authority, $upath, $query, $fragment) =
102 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|; 195 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
103 196
104 $scheme = lc $scheme; 197 $scheme = lc $scheme;
198
105 my $port = $scheme eq "http" ? 80 199 my $uport = $scheme eq "http" ? 80
106 : $scheme eq "https" ? 443 200 : $scheme eq "https" ? 443
107 : croak "$url: only http and https URLs supported"; 201 : return $cb->(undef, { Status => 599, Reason => "only http and https URL schemes supported" });
108 202
109 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 203 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
110 or croak "$authority: unparsable URL"; 204 or return $cb->(undef, { Status => 599, Reason => "unparsable URL" });
111 205
112 my $host = $1; 206 my $uhost = $1;
113 $port = $2 if defined $2; 207 $uport = $2 if defined $2;
114 208
115 $host =~ s/^\[(.*)\]$/$1/; 209 $uhost =~ s/^\[(.*)\]$/$1/;
116 $path .= "?$query" if length $query; 210 $upath .= "?$query" if length $query;
117 211
118 $hdr{host} = $host = lc $host; 212 $upath =~ s%^/?%/%;
213
214 # cookie processing
215 if (my $jar = $arg{cookie_jar}) {
216 %$jar = () if $jar->{version} < 1;
217
218 my @cookie;
219
220 while (my ($chost, $v) = each %$jar) {
221 next unless $chost eq substr $uhost, -length $chost;
222 next unless $chost =~ /^\./;
223
224 while (my ($cpath, $v) = each %$v) {
225 next unless $cpath eq substr $upath, 0, length $cpath;
226
227 while (my ($k, $v) = each %$v) {
228 next if $scheme ne "https" && exists $v->{secure};
229 push @cookie, "$k=$v->{value}";
230 }
231 }
232 }
233
234 $hdr{cookie} = join "; ", @cookie
235 if @cookie;
236 }
237
238 my ($rhost, $rport, $rpath); # request host, port, path
239
240 if ($proxy) {
241 ($rhost, $rport, $scheme) = @$proxy;
242 $rpath = $url;
243 } else {
244 ($rhost, $rport, $rpath) = ($uhost, $uport, $upath);
245 $hdr{host} = $uhost;
246 }
247
248 $hdr{"content-length"} = length $arg{body};
119 249
120 my %state; 250 my %state;
121 251
122 my $body = "";
123 $state{body} = $body;
124
125 $hdr{"content-length"} = length $body;
126
127 $state{connect_guard} = AnyEvent::Socket::tcp_connect $host, $port, sub { 252 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
128 $state{fh} = shift 253 $state{fh} = shift
129 or return $cb->(undef, { Status => 599, Reason => "$!" }); 254 or return $cb->(undef, { Status => 599, Reason => "$!" });
130 255
131 delete $state{connect_guard}; # reduce memory usage, save a tree 256 delete $state{connect_guard}; # reduce memory usage, save a tree
132 257
138 # limit the number of persistent connections 263 # limit the number of persistent connections
139 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 264 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
140 ++$KA_COUNT{$_[1]}; 265 ++$KA_COUNT{$_[1]};
141 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} }; 266 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { --$KA_COUNT{$_[1]} };
142 $hdr{connection} = "keep-alive"; 267 $hdr{connection} = "keep-alive";
268 delete $hdr{connection}; # keep-alive not yet supported
143 } else { 269 } else {
144 delete $hdr{connection}; 270 delete $hdr{connection};
145 } 271 }
146 272
147 # (re-)configure handle 273 # (re-)configure handle
155 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" }); 281 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" });
156 }); 282 });
157 283
158 # send request 284 # send request
159 $state{handle}->push_write ( 285 $state{handle}->push_write (
160 "\U$method\E $path HTTP/1.0\015\012" 286 "$method $rpath HTTP/1.0\015\012"
161 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr) 287 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr)
162 . "\015\012" 288 . "\015\012"
163 . (delete $state{body}) 289 . (delete $arg{body})
164 ); 290 );
165 291
166 %hdr = (); # reduce memory usage, save a kitten 292 %hdr = (); # reduce memory usage, save a kitten
167 293
168 # status line 294 # status line
169 $state{handle}->push_read (line => qr/\015?\012/, sub { 295 $state{handle}->push_read (line => qr/\015?\012/, sub {
170 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix 296 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
171 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" })); 297 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
172 298
173 my %hdr = ( # response headers 299 my %hdr = ( # response headers
174 HTTPVersion => ",$1", 300 HTTPVersion => "\x00$1",
175 Status => ",$2", 301 Status => "\x00$2",
176 Reason => ",$3", 302 Reason => "\x00$3",
177 ); 303 );
178 304
179 # headers, could be optimized a bit 305 # headers, could be optimized a bit
180 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub { 306 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub {
181 for ("$_[1]\012") { 307 for ("$_[1]\012") {
308 # we support spaces in field names, as lotus domino
309 # creates them.
182 $hdr{lc $1} .= ",$2" 310 $hdr{lc $1} .= "\x00$2"
183 while /\G 311 while /\G
184 ([^:\000-\040]+): 312 ([^:\000-\037]+):
185 [\011\040]* 313 [\011\040]*
186 ((?: [^\015\012]+ | \015?\012[\011\040] )*) 314 ((?: [^\015\012]+ | \015?\012[\011\040] )*)
187 \015?\012 315 \015?\012
188 /gxc; 316 /gxc;
189 317
190 /\G$/ 318 /\G$/
191 or return $cb->(undef, { Status => 599, Reason => "garbled response headers" }); 319 or return (%state = (), $cb->(undef, { Status => 599, Reason => "garbled response headers" }));
192 } 320 }
193 321
194 substr $_, 0, 1, "" 322 substr $_, 0, 1, ""
195 for values %hdr; 323 for values %hdr;
196 324
197 if (exists $hdr{"content-length"}) { 325 my $finish = sub {
198 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub { 326 %state = ();
199 # could cache persistent connection now 327
200 if ($hdr{connection} =~ /\bkeep-alive\b/i) { 328 # set-cookie processing
329 if ($arg{cookie_jar} && exists $hdr{"set-cookie"}) {
330 for (split /\x00/, $hdr{"set-cookie"}) {
331 my ($cookie, @arg) = split /;\s*/;
332 my ($name, $value) = split /=/, $cookie, 2;
333 my %kv = (value => $value, map { split /=/, $_, 2 } @arg);
334
335 my $cdom = (delete $kv{domain}) || $uhost;
336 my $cpath = (delete $kv{path}) || "/";
337
338 $cdom =~ s/^.?/./; # make sure it starts with a "."
339
340 my $ndots = $cdom =~ y/.//;
341 next if $ndots < ($cdom =~ /[^.]{3}$/ ? 2 : 3);
342
343 # store it
344 $arg{cookie_jar}{version} = 1;
345 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
201 }; 346 }
202
203 %state = ();
204 $cb->($_[1], \%hdr);
205 }); 347 }
348
349 if ($_[1]{Status} =~ /^x30[12]$/ && $recurse) {
350 # microsoft and other assholes don't give a shit for following standards,
351 # try to support a common form of broken Location header.
352 $_[1]{location} =~ s%^/%$scheme://$uhost:$uport/%;
353
354 http_request ($method, $_[1]{location}, %arg, recurse => $recurse - 1, $cb);
355 } else {
356 $cb->($_[0], $_[1]);
357 }
358 };
359
360 if ($hdr{Status} =~ /^(?:1..|204|304)$/ or $method eq "HEAD") {
361 $finish->(undef, \%hdr);
206 } else { 362 } else {
363 if (exists $hdr{"content-length"}) {
364 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
365 # could cache persistent connection now
366 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
367 # but we don't, due to misdesigns, this is annoyingly complex
368 };
369
370 $finish->($_[1], \%hdr);
371 });
372 } else {
207 # too bad, need to read until we get an error or EOF, 373 # too bad, need to read until we get an error or EOF,
208 # no way to detect winged data. 374 # no way to detect winged data.
209 $_[0]->on_error (sub { 375 $_[0]->on_error (sub {
210 %state = ();
211 $cb->($_[0]{rbuf}, \%hdr); 376 $finish->($_[0]{rbuf}, \%hdr);
212 }); 377 });
213 $_[0]->on_eof (undef); 378 $_[0]->on_eof (undef);
214 $_[0]->on_read (sub { }); 379 $_[0]->on_read (sub { });
380 }
215 } 381 }
216 }); 382 });
217 }); 383 });
218 }, sub { 384 }, sub {
219 $timeout 385 $timeout
225sub http_get($$;@) { 391sub http_get($$;@) {
226 unshift @_, "GET"; 392 unshift @_, "GET";
227 &http_request 393 &http_request
228} 394}
229 395
396sub http_head($$;@) {
397 unshift @_, "HEAD";
398 &http_request
399}
400
401sub http_post($$$;@) {
402 unshift @_, "POST", "body";
403 &http_request
404}
405
406=back
407
230=head2 GLOBAL VARIABLES 408=head2 GLOBAL FUNCTIONS AND VARIABLES
231 409
232=over 4 410=over 4
233 411
412=item AnyEvent::HTTP::set_proxy "proxy-url"
413
414Sets the default proxy server to use. The proxy-url must begin with a
415string of the form C<http://host:port> (optionally C<https:...>).
416
234=item $AnyEvent::HTTP::MAX_REDIRECTS 417=item $AnyEvent::HTTP::MAX_RECURSE
235 418
236The default value for the C<max_redirects> request parameter 419The default value for the C<recurse> request parameter (default: C<10>).
237(default: C<10>).
238 420
239=item $AnyEvent::HTTP::USERAGENT 421=item $AnyEvent::HTTP::USERAGENT
240 422
241The default value for the C<User-Agent> header (the default is 423The default value for the C<User-Agent> header (the default is
242C<Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>). 424C<Mozilla/5.0 (compatible; AnyEvent::HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)>).
243 425
244=item $AnyEvent::HTTP::MAX_PERSISTENT 426=item $AnyEvent::HTTP::MAX_PERSISTENT
245 427
246The maximum number of persistent connections to keep open (default: 8). 428The maximum number of persistent connections to keep open (default: 8).
247 429
430Not implemented currently.
431
248=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT 432=item $AnyEvent::HTTP::PERSISTENT_TIMEOUT
249 433
250The maximum time to cache a persistent connection, in seconds (default: 15). 434The maximum time to cache a persistent connection, in seconds (default: 2).
435
436Not implemented currently.
251 437
252=back 438=back
253 439
254=cut 440=cut
441
442sub set_proxy($) {
443 $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix;
444}
445
446# initialise proxy from environment
447set_proxy $ENV{http_proxy};
255 448
256=head1 SEE ALSO 449=head1 SEE ALSO
257 450
258L<AnyEvent>. 451L<AnyEvent>.
259 452

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines