ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.5 by root, Wed Jun 4 12:03:47 2008 UTC vs.
Revision 1.10 by root, Thu Jun 5 13:06:43 2008 UTC

59Executes an HTTP-HEAD request. See the http_request function for details on 59Executes an HTTP-HEAD request. See the http_request function for details on
60additional parameters. 60additional parameters.
61 61
62=item http_post $url, $body, key => value..., $cb->($data, $headers) 62=item http_post $url, $body, key => value..., $cb->($data, $headers)
63 63
64Executes an HTTP-POST request with a requets body of C<$bod>. See the 64Executes an HTTP-POST request with a request body of C<$bod>. See the
65http_request function for details on additional parameters. 65http_request function for details on additional parameters.
66 66
67=item http_request $method => $url, key => value..., $cb->($data, $headers) 67=item http_request $method => $url, key => value..., $cb->($data, $headers)
68 68
69Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL 69Executes a HTTP request of type C<$method> (e.g. C<GET>, C<POST>). The URL
71 71
72The callback will be called with the response data as first argument 72The callback will be called with the response data as first argument
73(or C<undef> if it wasn't available due to errors), and a hash-ref with 73(or C<undef> if it wasn't available due to errors), and a hash-ref with
74response headers as second argument. 74response headers as second argument.
75 75
76All the headers in that has are lowercased. In addition to the response 76All the headers in that hash are lowercased. In addition to the response
77headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and 77headers, the three "pseudo-headers" C<HTTPVersion>, C<Status> and
78C<Reason> contain the three parts of the HTTP Status-Line of the same 78C<Reason> contain the three parts of the HTTP Status-Line of the same
79name. 79name. If the server sends a header multiple lines, then their contents
80will be joined together with C<\x00>.
80 81
81If an internal error occurs, such as not being able to resolve a hostname, 82If an internal error occurs, such as not being able to resolve a hostname,
82then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599> 83then C<$data> will be C<undef>, C<< $headers->{Status} >> will be C<599>
83and the C<Reason> pseudo-header will contain an error message. 84and the C<Reason> pseudo-header will contain an error message.
84 85
86A typical callback might look like this:
87
88 sub {
89 my ($body, $hdr) = @_;
90
91 if ($hdr->{Status} =~ /^2/) {
92 ... everything should be ok
93 } else {
94 print "error, $hdr->{Status} $hdr->{Reason}\n";
95 }
96 }
97
85Additional parameters are key-value pairs, and are fully optional. They 98Additional parameters are key-value pairs, and are fully optional. They
86include: 99include:
87 100
88=over 4 101=over 4
89 102
112=item body => $string 125=item body => $string
113 126
114The request body, usually empty. Will be-sent as-is (future versions of 127The request body, usually empty. Will be-sent as-is (future versions of
115this module might offer more options). 128this module might offer more options).
116 129
130=item cookie_jar => $hash_ref
131
132Passing this parameter enables (simplified) cookie-processing, loosely
133based on the original netscape specification.
134
135The C<$hash_ref> must be an (initially empty) hash reference which will
136get updated automatically. It is possible to save the cookie_jar to
137persistent storage with something like JSON or Storable, but this is not
138recommended, as expire times are currently being ignored.
139
140Note that this cookie implementation is not of very high quality, nor
141meant to be complete. If you want complete cookie management you have to
142do that on your own. C<cookie_jar> is meant as a quick fix to get some
143cookie-using sites working. Cookies are a privacy disaster, do not use
144them unless required to.
145
117=back 146=back
118 147
119=back 148Example: make a simple HTTP GET request for http://www.nethype.de/
149
150 http_request GET => "http://www.nethype.de/", sub {
151 my ($body, $hdr) = @_;
152 print "$body\n";
153 };
154
155Example: make a HTTP HEAD request on https://www.google.com/, use a
156timeout of 30 seconds.
157
158 http_request
159 GET => "https://www.google.com",
160 timeout => 30,
161 sub {
162 my ($body, $hdr) = @_;
163 use Data::Dumper;
164 print Dumper $hdr;
165 }
166 ;
120 167
121=cut 168=cut
122 169
123sub http_request($$$;@) { 170sub http_request($$$;@) {
124 my $cb = pop; 171 my $cb = pop;
126 173
127 my %hdr; 174 my %hdr;
128 175
129 $method = uc $method; 176 $method = uc $method;
130 177
131 if (my $hdr = delete $arg{headers}) { 178 if (my $hdr = $arg{headers}) {
132 while (my ($k, $v) = each %$hdr) { 179 while (my ($k, $v) = each %$hdr) {
133 $hdr{lc $k} = $v; 180 $hdr{lc $k} = $v;
134 } 181 }
135 } 182 }
136 183
184 my $recurse = exists $arg{recurse} ? $arg{recurse} : $MAX_RECURSE;
185
186 return $cb->(undef, { Status => 599, Reason => "recursion limit reached" })
187 if $recurse < 0;
188
137 my $proxy = $arg{proxy} || $PROXY; 189 my $proxy = $arg{proxy} || $PROXY;
138 my $timeout = $arg{timeout} || $TIMEOUT; 190 my $timeout = $arg{timeout} || $TIMEOUT;
139 my $recurse = exists $arg{recurse} ? $arg{recurse} : $MAX_RECURSE;
140 191
141 $hdr{"user-agent"} ||= $USERAGENT; 192 $hdr{"user-agent"} ||= $USERAGENT;
142 193
143 my ($host, $port, $path, $scheme); 194 my ($scheme, $authority, $upath, $query, $fragment) =
195 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
196
197 $scheme = lc $scheme;
198
199 my $uport = $scheme eq "http" ? 80
200 : $scheme eq "https" ? 443
201 : return $cb->(undef, { Status => 599, Reason => "only http and https URL schemes supported" });
202
203 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
204 or return $cb->(undef, { Status => 599, Reason => "unparsable URL" });
205
206 my $uhost = $1;
207 $uport = $2 if defined $2;
208
209 $uhost =~ s/^\[(.*)\]$/$1/;
210 $upath .= "?$query" if length $query;
211
212 $upath =~ s%^/?%/%;
213
214 # cookie processing
215 if (my $jar = $arg{cookie_jar}) {
216 %$jar = () if $jar->{version} < 1;
217
218 my @cookie;
219
220 while (my ($chost, $v) = each %$jar) {
221 next unless $chost eq substr $uhost, -length $chost;
222 next unless $chost =~ /^\./;
223
224 while (my ($cpath, $v) = each %$v) {
225 next unless $cpath eq substr $upath, 0, length $cpath;
226
227 while (my ($k, $v) = each %$v) {
228 next if $scheme ne "https" && exists $v->{secure};
229 push @cookie, "$k=$v->{value}";
230 }
231 }
232 }
233
234 $hdr{cookie} = join "; ", @cookie
235 if @cookie;
236 }
237
238 my ($rhost, $rport, $rpath); # request host, port, path
144 239
145 if ($proxy) { 240 if ($proxy) {
146 ($host, $port, $scheme) = @$proxy; 241 ($rhost, $rport, $scheme) = @$proxy;
147 $path = $url; 242 $rpath = $url;
148 } else { 243 } else {
149 ($scheme, my $authority, $path, my $query, my $fragment) = 244 ($rhost, $rport, $rpath) = ($uhost, $uport, $upath);
150 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
151
152 $port = $scheme eq "http" ? 80
153 : $scheme eq "https" ? 443
154 : croak "$url: only http and https URLs supported";
155
156 $authority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
157 or croak "$authority: unparsable URL";
158
159 $host = $1;
160 $port = $2 if defined $2;
161
162 $host =~ s/^\[(.*)\]$/$1/;
163 $path .= "?$query" if length $query;
164
165 $path = "/" unless $path;
166
167 $hdr{host} = $host = lc $host; 245 $hdr{host} = $uhost;
168 } 246 }
169 247
170 $scheme = lc $scheme; 248 $hdr{"content-length"} = length $arg{body};
171 249
172 my %state; 250 my %state;
173 251
174 $state{body} = delete $arg{body};
175
176 $hdr{"content-length"} = length $state{body};
177
178 $state{connect_guard} = AnyEvent::Socket::tcp_connect $host, $port, sub { 252 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
179 $state{fh} = shift 253 $state{fh} = shift
180 or return $cb->(undef, { Status => 599, Reason => "$!" }); 254 or return $cb->(undef, { Status => 599, Reason => "$!" });
181 255
182 delete $state{connect_guard}; # reduce memory usage, save a tree 256 delete $state{connect_guard}; # reduce memory usage, save a tree
183 257
207 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" }); 281 $cb->(undef, { Status => 599, Reason => "unexpected end-of-file" });
208 }); 282 });
209 283
210 # send request 284 # send request
211 $state{handle}->push_write ( 285 $state{handle}->push_write (
212 "$method $path HTTP/1.0\015\012" 286 "$method $rpath HTTP/1.0\015\012"
213 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr) 287 . (join "", map "$_: $hdr{$_}\015\012", keys %hdr)
214 . "\015\012" 288 . "\015\012"
215 . (delete $state{body}) 289 . (delete $arg{body})
216 ); 290 );
217 291
218 %hdr = (); # reduce memory usage, save a kitten 292 %hdr = (); # reduce memory usage, save a kitten
219 293
220 # status line 294 # status line
221 $state{handle}->push_read (line => qr/\015?\012/, sub { 295 $state{handle}->push_read (line => qr/\015?\012/, sub {
222 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix 296 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) \s+ ([^\015\012]+)/ix
223 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" })); 297 or return (%state = (), $cb->(undef, { Status => 599, Reason => "invalid server response ($_[1])" }));
224 298
225 my %hdr = ( # response headers 299 my %hdr = ( # response headers
226 HTTPVersion => ",$1", 300 HTTPVersion => "\x00$1",
227 Status => ",$2", 301 Status => "\x00$2",
228 Reason => ",$3", 302 Reason => "\x00$3",
229 ); 303 );
230 304
231 # headers, could be optimized a bit 305 # headers, could be optimized a bit
232 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub { 306 $state{handle}->unshift_read (line => qr/\015?\012\015?\012/, sub {
233 for ("$_[1]\012") { 307 for ("$_[1]\012") {
234 # we support spaces in field names, as lotus domino 308 # we support spaces in field names, as lotus domino
235 # creates them. 309 # creates them.
236 $hdr{lc $1} .= ",$2" 310 $hdr{lc $1} .= "\x00$2"
237 while /\G 311 while /\G
238 ([^:\000-\037]+): 312 ([^:\000-\037]+):
239 [\011\040]* 313 [\011\040]*
240 ((?: [^\015\012]+ | \015?\012[\011\040] )*) 314 ((?: [^\015\012]+ | \015?\012[\011\040] )*)
241 \015?\012 315 \015?\012
242 /gxc; 316 /gxc;
243 317
244 /\G$/ 318 /\G$/
245 or return $cb->(undef, { Status => 599, Reason => "garbled response headers" }); 319 or return (%state = (), $cb->(undef, { Status => 599, Reason => "garbled response headers" }));
246 } 320 }
247 321
248 substr $_, 0, 1, "" 322 substr $_, 0, 1, ""
249 for values %hdr; 323 for values %hdr;
250 324
251 if ($method eq "HEAD") { 325 my $finish = sub {
252 %state = (); 326 %state = ();
327
328 # set-cookie processing
329 if ($arg{cookie_jar} && exists $hdr{"set-cookie"}) {
330 for (split /\x00/, $hdr{"set-cookie"}) {
331 my ($cookie, @arg) = split /;\s*/;
332 my ($name, $value) = split /=/, $cookie, 2;
333 my %kv = (value => $value, map { split /=/, $_, 2 } @arg);
334
335 my $cdom = (delete $kv{domain}) || $uhost;
336 my $cpath = (delete $kv{path}) || "/";
337
338 $cdom =~ s/^.?/./; # make sure it starts with a "."
339
340 my $ndots = $cdom =~ y/.//;
341 next if $ndots < ($cdom =~ /[^.]{3}$/ ? 2 : 3);
342
343 # store it
344 $arg{cookie_jar}{version} = 1;
345 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
346 }
347 }
348
349 if ($_[1]{Status} =~ /^x30[12]$/ && $recurse) {
350 # microsoft and other assholes don't give a shit for following standards,
351 # try to support a common form of broken Location header.
352 $_[1]{location} =~ s%^/%$scheme://$uhost:$uport/%;
353
354 http_request ($method, $_[1]{location}, %arg, recurse => $recurse - 1, $cb);
355 } else {
356 $cb->($_[0], $_[1]);
357 }
358 };
359
360 if ($hdr{Status} =~ /^(?:1..|204|304)$/ or $method eq "HEAD") {
253 $cb->(undef, \%hdr); 361 $finish->(undef, \%hdr);
254 } else { 362 } else {
255 if (exists $hdr{"content-length"}) { 363 if (exists $hdr{"content-length"}) {
256 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub { 364 $_[0]->unshift_read (chunk => $hdr{"content-length"}, sub {
257 # could cache persistent connection now 365 # could cache persistent connection now
258 if ($hdr{connection} =~ /\bkeep-alive\b/i) { 366 if ($hdr{connection} =~ /\bkeep-alive\b/i) {
259 # but we don't, due to misdesigns, this is annoyingly complex 367 # but we don't, due to misdesigns, this is annoyingly complex
260 }; 368 };
261 369
262 %state = ();
263 $cb->($_[1], \%hdr); 370 $finish->($_[1], \%hdr);
264 }); 371 });
265 } else { 372 } else {
266 # too bad, need to read until we get an error or EOF, 373 # too bad, need to read until we get an error or EOF,
267 # no way to detect winged data. 374 # no way to detect winged data.
268 $_[0]->on_error (sub { 375 $_[0]->on_error (sub {
269 %state = ();
270 $cb->($_[0]{rbuf}, \%hdr); 376 $finish->($_[0]{rbuf}, \%hdr);
271 }); 377 });
272 $_[0]->on_eof (undef); 378 $_[0]->on_eof (undef);
273 $_[0]->on_read (sub { }); 379 $_[0]->on_read (sub { });
274 } 380 }
275 } 381 }
295sub http_post($$$;@) { 401sub http_post($$$;@) {
296 unshift @_, "POST", "body"; 402 unshift @_, "POST", "body";
297 &http_request 403 &http_request
298} 404}
299 405
406=back
407
300=head2 GLOBAL FUNCTIONS AND VARIABLES 408=head2 GLOBAL FUNCTIONS AND VARIABLES
301 409
302=over 4 410=over 4
303 411
304=item AnyEvent::HTTP::set_proxy "proxy-url" 412=item AnyEvent::HTTP::set_proxy "proxy-url"

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines