ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.58 by root, Sun Nov 14 20:23:00 2010 UTC vs.
Revision 1.69 by root, Fri Dec 31 19:32:47 2010 UTC

43 43
44use Errno (); 44use Errno ();
45 45
46use AnyEvent 5.0 (); 46use AnyEvent 5.0 ();
47use AnyEvent::Util (); 47use AnyEvent::Util ();
48use AnyEvent::Socket ();
49use AnyEvent::Handle (); 48use AnyEvent::Handle ();
50 49
51use base Exporter::; 50use base Exporter::;
52 51
53our $VERSION = '1.46'; 52our $VERSION = '1.5';
54 53
55our @EXPORT = qw(http_get http_post http_head http_request); 54our @EXPORT = qw(http_get http_post http_head http_request);
56 55
57our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 56our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58our $MAX_RECURSE = 10; 57our $MAX_RECURSE = 10;
95C<http_request> returns a "cancellation guard" - you have to keep the 94C<http_request> returns a "cancellation guard" - you have to keep the
96object at least alive until the callback get called. If the object gets 95object at least alive until the callback get called. If the object gets
97destroyed before the callback is called, the request will be cancelled. 96destroyed before the callback is called, the request will be cancelled.
98 97
99The callback will be called with the response body data as first argument 98The callback will be called with the response body data as first argument
100(or C<undef> if an error occured), and a hash-ref with response headers as 99(or C<undef> if an error occured), and a hash-ref with response headers
101second argument. 100(and trailers) as second argument.
102 101
103All the headers in that hash are lowercased. In addition to the response 102All the headers in that hash are lowercased. In addition to the response
104headers, the "pseudo-headers" (uppercase to avoid clashing with possible 103headers, the "pseudo-headers" (uppercase to avoid clashing with possible
105response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 104response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
106three parts of the HTTP Status-Line of the same name. 105three parts of the HTTP Status-Line of the same name. If an error occurs
106during the body phase of a request, then the original C<Status> and
107C<Reason> values from the header are available as C<OrigStatus> and
108C<OrigReason>.
107 109
108The pseudo-header C<URL> contains the actual URL (which can differ from 110The pseudo-header C<URL> contains the actual URL (which can differ from
109the requested URL when following redirects - for example, you might get 111the requested URL when following redirects - for example, you might get
110an error that your URL scheme is not supported even though your URL is a 112an error that your URL scheme is not supported even though your URL is a
111valid http URL because it redirected to an ftp URL, in which case you can 113valid http URL because it redirected to an ftp URL, in which case you can
148Whether to recurse requests or not, e.g. on redirects, authentication 150Whether to recurse requests or not, e.g. on redirects, authentication
149retries and so on, and how often to do so. 151retries and so on, and how often to do so.
150 152
151=item headers => hashref 153=item headers => hashref
152 154
153The request headers to use. Currently, C<http_request> may provide its 155The request headers to use. Currently, C<http_request> may provide its own
154own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers 156C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
155and will provide defaults for C<User-Agent:> and C<Referer:> (this can be 157will provide defaults at least for C<TE:>, C<Referer:> and C<User-Agent:>
156suppressed by using C<undef> for these headers in which case they won't be 158(this can be suppressed by using C<undef> for these headers in which case
157sent at all). 159they won't be sent at all).
158 160
159=item timeout => $seconds 161=item timeout => $seconds
160 162
161The time-out to use for various stages - each connect attempt will reset 163The time-out to use for various stages - each connect attempt will reset
162the timeout, as will read or write activity, i.e. this is not an overall 164the timeout, as will read or write activity, i.e. this is not an overall
172C<$scheme> must be either missing, C<http> for HTTP or C<https> for 174C<$scheme> must be either missing, C<http> for HTTP or C<https> for
173HTTPS. 175HTTPS.
174 176
175=item body => $string 177=item body => $string
176 178
177The request body, usually empty. Will be-sent as-is (future versions of 179The request body, usually empty. Will be sent as-is (future versions of
178this module might offer more options). 180this module might offer more options).
179 181
180=item cookie_jar => $hash_ref 182=item cookie_jar => $hash_ref
181 183
182Passing this parameter enables (simplified) cookie-processing, loosely 184Passing this parameter enables (simplified) cookie-processing, loosely
190Note that this cookie implementation is not of very high quality, nor 192Note that this cookie implementation is not of very high quality, nor
191meant to be complete. If you want complete cookie management you have to 193meant to be complete. If you want complete cookie management you have to
192do that on your own. C<cookie_jar> is meant as a quick fix to get some 194do that on your own. C<cookie_jar> is meant as a quick fix to get some
193cookie-using sites working. Cookies are a privacy disaster, do not use 195cookie-using sites working. Cookies are a privacy disaster, do not use
194them unless required to. 196them unless required to.
197
198When cookie processing is enabled, the C<Cookie:> and C<Set-Cookie:>
199headers will be ste and handled by this module, otherwise they will be
200left untouched.
195 201
196=item tls_ctx => $scheme | $tls_ctx 202=item tls_ctx => $scheme | $tls_ctx
197 203
198Specifies the AnyEvent::TLS context to be used for https connections. This 204Specifies the AnyEvent::TLS context to be used for https connections. This
199parameter follows the same rules as the C<tls_ctx> parameter to 205parameter follows the same rules as the C<tls_ctx> parameter to
212overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 218overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
213and behaves exactly the same way (e.g. it has to provide a 219and behaves exactly the same way (e.g. it has to provide a
214timeout). See the description for the C<$prepare_cb> argument of 220timeout). See the description for the C<$prepare_cb> argument of
215C<AnyEvent::Socket::tcp_connect> for details. 221C<AnyEvent::Socket::tcp_connect> for details.
216 222
223=item tcp_connect => $callback->($host, $service, $connect_cb, $prepare_cb)
224
225In even rarer cases you want total control over how AnyEvent::HTTP
226establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
227to do this, but you can provide your own C<tcp_connect> function -
228obviously, it has to follow the same calling conventions, except that it
229may always return a connection guard object.
230
231There are probably lots of weird uses for this function, starting from
232tracing the hosts C<http_request> actually tries to connect, to (inexact
233but fast) host => IP address caching or even socks protocol support.
234
217=item on_header => $callback->($headers) 235=item on_header => $callback->($headers)
218 236
219When specified, this callback will be called with the header hash as soon 237When specified, this callback will be called with the header hash as soon
220as headers have been successfully received from the remote server (not on 238as headers have been successfully received from the remote server (not on
221locally-generated errors). 239locally-generated errors).
226 244
227This callback is useful, among other things, to quickly reject unwanted 245This callback is useful, among other things, to quickly reject unwanted
228content, which, if it is supposed to be rare, can be faster than first 246content, which, if it is supposed to be rare, can be faster than first
229doing a C<HEAD> request. 247doing a C<HEAD> request.
230 248
249The downside is that cancelling the request makes it impossible to re-use
250the connection. Also, the C<on_header> callback will not receive any
251trailer (headers sent after the response body).
252
231Example: cancel the request unless the content-type is "text/html". 253Example: cancel the request unless the content-type is "text/html".
232 254
233 on_header => sub { 255 on_header => sub {
234 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ 256 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
235 }, 257 },
241string instead of the body data. 263string instead of the body data.
242 264
243It has to return either true (in which case AnyEvent::HTTP will continue), 265It has to return either true (in which case AnyEvent::HTTP will continue),
244or false, in which case AnyEvent::HTTP will cancel the download (and call 266or false, in which case AnyEvent::HTTP will cancel the download (and call
245the completion callback with an error code of C<598>). 267the completion callback with an error code of C<598>).
268
269The downside to cancelling the request is that it makes it impossible to
270re-use the connection.
246 271
247This callback is useful when the data is too large to be held in memory 272This callback is useful when the data is too large to be held in memory
248(so the callback writes it to a file) or when only some information should 273(so the callback writes it to a file) or when only some information should
249be extracted, or when the body should be processed incrementally. 274be extracted, or when the body should be processed incrementally.
250 275
276If you think you need this, first have a look at C<on_body>, to see if 301If you think you need this, first have a look at C<on_body>, to see if
277that doesn't solve your problem in a better way. 302that doesn't solve your problem in a better way.
278 303
279=back 304=back
280 305
281Example: make a simple HTTP GET request for http://www.nethype.de/ 306Example: do a simple HTTP GET request for http://www.nethype.de/ and print
307the response body.
282 308
283 http_request GET => "http://www.nethype.de/", sub { 309 http_request GET => "http://www.nethype.de/", sub {
284 my ($body, $hdr) = @_; 310 my ($body, $hdr) = @_;
285 print "$body\n"; 311 print "$body\n";
286 }; 312 };
287 313
288Example: make a HTTP HEAD request on https://www.google.com/, use a 314Example: do a HTTP HEAD request on https://www.google.com/, use a
289timeout of 30 seconds. 315timeout of 30 seconds.
290 316
291 http_request 317 http_request
292 GET => "https://www.google.com", 318 GET => "https://www.google.com",
293 timeout => 30, 319 timeout => 30,
296 use Data::Dumper; 322 use Data::Dumper;
297 print Dumper $hdr; 323 print Dumper $hdr;
298 } 324 }
299 ; 325 ;
300 326
301Example: make another simple HTTP GET request, but immediately try to 327Example: do another simple HTTP GET request, but immediately try to
302cancel it. 328cancel it.
303 329
304 my $request = http_request GET => "http://www.nethype.de/", sub { 330 my $request = http_request GET => "http://www.nethype.de/", sub {
305 my ($body, $hdr) = @_; 331 my ($body, $hdr) = @_;
306 print "$body\n"; 332 print "$body\n";
338 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 364 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
339 365
340 _slot_schedule $_[0]; 366 _slot_schedule $_[0];
341} 367}
342 368
369# continue to parse $_ for headers and place them into the arg
370sub parse_hdr() {
371 my %hdr;
372
373 # things seen, not parsed:
374 # p3pP="NON CUR OTPi OUR NOR UNI"
375
376 $hdr{lc $1} .= ",$2"
377 while /\G
378 ([^:\000-\037]*):
379 [\011\040]*
380 ((?: [^\012]+ | \012[\011\040] )*)
381 \012
382 /gxc;
383
384 /\G$/
385 or return;
386
387 # remove the "," prefix we added to all headers above
388 substr $_, 0, 1, ""
389 for values %hdr;
390
391 \%hdr
392}
393
343our $qr_nlnl = qr{(?<![^\012])\015?\012}; 394our $qr_nlnl = qr{(?<![^\012])\015?\012};
344 395
345our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 396our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
346our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 397our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
347 398
366 my @pseudo = (URL => $url); 417 my @pseudo = (URL => $url);
367 push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect}; 418 push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect};
368 419
369 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE; 420 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
370 421
371 return $cb->(undef, { Status => 599, Reason => "Too many redirections", @pseudo }) 422 return $cb->(undef, { @pseudo, Status => 599, Reason => "Too many redirections" })
372 if $recurse < 0; 423 if $recurse < 0;
373 424
374 my $proxy = $arg{proxy} || $PROXY; 425 my $proxy = $arg{proxy} || $PROXY;
375 my $timeout = $arg{timeout} || $TIMEOUT; 426 my $timeout = $arg{timeout} || $TIMEOUT;
376 427
379 430
380 $uscheme = lc $uscheme; 431 $uscheme = lc $uscheme;
381 432
382 my $uport = $uscheme eq "http" ? 80 433 my $uport = $uscheme eq "http" ? 80
383 : $uscheme eq "https" ? 443 434 : $uscheme eq "https" ? 443
384 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", @pseudo }); 435 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
385 436
386 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 437 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
387 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", @pseudo }); 438 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
388 439
389 my $uhost = $1; 440 my $uhost = $1;
390 $uport = $2 if defined $2; 441 $uport = $2 if defined $2;
391 442
392 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost" 443 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
441 } else { 492 } else {
442 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath); 493 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
443 } 494 }
444 495
445 # leave out fragment and query string, just a heuristic 496 # leave out fragment and query string, just a heuristic
446 $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer}; 497 $hdr{referer} = "$uscheme://$uauthority$upath" unless exists $hdr{referer};
447 $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"}; 498 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
448 499
449 $hdr{"content-length"} = length $arg{body} 500 $hdr{"content-length"} = length $arg{body}
450 if length $arg{body} || $method ne "GET"; 501 if length $arg{body} || $method ne "GET";
451 502
503 $hdr{connection} = "close TE"; #1.1
504 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
505
452 my %state = (connect_guard => 1); 506 my %state = (connect_guard => 1);
453 507
454 _get_slot $uhost, sub { 508 _get_slot $uhost, sub {
455 $state{slot_guard} = shift; 509 $state{slot_guard} = shift;
456 510
457 return unless $state{connect_guard}; 511 return unless $state{connect_guard};
458 512
459 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub { 513 my $connect_cb = sub {
460 $state{fh} = shift 514 $state{fh} = shift
461 or do { 515 or do {
462 my $err = "$!"; 516 my $err = "$!";
463 %state = (); 517 %state = ();
464 return $cb->(undef, { Status => 599, Reason => $err, @pseudo }); 518 return $cb->(undef, { @pseudo, Status => 599, Reason => $err });
465 }; 519 };
466 520
467 pop; # free memory, save a tree 521 pop; # free memory, save a tree
468 522
469 return unless delete $state{connect_guard}; 523 return unless delete $state{connect_guard};
475 tls_ctx => $arg{tls_ctx}, 529 tls_ctx => $arg{tls_ctx},
476 # these need to be reconfigured on keepalive handles 530 # these need to be reconfigured on keepalive handles
477 timeout => $timeout, 531 timeout => $timeout,
478 on_error => sub { 532 on_error => sub {
479 %state = (); 533 %state = ();
480 $cb->(undef, { Status => 599, Reason => $_[2], @pseudo }); 534 $cb->(undef, { @pseudo, Status => 599, Reason => $_[2] });
481 }, 535 },
482 on_eof => sub { 536 on_eof => sub {
483 %state = (); 537 %state = ();
484 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", @pseudo }); 538 $cb->(undef, { @pseudo, Status => 599, Reason => "Unexpected end-of-file" });
485 }, 539 },
486 ; 540 ;
487 541
488 # limit the number of persistent connections 542 # limit the number of persistent connections
489 # keepalive not yet supported 543 # keepalive not yet supported
491# ++$KA_COUNT{$_[1]}; 545# ++$KA_COUNT{$_[1]};
492# $state{handle}{ka_count_guard} = AnyEvent::Util::guard { 546# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
493# --$KA_COUNT{$_[1]} 547# --$KA_COUNT{$_[1]}
494# }; 548# };
495# $hdr{connection} = "keep-alive"; 549# $hdr{connection} = "keep-alive";
496# } else {
497 delete $hdr{connection};
498# } 550# }
499 551
500 $state{handle}->starttls ("connect") if $rscheme eq "https"; 552 $state{handle}->starttls ("connect") if $rscheme eq "https";
501 553
502 # handle actual, non-tunneled, request 554 # handle actual, non-tunneled, request
503 my $handle_actual_request = sub { 555 my $handle_actual_request = sub {
504 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls}; 556 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
505 557
506 # send request 558 # send request
507 $state{handle}->push_write ( 559 $state{handle}->push_write (
508 "$method $rpath HTTP/1.0\015\012" 560 "$method $rpath HTTP/1.1\015\012"
509 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 561 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
510 . "\015\012" 562 . "\015\012"
511 . (delete $arg{body}) 563 . (delete $arg{body})
512 ); 564 );
513 565
515 return unless %state; 567 return unless %state;
516 568
517 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use 569 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
518 570
519 # status line and headers 571 # status line and headers
520 $state{handle}->push_read (line => $qr_nlnl, sub { 572 $state{read_response} = sub {
521 for ("$_[1]") { 573 for ("$_[1]") {
522 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 574 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
523 575
524 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )? \015?\012/igxc 576 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/igxc
525 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response", @pseudo })); 577 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
578
579 # 100 Continue handling
580 # should not happen as we don't send expect: 100-continue,
581 # but we handle it just in case.
582 # since we send the request body regardless, if we get an error
583 # we are out of-sync, which we currently do NOT handle correctly.
584 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
585 if $2 eq 100;
526 586
527 push @pseudo, 587 push @pseudo,
528 HTTPVersion => $1, 588 HTTPVersion => $1,
529 Status => $2, 589 Status => $2,
530 Reason => $3, 590 Reason => $3,
531 ; 591 ;
532 592
533 # things seen, not parsed: 593 my $hdr = parse_hdr
534 # p3pP="NON CUR OTPi OUR NOR UNI"
535
536 $hdr{lc $1} .= ",$2"
537 while /\G
538 ([^:\000-\037]*):
539 [\011\040]*
540 ((?: [^\012]+ | \012[\011\040] )*)
541 \012
542 /gxc;
543
544 /\G$/
545 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", @pseudo })); 594 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
595
596 %hdr = (%$hdr, @pseudo);
546 } 597 }
547
548 # remove the "," prefix we added to all headers above
549 substr $_, 0, 1, ""
550 for values %hdr;
551
552 # patch in all pseudo headers
553 %hdr = (%hdr, @pseudo);
554 598
555 # redirect handling 599 # redirect handling
556 # microsoft and other shitheads don't give a shit for following standards, 600 # microsoft and other shitheads don't give a shit for following standards,
557 # try to support some common forms of broken Location headers. 601 # try to support some common forms of broken Location headers.
558 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) { 602 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
585 } elsif ($status == 307) { 629 } elsif ($status == 307) {
586 $redirect = 1; 630 $redirect = 1;
587 } 631 }
588 } 632 }
589 633
590 my $finish = sub { 634 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
635 my $keepalive = pop;
636
591 $state{handle}->destroy if $state{handle}; 637 $state{handle}->destroy if $state{handle};
592 %state = (); 638 %state = ();
593 639
640 if (defined $_[1]) {
641 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
642 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
643 }
644
594 # set-cookie processing 645 # set-cookie processing
595 if ($arg{cookie_jar}) { 646 if ($arg{cookie_jar}) {
596 for ($_[1]{"set-cookie"}) { 647 for ($hdr{"set-cookie"}) {
597 # parse NAME=VALUE 648 # parse NAME=VALUE
598 my @kv; 649 my @kv;
599 650
600 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) { 651 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
601 my $name = $1; 652 my $name = $1;
647 # we also access %hdr, as $_[1] might be an erro 698 # we also access %hdr, as $_[1] might be an erro
648 http_request ( 699 http_request (
649 $method => $hdr{location}, 700 $method => $hdr{location},
650 %arg, 701 %arg,
651 recurse => $recurse - 1, 702 recurse => $recurse - 1,
652 Redirect => \@_, 703 Redirect => [$_[0], \%hdr],
653 $cb); 704 $cb);
654 } else { 705 } else {
655 $cb->($_[0], $_[1]); 706 $cb->($_[0], \%hdr);
656 } 707 }
657 }; 708 };
658 709
659 my $len = $hdr{"content-length"}; 710 my $len = $hdr{"content-length"};
660 711
661 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) { 712 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
662 $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", @pseudo }); 713 $finish->(undef, 598 => "Request cancelled by on_header");
663 } elsif ( 714 } elsif (
664 $hdr{Status} =~ /^(?:1..|[23]04)$/ 715 $hdr{Status} =~ /^(?:1..|204|205|304)$/
665 or $method eq "HEAD" 716 or $method eq "HEAD"
666 or (defined $len && !$len) 717 or (defined $len && !$len)
667 ) { 718 ) {
668 # no body 719 # no body
669 $finish->("", \%hdr); 720 $finish->("", undef, undef, 1);
670 } else { 721 } else {
671 # body handling, four different code paths 722 # body handling, many different code paths
672 # for want_body_handle, on_body (2x), normal (2x) 723 # - no body expected
673 # we might read too much here, but it does not matter yet (no pers. connections) 724 # - want_body_handle
725 # - te chunked
726 # - 2x length known (with or without on_body)
727 # - 2x length not known (with or without on_body)
674 if (!$redirect && $arg{want_body_handle}) { 728 if (!$redirect && $arg{want_body_handle}) {
675 $_[0]->on_eof (undef); 729 $_[0]->on_eof (undef);
676 $_[0]->on_error (undef); 730 $_[0]->on_error (undef);
677 $_[0]->on_read (undef); 731 $_[0]->on_read (undef);
678 732
679 $finish->(delete $state{handle}, \%hdr); 733 $finish->(delete $state{handle});
734
735 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
736 my $cl = 0;
737 my $body = undef;
738 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
739
740 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
741
742 my $read_chunk; $read_chunk = sub {
743 $_[1] =~ /^([0-9a-fA-F]+)/
744 or $finish->(undef, 599 => "Garbled chunked transfer encoding");
745
746 my $len = hex $1;
747
748 if ($len) {
749 $cl += $len;
750
751 $_[0]->push_read (chunk => $len, sub {
752 $on_body->($_[1], \%hdr)
753 or return $finish->(undef, 598 => "Request cancelled by on_body");
754
755 $_[0]->push_read (line => sub {
756 length $_[1]
757 and return $finish->(undef, 599 => "Garbled chunked transfer encoding");
758 $_[0]->push_read (line => $read_chunk);
759 });
760 });
761 } else {
762 $hdr{"content-length"} ||= $cl;
763
764 $_[0]->push_read (line => $qr_nlnl, sub {
765 if (length $_[1]) {
766 for ("$_[1]") {
767 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
768
769 my $hdr = parse_hdr
770 or return $finish->(undef, 599 => "Garbled response trailers");
771
772 %hdr = (%hdr, %$hdr);
773 }
774 }
775
776 $finish->($body, undef, undef, 1);
777 });
778 }
779 };
780
781 $_[0]->push_read (line => $read_chunk);
680 782
681 } elsif ($arg{on_body}) { 783 } elsif ($arg{on_body}) {
682 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) }); 784 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
785
683 if ($len) { 786 if ($len) {
684 $_[0]->on_eof (undef);
685 $_[0]->on_read (sub { 787 $_[0]->on_read (sub {
686 $len -= length $_[0]{rbuf}; 788 $len -= length $_[0]{rbuf};
687 789
688 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 790 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
689 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo }); 791 or return $finish->(undef, 598 => "Request cancelled by on_body");
690 792
691 $len > 0 793 $len > 0
692 or $finish->("", \%hdr); 794 or $finish->("", undef, undef, 1);
693 }); 795 });
694 } else { 796 } else {
695 $_[0]->on_eof (sub { 797 $_[0]->on_eof (sub {
696 $finish->("", \%hdr); 798 $finish->("");
697 }); 799 });
698 $_[0]->on_read (sub { 800 $_[0]->on_read (sub {
699 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 801 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
700 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo }); 802 or $finish->(undef, 598 => "Request cancelled by on_body");
701 }); 803 });
702 } 804 }
703 } else { 805 } else {
704 $_[0]->on_eof (undef); 806 $_[0]->on_eof (undef);
705 807
706 if ($len) { 808 if ($len) {
707 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) }); 809 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
708 $_[0]->on_read (sub { 810 $_[0]->on_read (sub {
709 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr) 811 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
710 if $len <= length $_[0]{rbuf}; 812 if $len <= length $_[0]{rbuf};
711 }); 813 });
712 } else { 814 } else {
713 $_[0]->on_error (sub { 815 $_[0]->on_error (sub {
714 ($! == Errno::EPIPE || !$!) 816 ($! == Errno::EPIPE || !$!)
715 ? $finish->(delete $_[0]{rbuf}, \%hdr) 817 ? $finish->(delete $_[0]{rbuf})
716 : $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }); 818 : $finish->(undef, 599 => $_[2]);
717 }); 819 });
718 $_[0]->on_read (sub { }); 820 $_[0]->on_read (sub { });
719 } 821 }
720 } 822 }
721 } 823 }
722 }); 824 };
825
826 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
723 }; 827 };
724 828
725 # now handle proxy-CONNECT method 829 # now handle proxy-CONNECT method
726 if ($proxy && $uscheme eq "https") { 830 if ($proxy && $uscheme eq "https") {
727 # oh dear, we have to wrap it into a connect request 831 # oh dear, we have to wrap it into a connect request
728 832
729 # maybe re-use $uauthority with patched port? 833 # maybe re-use $uauthority with patched port?
730 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012"); 834 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
731 $state{handle}->push_read (line => $qr_nlnl, sub { 835 $state{handle}->push_read (line => $qr_nlnl, sub {
732 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 836 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
733 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", @pseudo })); 837 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }));
734 838
735 if ($2 == 200) { 839 if ($2 == 200) {
736 $rpath = $upath; 840 $rpath = $upath;
737 &$handle_actual_request; 841 &$handle_actual_request;
738 } else { 842 } else {
739 %state = (); 843 %state = ();
740 $cb->(undef, { Status => $2, Reason => $3, @pseudo }); 844 $cb->(undef, { @pseudo, Status => $2, Reason => $3 });
741 } 845 }
742 }); 846 });
743 } else { 847 } else {
744 &$handle_actual_request; 848 &$handle_actual_request;
745 } 849 }
850 };
746 851
747 }, $arg{on_prepare} || sub { $timeout }; 852 my $tcp_connect = $arg{tcp_connect}
853 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
854
855 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
856
748 }; 857 };
749 858
750 defined wantarray && AnyEvent::Util::guard { %state = () } 859 defined wantarray && AnyEvent::Util::guard { %state = () }
751} 860}
752 861
787string of the form C<http://host:port> (optionally C<https:...>), croaks 896string of the form C<http://host:port> (optionally C<https:...>), croaks
788otherwise. 897otherwise.
789 898
790To clear an already-set proxy, use C<undef>. 899To clear an already-set proxy, use C<undef>.
791 900
901=item $date = AnyEvent::HTTP::format_date $timestamp
902
903Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
904Date (RFC 2616).
905
906=item $timestamp = AnyEvent::HTTP::parse_date $date
907
908Takes a HTTP Date (RFC 2616) and returns the corresponding POSIX
909timestamp, or C<undef> if the date cannot be parsed.
910
792=item $AnyEvent::HTTP::MAX_RECURSE 911=item $AnyEvent::HTTP::MAX_RECURSE
793 912
794The default value for the C<recurse> request parameter (default: C<10>). 913The default value for the C<recurse> request parameter (default: C<10>).
795 914
796=item $AnyEvent::HTTP::USERAGENT 915=item $AnyEvent::HTTP::USERAGENT
814connections. This number of can be useful for load-leveling. 933connections. This number of can be useful for load-leveling.
815 934
816=back 935=back
817 936
818=cut 937=cut
938
939our @month = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
940our @weekday = qw(Sun Mon Tue Wed Thu Fri Sat);
941
942sub format_date($) {
943 my ($time) = @_;
944
945 # RFC 822/1123 format
946 my ($S, $M, $H, $mday, $mon, $year, $wday, $yday, undef) = gmtime $time;
947
948 sprintf "%s, %02d %s %04d %02d:%02d:%02d GMT",
949 $weekday[$wday], $mday, $month[$mon], $year + 1900,
950 $H, $M, $S;
951}
952
953sub parse_date($) {
954 my ($date) = @_;
955
956 my ($d, $m, $y, $H, $M, $S);
957
958 if ($date =~ /^[A-Z][a-z][a-z], ([0-9][0-9]) ([A-Z][a-z][a-z]) ([0-9][0-9][0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
959 # RFC 822/1123, required by RFC 2616
960 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6);
961
962 } elsif ($date =~ /^[A-Z][a-z]+, ([0-9][0-9])-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
963 # RFC 850
964 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6);
965
966 } elsif ($date =~ /^[A-Z][a-z][a-z] ([A-Z][a-z][a-z]) ([0-9 ][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) ([0-9][0-9][0-9][0-9])$/) {
967 # ISO C's asctime
968 ($d, $m, $y, $H, $M, $S) = ($2, $1, $6, $3, $4, $5);
969 }
970 # other formats fail in the loop below
971
972 for (0..11) {
973 if ($m eq $month[$_]) {
974 require Time::Local;
975 return Time::Local::timegm ($S, $M, $H, $d, $_, $y);
976 }
977 }
978
979 undef
980}
819 981
820sub set_proxy($) { 982sub set_proxy($) {
821 if (length $_[0]) { 983 if (length $_[0]) {
822 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix 984 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix
823 or Carp::croak "$_[0]: invalid proxy URL"; 985 or Carp::croak "$_[0]: invalid proxy URL";
830# initialise proxy from environment 992# initialise proxy from environment
831eval { 993eval {
832 set_proxy $ENV{http_proxy}; 994 set_proxy $ENV{http_proxy};
833}; 995};
834 996
997=head2 SOCKS PROXIES
998
999Socks proxies are not directly supported by AnyEvent::HTTP. You can
1000compile your perl to support socks, or use an external program such as
1001F<socksify> (dante) or F<tsocks> to make your program use a socks proxy
1002transparently.
1003
1004Alternatively, for AnyEvent::HTTP only, you can use your own
1005C<tcp_connect> function that does the proxy handshake - here is an example
1006that works with socks4a proxies:
1007
1008 use Errno;
1009 use AnyEvent::Util;
1010 use AnyEvent::Socket;
1011 use AnyEvent::Handle;
1012
1013 # host, port and username of/for your socks4a proxy
1014 my $socks_host = "10.0.0.23";
1015 my $socks_port = 9050;
1016 my $socks_user = "";
1017
1018 sub socks4a_connect {
1019 my ($host, $port, $connect_cb, $prepare_cb) = @_;
1020
1021 my $hdl = new AnyEvent::Handle
1022 connect => [$socks_host, $socks_port],
1023 on_prepare => sub { $prepare_cb->($_[0]{fh}) },
1024 on_error => sub { $connect_cb->() },
1025 ;
1026
1027 $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
1028
1029 $hdl->push_read (chunk => 8, sub {
1030 my ($hdl, $chunk) = @_;
1031 my ($status, $port, $ipn) = unpack "xCna4", $chunk;
1032
1033 if ($status == 0x5a) {
1034 $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
1035 } else {
1036 $! = Errno::ENXIO; $connect_cb->();
1037 }
1038 });
1039
1040 $hdl
1041 }
1042
1043Use C<socks4a_connect> instead of C<tcp_connect> when doing C<http_request>s,
1044possibly after switching off other proxy types:
1045
1046 AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
1047
1048 http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
1049 my ($data, $headers) = @_;
1050 ...
1051 };
1052
835=head1 SEE ALSO 1053=head1 SEE ALSO
836 1054
837L<AnyEvent>. 1055L<AnyEvent>.
838 1056
839=head1 AUTHOR 1057=head1 AUTHOR

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines