ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.58 by root, Sun Nov 14 20:23:00 2010 UTC vs.
Revision 1.68 by root, Fri Dec 31 19:22:18 2010 UTC

43 43
44use Errno (); 44use Errno ();
45 45
46use AnyEvent 5.0 (); 46use AnyEvent 5.0 ();
47use AnyEvent::Util (); 47use AnyEvent::Util ();
48use AnyEvent::Socket ();
49use AnyEvent::Handle (); 48use AnyEvent::Handle ();
50 49
51use base Exporter::; 50use base Exporter::;
52 51
53our $VERSION = '1.46'; 52our $VERSION = '1.5';
54 53
55our @EXPORT = qw(http_get http_post http_head http_request); 54our @EXPORT = qw(http_get http_post http_head http_request);
56 55
57our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 56our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58our $MAX_RECURSE = 10; 57our $MAX_RECURSE = 10;
95C<http_request> returns a "cancellation guard" - you have to keep the 94C<http_request> returns a "cancellation guard" - you have to keep the
96object at least alive until the callback get called. If the object gets 95object at least alive until the callback get called. If the object gets
97destroyed before the callback is called, the request will be cancelled. 96destroyed before the callback is called, the request will be cancelled.
98 97
99The callback will be called with the response body data as first argument 98The callback will be called with the response body data as first argument
100(or C<undef> if an error occured), and a hash-ref with response headers as 99(or C<undef> if an error occured), and a hash-ref with response headers
101second argument. 100(and trailers) as second argument.
102 101
103All the headers in that hash are lowercased. In addition to the response 102All the headers in that hash are lowercased. In addition to the response
104headers, the "pseudo-headers" (uppercase to avoid clashing with possible 103headers, the "pseudo-headers" (uppercase to avoid clashing with possible
105response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the 104response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
106three parts of the HTTP Status-Line of the same name. 105three parts of the HTTP Status-Line of the same name. If an error occurs
106during the body phase of a request, then the original C<Status> and
107C<Reason> values from the header are available as C<OrigStatus> and
108C<OrigReason>.
107 109
108The pseudo-header C<URL> contains the actual URL (which can differ from 110The pseudo-header C<URL> contains the actual URL (which can differ from
109the requested URL when following redirects - for example, you might get 111the requested URL when following redirects - for example, you might get
110an error that your URL scheme is not supported even though your URL is a 112an error that your URL scheme is not supported even though your URL is a
111valid http URL because it redirected to an ftp URL, in which case you can 113valid http URL because it redirected to an ftp URL, in which case you can
148Whether to recurse requests or not, e.g. on redirects, authentication 150Whether to recurse requests or not, e.g. on redirects, authentication
149retries and so on, and how often to do so. 151retries and so on, and how often to do so.
150 152
151=item headers => hashref 153=item headers => hashref
152 154
153The request headers to use. Currently, C<http_request> may provide its 155The request headers to use. Currently, C<http_request> may provide its own
154own C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers 156C<Host:>, C<Content-Length:>, C<Connection:> and C<Cookie:> headers and
155and will provide defaults for C<User-Agent:> and C<Referer:> (this can be 157will provide defaults for C<TE:>, C<Referer:> and C<User-Agent:> (this can
156suppressed by using C<undef> for these headers in which case they won't be 158be suppressed by using C<undef> for these headers in which case they won't
157sent at all). 159be sent at all).
158 160
159=item timeout => $seconds 161=item timeout => $seconds
160 162
161The time-out to use for various stages - each connect attempt will reset 163The time-out to use for various stages - each connect attempt will reset
162the timeout, as will read or write activity, i.e. this is not an overall 164the timeout, as will read or write activity, i.e. this is not an overall
172C<$scheme> must be either missing, C<http> for HTTP or C<https> for 174C<$scheme> must be either missing, C<http> for HTTP or C<https> for
173HTTPS. 175HTTPS.
174 176
175=item body => $string 177=item body => $string
176 178
177The request body, usually empty. Will be-sent as-is (future versions of 179The request body, usually empty. Will be sent as-is (future versions of
178this module might offer more options). 180this module might offer more options).
179 181
180=item cookie_jar => $hash_ref 182=item cookie_jar => $hash_ref
181 183
182Passing this parameter enables (simplified) cookie-processing, loosely 184Passing this parameter enables (simplified) cookie-processing, loosely
212overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect> 214overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
213and behaves exactly the same way (e.g. it has to provide a 215and behaves exactly the same way (e.g. it has to provide a
214timeout). See the description for the C<$prepare_cb> argument of 216timeout). See the description for the C<$prepare_cb> argument of
215C<AnyEvent::Socket::tcp_connect> for details. 217C<AnyEvent::Socket::tcp_connect> for details.
216 218
219=item tcp_connect => $callback->($host, $service, $connect_cb, $prepare_cb)
220
221In even rarer cases you want total control over how AnyEvent::HTTP
222establishes connections. Normally it uses L<AnyEvent::Socket::tcp_connect>
223to do this, but you can provide your own C<tcp_connect> function -
224obviously, it has to follow the same calling conventions, except that it
225may always return a connection guard object.
226
227There are probably lots of weird uses for this function, starting from
228tracing the hosts C<http_request> actually tries to connect, to (inexact
229but fast) host => IP address caching or even socks protocol support.
230
217=item on_header => $callback->($headers) 231=item on_header => $callback->($headers)
218 232
219When specified, this callback will be called with the header hash as soon 233When specified, this callback will be called with the header hash as soon
220as headers have been successfully received from the remote server (not on 234as headers have been successfully received from the remote server (not on
221locally-generated errors). 235locally-generated errors).
226 240
227This callback is useful, among other things, to quickly reject unwanted 241This callback is useful, among other things, to quickly reject unwanted
228content, which, if it is supposed to be rare, can be faster than first 242content, which, if it is supposed to be rare, can be faster than first
229doing a C<HEAD> request. 243doing a C<HEAD> request.
230 244
245The downside is that cancelling the request makes it impossible to re-use
246the connection. Also, the C<on_header> callback will not receive any
247trailer (headers sent after the response body).
248
231Example: cancel the request unless the content-type is "text/html". 249Example: cancel the request unless the content-type is "text/html".
232 250
233 on_header => sub { 251 on_header => sub {
234 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/ 252 $_[0]{"content-type"} =~ /^text\/html\s*(?:;|$)/
235 }, 253 },
241string instead of the body data. 259string instead of the body data.
242 260
243It has to return either true (in which case AnyEvent::HTTP will continue), 261It has to return either true (in which case AnyEvent::HTTP will continue),
244or false, in which case AnyEvent::HTTP will cancel the download (and call 262or false, in which case AnyEvent::HTTP will cancel the download (and call
245the completion callback with an error code of C<598>). 263the completion callback with an error code of C<598>).
264
265The downside to cancelling the request is that it makes it impossible to
266re-use the connection.
246 267
247This callback is useful when the data is too large to be held in memory 268This callback is useful when the data is too large to be held in memory
248(so the callback writes it to a file) or when only some information should 269(so the callback writes it to a file) or when only some information should
249be extracted, or when the body should be processed incrementally. 270be extracted, or when the body should be processed incrementally.
250 271
276If you think you need this, first have a look at C<on_body>, to see if 297If you think you need this, first have a look at C<on_body>, to see if
277that doesn't solve your problem in a better way. 298that doesn't solve your problem in a better way.
278 299
279=back 300=back
280 301
281Example: make a simple HTTP GET request for http://www.nethype.de/ 302Example: do a simple HTTP GET request for http://www.nethype.de/ and print
303the response body.
282 304
283 http_request GET => "http://www.nethype.de/", sub { 305 http_request GET => "http://www.nethype.de/", sub {
284 my ($body, $hdr) = @_; 306 my ($body, $hdr) = @_;
285 print "$body\n"; 307 print "$body\n";
286 }; 308 };
287 309
288Example: make a HTTP HEAD request on https://www.google.com/, use a 310Example: do a HTTP HEAD request on https://www.google.com/, use a
289timeout of 30 seconds. 311timeout of 30 seconds.
290 312
291 http_request 313 http_request
292 GET => "https://www.google.com", 314 GET => "https://www.google.com",
293 timeout => 30, 315 timeout => 30,
296 use Data::Dumper; 318 use Data::Dumper;
297 print Dumper $hdr; 319 print Dumper $hdr;
298 } 320 }
299 ; 321 ;
300 322
301Example: make another simple HTTP GET request, but immediately try to 323Example: do another simple HTTP GET request, but immediately try to
302cancel it. 324cancel it.
303 325
304 my $request = http_request GET => "http://www.nethype.de/", sub { 326 my $request = http_request GET => "http://www.nethype.de/", sub {
305 my ($body, $hdr) = @_; 327 my ($body, $hdr) = @_;
306 print "$body\n"; 328 print "$body\n";
338 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 360 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
339 361
340 _slot_schedule $_[0]; 362 _slot_schedule $_[0];
341} 363}
342 364
365# continue to parse $_ for headers and place them into the arg
366sub parse_hdr() {
367 my %hdr;
368
369 # things seen, not parsed:
370 # p3pP="NON CUR OTPi OUR NOR UNI"
371
372 $hdr{lc $1} .= ",$2"
373 while /\G
374 ([^:\000-\037]*):
375 [\011\040]*
376 ((?: [^\012]+ | \012[\011\040] )*)
377 \012
378 /gxc;
379
380 /\G$/
381 or return;
382
383 # remove the "," prefix we added to all headers above
384 substr $_, 0, 1, ""
385 for values %hdr;
386
387 \%hdr
388}
389
343our $qr_nlnl = qr{(?<![^\012])\015?\012}; 390our $qr_nlnl = qr{(?<![^\012])\015?\012};
344 391
345our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 392our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
346our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 393our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
347 394
366 my @pseudo = (URL => $url); 413 my @pseudo = (URL => $url);
367 push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect}; 414 push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect};
368 415
369 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE; 416 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
370 417
371 return $cb->(undef, { Status => 599, Reason => "Too many redirections", @pseudo }) 418 return $cb->(undef, { @pseudo, Status => 599, Reason => "Too many redirections" })
372 if $recurse < 0; 419 if $recurse < 0;
373 420
374 my $proxy = $arg{proxy} || $PROXY; 421 my $proxy = $arg{proxy} || $PROXY;
375 my $timeout = $arg{timeout} || $TIMEOUT; 422 my $timeout = $arg{timeout} || $TIMEOUT;
376 423
379 426
380 $uscheme = lc $uscheme; 427 $uscheme = lc $uscheme;
381 428
382 my $uport = $uscheme eq "http" ? 80 429 my $uport = $uscheme eq "http" ? 80
383 : $uscheme eq "https" ? 443 430 : $uscheme eq "https" ? 443
384 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", @pseudo }); 431 : return $cb->(undef, { @pseudo, Status => 599, Reason => "Only http and https URL schemes supported" });
385 432
386 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 433 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
387 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", @pseudo }); 434 or return $cb->(undef, { @pseudo, Status => 599, Reason => "Unparsable URL" });
388 435
389 my $uhost = $1; 436 my $uhost = $1;
390 $uport = $2 if defined $2; 437 $uport = $2 if defined $2;
391 438
392 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost" 439 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
441 } else { 488 } else {
442 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath); 489 ($rhost, $rport, $rscheme, $rpath) = ($uhost, $uport, $uscheme, $upath);
443 } 490 }
444 491
445 # leave out fragment and query string, just a heuristic 492 # leave out fragment and query string, just a heuristic
446 $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer}; 493 $hdr{referer} = "$uscheme://$uauthority$upath" unless exists $hdr{referer};
447 $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"}; 494 $hdr{"user-agent"} = $USERAGENT unless exists $hdr{"user-agent"};
448 495
449 $hdr{"content-length"} = length $arg{body} 496 $hdr{"content-length"} = length $arg{body}
450 if length $arg{body} || $method ne "GET"; 497 if length $arg{body} || $method ne "GET";
451 498
499 $hdr{connection} = "close TE"; #1.1
500 $hdr{te} = "trailers" unless exists $hdr{te}; #1.1
501
452 my %state = (connect_guard => 1); 502 my %state = (connect_guard => 1);
453 503
454 _get_slot $uhost, sub { 504 _get_slot $uhost, sub {
455 $state{slot_guard} = shift; 505 $state{slot_guard} = shift;
456 506
457 return unless $state{connect_guard}; 507 return unless $state{connect_guard};
458 508
459 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub { 509 my $connect_cb = sub {
460 $state{fh} = shift 510 $state{fh} = shift
461 or do { 511 or do {
462 my $err = "$!"; 512 my $err = "$!";
463 %state = (); 513 %state = ();
464 return $cb->(undef, { Status => 599, Reason => $err, @pseudo }); 514 return $cb->(undef, { @pseudo, Status => 599, Reason => $err });
465 }; 515 };
466 516
467 pop; # free memory, save a tree 517 pop; # free memory, save a tree
468 518
469 return unless delete $state{connect_guard}; 519 return unless delete $state{connect_guard};
475 tls_ctx => $arg{tls_ctx}, 525 tls_ctx => $arg{tls_ctx},
476 # these need to be reconfigured on keepalive handles 526 # these need to be reconfigured on keepalive handles
477 timeout => $timeout, 527 timeout => $timeout,
478 on_error => sub { 528 on_error => sub {
479 %state = (); 529 %state = ();
480 $cb->(undef, { Status => 599, Reason => $_[2], @pseudo }); 530 $cb->(undef, { @pseudo, Status => 599, Reason => $_[2] });
481 }, 531 },
482 on_eof => sub { 532 on_eof => sub {
483 %state = (); 533 %state = ();
484 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", @pseudo }); 534 $cb->(undef, { @pseudo, Status => 599, Reason => "Unexpected end-of-file" });
485 }, 535 },
486 ; 536 ;
487 537
488 # limit the number of persistent connections 538 # limit the number of persistent connections
489 # keepalive not yet supported 539 # keepalive not yet supported
491# ++$KA_COUNT{$_[1]}; 541# ++$KA_COUNT{$_[1]};
492# $state{handle}{ka_count_guard} = AnyEvent::Util::guard { 542# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
493# --$KA_COUNT{$_[1]} 543# --$KA_COUNT{$_[1]}
494# }; 544# };
495# $hdr{connection} = "keep-alive"; 545# $hdr{connection} = "keep-alive";
496# } else {
497 delete $hdr{connection};
498# } 546# }
499 547
500 $state{handle}->starttls ("connect") if $rscheme eq "https"; 548 $state{handle}->starttls ("connect") if $rscheme eq "https";
501 549
502 # handle actual, non-tunneled, request 550 # handle actual, non-tunneled, request
503 my $handle_actual_request = sub { 551 my $handle_actual_request = sub {
504 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls}; 552 $state{handle}->starttls ("connect") if $uscheme eq "https" && !exists $state{handle}{tls};
505 553
506 # send request 554 # send request
507 $state{handle}->push_write ( 555 $state{handle}->push_write (
508 "$method $rpath HTTP/1.0\015\012" 556 "$method $rpath HTTP/1.1\015\012"
509 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 557 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
510 . "\015\012" 558 . "\015\012"
511 . (delete $arg{body}) 559 . (delete $arg{body})
512 ); 560 );
513 561
515 return unless %state; 563 return unless %state;
516 564
517 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use 565 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
518 566
519 # status line and headers 567 # status line and headers
520 $state{handle}->push_read (line => $qr_nlnl, sub { 568 $state{read_response} = sub {
521 for ("$_[1]") { 569 for ("$_[1]") {
522 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 570 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
523 571
524 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )? \015?\012/igxc 572 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\012]*) )? \012/igxc
525 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response", @pseudo })); 573 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid server response" }));
574
575 # 100 Continue handling
576 # should not happen as we don't send expect: 100-continue,
577 # but we handle it just in case.
578 # since we send the request body regardless, if we get an error
579 # we are out of-sync, which we currently do NOT handle correctly.
580 return $state{handle}->push_read (line => $qr_nlnl, $state{read_response})
581 if $2 eq 100;
526 582
527 push @pseudo, 583 push @pseudo,
528 HTTPVersion => $1, 584 HTTPVersion => $1,
529 Status => $2, 585 Status => $2,
530 Reason => $3, 586 Reason => $3,
531 ; 587 ;
532 588
533 # things seen, not parsed: 589 my $hdr = parse_hdr
534 # p3pP="NON CUR OTPi OUR NOR UNI"
535
536 $hdr{lc $1} .= ",$2"
537 while /\G
538 ([^:\000-\037]*):
539 [\011\040]*
540 ((?: [^\012]+ | \012[\011\040] )*)
541 \012
542 /gxc;
543
544 /\G$/
545 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", @pseudo })); 590 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Garbled response headers" }));
591
592 %hdr = (%$hdr, @pseudo);
546 } 593 }
547
548 # remove the "," prefix we added to all headers above
549 substr $_, 0, 1, ""
550 for values %hdr;
551
552 # patch in all pseudo headers
553 %hdr = (%hdr, @pseudo);
554 594
555 # redirect handling 595 # redirect handling
556 # microsoft and other shitheads don't give a shit for following standards, 596 # microsoft and other shitheads don't give a shit for following standards,
557 # try to support some common forms of broken Location headers. 597 # try to support some common forms of broken Location headers.
558 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) { 598 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
585 } elsif ($status == 307) { 625 } elsif ($status == 307) {
586 $redirect = 1; 626 $redirect = 1;
587 } 627 }
588 } 628 }
589 629
590 my $finish = sub { 630 my $finish = sub { # ($data, $err_status, $err_reason[, $keepalive])
631 my $keepalive = pop;
632
591 $state{handle}->destroy if $state{handle}; 633 $state{handle}->destroy if $state{handle};
592 %state = (); 634 %state = ();
593 635
636 if (defined $_[1]) {
637 $hdr{OrigStatus} = $hdr{Status}; $hdr{Status} = $_[1];
638 $hdr{OrigReason} = $hdr{Reason}; $hdr{Reason} = $_[2];
639 }
640
594 # set-cookie processing 641 # set-cookie processing
595 if ($arg{cookie_jar}) { 642 if ($arg{cookie_jar}) {
596 for ($_[1]{"set-cookie"}) { 643 for ($hdr{"set-cookie"}) {
597 # parse NAME=VALUE 644 # parse NAME=VALUE
598 my @kv; 645 my @kv;
599 646
600 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) { 647 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
601 my $name = $1; 648 my $name = $1;
647 # we also access %hdr, as $_[1] might be an erro 694 # we also access %hdr, as $_[1] might be an erro
648 http_request ( 695 http_request (
649 $method => $hdr{location}, 696 $method => $hdr{location},
650 %arg, 697 %arg,
651 recurse => $recurse - 1, 698 recurse => $recurse - 1,
652 Redirect => \@_, 699 Redirect => [$_[0], \%hdr],
653 $cb); 700 $cb);
654 } else { 701 } else {
655 $cb->($_[0], $_[1]); 702 $cb->($_[0], \%hdr);
656 } 703 }
657 }; 704 };
658 705
659 my $len = $hdr{"content-length"}; 706 my $len = $hdr{"content-length"};
660 707
661 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) { 708 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
662 $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", @pseudo }); 709 $finish->(undef, 598 => "Request cancelled by on_header");
663 } elsif ( 710 } elsif (
664 $hdr{Status} =~ /^(?:1..|[23]04)$/ 711 $hdr{Status} =~ /^(?:1..|204|205|304)$/
665 or $method eq "HEAD" 712 or $method eq "HEAD"
666 or (defined $len && !$len) 713 or (defined $len && !$len)
667 ) { 714 ) {
668 # no body 715 # no body
669 $finish->("", \%hdr); 716 $finish->("", undef, undef, 1);
670 } else { 717 } else {
671 # body handling, four different code paths 718 # body handling, many different code paths
672 # for want_body_handle, on_body (2x), normal (2x) 719 # - no body expected
673 # we might read too much here, but it does not matter yet (no pers. connections) 720 # - want_body_handle
721 # - te chunked
722 # - 2x length known (with or without on_body)
723 # - 2x length not known (with or without on_body)
674 if (!$redirect && $arg{want_body_handle}) { 724 if (!$redirect && $arg{want_body_handle}) {
675 $_[0]->on_eof (undef); 725 $_[0]->on_eof (undef);
676 $_[0]->on_error (undef); 726 $_[0]->on_error (undef);
677 $_[0]->on_read (undef); 727 $_[0]->on_read (undef);
678 728
679 $finish->(delete $state{handle}, \%hdr); 729 $finish->(delete $state{handle});
730
731 } elsif ($hdr{"transfer-encoding"} =~ /\bchunked\b/i) {
732 my $cl = 0;
733 my $body = undef;
734 my $on_body = $arg{on_body} || sub { $body .= shift; 1 };
735
736 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
737
738 my $read_chunk; $read_chunk = sub {
739 $_[1] =~ /^([0-9a-fA-F]+)/
740 or $finish->(undef, 599 => "Garbled chunked transfer encoding");
741
742 my $len = hex $1;
743
744 if ($len) {
745 $cl += $len;
746
747 $_[0]->push_read (chunk => $len, sub {
748 $on_body->($_[1], \%hdr)
749 or return $finish->(undef, 598 => "Request cancelled by on_body");
750
751 $_[0]->push_read (line => sub {
752 length $_[1]
753 and return $finish->(undef, 599 => "Garbled chunked transfer encoding");
754 $_[0]->push_read (line => $read_chunk);
755 });
756 });
757 } else {
758 $hdr{"content-length"} ||= $cl;
759
760 $_[0]->push_read (line => $qr_nlnl, sub {
761 if (length $_[1]) {
762 for ("$_[1]") {
763 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
764
765 my $hdr = parse_hdr
766 or return $finish->(undef, 599 => "Garbled response trailers");
767
768 %hdr = (%hdr, %$hdr);
769 }
770 }
771
772 $finish->($body, undef, undef, 1);
773 });
774 }
775 };
776
777 $_[0]->push_read (line => $read_chunk);
680 778
681 } elsif ($arg{on_body}) { 779 } elsif ($arg{on_body}) {
682 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) }); 780 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
781
683 if ($len) { 782 if ($len) {
684 $_[0]->on_eof (undef);
685 $_[0]->on_read (sub { 783 $_[0]->on_read (sub {
686 $len -= length $_[0]{rbuf}; 784 $len -= length $_[0]{rbuf};
687 785
688 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 786 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
689 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo }); 787 or return $finish->(undef, 598 => "Request cancelled by on_body");
690 788
691 $len > 0 789 $len > 0
692 or $finish->("", \%hdr); 790 or $finish->("", undef, undef, 1);
693 }); 791 });
694 } else { 792 } else {
695 $_[0]->on_eof (sub { 793 $_[0]->on_eof (sub {
696 $finish->("", \%hdr); 794 $finish->("");
697 }); 795 });
698 $_[0]->on_read (sub { 796 $_[0]->on_read (sub {
699 $arg{on_body}(delete $_[0]{rbuf}, \%hdr) 797 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
700 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo }); 798 or $finish->(undef, 598 => "Request cancelled by on_body");
701 }); 799 });
702 } 800 }
703 } else { 801 } else {
704 $_[0]->on_eof (undef); 802 $_[0]->on_eof (undef);
705 803
706 if ($len) { 804 if ($len) {
707 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) }); 805 $_[0]->on_error (sub { $finish->(undef, 599 => $_[2]) });
708 $_[0]->on_read (sub { 806 $_[0]->on_read (sub {
709 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr) 807 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), undef, undef, 1)
710 if $len <= length $_[0]{rbuf}; 808 if $len <= length $_[0]{rbuf};
711 }); 809 });
712 } else { 810 } else {
713 $_[0]->on_error (sub { 811 $_[0]->on_error (sub {
714 ($! == Errno::EPIPE || !$!) 812 ($! == Errno::EPIPE || !$!)
715 ? $finish->(delete $_[0]{rbuf}, \%hdr) 813 ? $finish->(delete $_[0]{rbuf})
716 : $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }); 814 : $finish->(undef, 599 => $_[2]);
717 }); 815 });
718 $_[0]->on_read (sub { }); 816 $_[0]->on_read (sub { });
719 } 817 }
720 } 818 }
721 } 819 }
722 }); 820 };
821
822 $state{handle}->push_read (line => $qr_nlnl, $state{read_response});
723 }; 823 };
724 824
725 # now handle proxy-CONNECT method 825 # now handle proxy-CONNECT method
726 if ($proxy && $uscheme eq "https") { 826 if ($proxy && $uscheme eq "https") {
727 # oh dear, we have to wrap it into a connect request 827 # oh dear, we have to wrap it into a connect request
728 828
729 # maybe re-use $uauthority with patched port? 829 # maybe re-use $uauthority with patched port?
730 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012"); 830 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
731 $state{handle}->push_read (line => $qr_nlnl, sub { 831 $state{handle}->push_read (line => $qr_nlnl, sub {
732 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 832 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
733 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", @pseudo })); 833 or return (%state = (), $cb->(undef, { @pseudo, Status => 599, Reason => "Invalid proxy connect response ($_[1])" }));
734 834
735 if ($2 == 200) { 835 if ($2 == 200) {
736 $rpath = $upath; 836 $rpath = $upath;
737 &$handle_actual_request; 837 &$handle_actual_request;
738 } else { 838 } else {
739 %state = (); 839 %state = ();
740 $cb->(undef, { Status => $2, Reason => $3, @pseudo }); 840 $cb->(undef, { @pseudo, Status => $2, Reason => $3 });
741 } 841 }
742 }); 842 });
743 } else { 843 } else {
744 &$handle_actual_request; 844 &$handle_actual_request;
745 } 845 }
846 };
746 847
747 }, $arg{on_prepare} || sub { $timeout }; 848 my $tcp_connect = $arg{tcp_connect}
849 || do { require AnyEvent::Socket; \&AnyEvent::Socket::tcp_connect };
850
851 $state{connect_guard} = $tcp_connect->($rhost, $rport, $connect_cb, $arg{on_prepare} || sub { $timeout });
852
748 }; 853 };
749 854
750 defined wantarray && AnyEvent::Util::guard { %state = () } 855 defined wantarray && AnyEvent::Util::guard { %state = () }
751} 856}
752 857
787string of the form C<http://host:port> (optionally C<https:...>), croaks 892string of the form C<http://host:port> (optionally C<https:...>), croaks
788otherwise. 893otherwise.
789 894
790To clear an already-set proxy, use C<undef>. 895To clear an already-set proxy, use C<undef>.
791 896
897=item $date = AnyEvent::HTTP::format_date $timestamp
898
899Takes a POSIX timestamp (seconds since the epoch) and formats it as a HTTP
900Date (RFC 2616).
901
902=item $timestamp = AnyEvent::HTTP::parse_date $date
903
904Takes a HTTP Date (RFC 2616) and returns the corresponding POSIX
905timestamp, or C<undef> if the date cannot be parsed.
906
792=item $AnyEvent::HTTP::MAX_RECURSE 907=item $AnyEvent::HTTP::MAX_RECURSE
793 908
794The default value for the C<recurse> request parameter (default: C<10>). 909The default value for the C<recurse> request parameter (default: C<10>).
795 910
796=item $AnyEvent::HTTP::USERAGENT 911=item $AnyEvent::HTTP::USERAGENT
814connections. This number of can be useful for load-leveling. 929connections. This number of can be useful for load-leveling.
815 930
816=back 931=back
817 932
818=cut 933=cut
934
935our @month = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
936our @weekday = qw(Sun Mon Tue Wed Thu Fri Sat);
937
938sub format_date($) {
939 my ($time) = @_;
940
941 # RFC 822/1123 format
942 my ($S, $M, $H, $mday, $mon, $year, $wday, $yday, undef) = gmtime $time;
943
944 sprintf "%s, %02d %s %04d %02d:%02d:%02d GMT",
945 $weekday[$wday], $mday, $month[$mon], $year + 1900,
946 $H, $M, $S;
947}
948
949sub parse_date($) {
950 my ($date) = @_;
951
952 my ($d, $m, $y, $H, $M, $S);
953
954 if ($date =~ /^[A-Z][a-z][a-z], ([0-9][0-9]) ([A-Z][a-z][a-z]) ([0-9][0-9][0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
955 # RFC 822/1123, required by RFC 2616
956 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3, $4, $5, $6);
957
958 } elsif ($date =~ /^[A-Z][a-z]+, ([0-9][0-9])-([A-Z][a-z][a-z])-([0-9][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) GMT$/) {
959 # RFC 850
960 ($d, $m, $y, $H, $M, $S) = ($1, $2, $3 < 69 ? $3 + 2000 : $3 + 1900, $4, $5, $6);
961
962 } elsif ($date =~ /^[A-Z][a-z][a-z] ([A-Z][a-z][a-z]) ([0-9 ][0-9]) ([0-9][0-9]):([0-9][0-9]):([0-9][0-9]) ([0-9][0-9][0-9][0-9])$/) {
963 # ISO C's asctime
964 ($d, $m, $y, $H, $M, $S) = ($2, $1, $6, $3, $4, $5);
965 }
966 # other formats fail in the loop below
967
968 for (0..11) {
969 if ($m eq $month[$_]) {
970 require Time::Local;
971 return Time::Local::timegm ($S, $M, $H, $d, $_, $y);
972 }
973 }
974
975 undef
976}
819 977
820sub set_proxy($) { 978sub set_proxy($) {
821 if (length $_[0]) { 979 if (length $_[0]) {
822 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix 980 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix
823 or Carp::croak "$_[0]: invalid proxy URL"; 981 or Carp::croak "$_[0]: invalid proxy URL";
830# initialise proxy from environment 988# initialise proxy from environment
831eval { 989eval {
832 set_proxy $ENV{http_proxy}; 990 set_proxy $ENV{http_proxy};
833}; 991};
834 992
993=head2 SOCKS PROXIES
994
995Socks proxies are not directly supported by AnyEvent::HTTP. You can
996compile your perl to support socks, or use an external program such as
997F<socksify> (dante) or F<tsocks> to make your program use a socks proxy
998transparently.
999
1000Alternatively, for AnyEvent::HTTP only, you can use your own
1001C<tcp_connect> function that does the proxy handshake - here is an example
1002that works with socks4a proxies:
1003
1004 use Errno;
1005 use AnyEvent::Util;
1006 use AnyEvent::Socket;
1007 use AnyEvent::Handle;
1008
1009 # host, port and username of/for your socks4a proxy
1010 my $socks_host = "10.0.0.23";
1011 my $socks_port = 9050;
1012 my $socks_user = "";
1013
1014 sub socks4a_connect {
1015 my ($host, $port, $connect_cb, $prepare_cb) = @_;
1016
1017 my $hdl = new AnyEvent::Handle
1018 connect => [$socks_host, $socks_port],
1019 on_prepare => sub { $prepare_cb->($_[0]{fh}) },
1020 on_error => sub { $connect_cb->() },
1021 ;
1022
1023 $hdl->push_write (pack "CCnNZ*Z*", 4, 1, $port, 1, $socks_user, $host);
1024
1025 $hdl->push_read (chunk => 8, sub {
1026 my ($hdl, $chunk) = @_;
1027 my ($status, $port, $ipn) = unpack "xCna4", $chunk;
1028
1029 if ($status == 0x5a) {
1030 $connect_cb->($hdl->{fh}, (format_address $ipn) . ":$port");
1031 } else {
1032 $! = Errno::ENXIO; $connect_cb->();
1033 }
1034 });
1035
1036 $hdl
1037 }
1038
1039Use C<socks4a_connect> instead of C<tcp_connect> when doing C<http_request>s,
1040possibly after switching off other proxy types:
1041
1042 AnyEvent::HTTP::set_proxy undef; # usually you do not want other proxies
1043
1044 http_get 'http://www.google.com', tcp_connect => \&socks4a_connect, sub {
1045 my ($data, $headers) = @_;
1046 ...
1047 };
1048
835=head1 SEE ALSO 1049=head1 SEE ALSO
836 1050
837L<AnyEvent>. 1051L<AnyEvent>.
838 1052
839=head1 AUTHOR 1053=head1 AUTHOR

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines