ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-HTTP/HTTP.pm
(Generate patch)

Comparing AnyEvent-HTTP/HTTP.pm (file contents):
Revision 1.49 by root, Wed Aug 5 16:23:48 2009 UTC vs.
Revision 1.58 by root, Sun Nov 14 20:23:00 2010 UTC

41use strict; 41use strict;
42no warnings; 42no warnings;
43 43
44use Errno (); 44use Errno ();
45 45
46use AnyEvent 4.8 (); 46use AnyEvent 5.0 ();
47use AnyEvent::Util (); 47use AnyEvent::Util ();
48use AnyEvent::Socket (); 48use AnyEvent::Socket ();
49use AnyEvent::Handle (); 49use AnyEvent::Handle ();
50 50
51use base Exporter::; 51use base Exporter::;
52 52
53our $VERSION = '1.42'; 53our $VERSION = '1.46';
54 54
55our @EXPORT = qw(http_get http_post http_head http_request); 55our @EXPORT = qw(http_get http_post http_head http_request);
56 56
57our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)"; 57our $USERAGENT = "Mozilla/5.0 (compatible; U; AnyEvent-HTTP/$VERSION; +http://software.schmorp.de/pkg/AnyEvent)";
58our $MAX_RECURSE = 10; 58our $MAX_RECURSE = 10;
92must be an absolute http or https URL. 92must be an absolute http or https URL.
93 93
94When called in void context, nothing is returned. In other contexts, 94When called in void context, nothing is returned. In other contexts,
95C<http_request> returns a "cancellation guard" - you have to keep the 95C<http_request> returns a "cancellation guard" - you have to keep the
96object at least alive until the callback get called. If the object gets 96object at least alive until the callback get called. If the object gets
97destroyed before the callbakc is called, the request will be cancelled. 97destroyed before the callback is called, the request will be cancelled.
98 98
99The callback will be called with the response body data as first argument 99The callback will be called with the response body data as first argument
100(or C<undef> if an error occured), and a hash-ref with response headers as 100(or C<undef> if an error occured), and a hash-ref with response headers as
101second argument. 101second argument.
102 102
103All the headers in that hash are lowercased. In addition to the response 103All the headers in that hash are lowercased. In addition to the response
104headers, the "pseudo-headers" C<HTTPVersion>, C<Status> and C<Reason> 104headers, the "pseudo-headers" (uppercase to avoid clashing with possible
105response headers) C<HTTPVersion>, C<Status> and C<Reason> contain the
105contain the three parts of the HTTP Status-Line of the same name. The 106three parts of the HTTP Status-Line of the same name.
107
106pseudo-header C<URL> contains the original URL (which can differ from the 108The pseudo-header C<URL> contains the actual URL (which can differ from
107requested URL when following redirects). 109the requested URL when following redirects - for example, you might get
110an error that your URL scheme is not supported even though your URL is a
111valid http URL because it redirected to an ftp URL, in which case you can
112look at the URL pseudo header).
113
114The pseudo-header C<Redirect> only exists when the request was a result
115of an internal redirect. In that case it is an array reference with
116the C<($data, $headers)> from the redirect response. Note that this
117response could in turn be the result of a redirect itself, and C<<
118$headers->{Redirect}[1]{Redirect} >> will then contain the original
119response, and so on.
108 120
109If the server sends a header multiple times, then their contents will be 121If the server sends a header multiple times, then their contents will be
110joined together with a comma (C<,>), as per the HTTP spec. 122joined together with a comma (C<,>), as per the HTTP spec.
111 123
112If an internal error occurs, such as not being able to resolve a hostname, 124If an internal error occurs, such as not being able to resolve a hostname,
145sent at all). 157sent at all).
146 158
147=item timeout => $seconds 159=item timeout => $seconds
148 160
149The time-out to use for various stages - each connect attempt will reset 161The time-out to use for various stages - each connect attempt will reset
150the timeout, as will read or write activity. Default timeout is 5 minutes. 162the timeout, as will read or write activity, i.e. this is not an overall
163timeout.
164
165Default timeout is 5 minutes.
151 166
152=item proxy => [$host, $port[, $scheme]] or undef 167=item proxy => [$host, $port[, $scheme]] or undef
153 168
154Use the given http proxy for all requests. If not specified, then the 169Use the given http proxy for all requests. If not specified, then the
155default proxy (as specified by C<$ENV{http_proxy}>) is used. 170default proxy (as specified by C<$ENV{http_proxy}>) is used.
187verification, highest compatibility) and high-security (CA and common-name 202verification, highest compatibility) and high-security (CA and common-name
188verification) TLS context. 203verification) TLS context.
189 204
190The default for this option is C<low>, which could be interpreted as "give 205The default for this option is C<low>, which could be interpreted as "give
191me the page, no matter what". 206me the page, no matter what".
207
208=item on_prepare => $callback->($fh)
209
210In rare cases you need to "tune" the socket before it is used to
211connect (for exmaple, to bind it on a given IP address). This parameter
212overrides the prepare callback passed to C<AnyEvent::Socket::tcp_connect>
213and behaves exactly the same way (e.g. it has to provide a
214timeout). See the description for the C<$prepare_cb> argument of
215C<AnyEvent::Socket::tcp_connect> for details.
192 216
193=item on_header => $callback->($headers) 217=item on_header => $callback->($headers)
194 218
195When specified, this callback will be called with the header hash as soon 219When specified, this callback will be called with the header hash as soon
196as headers have been successfully received from the remote server (not on 220as headers have been successfully received from the remote server (not on
314 push @{ $CO_SLOT{$_[0]}[1] }, $_[1]; 338 push @{ $CO_SLOT{$_[0]}[1] }, $_[1];
315 339
316 _slot_schedule $_[0]; 340 _slot_schedule $_[0];
317} 341}
318 342
319our $qr_nl = qr{\015?\012};
320our $qr_nlnl = qr{(?<![^\012])\015?\012}; 343our $qr_nlnl = qr{(?<![^\012])\015?\012};
321 344
322our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 }; 345our $TLS_CTX_LOW = { cache => 1, sslv2 => 1 };
323our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" }; 346our $TLS_CTX_HIGH = { cache => 1, verify => 1, verify_peername => "https" };
324 347
337 while (my ($k, $v) = each %$hdr) { 360 while (my ($k, $v) = each %$hdr) {
338 $hdr{lc $k} = $v; 361 $hdr{lc $k} = $v;
339 } 362 }
340 } 363 }
341 364
365 # pseudo headers for all subsequent responses
366 my @pseudo = (URL => $url);
367 push @pseudo, Redirect => delete $arg{Redirect} if exists $arg{Redirect};
368
342 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE; 369 my $recurse = exists $arg{recurse} ? delete $arg{recurse} : $MAX_RECURSE;
343 370
344 return $cb->(undef, { Status => 599, Reason => "Too many redirections", URL => $url }) 371 return $cb->(undef, { Status => 599, Reason => "Too many redirections", @pseudo })
345 if $recurse < 0; 372 if $recurse < 0;
346 373
347 my $proxy = $arg{proxy} || $PROXY; 374 my $proxy = $arg{proxy} || $PROXY;
348 my $timeout = $arg{timeout} || $TIMEOUT; 375 my $timeout = $arg{timeout} || $TIMEOUT;
349 376
350 my ($uscheme, $uauthority, $upath, $query, $fragment) = 377 my ($uscheme, $uauthority, $upath, $query, $fragment) =
351 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|; 378 $url =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:(\?[^#]*))?(?:#(.*))?|;
352 379
353 $uscheme = lc $uscheme; 380 $uscheme = lc $uscheme;
354 381
355 my $uport = $uscheme eq "http" ? 80 382 my $uport = $uscheme eq "http" ? 80
356 : $uscheme eq "https" ? 443 383 : $uscheme eq "https" ? 443
357 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", URL => $url }); 384 : return $cb->(undef, { Status => 599, Reason => "Only http and https URL schemes supported", @pseudo });
358 385
359 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x 386 $uauthority =~ /^(?: .*\@ )? ([^\@:]+) (?: : (\d+) )?$/x
360 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", URL => $url }); 387 or return $cb->(undef, { Status => 599, Reason => "Unparsable URL", @pseudo });
361 388
362 my $uhost = $1; 389 my $uhost = $1;
363 $uport = $2 if defined $2; 390 $uport = $2 if defined $2;
364 391
365 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"; 392 $hdr{host} = defined $2 ? "$uhost:$2" : "$uhost"
393 unless exists $hdr{host};
366 394
367 $uhost =~ s/^\[(.*)\]$/$1/; 395 $uhost =~ s/^\[(.*)\]$/$1/;
368 $upath .= "?$query" if length $query; 396 $upath .= $query if length $query;
369 397
370 $upath =~ s%^/?%/%; 398 $upath =~ s%^/?%/%;
371 399
372 # cookie processing 400 # cookie processing
373 if (my $jar = $arg{cookie_jar}) { 401 if (my $jar = $arg{cookie_jar}) {
416 444
417 # leave out fragment and query string, just a heuristic 445 # leave out fragment and query string, just a heuristic
418 $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer}; 446 $hdr{referer} ||= "$uscheme://$uauthority$upath" unless exists $hdr{referer};
419 $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"}; 447 $hdr{"user-agent"} ||= $USERAGENT unless exists $hdr{"user-agent"};
420 448
421 $hdr{"content-length"} = length $arg{body}; 449 $hdr{"content-length"} = length $arg{body}
450 if length $arg{body} || $method ne "GET";
422 451
423 my %state = (connect_guard => 1); 452 my %state = (connect_guard => 1);
424 453
425 _get_slot $uhost, sub { 454 _get_slot $uhost, sub {
426 $state{slot_guard} = shift; 455 $state{slot_guard} = shift;
430 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub { 459 $state{connect_guard} = AnyEvent::Socket::tcp_connect $rhost, $rport, sub {
431 $state{fh} = shift 460 $state{fh} = shift
432 or do { 461 or do {
433 my $err = "$!"; 462 my $err = "$!";
434 %state = (); 463 %state = ();
435 return $cb->(undef, { Status => 599, Reason => $err, URL => $url }); 464 return $cb->(undef, { Status => 599, Reason => $err, @pseudo });
436 }; 465 };
437 466
438 pop; # free memory, save a tree 467 pop; # free memory, save a tree
439 468
440 return unless delete $state{connect_guard}; 469 return unless delete $state{connect_guard};
441 470
442 # get handle 471 # get handle
443 $state{handle} = new AnyEvent::Handle 472 $state{handle} = new AnyEvent::Handle
444 fh => $state{fh}, 473 fh => $state{fh},
474 peername => $rhost,
475 tls_ctx => $arg{tls_ctx},
476 # these need to be reconfigured on keepalive handles
445 timeout => $timeout, 477 timeout => $timeout,
446 peername => $rhost, 478 on_error => sub {
447 tls_ctx => $arg{tls_ctx}; 479 %state = ();
480 $cb->(undef, { Status => 599, Reason => $_[2], @pseudo });
481 },
482 on_eof => sub {
483 %state = ();
484 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", @pseudo });
485 },
486 ;
448 487
449 # limit the number of persistent connections 488 # limit the number of persistent connections
450 # keepalive not yet supported 489 # keepalive not yet supported
451 if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) { 490# if ($KA_COUNT{$_[1]} < $MAX_PERSISTENT_PER_HOST) {
452 ++$KA_COUNT{$_[1]}; 491# ++$KA_COUNT{$_[1]};
453 $state{handle}{ka_count_guard} = AnyEvent::Util::guard { 492# $state{handle}{ka_count_guard} = AnyEvent::Util::guard {
454 --$KA_COUNT{$_[1]} 493# --$KA_COUNT{$_[1]}
455 }; 494# };
456 $hdr{connection} = "keep-alive"; 495# $hdr{connection} = "keep-alive";
457 } else { 496# } else {
458 delete $hdr{connection}; 497 delete $hdr{connection};
459 } 498# }
460
461 # (re-)configure handle
462 $state{handle}->on_error (sub {
463 %state = ();
464 $cb->(undef, { Status => 599, Reason => $_[2], URL => $url });
465 });
466 $state{handle}->on_eof (sub {
467 %state = ();
468 $cb->(undef, { Status => 599, Reason => "Unexpected end-of-file", URL => $url });
469 });
470 499
471 $state{handle}->starttls ("connect") if $rscheme eq "https"; 500 $state{handle}->starttls ("connect") if $rscheme eq "https";
472 501
473 # handle actual, non-tunneled, request 502 # handle actual, non-tunneled, request
474 my $handle_actual_request = sub { 503 my $handle_actual_request = sub {
480 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr) 509 . (join "", map "\u$_: $hdr{$_}\015\012", grep defined $hdr{$_}, keys %hdr)
481 . "\015\012" 510 . "\015\012"
482 . (delete $arg{body}) 511 . (delete $arg{body})
483 ); 512 );
484 513
514 # return if error occured during push_write()
515 return unless %state;
516
485 %hdr = (); # reduce memory usage, save a kitten 517 %hdr = (); # reduce memory usage, save a kitten, also make it possible to re-use
486 518
487 # status line 519 # status line and headers
488 $state{handle}->push_read (line => $qr_nl, sub { 520 $state{handle}->push_read (line => $qr_nlnl, sub {
489 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
490 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response ($_[1])", URL => $url }));
491
492 my %hdr = ( # response headers
493 HTTPVersion => ",$1",
494 Status => ",$2",
495 Reason => ",$3",
496 URL => ",$url"
497 );
498
499 # headers, could be optimized a bit
500 $state{handle}->unshift_read (line => $qr_nlnl, sub {
501 for ("$_[1]") { 521 for ("$_[1]") {
502 y/\015//d; # weed out any \015, as they show up in the weirdest of places. 522 y/\015//d; # weed out any \015, as they show up in the weirdest of places.
503 523
524 /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )? \015?\012/igxc
525 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid server response", @pseudo }));
526
527 push @pseudo,
528 HTTPVersion => $1,
529 Status => $2,
530 Reason => $3,
531 ;
532
504 # things seen, not parsed: 533 # things seen, not parsed:
505 # p3pP="NON CUR OTPi OUR NOR UNI" 534 # p3pP="NON CUR OTPi OUR NOR UNI"
506 535
507 $hdr{lc $1} .= ",$2" 536 $hdr{lc $1} .= ",$2"
508 while /\G 537 while /\G
509 ([^:\000-\037]*): 538 ([^:\000-\037]*):
510 [\011\040]* 539 [\011\040]*
511 ((?: [^\012]+ | \012[\011\040] )*) 540 ((?: [^\012]+ | \012[\011\040] )*)
512 \012 541 \012
513 /gxc; 542 /gxc;
514 543
515 /\G$/ 544 /\G$/
516 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", URL => $url })); 545 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Garbled response headers", @pseudo }));
546 }
547
548 # remove the "," prefix we added to all headers above
549 substr $_, 0, 1, ""
550 for values %hdr;
551
552 # patch in all pseudo headers
553 %hdr = (%hdr, @pseudo);
554
555 # redirect handling
556 # microsoft and other shitheads don't give a shit for following standards,
557 # try to support some common forms of broken Location headers.
558 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
559 $hdr{location} =~ s/^\.\/+//;
560
561 my $url = "$rscheme://$uhost:$uport";
562
563 unless ($hdr{location} =~ s/^\///) {
564 $url .= $upath;
565 $url =~ s/\/[^\/]*$//;
517 } 566 }
518 567
519 substr $_, 0, 1, ""
520 for values %hdr;
521
522 # redirect handling
523 # microsoft and other shitheads don't give a shit for following standards,
524 # try to support some common forms of broken Location headers.
525 if ($hdr{location} !~ /^(?: $ | [^:\/?\#]+ : )/x) {
526 $hdr{location} =~ s/^\.\/+//;
527
528 my $url = "$rscheme://$uhost:$uport";
529
530 unless ($hdr{location} =~ s/^\///) {
531 $url .= $upath;
532 $url =~ s/\/[^\/]*$//;
533 }
534
535 $hdr{location} = "$url/$hdr{location}"; 568 $hdr{location} = "$url/$hdr{location}";
569 }
570
571 my $redirect;
572
573 if ($recurse) {
574 my $status = $hdr{Status};
575
576 # industry standard is to redirect POST as GET for
577 # 301, 302 and 303, in contrast to http/1.0 and 1.1.
578 # also, the UA should ask the user for 301 and 307 and POST,
579 # industry standard seems to be to simply follow.
580 # we go with the industry standard.
581 if ($status == 301 or $status == 302 or $status == 303) {
582 # HTTP/1.1 is unclear on how to mutate the method
583 $method = "GET" unless $method eq "HEAD";
584 $redirect = 1;
585 } elsif ($status == 307) {
586 $redirect = 1;
536 } 587 }
588 }
537 589
538 my $redirect; 590 my $finish = sub {
591 $state{handle}->destroy if $state{handle};
592 %state = ();
539 593
594 # set-cookie processing
540 if ($recurse) { 595 if ($arg{cookie_jar}) {
541 if ($hdr{Status} =~ /^30[12]$/ && $method ne "POST") { 596 for ($_[1]{"set-cookie"}) {
542 # apparently, mozilla et al. just change POST to GET here 597 # parse NAME=VALUE
543 # more research is needed before we do the same 598 my @kv;
599
600 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
544 $redirect = 1; 601 my $name = $1;
545 } elsif ($hdr{Status} == 303) { 602 my $value = $3;
546 # even http/1.1 is unclear on how to mutate the method 603
547 $method = "GET" unless $method eq "HEAD"; 604 unless ($value) {
605 $value = $2;
606 $value =~ s/\\(.)/$1/gs;
607 }
608
609 push @kv, $name => $value;
610
611 last unless /\G\s*;/gc;
612 }
613
614 last unless @kv;
615
616 my $name = shift @kv;
617 my %kv = (value => shift @kv, @kv);
618
619 my $cdom;
620 my $cpath = (delete $kv{path}) || "/";
621
622 if (exists $kv{domain}) {
623 $cdom = delete $kv{domain};
624
625 $cdom =~ s/^\.?/./; # make sure it starts with a "."
626
627 next if $cdom =~ /\.$/;
628
629 # this is not rfc-like and not netscape-like. go figure.
630 my $ndots = $cdom =~ y/.//;
631 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
632 } else {
633 $cdom = $uhost;
634 }
635
548 $redirect = 1; 636 # store it
549 } elsif ($hdr{Status} == 307 && $method =~ /^(?:GET|HEAD)$/) { 637 $arg{cookie_jar}{version} = 1;
550 $redirect = 1; 638 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
639
640 redo if /\G\s*,/gc;
551 } 641 }
552 } 642 }
553 643
554 my $finish = sub { 644 if ($redirect && exists $hdr{location}) {
555 $state{handle}->destroy if $state{handle}; 645 # we ignore any errors, as it is very common to receive
556 %state = (); 646 # Content-Length != 0 but no actual body
557 647 # we also access %hdr, as $_[1] might be an erro
558 # set-cookie processing 648 http_request (
559 if ($arg{cookie_jar}) { 649 $method => $hdr{location},
560 for ($_[1]{"set-cookie"}) {
561 # parse NAME=VALUE
562 my @kv;
563
564 while (/\G\s* ([^=;,[:space:]]+) \s*=\s* (?: "((?:[^\\"]+|\\.)*)" | ([^=;,[:space:]]*) )/gcxs) {
565 my $name = $1;
566 my $value = $3;
567
568 unless ($value) {
569 $value = $2;
570 $value =~ s/\\(.)/$1/gs;
571 } 650 %arg,
651 recurse => $recurse - 1,
652 Redirect => \@_,
653 $cb);
654 } else {
655 $cb->($_[0], $_[1]);
656 }
657 };
572 658
573 push @kv, $name => $value; 659 my $len = $hdr{"content-length"};
574 660
575 last unless /\G\s*;/gc; 661 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
662 $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", @pseudo });
663 } elsif (
664 $hdr{Status} =~ /^(?:1..|[23]04)$/
665 or $method eq "HEAD"
666 or (defined $len && !$len)
667 ) {
668 # no body
669 $finish->("", \%hdr);
670 } else {
671 # body handling, four different code paths
672 # for want_body_handle, on_body (2x), normal (2x)
673 # we might read too much here, but it does not matter yet (no pers. connections)
674 if (!$redirect && $arg{want_body_handle}) {
675 $_[0]->on_eof (undef);
676 $_[0]->on_error (undef);
677 $_[0]->on_read (undef);
678
679 $finish->(delete $state{handle}, \%hdr);
680
681 } elsif ($arg{on_body}) {
682 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) });
683 if ($len) {
684 $_[0]->on_eof (undef);
685 $_[0]->on_read (sub {
686 $len -= length $_[0]{rbuf};
687
688 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
689 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo });
690
576 } 691 $len > 0
577 692 or $finish->("", \%hdr);
578 last unless @kv;
579
580 my $name = shift @kv;
581 my %kv = (value => shift @kv, @kv);
582
583 my $cdom;
584 my $cpath = (delete $kv{path}) || "/";
585
586 if (exists $kv{domain}) {
587 $cdom = delete $kv{domain};
588
589 $cdom =~ s/^\.?/./; # make sure it starts with a "."
590
591 next if $cdom =~ /\.$/;
592
593 # this is not rfc-like and not netscape-like. go figure.
594 my $ndots = $cdom =~ y/.//;
595 next if $ndots < ($cdom =~ /\.[^.][^.]\.[^.][^.]$/ ? 3 : 2);
596 } else {
597 $cdom = $uhost;
598 }
599
600 # store it
601 $arg{cookie_jar}{version} = 1;
602 $arg{cookie_jar}{$cdom}{$cpath}{$name} = \%kv;
603
604 redo if /\G\s*,/gc;
605 } 693 });
694 } else {
695 $_[0]->on_eof (sub {
696 $finish->("", \%hdr);
697 });
698 $_[0]->on_read (sub {
699 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
700 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", @pseudo });
701 });
606 } 702 }
703 } else {
704 $_[0]->on_eof (undef);
607 705
608 if ($redirect && exists $hdr{location}) { 706 if ($len) {
609 # we ignore any errors, as it is very common to receive 707 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], @pseudo }) });
610 # Content-Length != 0 but no actual body 708 $_[0]->on_read (sub {
611 # we also access %hdr, as $_[1] might be an erro 709 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
612 http_request ($method => $hdr{location}, %arg, recurse => $recurse - 1, $cb); 710 if $len <= length $_[0]{rbuf};
711 });
613 } else { 712 } else {
614 $cb->($_[0], $_[1]);
615 }
616 };
617
618 my $len = $hdr{"content-length"};
619
620 if (!$redirect && $arg{on_header} && !$arg{on_header}(\%hdr)) {
621 $finish->(undef, { Status => 598, Reason => "Request cancelled by on_header", URL => $url });
622 } elsif (
623 $hdr{Status} =~ /^(?:1..|[23]04)$/
624 or $method eq "HEAD"
625 or (defined $len && !$len)
626 ) {
627 # no body
628 $finish->("", \%hdr);
629 } else {
630 # body handling, four different code paths
631 # for want_body_handle, on_body (2x), normal (2x)
632 # we might read too much here, but it does not matter yet (no pers. connections)
633 if (!$redirect && $arg{want_body_handle}) {
634 $_[0]->on_eof (undef);
635 $_[0]->on_error (undef);
636 $_[0]->on_read (undef);
637
638 $finish->(delete $state{handle}, \%hdr);
639
640 } elsif ($arg{on_body}) {
641 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
642 if ($len) {
643 $_[0]->on_eof (undef);
644 $_[0]->on_read (sub {
645 $len -= length $_[0]{rbuf};
646
647 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
648 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
649
650 $len > 0
651 or $finish->("", \%hdr);
652 });
653 } else {
654 $_[0]->on_eof (sub {
655 $finish->("", \%hdr);
656 });
657 $_[0]->on_read (sub {
658 $arg{on_body}(delete $_[0]{rbuf}, \%hdr)
659 or $finish->(undef, { Status => 598, Reason => "Request cancelled by on_body", URL => $url });
660 });
661 }
662 } else {
663 $_[0]->on_eof (undef);
664
665 if ($len) {
666 $_[0]->on_error (sub { $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }) });
667 $_[0]->on_read (sub {
668 $finish->((substr delete $_[0]{rbuf}, 0, $len, ""), \%hdr)
669 if $len <= length $_[0]{rbuf};
670 });
671 } else {
672 $_[0]->on_error (sub { 713 $_[0]->on_error (sub {
673 $! == Errno::EPIPE || !$! 714 ($! == Errno::EPIPE || !$!)
674 ? $finish->(delete $_[0]{rbuf}, \%hdr) 715 ? $finish->(delete $_[0]{rbuf}, \%hdr)
675 : $finish->(undef, { Status => 599, Reason => $_[2], URL => $url }); 716 : $finish->(undef, { Status => 599, Reason => $_[2], @pseudo });
676 }); 717 });
677 $_[0]->on_read (sub { }); 718 $_[0]->on_read (sub { });
678 }
679 } 719 }
680 } 720 }
681 }); 721 }
682 }); 722 });
683 }; 723 };
684 724
685 # now handle proxy-CONNECT method 725 # now handle proxy-CONNECT method
686 if ($proxy && $uscheme eq "https") { 726 if ($proxy && $uscheme eq "https") {
688 728
689 # maybe re-use $uauthority with patched port? 729 # maybe re-use $uauthority with patched port?
690 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012"); 730 $state{handle}->push_write ("CONNECT $uhost:$uport HTTP/1.0\015\012Host: $uhost\015\012\015\012");
691 $state{handle}->push_read (line => $qr_nlnl, sub { 731 $state{handle}->push_read (line => $qr_nlnl, sub {
692 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix 732 $_[1] =~ /^HTTP\/([0-9\.]+) \s+ ([0-9]{3}) (?: \s+ ([^\015\012]*) )?/ix
693 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", URL => $url })); 733 or return (%state = (), $cb->(undef, { Status => 599, Reason => "Invalid proxy connect response ($_[1])", @pseudo }));
694 734
695 if ($2 == 200) { 735 if ($2 == 200) {
696 $rpath = $upath; 736 $rpath = $upath;
697 &$handle_actual_request; 737 &$handle_actual_request;
698 } else { 738 } else {
699 %state = (); 739 %state = ();
700 $cb->(undef, { Status => $2, Reason => $3, URL => $url }); 740 $cb->(undef, { Status => $2, Reason => $3, @pseudo });
701 } 741 }
702 }); 742 });
703 } else { 743 } else {
704 &$handle_actual_request; 744 &$handle_actual_request;
705 } 745 }
706 746
707 }, sub { 747 }, $arg{on_prepare} || sub { $timeout };
708 $timeout
709 };
710 }; 748 };
711 749
712 defined wantarray && AnyEvent::Util::guard { %state = () } 750 defined wantarray && AnyEvent::Util::guard { %state = () }
713} 751}
714 752
728 &http_request 766 &http_request
729} 767}
730 768
731=back 769=back
732 770
771=head2 DNS CACHING
772
773AnyEvent::HTTP uses the AnyEvent::Socket::tcp_connect function for
774the actual connection, which in turn uses AnyEvent::DNS to resolve
775hostnames. The latter is a simple stub resolver and does no caching
776on its own. If you want DNS caching, you currently have to provide
777your own default resolver (by storing a suitable resolver object in
778C<$AnyEvent::DNS::RESOLVER>).
779
733=head2 GLOBAL FUNCTIONS AND VARIABLES 780=head2 GLOBAL FUNCTIONS AND VARIABLES
734 781
735=over 4 782=over 4
736 783
737=item AnyEvent::HTTP::set_proxy "proxy-url" 784=item AnyEvent::HTTP::set_proxy "proxy-url"
738 785
739Sets the default proxy server to use. The proxy-url must begin with a 786Sets the default proxy server to use. The proxy-url must begin with a
740string of the form C<http://host:port> (optionally C<https:...>). 787string of the form C<http://host:port> (optionally C<https:...>), croaks
788otherwise.
789
790To clear an already-set proxy, use C<undef>.
741 791
742=item $AnyEvent::HTTP::MAX_RECURSE 792=item $AnyEvent::HTTP::MAX_RECURSE
743 793
744The default value for the C<recurse> request parameter (default: C<10>). 794The default value for the C<recurse> request parameter (default: C<10>).
745 795
766=back 816=back
767 817
768=cut 818=cut
769 819
770sub set_proxy($) { 820sub set_proxy($) {
821 if (length $_[0]) {
771 $PROXY = [$2, $3 || 3128, $1] if $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix; 822 $_[0] =~ m%^(https?):// ([^:/]+) (?: : (\d*) )?%ix
823 or Carp::croak "$_[0]: invalid proxy URL";
824 $PROXY = [$2, $3 || 3128, $1]
825 } else {
826 undef $PROXY;
827 }
772} 828}
773 829
774# initialise proxy from environment 830# initialise proxy from environment
831eval {
775set_proxy $ENV{http_proxy}; 832 set_proxy $ENV{http_proxy};
833};
776 834
777=head1 SEE ALSO 835=head1 SEE ALSO
778 836
779L<AnyEvent>. 837L<AnyEvent>.
780 838

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines