ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
(Generate patch)

Comparing AnyEvent/lib/AnyEvent/Intro.pod (file contents):
Revision 1.10 by root, Mon Jun 2 06:04:08 2008 UTC vs.
Revision 1.11 by root, Mon Jun 2 09:10:38 2008 UTC

52 52
53L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either 53L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
54 54
55=over 4 55=over 4
56 56
57=item write their own event loop (because guarantees to offer one 57=item - write their own event loop (because guarantees to offer one
58everywhere - even on windows). 58everywhere - even on windows).
59 59
60=item choose one fixed event loop (because AnyEvent works with all 60=item - choose one fixed event loop (because AnyEvent works with all
61important event loops available for Perl, and adding others is trivial). 61important event loops available for Perl, and adding others is trivial).
62 62
63=back 63=back
64 64
65If the module author uses L<AnyEvent> for all his event needs (IO events, 65If the module author uses L<AnyEvent> for all his event needs (IO events,
442successful? That unsuccessful TCP connects might never be reported back 442successful? That unsuccessful TCP connects might never be reported back
443to your program? That C<WSAEINPROGRESS> means your C<connect> call was 443to your program? That C<WSAEINPROGRESS> means your C<connect> call was
444ignored instead of being in progress? AnyEvent::Socket works around all of 444ignored instead of being in progress? AnyEvent::Socket works around all of
445these Windows/Perl bugs for you). 445these Windows/Perl bugs for you).
446 446
447=head2 First experiments with non-blocking connects: a parallel finger 447=head2 Implementing a parallel finger client with non-blocking connects
448client.
449 448
450The finger protocol is one of the simplest protocols in use on the 449The finger protocol is one of the simplest protocols in use on the
451internet. Or in use in the past, as almost nobody uses it anymore. 450internet. Or in use in the past, as almost nobody uses it anymore.
452 451
453It works by connecting to the finger port on another host, writing a 452It works by connecting to the finger port on another host, writing a
466 Now on the web: 465 Now on the web:
467 [...] 466 [...]
468 467
469 Connection closed by foreign host. 468 Connection closed by foreign host.
470 469
471Yeah, I<was> used indeed, but at least the finger daemon still works, so 470"Now on the web..." yeah, I<was> used indeed, but at least the finger
472let's write a little AnyEvent function that makes a finger request: 471daemon still works, so let's write a little AnyEvent function that makes a
472finger request:
473 473
474 use AnyEvent; 474 use AnyEvent;
475 use AnyEvent::Socket; 475 use AnyEvent::Socket;
476 476
477 sub finger($$) { 477 sub finger($$) {
509 509
510 # pass $cv to the caller 510 # pass $cv to the caller
511 $cv 511 $cv
512 } 512 }
513 513
514That's a mouthful! Let's dissect this function a bit, first the overall function: 514That's a mouthful! Let's dissect this function a bit, first the overall
515function and execution flow:
515 516
516 sub finger($$) { 517 sub finger($$) {
517 my ($user, $host) = @_; 518 my ($user, $host) = @_;
518 519
519 # use a condvar to return results 520 # use a condvar to return results
525 }; 526 };
526 527
527 $cv 528 $cv
528 } 529 }
529 530
530This isn't too complicated, just a function with two parameters, which 531This isn't too complicated, just a function with two parameters, that
531creates a condition variable, returns it, and while it does that, 532creates a condition variable, returns it, and while it does that,
532initiates a TCP connect to C<$host>. The condition variable 533initiates a TCP connect to C<$host>. The condition variable will be used
533will be used by the caller to receive the finger response. 534by the caller to receive the finger response, but one could equally well
535pass a third argument, a callback, to the function.
534 536
535Since we are event-based programmers, we do not wait for the connect to 537Since we are programming event'ish, we do not wait for the connect to
536finish - it could block your program for a minute or longer! Instead, 538finish - it could block the program for a minute or longer!
539
537we pass the callback it should invoke when the connect is done to 540Instead, we pass the callback it should invoke when the connect is done to
538C<tcp_connect>. If it is successful, our callback gets called with the 541C<tcp_connect>. If it is successful, that callback gets called with the
539socket handle as first argument, otherwise, nothing will be passed to our 542socket handle as first argument, otherwise, nothing will be passed to our
540callback. 543callback. The important point is that it will always be called as soon as
544the outcome of the TCP connect is known.
541 545
546This style of programming is also called "continuation style": the
547"continuation" is simply the way the program continues - normally, a
548program continues at the next line after some statement (the exception
549is loops or things like C<return>). When we are interested in events,
550however, we instead specify the "continuation" of our program by passing a
551closure, which makes that closure the "continuation" of the program. The
552C<tcp_connect> call is like saying "return now, and when the connection is
553established or it failed, continue there".
554
542Let's look at our callback in more detail: 555Now let's look at the callback/closure in more detail:
543 556
544 # the callback gets the socket handle - or nothing 557 # the callback receives the socket handle - or nothing
545 my ($fh) = @_ 558 my ($fh) = @_
546 or return $cv->send; 559 or return $cv->send;
547 560
548The first thing the callback does is indeed save the socket handle in 561The first thing the callback does is indeed save the socket handle in
549C<$fh>. When there was an error (no arguments), then our instinct as 562C<$fh>. When there was an error (no arguments), then our instinct as
550expert Perl programmers would tell us to die: 563expert Perl programmers would tell us to C<die>:
551 564
552 my ($fh) = @_ 565 my ($fh) = @_
553 or die "$host: $!"; 566 or die "$host: $!";
554 567
555While this would give good feedback to the user, our program would 568While this would give good feedback to the user (if he happens to watch
556probably freeze here, as we never report the results to anybody, certainly 569standard error), our program would probably stop working here, as we never
557not the caller of our C<finger> function! 570report the results to anybody, certainly not the caller of our C<finger>
571function, and most event loops continue even after a C<die>!
558 572
559This is why we instead return, but also call C<< $cv->send >> without any 573This is why we instead C<return>, but also call C<< $cv->send >> without
560arguments to signal to our consumer that something bad has happened. The 574any arguments to signal to the condvar consumer that something bad has
561return value of C<< $cv->send >> is irrelevant, as is the return value of 575happened. The return value of C<< $cv->send >> is irrelevant, as is the
562our callback. The return statement is simply used for the side effect of, 576return value of our callback. The return statement is simply used for the
563well, returning immediately from the callback. 577side effect of, well, returning immediately from the callback. Checking
578for errors and handling them this way is very common, which is why this
579compact idiom is so handy.
564 580
565As the next step in the finger protocol, we send the username to the 581As the next step in the finger protocol, we send the username to the
566finger daemon on the other side of our connection: 582finger daemon on the other side of our connection:
567 583
568 syswrite $fh, "$user\015\012"; 584 syswrite $fh, "$user\015\012";
569 585
570Note that this isn't 100% clean - the socket could, for whatever reasons, 586Note that this isn't 100% clean socket programming - the socket could,
571not accept our data. When writing a small amount of data like in this 587for whatever reasons, not accept our data. When writing a small amount
572example it doesn't matter, but for real-world cases you might need to 588of data like in this example it doesn't matter, as a socket buffer is
573implement some kind of write buffering - or use L<AnyEvent::Handle>, which 589almost always big enough for a mere "username", but for real-world
574handles these matters for you. 590cases you might need to implement some kind of write buffering - or use
591L<AnyEvent::Handle>, which handles these matters for you, as shown in the
592next section.
575 593
576What we do have to do is to implement our own read buffer - the response 594What we I<do> have to do is to implement our own read buffer - the response
577data could arrive late or in multiple chunks, and we cannot just wait for 595data could arrive late or in multiple chunks, and we cannot just wait for
578it (event-based programming, you know?). 596it (event-based programming, you know?).
579 597
580To do that, we register a read watcher on the socket which waits for data: 598To do that, we register a read watcher on the socket which waits for data:
581 599
591To avoid that, we C<undef>ine the variable in the watcher callback. This 609To avoid that, we C<undef>ine the variable in the watcher callback. This
592means that, when the C<tcp_connect> callback returns, that perl thinks 610means that, when the C<tcp_connect> callback returns, that perl thinks
593(quite correctly) that the read watcher is still in use - namely in the 611(quite correctly) that the read watcher is still in use - namely in the
594callback. 612callback.
595 613
614The trick, however, is that instead of:
615
616 my $read_watcher = AnyEvent->io (...
617
618The program does:
619
620 my $read_watcher; $read_watcher = AnyEvent->io (...
621
622The reason for this is a quirk in the way Perl works: variable names
623declared with C<my> are only visible in the I<next> statement. If the
624whole C<< AnyEvent->io >> call, including the callback, would be done in
625a single statement, the callback could not refer to the C<$read_watcher>
626variable to undefine it, so it is done in two statements.
627
628Whether you'd want to format it like this is of course a matter of style,
629this way emphasizes that the declaration and assignment really are one
630logical statement.
631
596The callback itself calls C<sysread> for as many times as necessary, until 632The callback itself calls C<sysread> for as many times as necessary, until
597C<sysread> returns an error or end-of-file: 633C<sysread> returns either an error or end-of-file:
598 634
599 cb => sub { 635 cb => sub {
600 my $len = sysread $fh, $response, 1024, length $response; 636 my $len = sysread $fh, $response, 1024, length $response;
601 637
602 if ($len <= 0) { 638 if ($len <= 0) {
603 639
604Note that C<sysread> has the ability to append data it reads to a scalar, 640Note that C<sysread> has the ability to append data it reads to a scalar,
605which is what we make good use of in this example. 641by specifying an offset, which is what we make good use of in this
642example.
606 643
607When C<sysread> indicates we are done, the callback C<undef>ines 644When C<sysread> indicates we are done, the callback C<undef>ines
608the watcher and then C<send>'s the response data to the condition 645the watcher and then C<send>'s the response data to the condition
609variable. All this has the following effects: 646variable. All this has the following effects:
610 647
631 print "trouble ticket #1736:\n", $f2->recv, "\n"; 668 print "trouble ticket #1736:\n", $f2->recv, "\n";
632 print "john carmacks finger file: ", $f3->recv, "\n"; 669 print "john carmacks finger file: ", $f3->recv, "\n";
633 670
634It doesn't look like it, but in fact all three requests run in 671It doesn't look like it, but in fact all three requests run in
635parallel. The code waits for the first finger request to finish first, but 672parallel. The code waits for the first finger request to finish first, but
636that doesn't keep it from executing in parallel, because when the first 673that doesn't keep it from executing them parallel: when the first C<recv>
637C<recv> call sees that the data isn't ready yet, it serves events for all 674call sees that the data isn't ready yet, it serves events for all three
638three requests automatically. 675requests automatically, until the first request has finished.
676
677The second C<recv> call might either find the data is already there, or it
678will continue handling events until that is the case, and so on.
639 679
640By taking advantage of network latencies, which allows us to serve other 680By taking advantage of network latencies, which allows us to serve other
641requests and events while we wait for an event on one socket, the overall 681requests and events while we wait for an event on one socket, the overall
642time to do these three requests will be greatly reduces, typically all 682time to do these three requests will be greatly reduced, typically all
643three are done in the same time as the slowest of them would use. 683three are done in the same time as the slowest of them would need to finish.
644 684
645By the way, you do not actually have to wait in the C<recv> method on an 685By the way, you do not actually have to wait in the C<recv> method on an
646AnyEvent condition variable, you can also register a callback: 686AnyEvent condition variable - after all, waiting is evil - you can also
687register a callback:
647 688
648 $cv->cb (sub { 689 $cv->cb (sub {
649 my $response = shift->recv; 690 my $response = shift->recv;
650 # ... 691 # ...
651 }); 692 });
656response: 697response:
657 698
658 sub finger($$$) { 699 sub finger($$$) {
659 my ($user, $host, $cb) = @_; 700 my ($user, $host, $cb) = @_;
660 701
661What you use is a matter of taste - if you expect your function to be 702How you implement it is a matter of taste - if you expect your function to
662used mainly in an event-based program you would normally prefer to pass a 703be used mainly in an event-based program you would normally prefer to pass
663callback directly. 704a callback directly. If you write a module and expect your users to use
705it "synchronously" often (for example, a simple http-get script would not
706really care much for events), then you would use a condition variable and
707tell them "simply ->recv the data".
664 708
665=head3 Criticism and fix 709=head3 Problems with the implementation and how to fix them
666 710
667To make this example more real-world-ready, we would not only implement 711To make this example more real-world-ready, we would not only implement
668some write buffering (for the paranoid), but we would also have to handle 712some write buffering (for the paranoid), but we would also have to handle
669timeouts and maybe protocol errors. 713timeouts and maybe protocol errors.
670 714
671This quickly gets unwieldy, which is why we introduce L<AnyEvent::Handle> 715Doing this quickly gets unwieldy, which is why we introduce
672in the next section, which takes care of all these details for us. 716L<AnyEvent::Handle> in the next section, which takes care of all these
717details for you and let's you concentrate on the actual protocol.
673 718
674 719
675=head2 First experiments with AnyEvent::Handle 720=head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
721
722The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
723see what it really offers.
724
725As finger is such a simple protocol, let's try something slightly more
726complicated: HTTP/1.0.
727
728An HTTP GET request works by sending a single request line that indicates
729what you want the server to do and the URI you want to act it on, followed
730by as many "header" lines (C<Header: data>, same as e-mail headers) as
731required for the request, ended by an empty line.
732
733The response is formatted very similarly, first a line with the response
734status, then again as many header lines as required, then an empty line,
735followed by any data that the server might send.
736
737Again, let's try it out with C<telnet> (I condensed the output a bit - if
738you want to see the full response, do it yourself).
739
740 # telnet www.google.com 80
741 Trying 209.85.135.99...
742 Connected to www.google.com (209.85.135.99).
743 Escape character is '^]'.
744 GET /test HTTP/1.0
745
746 HTTP/1.0 404 Not Found
747 Date: Mon, 02 Jun 2008 07:05:54 GMT
748 Content-Type: text/html; charset=UTF-8
749
750 <html><head>
751 [...]
752 Connection closed by foreign host.
753
754The C<GET ...> and the empty line were entered manually, the rest of the
755telnet output is google's response, in which case a C<404 not found> one.
756
757So, here is how you would do it with C<AnyEvent::Handle>:
758
759###TODO
760
761And now let's go through it step by step. First, as usual, the overall
762C<http_get> function structure:
763
764 sub http_get {
765 my ($host, $uri, $cb) = @_;
766
767 tcp_connect $host, "http", sub {
768 ...
769 };
770 }
771
772Unlike in the finger example, this time the caller has to pass a callback
773to C<http_get>. Also, instead of passing a URL as one would expect, the
774caller has to provide the hostname and URI - normally you would use the
775C<URI> module to parse a URL and separate it into those parts, but that is
776left to the inspired reader :)
777
778Since everything else is left to the caller, all C<http_get> does it to
779initiate the connection with C<tcp_connect> and leave everything else to
780it's callback.
781
782The first thing the callback does is check for connection errors and
783declare some variables:
784
785 my ($fh) = @_
786 or $cb->("HTTP/1.0 500 $!");
787
788 my ($response, $header, $body);
789
790Instead of having an extra mechanism to signal errors, connection errors
791are signalled by crafting a special "response status line", like this:
792
793 HTTP/1.0 500 Connection refused
794
795This means the caller cannot distinguish (easily) between
796locally-generated errors and server errors, but it simplifies error
797handling for the caller a lot.
798
799The next step finally involves L<AnyEvent::Handle>, namely it creates the
800handle object:
801
802 my $handle; $handle = new AnyEvent::Handle
803 fh => $fh,
804 on_error => sub {
805 undef $handle;
806 $cb->("HTTP/1.0 500 $!");
807 },
808 on_eof => sub {
809 undef $handle; # keep it alive till eof
810 $cb->($response, $header, $body);
811 };
812
813The constructor expects a file handle, which gets passed via the C<fh>
814argument.
815
816The remaining two argument pairs specify two callbacks to be called on
817any errors (C<on_error>) and in the case of a normal connection close
818(C<on_eof>).
819
820In the first case, we C<undef>ine the handle object and pass the error to
821the callback provided by the callback - done.
822
823In the second case we assume everything went fine and pass the results
824gobbled up so far to the caller-provided callback. This is not quite
825perfect, as when the server "cleanly" closes the connection in the middle
826of sending headers we might wrongly report this as an "OK" to the caller,
827but then, HTTP doesn't support a perfect mechanism that would detect such
828problems in all cases, so we don't bother either.
829
830=head3 The write queue
831
832The next line sends the actual request:
833
834 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
835
836No headers will be sent (this is fine for simple requests), so the whole
837request is just a single line followed by an empty line to signal the end
838of the headers to the server.
839
840The more interesting question is why the method is called C<push_write>
841and not just write. The reason is that you can I<always> add some write
842data without blocking, and to do this, AnyEvent::Handle needs some write
843queue internally - and C<push_write> simply pushes some data at the end of
844that queue, just like Perl's C<push> pushes data at the end of an array.
845
846The deeper reason is that at some point in the future, there might
847be C<unshift_write> as well, and in any case, we will shortly meet
848C<push_read> and C<unshift_read>, and it's usually easiest if all those
849functions have some symmetry in their name.
850
851If C<push_write> is called with more than one argument, then you can even
852do I<formatted> I/O, which simply means your data will be transformed in
853some ways. For example, this would JSON-encode your data before pushing it
854to the write queue:
855
856 $handle->push_write (json => [1, 2, 3]);
857
858Apart from that, this pretty much summarises the write queue, there is
859little else to it.
860
861Reading the response if far more interesting:
862
863=head3 The read queue
864
865the response consists of three parts: a single line of response status, a
866single paragraph of headers ended by an empty line, and the request body,
867which is simply the remaining data on that connection.
868
869For the first two, we push two read requests onto the read queue:
870
871 # now fetch response status line
872 $handle->push_read (line => sub {
873 my ($handle, $line) = @_;
874 $response = $line;
875 });
876
877 # then the headers
878 $handle->push_read (line => "\015\012\015\012", sub {
879 my ($handle, $line) = @_;
880 $header = $line;
881 });
882
883While one can simply push a single callback to the queue, I<formatted> I/O
884really comes to out advantage here, as there is a ready-made "read line"
885read type. The first read expects a single line, ended by C<\015\012> (the
886standard end-of-line marker in internet protocols).
887
888The second "line" is actually a single paragraph - instead of reading it
889line by line we tell C<push_read> that the end-of-line marker is really
890C<\015\012\015\012>, which is an empty line. The result is that the whole
891header paragraph will be treated as a single line and read. The word
892"line" is interpreted very freely, much like Perl itself does it.
893
894Note that push read requests are pushed immediately after creating the
895handle object - since AnyEvent::Handle provides a queue we can push as
896many requests as we want, and AnyEvent::Handle will handle them in order.
897
898There is, however, no read type for "the remaining data". For that, we
899install our own C<on_read> callback:
900
901 # and finally handle any remaining data as body
902 $handle->on_read (sub {
903 $body .= $_[0]->rbuf;
904 $_[0]->rbuf = "";
905 });
906
907This callback is invoked every time data arrives and the read queue is
908empty - which in this example will only be the case when both response and
909header have been read.
910
911
912#############################################################################
676 913
677Now let's start with something simple: a program that reads from standard 914Now let's start with something simple: a program that reads from standard
678input in a non-blocking way, that is, in a way that lets your program do 915input in a non-blocking way, that is, in a way that lets your program do
679other things while it is waiting for input. 916other things while it is waiting for input.
680 917

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines