ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.18
Committed: Fri Jun 6 07:29:45 2008 UTC (16 years ago) by root
Branch: MAIN
CVS Tags: rel-4_151, rel-4_152, rel-4_15, rel-4_161, rel-4_160
Changes since 1.17: +54 -8 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.2 =head1 Introduction to AnyEvent
2 root 1.1
3 root 1.2 This is a tutorial that will introduce you to the features of AnyEvent.
4 root 1.1
5 root 1.2 The first part introduces the core AnyEvent module (after swamping you a
6 root 1.17 bit in evangelism), which might already provide all you ever need. If you
7     are only interested in AnyEvent's event handling capabilities, read no
8     further.
9 root 1.1
10 root 1.2 The second part focuses on network programming using sockets, for which
11 root 1.17 AnyEvent offers a lot of support you can use, and a lot of workarounds
12     around portability quirks.
13 root 1.2
14    
15     =head1 What is AnyEvent?
16    
17 root 1.10 If you don't care for the whys and want to see code, skip this section!
18 root 1.2
19     AnyEvent is first of all just a framework to do event-based
20     programming. Typically such frameworks are an all-or-nothing thing: If you
21     use one such framework, you can't (easily, or even at all) use another in
22     the same program.
23    
24     AnyEvent is different - it is a thin abstraction layer above all kinds
25     of event loops. Its main purpose is to move the choice of the underlying
26     framework (the event loop) from the module author to the program author
27     using the module.
28    
29     That means you can write code that uses events to control what it
30     does, without forcing other code in the same program to use the same
31     underlying framework as you do - i.e. you can create a Perl module
32     that is event-based using AnyEvent, and users of that module can still
33     choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
34     all: AnyEvent comes with its own event loop implementation, so your
35     code works regardless of other modules that might or might not be
36     installed. The latter is important, as AnyEvent does not have any
37     dependencies to other modules, which makes it easy to install, for
38     example, when you lack a C compiler.
39    
40     A typical problem with Perl modules such as L<Net::IRC> is that they
41     come with their own event loop: In L<Net::IRC>, the program who uses it
42     needs to start the event loop of L<Net::IRC>. That means that one cannot
43     integrate this module into a L<Gtk2> GUI for instance, as that module,
44     too, enforces the use of its own event loop (namely L<Glib>).
45 root 1.1
46     Another example is L<LWP>: it provides no event interface at all. It's a
47     pure blocking HTTP (and FTP etc.) client library, which usually means that
48     you either have to start a thread or have to fork for a HTTP request, or
49     use L<Coro::LWP>, if you want to do something else while waiting for the
50     request to finish.
51    
52     The motivation behind these designs is often that a module doesn't want to
53     depend on some complicated XS-module (Net::IRC), or that it doesn't want
54 root 1.2 to force the user to use some specific event loop at all (LWP).
55 root 1.1
56 root 1.2 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
57 root 1.1
58     =over 4
59    
60 root 1.11 =item - write their own event loop (because guarantees to offer one
61 root 1.2 everywhere - even on windows).
62 root 1.1
63 root 1.11 =item - choose one fixed event loop (because AnyEvent works with all
64 root 1.2 important event loops available for Perl, and adding others is trivial).
65 root 1.1
66     =back
67    
68 root 1.2 If the module author uses L<AnyEvent> for all his event needs (IO events,
69     timers, signals, ...) then all other modules can just use his module and
70     don't have to choose an event loop or adapt to his event loop. The choice
71     of the event loop is ultimately made by the program author who uses all
72     the modules and writes the main program. And even there he doesn't have to
73     choose, he can just let L<AnyEvent> choose the best available event loop
74     for him.
75 root 1.1
76     Read more about this in the main documentation of the L<AnyEvent> module.
77    
78    
79 root 1.2 =head1 Introduction to Event-Based Programming
80    
81     So what exactly is programming using events? It quite simply means that
82     instead of your code actively waiting for something, such as the user
83     entering something on STDIN:
84    
85     $| = 1; print "enter your name> ";
86    
87     my $name = <STDIN>;
88    
89     You instead tell your event framework to notify you in the event of some
90     data being available on STDIN, by using a callback mechanism:
91    
92     use AnyEvent;
93    
94     $| = 1; print "enter your name> ";
95    
96     my $name;
97    
98     my $wait_for_input = AnyEvent->io (
99     fh => \*STDIN, # which file handle to check
100     poll => "r", # which event to wait for ("r"ead data)
101     cb => sub { # what callback to execute
102     $name = <STDIN>; # read it
103     }
104     );
105    
106     # do something else here
107    
108     Looks more complicated, and surely is, but the advantage of using events
109     is that your program can do something else instead of waiting for
110     input. Waiting as in the first example is also called "blocking" because
111     you "block" your process from executing anything else while you do so.
112    
113     The second example avoids blocking, by only registering interest in a read
114     event, which is fast and doesn't block your process. Only when read data
115     is available will the callback be called, which can then proceed to read
116     the data.
117    
118     The "interest" is represented by an object returned by C<< AnyEvent->io
119     >> called a "watcher" object - called like that because it "watches" your
120     file handle (or other event sources) for the event you are interested in.
121    
122     In the example above, we create an I/O watcher by calling the C<<
123     AnyEvent->io >> method. Disinterest in some event is simply expressed by
124     forgetting about the watcher, for example, by C<undef>'ing the variable it
125     is stored in. AnyEvent will automatically clean up the watcher if it is no
126     longer used, much like Perl closes your file handles if you no longer use
127     them anywhere.
128    
129 root 1.18 =head3 A short note on callbacks
130    
131     A common issue that hits people is the problem of passing parameters
132     to callbacks. Programmers used to languages such as C or C++ are often
133     used to a style where one passes the address of a function (a function
134     reference) and some data value, e.g.:
135    
136     sub callback {
137     my ($arg) = @_;
138    
139     $arg->method;
140     }
141    
142     my $arg = ...;
143    
144     call_me_back_later \&callback, $arg;
145    
146     This is clumsy, as the place where behaviour is specified (when the
147     callback is registered) is often far away from the place where behaviour
148     is implemented. It also doesn't use Perl syntax to invoke the code. There
149     is also an abstraction penalty to pay as one has to I<name> the callback,
150     which often is unnecessary and leads to nonsensical or duplicated names.
151    
152     In Perl, one can specify behaviour much more directly by using
153     I<closures>. Closures are code blocks that take a reference to the
154     enclosing scope(s) when they are created. This means lexical variables in scope at the time
155     of creating the closure can simply be used inside the closure:
156    
157     my $arg = ...;
158    
159     call_me_back_later sub { $arg->method };
160    
161     Under most circumstances, closures are faster, use less resources and
162     result in much clearer code then the traditional approach. Faster,
163     because parameter passing and storing them in local variables in Perl
164     is relatively slow. Less resources, because closures take references to
165     existing variables without having to create new ones, and clearer code
166     because it is immediately obvious that the second example calls the
167     C<method> method when the callback is invoked.
168    
169     Apart from these, the strongest argument for using closures with AnyEvent
170     is that AnyEvent does not allow passing parameters to the callback, so
171     closures are the only way to achieve that in most cases :->
172    
173    
174 root 1.2 =head2 Condition Variables
175    
176 root 1.18 Back to the I/O watcher example: The code not yet a fully working program,
177     and will not work as-is. The reason is that your callback will not be
178     invoked out of the blue, you have to run the event loop. Also, event-based
179     programs sometimes have to block, too, as when there simply is nothing
180     else to do and everything waits for some events, it needs to block the
181     process as well.
182 root 1.2
183     In AnyEvent, this is done using condition variables. Condition variables
184     are named "condition variables" because they represent a condition that is
185     initially false and needs to be fulfilled.
186    
187 root 1.10 You can also call them "merge points", "sync points", "rendezvous ports"
188     or even callbacks and many other things (and they are often called like
189     this in other frameworks). The important point is that you can create them
190     freely and later wait for them to become true.
191 root 1.2
192     Condition variables have two sides - one side is the "producer" of the
193 root 1.18 condition (whatever code detects and flags the condition), the other side
194     is the "consumer" (the code that waits for that condition).
195 root 1.2
196     In our example in the previous section, the producer is the event callback
197 root 1.18 and there is no consumer yet - let's change that right now:
198 root 1.2
199     use AnyEvent;
200    
201     $| = 1; print "enter your name> ";
202    
203     my $name;
204    
205     my $name_ready = AnyEvent->condvar;
206    
207     my $wait_for_input = AnyEvent->io (
208     fh => \*STDIN,
209     poll => "r",
210     cb => sub {
211     $name = <STDIN>;
212     $name_ready->send;
213     }
214     );
215    
216     # do something else here
217    
218     # now wait until the name is available:
219     $name_ready->recv;
220    
221     undef $wait_for_input; # watche rno longer needed
222    
223     print "your name is $name\n";
224    
225     This program creates an AnyEvent condvar by calling the C<<
226     AnyEvent->condvar >> method. It then creates a watcher as usual, but
227     inside the callback it C<send>'s the C<$name_ready> condition variable,
228     which causes anybody waiting on it to continue.
229    
230     The "anybody" in this case is the code that follows, which calls C<<
231     $name_ready->recv >>: The producer calls C<send>, the consumer calls
232     C<recv>.
233    
234     If there is no C<$name> available yet, then the call to C<<
235     $name_ready->recv >> will halt your program until the condition becomes
236     true.
237    
238     As the names C<send> and C<recv> imply, you can actually send and receive
239     data using this, for example, the above code could also be written like
240     this, without an extra variable to store the name in:
241    
242     use AnyEvent;
243    
244     $| = 1; print "enter your name> ";
245    
246     my $name_ready = AnyEvent->condvar;
247    
248     my $wait_for_input = AnyEvent->io (
249     fh => \*STDIN, poll => "r",
250     cb => sub { $name_ready->send (scalar = <STDIN>) }
251     );
252    
253     # do something else here
254    
255     # now wait and fetch the name
256     my $name = $name_ready->recv;
257    
258     undef $wait_for_input; # watche rno longer needed
259    
260     print "your name is $name\n";
261    
262     You can pass any number of arguments to C<send>, and everybody call to
263     C<recv> will return them.
264    
265     =head2 The "main loop"
266    
267     Most event-based frameworks have something called a "main loop" or "event
268     loop run function" or something similar.
269    
270     Just like in C<recv> AnyEvent, these functions need to be called
271     eventually so that your event loop has a chance of actually looking for
272     those events you are interested in.
273    
274     For example, in a L<Gtk2> program, the above example could also be written
275     like this:
276    
277     use Gtk2 -init;
278     use AnyEvent;
279    
280     ############################################
281     # create a window and some label
282    
283     my $window = new Gtk2::Window "toplevel";
284     $window->add (my $label = new Gtk2::Label "soon replaced by name");
285    
286     $window->show_all;
287    
288     ############################################
289     # do our AnyEvent stuff
290    
291     $| = 1; print "enter your name> ";
292    
293     my $name_ready = AnyEvent->condvar;
294    
295     my $wait_for_input = AnyEvent->io (
296     fh => \*STDIN, poll => "r",
297     cb => sub {
298     # set the label
299     $label->set_text (scalar <STDIN>);
300     print "enter another name> ";
301     }
302     );
303    
304     ############################################
305     # Now enter Gtk2's event loop
306    
307     main Gtk2;
308    
309     No condition variable anywhere in sight - instead, we just read a line
310     from STDIN and replace the text in the label. In fact, since nobody
311     C<undef>'s C<$wait_for_input> you can enter multiple lines.
312    
313     Instead of waiting for a condition variable, the program enters the Gtk2
314     main loop by calling C<< Gtk2->main >>, which will block the program and
315     wait for events to arrive.
316    
317     This also shows that AnyEvent is quite flexible - you didn't have anything
318     to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
319     worked.
320    
321     Admittedly, the example is a bit silly - who would want to read names
322     form standard input in a Gtk+ application. But imagine that instead of
323     doing that, you would make a HTTP request in the background and display
324     it's results. In fact, with event-based programming you can make many
325     http-requests in parallel in your program and still provide feedback to
326     the user and stay interactive.
327    
328     In the next part you will see how to do just that - by implementing an
329     HTTP request, on our own, with the utility modules AnyEvent comes with.
330    
331 root 1.4 Before that, however, let's briefly look at how you would write your
332 root 1.2 program with using only AnyEvent, without ever calling some other event
333     loop's run function.
334    
335     In the example using condition variables, we used that, and in fact, this
336     is the solution:
337    
338     my $quit_program = AnyEvent->condvar;
339    
340     # create AnyEvent watchers (or not) here
341    
342     $quit_program->recv;
343    
344     If any of your watcher callbacks decide to quit, they can simply call
345     C<< $quit_program->send >>. Of course, they could also decide not to and
346     simply call C<exit> instead, or they could decide not to quit, ever (e.g.
347     in a long-running daemon program).
348    
349     In that case, you can simply use:
350    
351     AnyEvent->condvar->recv;
352    
353     And this is, in fact, closest to the idea of a main loop run function that
354     AnyEvent offers.
355    
356     =head2 Timers and other event sources
357    
358     So far, we have only used I/O watchers. These are useful mainly to find
359 root 1.10 out whether a Socket has data to read, or space to write more data. On sane
360 root 1.2 operating systems this also works for console windows/terminals (typically
361     on standard input), serial lines, all sorts of other devices, basically
362     almost everything that has a file descriptor but isn't a file itself. (As
363     usual, "sane" excludes windows - on that platform you would need different
364 root 1.10 functions for all of these, complicating code immensely - think "socket
365 root 1.2 only" on windows).
366    
367 root 1.10 However, I/O is not everything - the second most important event source is
368 root 1.2 the clock. For example when doing an HTTP request you might want to time
369 root 1.10 out when the server doesn't answer within some predefined amount of time.
370 root 1.2
371     In AnyEvent, timer event watchers are created by calling the C<<
372     AnyEvent->timer >> method:
373    
374     use AnyEvent;
375    
376     my $cv = AnyEvent->condvar;
377    
378     my $wait_one_and_a_half_seconds = AnyEvent->timer (
379     after => 1.5, # after how many seconds to invoke the cb?
380     cb => sub { # the callback to invoke
381     $cv->send;
382     },
383     );
384    
385 root 1.10 # can do something else here
386 root 1.2
387     # now wait till our time has come
388     $cv->recv;
389    
390     Unlike I/O watchers, timers are only interested in the amount of seconds
391     they have to wait. When that amount of time has passed, AnyEvent will
392     invoke your callback.
393    
394     Unlike I/O watchers, which will call your callback as many times as there
395     is data available, timers are one-shot: after they have "fired" once and
396     invoked your callback, they are dead and no longer do anything.
397    
398     To get a repeating timer, such as a timer firing roughly once per second,
399     you have to recreate it:
400    
401     use AnyEvent;
402    
403     my $time_watcher;
404    
405     sub once_per_second {
406     print "tick\n";
407    
408     # (re-)create the watcher
409     $time_watcher = AnyEvent->timer (
410     after => 1,
411     cb => \&once_per_second,
412     );
413     }
414    
415     # now start the timer
416     once_per_second;
417    
418     Having to recreate your timer is a restriction put on AnyEvent that is
419     present in most event libraries it uses. It is so annoying that some
420 root 1.10 future version might work around this limitation, but right now, it's the
421 root 1.2 only way to do repeating timers.
422    
423     Fortunately most timers aren't really repeating but specify timeouts of
424     some sort.
425    
426     =head3 More esoteric sources
427    
428     AnyEvent also has some other, more esoteric event sources you can tap
429     into: signal and child watchers.
430    
431     Signal watchers can be used to wait for "signal events", which simply
432 root 1.7 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
433 root 1.2
434     Process watchers wait for a child process to exit. They are useful when
435 root 1.10 you fork a separate process and need to know when it exits, but you do not
436 root 1.2 wait for that by blocking.
437    
438     Both watcher types are described in detail in the main L<AnyEvent> manual
439     page.
440    
441    
442     =head1 Network programming and AnyEvent
443    
444 root 1.3 So far you have seen how to register event watchers and handle events.
445 root 1.1
446 root 1.5 This is a great foundation to write network clients and servers, and might be
447 root 1.3 all that your module (or program) ever requires, but writing your own I/O
448     buffering again and again becomes tedious, not to mention that it attracts
449     errors.
450    
451     While the core L<AnyEvent> module is still small and self-contained,
452     the distribution comes with some very useful utility modules such as
453     L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
454     make your life as non-blocking network programmer a lot easier.
455    
456 root 1.4 Here is a quick overview over these three modules:
457    
458     =head2 L<AnyEvent::DNS>
459    
460     This module allows fully asynchronous DNS resolution. It is used mainly by
461     L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
462     a great way to do other DNS resolution tasks, such as reverse lookups of
463     IP addresses for log files.
464 root 1.1
465 root 1.2 =head2 L<AnyEvent::Handle>
466 root 1.1
467     This module handles non-blocking IO on file handles in an event based
468     manner. It provides a wrapper object around your file handle that provides
469     queueing and buffering of incoming and outgoing data for you.
470    
471 root 1.4 It also implements the most common data formats, such as text lines, or
472     fixed and variable-width data blocks.
473 root 1.1
474 root 1.2 =head2 L<AnyEvent::Socket>
475 root 1.1
476     This module provides you with functions that handle socket creation
477     and IP address magic. The two main functions are C<tcp_connect> and
478     C<tcp_server>. The former will connect a (streaming) socket to an internet
479     host for you and the later will make a server socket for you, to accept
480     connections.
481    
482     This module also comes with transparent IPv6 support, this means: If you
483     write your programs with this module, you will be IPv6 ready without doing
484 root 1.4 anything special.
485 root 1.1
486     It also works around a lot of portability quirks (especially on the
487     windows platform), which makes it even easier to write your programs in a
488 root 1.4 portable way (did you know that windows uses different error codes for all
489     socket functions and that Perl does not know about these? That "Unknown
490     error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
491     successful? That unsuccessful TCP connects might never be reported back
492     to your program? That C<WSAEINPROGRESS> means your C<connect> call was
493     ignored instead of being in progress? AnyEvent::Socket works around all of
494     these Windows/Perl bugs for you).
495    
496 root 1.11 =head2 Implementing a parallel finger client with non-blocking connects
497 root 1.16 and AnyEvent::Socket
498 root 1.4
499     The finger protocol is one of the simplest protocols in use on the
500     internet. Or in use in the past, as almost nobody uses it anymore.
501    
502     It works by connecting to the finger port on another host, writing a
503     single line with a user name and then reading the finger response, as
504     specified by that user. OK, RFC 1288 specifies a vastly more complex
505     protocol, but it basically boils down to this:
506    
507     # telnet idsoftware.com finger
508     Trying 192.246.40.37...
509     Connected to idsoftware.com (192.246.40.37).
510     Escape character is '^]'.
511     johnc
512     Welcome to id Software's Finger Service V1.5!
513    
514     [...]
515     Now on the web:
516     [...]
517    
518     Connection closed by foreign host.
519    
520 root 1.11 "Now on the web..." yeah, I<was> used indeed, but at least the finger
521     daemon still works, so let's write a little AnyEvent function that makes a
522     finger request:
523 root 1.4
524     use AnyEvent;
525     use AnyEvent::Socket;
526    
527     sub finger($$) {
528     my ($user, $host) = @_;
529    
530     # use a condvar to return results
531     my $cv = AnyEvent->condvar;
532    
533     # first, connect to the host
534     tcp_connect $host, "finger", sub {
535 root 1.8 # the callback receives the socket handle - or nothing
536 root 1.4 my ($fh) = @_
537     or return $cv->send;
538    
539     # now write the username
540     syswrite $fh, "$user\015\012";
541    
542     my $response;
543    
544     # register a read watcher
545     my $read_watcher; $read_watcher = AnyEvent->io (
546     fh => $fh,
547     poll => "r",
548     cb => sub {
549     my $len = sysread $fh, $response, 1024, length $response;
550    
551     if ($len <= 0) {
552     # we are done, or an error occured, lets ignore the latter
553     undef $read_watcher; # no longer interested
554     $cv->send ($response); # send results
555     }
556     },
557     );
558     };
559    
560     # pass $cv to the caller
561     $cv
562     }
563    
564 root 1.11 That's a mouthful! Let's dissect this function a bit, first the overall
565     function and execution flow:
566 root 1.4
567     sub finger($$) {
568     my ($user, $host) = @_;
569    
570     # use a condvar to return results
571     my $cv = AnyEvent->condvar;
572    
573     # first, connect to the host
574     tcp_connect $host, "finger", sub {
575     ...
576     };
577    
578     $cv
579     }
580    
581 root 1.11 This isn't too complicated, just a function with two parameters, that
582 root 1.4 creates a condition variable, returns it, and while it does that,
583 root 1.11 initiates a TCP connect to C<$host>. The condition variable will be used
584     by the caller to receive the finger response, but one could equally well
585     pass a third argument, a callback, to the function.
586 root 1.4
587 root 1.11 Since we are programming event'ish, we do not wait for the connect to
588     finish - it could block the program for a minute or longer!
589    
590     Instead, we pass the callback it should invoke when the connect is done to
591     C<tcp_connect>. If it is successful, that callback gets called with the
592 root 1.5 socket handle as first argument, otherwise, nothing will be passed to our
593 root 1.11 callback. The important point is that it will always be called as soon as
594     the outcome of the TCP connect is known.
595    
596     This style of programming is also called "continuation style": the
597     "continuation" is simply the way the program continues - normally, a
598     program continues at the next line after some statement (the exception
599     is loops or things like C<return>). When we are interested in events,
600     however, we instead specify the "continuation" of our program by passing a
601     closure, which makes that closure the "continuation" of the program. The
602     C<tcp_connect> call is like saying "return now, and when the connection is
603     established or it failed, continue there".
604 root 1.4
605 root 1.11 Now let's look at the callback/closure in more detail:
606 root 1.4
607 root 1.11 # the callback receives the socket handle - or nothing
608 root 1.4 my ($fh) = @_
609     or return $cv->send;
610    
611 root 1.5 The first thing the callback does is indeed save the socket handle in
612     C<$fh>. When there was an error (no arguments), then our instinct as
613 root 1.11 expert Perl programmers would tell us to C<die>:
614 root 1.4
615     my ($fh) = @_
616     or die "$host: $!";
617 root 1.1
618 root 1.11 While this would give good feedback to the user (if he happens to watch
619     standard error), our program would probably stop working here, as we never
620     report the results to anybody, certainly not the caller of our C<finger>
621     function, and most event loops continue even after a C<die>!
622    
623     This is why we instead C<return>, but also call C<< $cv->send >> without
624     any arguments to signal to the condvar consumer that something bad has
625     happened. The return value of C<< $cv->send >> is irrelevant, as is the
626     return value of our callback. The return statement is simply used for the
627     side effect of, well, returning immediately from the callback. Checking
628     for errors and handling them this way is very common, which is why this
629     compact idiom is so handy.
630 root 1.4
631     As the next step in the finger protocol, we send the username to the
632     finger daemon on the other side of our connection:
633    
634     syswrite $fh, "$user\015\012";
635    
636 root 1.11 Note that this isn't 100% clean socket programming - the socket could,
637     for whatever reasons, not accept our data. When writing a small amount
638     of data like in this example it doesn't matter, as a socket buffer is
639     almost always big enough for a mere "username", but for real-world
640     cases you might need to implement some kind of write buffering - or use
641     L<AnyEvent::Handle>, which handles these matters for you, as shown in the
642     next section.
643 root 1.4
644 root 1.11 What we I<do> have to do is to implement our own read buffer - the response
645 root 1.4 data could arrive late or in multiple chunks, and we cannot just wait for
646     it (event-based programming, you know?).
647    
648     To do that, we register a read watcher on the socket which waits for data:
649    
650     my $read_watcher; $read_watcher = AnyEvent->io (
651     fh => $fh,
652     poll => "r",
653    
654     There is a trick here, however: the read watcher isn't stored in a global
655     variable, but in a local one - if the callback returns, it would normally
656     destroy the variable and its contents, which would in turn unregister our
657 root 1.5 watcher.
658 root 1.4
659     To avoid that, we C<undef>ine the variable in the watcher callback. This
660 root 1.5 means that, when the C<tcp_connect> callback returns, that perl thinks
661 root 1.4 (quite correctly) that the read watcher is still in use - namely in the
662     callback.
663    
664 root 1.11 The trick, however, is that instead of:
665    
666     my $read_watcher = AnyEvent->io (...
667    
668     The program does:
669    
670     my $read_watcher; $read_watcher = AnyEvent->io (...
671    
672     The reason for this is a quirk in the way Perl works: variable names
673     declared with C<my> are only visible in the I<next> statement. If the
674     whole C<< AnyEvent->io >> call, including the callback, would be done in
675     a single statement, the callback could not refer to the C<$read_watcher>
676     variable to undefine it, so it is done in two statements.
677    
678     Whether you'd want to format it like this is of course a matter of style,
679     this way emphasizes that the declaration and assignment really are one
680     logical statement.
681    
682 root 1.4 The callback itself calls C<sysread> for as many times as necessary, until
683 root 1.11 C<sysread> returns either an error or end-of-file:
684 root 1.4
685     cb => sub {
686     my $len = sysread $fh, $response, 1024, length $response;
687    
688     if ($len <= 0) {
689    
690     Note that C<sysread> has the ability to append data it reads to a scalar,
691 root 1.11 by specifying an offset, which is what we make good use of in this
692     example.
693 root 1.4
694     When C<sysread> indicates we are done, the callback C<undef>ines
695     the watcher and then C<send>'s the response data to the condition
696     variable. All this has the following effects:
697    
698     Undefining the watcher destroys it, as our callback was the only one still
699     having a reference to it. When the watcher gets destroyed, it destroys the
700     callback, which in turn means the C<$fh> handle is no longer used, so that
701     gets destroyed as well. The result is that all resources will be nicely
702     cleaned up by perl for us.
703    
704     =head3 Using the finger client
705    
706 root 1.5 Now, we could probably write the same finger client in a simpler way if
707     we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
708     ignored IPv6 and a few other things that C<tcp_connect> handles for us.
709 root 1.4
710     But the main advantage is that we can not only run this finger function in
711     the background, we even can run multiple sessions in parallel, like this:
712    
713     my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
714     my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
715     my $f3 = finger "johnc", "idsoftware.com"; # finger john
716    
717     print "trouble tickets:\n", $f1->recv, "\n";
718     print "trouble ticket #1736:\n", $f2->recv, "\n";
719     print "john carmacks finger file: ", $f3->recv, "\n";
720    
721     It doesn't look like it, but in fact all three requests run in
722 root 1.9 parallel. The code waits for the first finger request to finish first, but
723 root 1.11 that doesn't keep it from executing them parallel: when the first C<recv>
724     call sees that the data isn't ready yet, it serves events for all three
725     requests automatically, until the first request has finished.
726    
727     The second C<recv> call might either find the data is already there, or it
728     will continue handling events until that is the case, and so on.
729 root 1.9
730     By taking advantage of network latencies, which allows us to serve other
731     requests and events while we wait for an event on one socket, the overall
732 root 1.11 time to do these three requests will be greatly reduced, typically all
733     three are done in the same time as the slowest of them would need to finish.
734 root 1.5
735     By the way, you do not actually have to wait in the C<recv> method on an
736 root 1.11 AnyEvent condition variable - after all, waiting is evil - you can also
737     register a callback:
738 root 1.5
739     $cv->cb (sub {
740     my $response = shift->recv;
741     # ...
742     });
743    
744     The callback will only be invoked when C<send> was called. In fact,
745     instead of returning a condition variable you could also pass a third
746     parameter to your finger function, the callback to invoke with the
747     response:
748    
749     sub finger($$$) {
750     my ($user, $host, $cb) = @_;
751    
752 root 1.11 How you implement it is a matter of taste - if you expect your function to
753     be used mainly in an event-based program you would normally prefer to pass
754     a callback directly. If you write a module and expect your users to use
755     it "synchronously" often (for example, a simple http-get script would not
756     really care much for events), then you would use a condition variable and
757     tell them "simply ->recv the data".
758 root 1.4
759 root 1.11 =head3 Problems with the implementation and how to fix them
760 root 1.4
761     To make this example more real-world-ready, we would not only implement
762     some write buffering (for the paranoid), but we would also have to handle
763     timeouts and maybe protocol errors.
764    
765 root 1.11 Doing this quickly gets unwieldy, which is why we introduce
766     L<AnyEvent::Handle> in the next section, which takes care of all these
767     details for you and let's you concentrate on the actual protocol.
768    
769    
770     =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
771    
772     The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
773     see what it really offers.
774    
775     As finger is such a simple protocol, let's try something slightly more
776     complicated: HTTP/1.0.
777    
778     An HTTP GET request works by sending a single request line that indicates
779     what you want the server to do and the URI you want to act it on, followed
780     by as many "header" lines (C<Header: data>, same as e-mail headers) as
781     required for the request, ended by an empty line.
782    
783     The response is formatted very similarly, first a line with the response
784     status, then again as many header lines as required, then an empty line,
785     followed by any data that the server might send.
786    
787     Again, let's try it out with C<telnet> (I condensed the output a bit - if
788     you want to see the full response, do it yourself).
789    
790     # telnet www.google.com 80
791     Trying 209.85.135.99...
792     Connected to www.google.com (209.85.135.99).
793     Escape character is '^]'.
794     GET /test HTTP/1.0
795    
796     HTTP/1.0 404 Not Found
797     Date: Mon, 02 Jun 2008 07:05:54 GMT
798     Content-Type: text/html; charset=UTF-8
799    
800     <html><head>
801     [...]
802     Connection closed by foreign host.
803    
804     The C<GET ...> and the empty line were entered manually, the rest of the
805     telnet output is google's response, in which case a C<404 not found> one.
806    
807     So, here is how you would do it with C<AnyEvent::Handle>:
808    
809 root 1.12 sub http_get {
810     my ($host, $uri, $cb) = @_;
811    
812 root 1.13 tcp_connect $host, "http", sub {
813 root 1.12 my ($fh) = @_
814     or $cb->("HTTP/1.0 500 $!");
815    
816     # store results here
817     my ($response, $header, $body);
818    
819     my $handle; $handle = new AnyEvent::Handle
820     fh => $fh,
821     on_error => sub {
822     undef $handle;
823     $cb->("HTTP/1.0 500 $!");
824     },
825     on_eof => sub {
826     undef $handle; # keep it alive till eof
827     $cb->($response, $header, $body);
828     };
829    
830     $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
831    
832     # now fetch response status line
833     $handle->push_read (line => sub {
834     my ($handle, $line) = @_;
835     $response = $line;
836     });
837    
838     # then the headers
839     $handle->push_read (line => "\015\012\015\012", sub {
840     my ($handle, $line) = @_;
841     $header = $line;
842     });
843    
844     # and finally handle any remaining data as body
845     $handle->on_read (sub {
846     $body .= $_[0]->rbuf;
847     $_[0]->rbuf = "";
848     });
849     };
850     }
851 root 1.11
852     And now let's go through it step by step. First, as usual, the overall
853     C<http_get> function structure:
854    
855     sub http_get {
856     my ($host, $uri, $cb) = @_;
857    
858     tcp_connect $host, "http", sub {
859     ...
860     };
861     }
862    
863     Unlike in the finger example, this time the caller has to pass a callback
864     to C<http_get>. Also, instead of passing a URL as one would expect, the
865     caller has to provide the hostname and URI - normally you would use the
866     C<URI> module to parse a URL and separate it into those parts, but that is
867     left to the inspired reader :)
868    
869     Since everything else is left to the caller, all C<http_get> does it to
870     initiate the connection with C<tcp_connect> and leave everything else to
871     it's callback.
872    
873     The first thing the callback does is check for connection errors and
874     declare some variables:
875    
876     my ($fh) = @_
877     or $cb->("HTTP/1.0 500 $!");
878    
879     my ($response, $header, $body);
880    
881     Instead of having an extra mechanism to signal errors, connection errors
882     are signalled by crafting a special "response status line", like this:
883    
884     HTTP/1.0 500 Connection refused
885    
886     This means the caller cannot distinguish (easily) between
887     locally-generated errors and server errors, but it simplifies error
888     handling for the caller a lot.
889    
890     The next step finally involves L<AnyEvent::Handle>, namely it creates the
891     handle object:
892    
893     my $handle; $handle = new AnyEvent::Handle
894     fh => $fh,
895     on_error => sub {
896     undef $handle;
897     $cb->("HTTP/1.0 500 $!");
898     },
899     on_eof => sub {
900     undef $handle; # keep it alive till eof
901     $cb->($response, $header, $body);
902     };
903    
904     The constructor expects a file handle, which gets passed via the C<fh>
905     argument.
906    
907     The remaining two argument pairs specify two callbacks to be called on
908     any errors (C<on_error>) and in the case of a normal connection close
909     (C<on_eof>).
910    
911     In the first case, we C<undef>ine the handle object and pass the error to
912     the callback provided by the callback - done.
913    
914     In the second case we assume everything went fine and pass the results
915     gobbled up so far to the caller-provided callback. This is not quite
916     perfect, as when the server "cleanly" closes the connection in the middle
917     of sending headers we might wrongly report this as an "OK" to the caller,
918     but then, HTTP doesn't support a perfect mechanism that would detect such
919     problems in all cases, so we don't bother either.
920    
921     =head3 The write queue
922    
923     The next line sends the actual request:
924    
925     $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
926    
927     No headers will be sent (this is fine for simple requests), so the whole
928     request is just a single line followed by an empty line to signal the end
929     of the headers to the server.
930    
931     The more interesting question is why the method is called C<push_write>
932     and not just write. The reason is that you can I<always> add some write
933     data without blocking, and to do this, AnyEvent::Handle needs some write
934     queue internally - and C<push_write> simply pushes some data at the end of
935     that queue, just like Perl's C<push> pushes data at the end of an array.
936    
937     The deeper reason is that at some point in the future, there might
938     be C<unshift_write> as well, and in any case, we will shortly meet
939     C<push_read> and C<unshift_read>, and it's usually easiest if all those
940     functions have some symmetry in their name.
941    
942     If C<push_write> is called with more than one argument, then you can even
943     do I<formatted> I/O, which simply means your data will be transformed in
944     some ways. For example, this would JSON-encode your data before pushing it
945     to the write queue:
946    
947     $handle->push_write (json => [1, 2, 3]);
948    
949     Apart from that, this pretty much summarises the write queue, there is
950     little else to it.
951    
952     Reading the response if far more interesting:
953    
954     =head3 The read queue
955    
956 root 1.12 The response consists of three parts: a single line of response status, a
957 root 1.11 single paragraph of headers ended by an empty line, and the request body,
958     which is simply the remaining data on that connection.
959    
960     For the first two, we push two read requests onto the read queue:
961    
962     # now fetch response status line
963     $handle->push_read (line => sub {
964     my ($handle, $line) = @_;
965     $response = $line;
966     });
967    
968     # then the headers
969     $handle->push_read (line => "\015\012\015\012", sub {
970     my ($handle, $line) = @_;
971     $header = $line;
972     });
973    
974     While one can simply push a single callback to the queue, I<formatted> I/O
975     really comes to out advantage here, as there is a ready-made "read line"
976     read type. The first read expects a single line, ended by C<\015\012> (the
977     standard end-of-line marker in internet protocols).
978    
979     The second "line" is actually a single paragraph - instead of reading it
980     line by line we tell C<push_read> that the end-of-line marker is really
981     C<\015\012\015\012>, which is an empty line. The result is that the whole
982     header paragraph will be treated as a single line and read. The word
983     "line" is interpreted very freely, much like Perl itself does it.
984    
985     Note that push read requests are pushed immediately after creating the
986     handle object - since AnyEvent::Handle provides a queue we can push as
987     many requests as we want, and AnyEvent::Handle will handle them in order.
988    
989     There is, however, no read type for "the remaining data". For that, we
990     install our own C<on_read> callback:
991    
992     # and finally handle any remaining data as body
993     $handle->on_read (sub {
994     $body .= $_[0]->rbuf;
995     $_[0]->rbuf = "";
996     });
997    
998     This callback is invoked every time data arrives and the read queue is
999     empty - which in this example will only be the case when both response and
1000 root 1.12 header have been read. The C<on_read> callback could actually have been
1001     specified when constructing the object, but doing it this way preserves
1002     logical ordering.
1003 root 1.1
1004 root 1.12 The read callback simply adds the current read buffer to it's C<$body>
1005     variable and, most importantly, I<empties> it by assign the empty string
1006     to it.
1007 root 1.1
1008 root 1.12 After AnyEvent::Handle has been so instructed, it will now handle incoming
1009     data according to these instructions - if all goes well, the callback will
1010     be invoked with the response data, if not, it will get an error.
1011 root 1.1
1012 root 1.13 In general, you get pipelining very easy with AnyEvent::Handle: If
1013     you have a protocol with a request/response structure, your request
1014     methods/functions will all look like this (simplified):
1015    
1016     sub request {
1017    
1018     # send the request to the server
1019     $handle->push_write (...);
1020    
1021     # push some response handlers
1022     $handle->push_read (...);
1023     }
1024    
1025 root 1.12 =head3 Using it
1026 root 1.1
1027 root 1.12 And here is how you would use it:
1028 root 1.1
1029 root 1.12 http_get "www.google.com", "/", sub {
1030     my ($response, $header, $body) = @_;
1031 root 1.1
1032 root 1.12 print
1033     $response, "\n",
1034     $body;
1035     };
1036 root 1.1
1037 root 1.12 And of course, you can run as many of these requests in parallel as you
1038     want (and your memory supports).
1039 root 1.1
1040 root 1.13 =head3 HTTPS
1041    
1042     Now, as promised, let's implement the same thing for HTTPS, or more
1043     correctly, let's change our C<http_get> function into a function that
1044     speaks HTTPS instead.
1045    
1046     HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1047     B<S>ecurity is the official name for what most people refer to as C<SSL>)
1048     that contains standard HTTP protocol exchanges. The other difference to
1049     HTTP is that it uses port C<443> instead of port C<80>.
1050    
1051     To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call
1052     we replace C<http> by C<https>):
1053    
1054     tcp_connect $host, "https", sub { ...
1055    
1056     The other change deals with TLS, which is something L<AnyEvent::Handle>
1057     does for us, as long as I<you> made sure that the L<Net::SSLeay> module is
1058     around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition
1059     C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1060    
1061     tls => "connect",
1062    
1063     Specifying C<tls> enables TLS, and the argument specifies whether
1064     AnyEvent::Handle is the server side ("accept") or the client side
1065     ("connect") for the TLS connection, as unlike TCP, there is a clear
1066     server/client relationship in TLS.
1067    
1068 root 1.14 That's all.
1069    
1070     Of course, all this should be handled transparently by C<http_get> after
1071     parsing the URL. See the part about exercising your inspiration earlier in
1072     this document.
1073 root 1.13
1074 root 1.12 =head3 The read queue - revisited
1075 root 1.1
1076 root 1.13 HTTP always uses the same structure in its responses, but many protocols
1077     require parsing responses different depending on the response itself.
1078    
1079     For example, in SMTP, you normally get a single response line:
1080    
1081     220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1082    
1083     But SMTP also supports multi-line responses:
1084    
1085     220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1086     220-hey guys
1087     220 my response is longer than yours
1088    
1089     To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1090     C<unshift_read> will not append your read request tot he end of the read
1091     queue, but instead it will prepend it to the queue.
1092    
1093     This is useful for this this situation: You push your response-line read
1094     request when sending the SMTP command, and when handling it, you look at
1095     the line to see if more is to come, and C<unshift_read> another reader,
1096     like this:
1097    
1098     my $response; # response lines end up in here
1099    
1100     my $read_response; $read_response = sub {
1101     my ($handle, $line) = @_;
1102    
1103     $response .= "$line\n";
1104    
1105     # check for continuation lines ("-" as 4th character")
1106     if ($line =~ /^...-/) {
1107     # if yes, then unshift another line read
1108     $handle->unshift_read (line => $read_response);
1109    
1110     } else {
1111     # otherwise we are done
1112    
1113     # free callback
1114     undef $read_response;
1115    
1116     print "we are don reading: $response\n";
1117     }
1118     };
1119    
1120     $handle->push_read (line => $read_response);
1121 root 1.1
1122 root 1.13 This recipe can be used for all similar parsing problems, for example in
1123     NNTP, the response code to some commands indicates that more data will be
1124 root 1.14 sent:
1125    
1126     $handle->push_write ("article 42");
1127    
1128     # read response line
1129     $handle->push_read (line => sub {
1130     my ($handle, $status) = @_;
1131    
1132     # article data following?
1133     if ($status =~ /^2/) {
1134     # yes, read article body
1135    
1136     $handle->unshift_read (line => "\012.\015\012", sub {
1137     my ($handle, $body) = @_;
1138    
1139     $finish->($status, $body);
1140     });
1141    
1142     } else {
1143     # some error occured, no article data
1144    
1145     $finish->($status);
1146     }
1147     }
1148    
1149     =head3 Your own read queue handler
1150    
1151     Sometimes, your protocol doesn't play nice and uses lines or chunks of
1152     data, in which case you have to implement your own read parser.
1153    
1154     To make up a contorted example, imagine you are looking for an even
1155     number of characters followed by a colon (":"). Also imagine that
1156     AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1157     had to do it manually.
1158    
1159     To implement this, you would C<push_read> (or C<unshift_read>) just a
1160     single code reference.
1161    
1162     This code reference will then be called each time there is (new) data
1163     available in the read buffer, and is expected to either eat/consume some
1164     of that data (and return true) or to return false to indicate that it
1165     wants to be called again.
1166    
1167     If the code reference returns true, then it will be removed from the read
1168     queue, otherwise it stays in front of it.
1169    
1170     The example above could be coded like this:
1171    
1172     $handle->push_read (sub {
1173     my ($handle) = @_;
1174    
1175     # check for even number of characters + ":"
1176     # and remove the data if a match is found.
1177     # if not, return false (actually nothing)
1178    
1179     $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1180     or return;
1181    
1182     # we got some data in $1, pass it to whoever wants it
1183     $finish->($1);
1184    
1185     # and return true to indicate we are done
1186     1
1187     });
1188    
1189 root 1.1
1190 root 1.15 =head1 Authors
1191 root 1.6
1192     Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1193