ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.19
Committed: Wed Jul 9 11:53:40 2008 UTC (15 years, 11 months ago) by root
Branch: MAIN
CVS Tags: rel-4_23, rel-4_21, rel-4_231, rel-4_233, rel-4_232, rel-4_234, rel-4_22, rel-4_2, rel-4_3, rel-4_31
Changes since 1.18: +19 -0 lines
Log Message:
document AnyEvent::Strict

File Contents

# User Rev Content
1 root 1.2 =head1 Introduction to AnyEvent
2 root 1.1
3 root 1.2 This is a tutorial that will introduce you to the features of AnyEvent.
4 root 1.1
5 root 1.2 The first part introduces the core AnyEvent module (after swamping you a
6 root 1.17 bit in evangelism), which might already provide all you ever need. If you
7     are only interested in AnyEvent's event handling capabilities, read no
8     further.
9 root 1.1
10 root 1.2 The second part focuses on network programming using sockets, for which
11 root 1.17 AnyEvent offers a lot of support you can use, and a lot of workarounds
12     around portability quirks.
13 root 1.2
14    
15     =head1 What is AnyEvent?
16    
17 root 1.10 If you don't care for the whys and want to see code, skip this section!
18 root 1.2
19     AnyEvent is first of all just a framework to do event-based
20     programming. Typically such frameworks are an all-or-nothing thing: If you
21     use one such framework, you can't (easily, or even at all) use another in
22     the same program.
23    
24     AnyEvent is different - it is a thin abstraction layer above all kinds
25     of event loops. Its main purpose is to move the choice of the underlying
26     framework (the event loop) from the module author to the program author
27     using the module.
28    
29     That means you can write code that uses events to control what it
30     does, without forcing other code in the same program to use the same
31     underlying framework as you do - i.e. you can create a Perl module
32     that is event-based using AnyEvent, and users of that module can still
33     choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
34     all: AnyEvent comes with its own event loop implementation, so your
35     code works regardless of other modules that might or might not be
36     installed. The latter is important, as AnyEvent does not have any
37     dependencies to other modules, which makes it easy to install, for
38     example, when you lack a C compiler.
39    
40     A typical problem with Perl modules such as L<Net::IRC> is that they
41     come with their own event loop: In L<Net::IRC>, the program who uses it
42     needs to start the event loop of L<Net::IRC>. That means that one cannot
43     integrate this module into a L<Gtk2> GUI for instance, as that module,
44     too, enforces the use of its own event loop (namely L<Glib>).
45 root 1.1
46     Another example is L<LWP>: it provides no event interface at all. It's a
47     pure blocking HTTP (and FTP etc.) client library, which usually means that
48     you either have to start a thread or have to fork for a HTTP request, or
49     use L<Coro::LWP>, if you want to do something else while waiting for the
50     request to finish.
51    
52     The motivation behind these designs is often that a module doesn't want to
53     depend on some complicated XS-module (Net::IRC), or that it doesn't want
54 root 1.2 to force the user to use some specific event loop at all (LWP).
55 root 1.1
56 root 1.2 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
57 root 1.1
58     =over 4
59    
60 root 1.11 =item - write their own event loop (because guarantees to offer one
61 root 1.2 everywhere - even on windows).
62 root 1.1
63 root 1.11 =item - choose one fixed event loop (because AnyEvent works with all
64 root 1.2 important event loops available for Perl, and adding others is trivial).
65 root 1.1
66     =back
67    
68 root 1.2 If the module author uses L<AnyEvent> for all his event needs (IO events,
69     timers, signals, ...) then all other modules can just use his module and
70     don't have to choose an event loop or adapt to his event loop. The choice
71     of the event loop is ultimately made by the program author who uses all
72     the modules and writes the main program. And even there he doesn't have to
73     choose, he can just let L<AnyEvent> choose the best available event loop
74     for him.
75 root 1.1
76     Read more about this in the main documentation of the L<AnyEvent> module.
77    
78    
79 root 1.2 =head1 Introduction to Event-Based Programming
80    
81     So what exactly is programming using events? It quite simply means that
82     instead of your code actively waiting for something, such as the user
83     entering something on STDIN:
84    
85     $| = 1; print "enter your name> ";
86    
87     my $name = <STDIN>;
88    
89     You instead tell your event framework to notify you in the event of some
90     data being available on STDIN, by using a callback mechanism:
91    
92     use AnyEvent;
93    
94     $| = 1; print "enter your name> ";
95    
96     my $name;
97    
98     my $wait_for_input = AnyEvent->io (
99     fh => \*STDIN, # which file handle to check
100     poll => "r", # which event to wait for ("r"ead data)
101     cb => sub { # what callback to execute
102     $name = <STDIN>; # read it
103     }
104     );
105    
106     # do something else here
107    
108     Looks more complicated, and surely is, but the advantage of using events
109     is that your program can do something else instead of waiting for
110     input. Waiting as in the first example is also called "blocking" because
111     you "block" your process from executing anything else while you do so.
112    
113     The second example avoids blocking, by only registering interest in a read
114     event, which is fast and doesn't block your process. Only when read data
115     is available will the callback be called, which can then proceed to read
116     the data.
117    
118     The "interest" is represented by an object returned by C<< AnyEvent->io
119     >> called a "watcher" object - called like that because it "watches" your
120     file handle (or other event sources) for the event you are interested in.
121    
122     In the example above, we create an I/O watcher by calling the C<<
123     AnyEvent->io >> method. Disinterest in some event is simply expressed by
124     forgetting about the watcher, for example, by C<undef>'ing the variable it
125     is stored in. AnyEvent will automatically clean up the watcher if it is no
126     longer used, much like Perl closes your file handles if you no longer use
127     them anywhere.
128    
129 root 1.18 =head3 A short note on callbacks
130    
131     A common issue that hits people is the problem of passing parameters
132     to callbacks. Programmers used to languages such as C or C++ are often
133     used to a style where one passes the address of a function (a function
134     reference) and some data value, e.g.:
135    
136     sub callback {
137     my ($arg) = @_;
138    
139     $arg->method;
140     }
141    
142     my $arg = ...;
143    
144     call_me_back_later \&callback, $arg;
145    
146     This is clumsy, as the place where behaviour is specified (when the
147     callback is registered) is often far away from the place where behaviour
148     is implemented. It also doesn't use Perl syntax to invoke the code. There
149     is also an abstraction penalty to pay as one has to I<name> the callback,
150     which often is unnecessary and leads to nonsensical or duplicated names.
151    
152     In Perl, one can specify behaviour much more directly by using
153     I<closures>. Closures are code blocks that take a reference to the
154     enclosing scope(s) when they are created. This means lexical variables in scope at the time
155     of creating the closure can simply be used inside the closure:
156    
157     my $arg = ...;
158    
159     call_me_back_later sub { $arg->method };
160    
161     Under most circumstances, closures are faster, use less resources and
162     result in much clearer code then the traditional approach. Faster,
163     because parameter passing and storing them in local variables in Perl
164     is relatively slow. Less resources, because closures take references to
165     existing variables without having to create new ones, and clearer code
166     because it is immediately obvious that the second example calls the
167     C<method> method when the callback is invoked.
168    
169     Apart from these, the strongest argument for using closures with AnyEvent
170     is that AnyEvent does not allow passing parameters to the callback, so
171     closures are the only way to achieve that in most cases :->
172    
173    
174 root 1.19 =head3 A hint on debugging
175    
176     AnyEvent does, by default, not do any argument checking. This can lead to
177     strange and unexpected results especially if you are trying to learn yur
178     ways with AnyEvent.
179    
180     AnyEvent supports a special "strict" mode, off by default, which does very
181     strict argument checking, at the expense of being somewhat slower. When
182     developing, however, this mode is very useful.
183    
184     You can enable this strict mode either by having an environment variable
185     C<PERL_ANYEVENT_STRICT> with a true value in your environment:
186    
187     PERL_ANYEVENT_STRICT=1 perl test.pl
188    
189     Or you can write C<use AnyEvent::Strict> in your program, which has the
190     same effect (do not do this in production, however).
191    
192    
193 root 1.2 =head2 Condition Variables
194    
195 root 1.18 Back to the I/O watcher example: The code not yet a fully working program,
196     and will not work as-is. The reason is that your callback will not be
197     invoked out of the blue, you have to run the event loop. Also, event-based
198     programs sometimes have to block, too, as when there simply is nothing
199     else to do and everything waits for some events, it needs to block the
200     process as well.
201 root 1.2
202     In AnyEvent, this is done using condition variables. Condition variables
203     are named "condition variables" because they represent a condition that is
204     initially false and needs to be fulfilled.
205    
206 root 1.10 You can also call them "merge points", "sync points", "rendezvous ports"
207     or even callbacks and many other things (and they are often called like
208     this in other frameworks). The important point is that you can create them
209     freely and later wait for them to become true.
210 root 1.2
211     Condition variables have two sides - one side is the "producer" of the
212 root 1.18 condition (whatever code detects and flags the condition), the other side
213     is the "consumer" (the code that waits for that condition).
214 root 1.2
215     In our example in the previous section, the producer is the event callback
216 root 1.18 and there is no consumer yet - let's change that right now:
217 root 1.2
218     use AnyEvent;
219    
220     $| = 1; print "enter your name> ";
221    
222     my $name;
223    
224     my $name_ready = AnyEvent->condvar;
225    
226     my $wait_for_input = AnyEvent->io (
227     fh => \*STDIN,
228     poll => "r",
229     cb => sub {
230     $name = <STDIN>;
231     $name_ready->send;
232     }
233     );
234    
235     # do something else here
236    
237     # now wait until the name is available:
238     $name_ready->recv;
239    
240     undef $wait_for_input; # watche rno longer needed
241    
242     print "your name is $name\n";
243    
244     This program creates an AnyEvent condvar by calling the C<<
245     AnyEvent->condvar >> method. It then creates a watcher as usual, but
246     inside the callback it C<send>'s the C<$name_ready> condition variable,
247     which causes anybody waiting on it to continue.
248    
249     The "anybody" in this case is the code that follows, which calls C<<
250     $name_ready->recv >>: The producer calls C<send>, the consumer calls
251     C<recv>.
252    
253     If there is no C<$name> available yet, then the call to C<<
254     $name_ready->recv >> will halt your program until the condition becomes
255     true.
256    
257     As the names C<send> and C<recv> imply, you can actually send and receive
258     data using this, for example, the above code could also be written like
259     this, without an extra variable to store the name in:
260    
261     use AnyEvent;
262    
263     $| = 1; print "enter your name> ";
264    
265     my $name_ready = AnyEvent->condvar;
266    
267     my $wait_for_input = AnyEvent->io (
268     fh => \*STDIN, poll => "r",
269     cb => sub { $name_ready->send (scalar = <STDIN>) }
270     );
271    
272     # do something else here
273    
274     # now wait and fetch the name
275     my $name = $name_ready->recv;
276    
277     undef $wait_for_input; # watche rno longer needed
278    
279     print "your name is $name\n";
280    
281     You can pass any number of arguments to C<send>, and everybody call to
282     C<recv> will return them.
283    
284     =head2 The "main loop"
285    
286     Most event-based frameworks have something called a "main loop" or "event
287     loop run function" or something similar.
288    
289     Just like in C<recv> AnyEvent, these functions need to be called
290     eventually so that your event loop has a chance of actually looking for
291     those events you are interested in.
292    
293     For example, in a L<Gtk2> program, the above example could also be written
294     like this:
295    
296     use Gtk2 -init;
297     use AnyEvent;
298    
299     ############################################
300     # create a window and some label
301    
302     my $window = new Gtk2::Window "toplevel";
303     $window->add (my $label = new Gtk2::Label "soon replaced by name");
304    
305     $window->show_all;
306    
307     ############################################
308     # do our AnyEvent stuff
309    
310     $| = 1; print "enter your name> ";
311    
312     my $name_ready = AnyEvent->condvar;
313    
314     my $wait_for_input = AnyEvent->io (
315     fh => \*STDIN, poll => "r",
316     cb => sub {
317     # set the label
318     $label->set_text (scalar <STDIN>);
319     print "enter another name> ";
320     }
321     );
322    
323     ############################################
324     # Now enter Gtk2's event loop
325    
326     main Gtk2;
327    
328     No condition variable anywhere in sight - instead, we just read a line
329     from STDIN and replace the text in the label. In fact, since nobody
330     C<undef>'s C<$wait_for_input> you can enter multiple lines.
331    
332     Instead of waiting for a condition variable, the program enters the Gtk2
333     main loop by calling C<< Gtk2->main >>, which will block the program and
334     wait for events to arrive.
335    
336     This also shows that AnyEvent is quite flexible - you didn't have anything
337     to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
338     worked.
339    
340     Admittedly, the example is a bit silly - who would want to read names
341     form standard input in a Gtk+ application. But imagine that instead of
342     doing that, you would make a HTTP request in the background and display
343     it's results. In fact, with event-based programming you can make many
344     http-requests in parallel in your program and still provide feedback to
345     the user and stay interactive.
346    
347     In the next part you will see how to do just that - by implementing an
348     HTTP request, on our own, with the utility modules AnyEvent comes with.
349    
350 root 1.4 Before that, however, let's briefly look at how you would write your
351 root 1.2 program with using only AnyEvent, without ever calling some other event
352     loop's run function.
353    
354     In the example using condition variables, we used that, and in fact, this
355     is the solution:
356    
357     my $quit_program = AnyEvent->condvar;
358    
359     # create AnyEvent watchers (or not) here
360    
361     $quit_program->recv;
362    
363     If any of your watcher callbacks decide to quit, they can simply call
364     C<< $quit_program->send >>. Of course, they could also decide not to and
365     simply call C<exit> instead, or they could decide not to quit, ever (e.g.
366     in a long-running daemon program).
367    
368     In that case, you can simply use:
369    
370     AnyEvent->condvar->recv;
371    
372     And this is, in fact, closest to the idea of a main loop run function that
373     AnyEvent offers.
374    
375     =head2 Timers and other event sources
376    
377     So far, we have only used I/O watchers. These are useful mainly to find
378 root 1.10 out whether a Socket has data to read, or space to write more data. On sane
379 root 1.2 operating systems this also works for console windows/terminals (typically
380     on standard input), serial lines, all sorts of other devices, basically
381     almost everything that has a file descriptor but isn't a file itself. (As
382     usual, "sane" excludes windows - on that platform you would need different
383 root 1.10 functions for all of these, complicating code immensely - think "socket
384 root 1.2 only" on windows).
385    
386 root 1.10 However, I/O is not everything - the second most important event source is
387 root 1.2 the clock. For example when doing an HTTP request you might want to time
388 root 1.10 out when the server doesn't answer within some predefined amount of time.
389 root 1.2
390     In AnyEvent, timer event watchers are created by calling the C<<
391     AnyEvent->timer >> method:
392    
393     use AnyEvent;
394    
395     my $cv = AnyEvent->condvar;
396    
397     my $wait_one_and_a_half_seconds = AnyEvent->timer (
398     after => 1.5, # after how many seconds to invoke the cb?
399     cb => sub { # the callback to invoke
400     $cv->send;
401     },
402     );
403    
404 root 1.10 # can do something else here
405 root 1.2
406     # now wait till our time has come
407     $cv->recv;
408    
409     Unlike I/O watchers, timers are only interested in the amount of seconds
410     they have to wait. When that amount of time has passed, AnyEvent will
411     invoke your callback.
412    
413     Unlike I/O watchers, which will call your callback as many times as there
414     is data available, timers are one-shot: after they have "fired" once and
415     invoked your callback, they are dead and no longer do anything.
416    
417     To get a repeating timer, such as a timer firing roughly once per second,
418     you have to recreate it:
419    
420     use AnyEvent;
421    
422     my $time_watcher;
423    
424     sub once_per_second {
425     print "tick\n";
426    
427     # (re-)create the watcher
428     $time_watcher = AnyEvent->timer (
429     after => 1,
430     cb => \&once_per_second,
431     );
432     }
433    
434     # now start the timer
435     once_per_second;
436    
437     Having to recreate your timer is a restriction put on AnyEvent that is
438     present in most event libraries it uses. It is so annoying that some
439 root 1.10 future version might work around this limitation, but right now, it's the
440 root 1.2 only way to do repeating timers.
441    
442     Fortunately most timers aren't really repeating but specify timeouts of
443     some sort.
444    
445     =head3 More esoteric sources
446    
447     AnyEvent also has some other, more esoteric event sources you can tap
448     into: signal and child watchers.
449    
450     Signal watchers can be used to wait for "signal events", which simply
451 root 1.7 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
452 root 1.2
453     Process watchers wait for a child process to exit. They are useful when
454 root 1.10 you fork a separate process and need to know when it exits, but you do not
455 root 1.2 wait for that by blocking.
456    
457     Both watcher types are described in detail in the main L<AnyEvent> manual
458     page.
459    
460    
461     =head1 Network programming and AnyEvent
462    
463 root 1.3 So far you have seen how to register event watchers and handle events.
464 root 1.1
465 root 1.5 This is a great foundation to write network clients and servers, and might be
466 root 1.3 all that your module (or program) ever requires, but writing your own I/O
467     buffering again and again becomes tedious, not to mention that it attracts
468     errors.
469    
470     While the core L<AnyEvent> module is still small and self-contained,
471     the distribution comes with some very useful utility modules such as
472     L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
473     make your life as non-blocking network programmer a lot easier.
474    
475 root 1.4 Here is a quick overview over these three modules:
476    
477     =head2 L<AnyEvent::DNS>
478    
479     This module allows fully asynchronous DNS resolution. It is used mainly by
480     L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
481     a great way to do other DNS resolution tasks, such as reverse lookups of
482     IP addresses for log files.
483 root 1.1
484 root 1.2 =head2 L<AnyEvent::Handle>
485 root 1.1
486     This module handles non-blocking IO on file handles in an event based
487     manner. It provides a wrapper object around your file handle that provides
488     queueing and buffering of incoming and outgoing data for you.
489    
490 root 1.4 It also implements the most common data formats, such as text lines, or
491     fixed and variable-width data blocks.
492 root 1.1
493 root 1.2 =head2 L<AnyEvent::Socket>
494 root 1.1
495     This module provides you with functions that handle socket creation
496     and IP address magic. The two main functions are C<tcp_connect> and
497     C<tcp_server>. The former will connect a (streaming) socket to an internet
498     host for you and the later will make a server socket for you, to accept
499     connections.
500    
501     This module also comes with transparent IPv6 support, this means: If you
502     write your programs with this module, you will be IPv6 ready without doing
503 root 1.4 anything special.
504 root 1.1
505     It also works around a lot of portability quirks (especially on the
506     windows platform), which makes it even easier to write your programs in a
507 root 1.4 portable way (did you know that windows uses different error codes for all
508     socket functions and that Perl does not know about these? That "Unknown
509     error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
510     successful? That unsuccessful TCP connects might never be reported back
511     to your program? That C<WSAEINPROGRESS> means your C<connect> call was
512     ignored instead of being in progress? AnyEvent::Socket works around all of
513     these Windows/Perl bugs for you).
514    
515 root 1.11 =head2 Implementing a parallel finger client with non-blocking connects
516 root 1.16 and AnyEvent::Socket
517 root 1.4
518     The finger protocol is one of the simplest protocols in use on the
519     internet. Or in use in the past, as almost nobody uses it anymore.
520    
521     It works by connecting to the finger port on another host, writing a
522     single line with a user name and then reading the finger response, as
523     specified by that user. OK, RFC 1288 specifies a vastly more complex
524     protocol, but it basically boils down to this:
525    
526     # telnet idsoftware.com finger
527     Trying 192.246.40.37...
528     Connected to idsoftware.com (192.246.40.37).
529     Escape character is '^]'.
530     johnc
531     Welcome to id Software's Finger Service V1.5!
532    
533     [...]
534     Now on the web:
535     [...]
536    
537     Connection closed by foreign host.
538    
539 root 1.11 "Now on the web..." yeah, I<was> used indeed, but at least the finger
540     daemon still works, so let's write a little AnyEvent function that makes a
541     finger request:
542 root 1.4
543     use AnyEvent;
544     use AnyEvent::Socket;
545    
546     sub finger($$) {
547     my ($user, $host) = @_;
548    
549     # use a condvar to return results
550     my $cv = AnyEvent->condvar;
551    
552     # first, connect to the host
553     tcp_connect $host, "finger", sub {
554 root 1.8 # the callback receives the socket handle - or nothing
555 root 1.4 my ($fh) = @_
556     or return $cv->send;
557    
558     # now write the username
559     syswrite $fh, "$user\015\012";
560    
561     my $response;
562    
563     # register a read watcher
564     my $read_watcher; $read_watcher = AnyEvent->io (
565     fh => $fh,
566     poll => "r",
567     cb => sub {
568     my $len = sysread $fh, $response, 1024, length $response;
569    
570     if ($len <= 0) {
571     # we are done, or an error occured, lets ignore the latter
572     undef $read_watcher; # no longer interested
573     $cv->send ($response); # send results
574     }
575     },
576     );
577     };
578    
579     # pass $cv to the caller
580     $cv
581     }
582    
583 root 1.11 That's a mouthful! Let's dissect this function a bit, first the overall
584     function and execution flow:
585 root 1.4
586     sub finger($$) {
587     my ($user, $host) = @_;
588    
589     # use a condvar to return results
590     my $cv = AnyEvent->condvar;
591    
592     # first, connect to the host
593     tcp_connect $host, "finger", sub {
594     ...
595     };
596    
597     $cv
598     }
599    
600 root 1.11 This isn't too complicated, just a function with two parameters, that
601 root 1.4 creates a condition variable, returns it, and while it does that,
602 root 1.11 initiates a TCP connect to C<$host>. The condition variable will be used
603     by the caller to receive the finger response, but one could equally well
604     pass a third argument, a callback, to the function.
605 root 1.4
606 root 1.11 Since we are programming event'ish, we do not wait for the connect to
607     finish - it could block the program for a minute or longer!
608    
609     Instead, we pass the callback it should invoke when the connect is done to
610     C<tcp_connect>. If it is successful, that callback gets called with the
611 root 1.5 socket handle as first argument, otherwise, nothing will be passed to our
612 root 1.11 callback. The important point is that it will always be called as soon as
613     the outcome of the TCP connect is known.
614    
615     This style of programming is also called "continuation style": the
616     "continuation" is simply the way the program continues - normally, a
617     program continues at the next line after some statement (the exception
618     is loops or things like C<return>). When we are interested in events,
619     however, we instead specify the "continuation" of our program by passing a
620     closure, which makes that closure the "continuation" of the program. The
621     C<tcp_connect> call is like saying "return now, and when the connection is
622     established or it failed, continue there".
623 root 1.4
624 root 1.11 Now let's look at the callback/closure in more detail:
625 root 1.4
626 root 1.11 # the callback receives the socket handle - or nothing
627 root 1.4 my ($fh) = @_
628     or return $cv->send;
629    
630 root 1.5 The first thing the callback does is indeed save the socket handle in
631     C<$fh>. When there was an error (no arguments), then our instinct as
632 root 1.11 expert Perl programmers would tell us to C<die>:
633 root 1.4
634     my ($fh) = @_
635     or die "$host: $!";
636 root 1.1
637 root 1.11 While this would give good feedback to the user (if he happens to watch
638     standard error), our program would probably stop working here, as we never
639     report the results to anybody, certainly not the caller of our C<finger>
640     function, and most event loops continue even after a C<die>!
641    
642     This is why we instead C<return>, but also call C<< $cv->send >> without
643     any arguments to signal to the condvar consumer that something bad has
644     happened. The return value of C<< $cv->send >> is irrelevant, as is the
645     return value of our callback. The return statement is simply used for the
646     side effect of, well, returning immediately from the callback. Checking
647     for errors and handling them this way is very common, which is why this
648     compact idiom is so handy.
649 root 1.4
650     As the next step in the finger protocol, we send the username to the
651     finger daemon on the other side of our connection:
652    
653     syswrite $fh, "$user\015\012";
654    
655 root 1.11 Note that this isn't 100% clean socket programming - the socket could,
656     for whatever reasons, not accept our data. When writing a small amount
657     of data like in this example it doesn't matter, as a socket buffer is
658     almost always big enough for a mere "username", but for real-world
659     cases you might need to implement some kind of write buffering - or use
660     L<AnyEvent::Handle>, which handles these matters for you, as shown in the
661     next section.
662 root 1.4
663 root 1.11 What we I<do> have to do is to implement our own read buffer - the response
664 root 1.4 data could arrive late or in multiple chunks, and we cannot just wait for
665     it (event-based programming, you know?).
666    
667     To do that, we register a read watcher on the socket which waits for data:
668    
669     my $read_watcher; $read_watcher = AnyEvent->io (
670     fh => $fh,
671     poll => "r",
672    
673     There is a trick here, however: the read watcher isn't stored in a global
674     variable, but in a local one - if the callback returns, it would normally
675     destroy the variable and its contents, which would in turn unregister our
676 root 1.5 watcher.
677 root 1.4
678     To avoid that, we C<undef>ine the variable in the watcher callback. This
679 root 1.5 means that, when the C<tcp_connect> callback returns, that perl thinks
680 root 1.4 (quite correctly) that the read watcher is still in use - namely in the
681     callback.
682    
683 root 1.11 The trick, however, is that instead of:
684    
685     my $read_watcher = AnyEvent->io (...
686    
687     The program does:
688    
689     my $read_watcher; $read_watcher = AnyEvent->io (...
690    
691     The reason for this is a quirk in the way Perl works: variable names
692     declared with C<my> are only visible in the I<next> statement. If the
693     whole C<< AnyEvent->io >> call, including the callback, would be done in
694     a single statement, the callback could not refer to the C<$read_watcher>
695     variable to undefine it, so it is done in two statements.
696    
697     Whether you'd want to format it like this is of course a matter of style,
698     this way emphasizes that the declaration and assignment really are one
699     logical statement.
700    
701 root 1.4 The callback itself calls C<sysread> for as many times as necessary, until
702 root 1.11 C<sysread> returns either an error or end-of-file:
703 root 1.4
704     cb => sub {
705     my $len = sysread $fh, $response, 1024, length $response;
706    
707     if ($len <= 0) {
708    
709     Note that C<sysread> has the ability to append data it reads to a scalar,
710 root 1.11 by specifying an offset, which is what we make good use of in this
711     example.
712 root 1.4
713     When C<sysread> indicates we are done, the callback C<undef>ines
714     the watcher and then C<send>'s the response data to the condition
715     variable. All this has the following effects:
716    
717     Undefining the watcher destroys it, as our callback was the only one still
718     having a reference to it. When the watcher gets destroyed, it destroys the
719     callback, which in turn means the C<$fh> handle is no longer used, so that
720     gets destroyed as well. The result is that all resources will be nicely
721     cleaned up by perl for us.
722    
723     =head3 Using the finger client
724    
725 root 1.5 Now, we could probably write the same finger client in a simpler way if
726     we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
727     ignored IPv6 and a few other things that C<tcp_connect> handles for us.
728 root 1.4
729     But the main advantage is that we can not only run this finger function in
730     the background, we even can run multiple sessions in parallel, like this:
731    
732     my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
733     my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
734     my $f3 = finger "johnc", "idsoftware.com"; # finger john
735    
736     print "trouble tickets:\n", $f1->recv, "\n";
737     print "trouble ticket #1736:\n", $f2->recv, "\n";
738     print "john carmacks finger file: ", $f3->recv, "\n";
739    
740     It doesn't look like it, but in fact all three requests run in
741 root 1.9 parallel. The code waits for the first finger request to finish first, but
742 root 1.11 that doesn't keep it from executing them parallel: when the first C<recv>
743     call sees that the data isn't ready yet, it serves events for all three
744     requests automatically, until the first request has finished.
745    
746     The second C<recv> call might either find the data is already there, or it
747     will continue handling events until that is the case, and so on.
748 root 1.9
749     By taking advantage of network latencies, which allows us to serve other
750     requests and events while we wait for an event on one socket, the overall
751 root 1.11 time to do these three requests will be greatly reduced, typically all
752     three are done in the same time as the slowest of them would need to finish.
753 root 1.5
754     By the way, you do not actually have to wait in the C<recv> method on an
755 root 1.11 AnyEvent condition variable - after all, waiting is evil - you can also
756     register a callback:
757 root 1.5
758     $cv->cb (sub {
759     my $response = shift->recv;
760     # ...
761     });
762    
763     The callback will only be invoked when C<send> was called. In fact,
764     instead of returning a condition variable you could also pass a third
765     parameter to your finger function, the callback to invoke with the
766     response:
767    
768     sub finger($$$) {
769     my ($user, $host, $cb) = @_;
770    
771 root 1.11 How you implement it is a matter of taste - if you expect your function to
772     be used mainly in an event-based program you would normally prefer to pass
773     a callback directly. If you write a module and expect your users to use
774     it "synchronously" often (for example, a simple http-get script would not
775     really care much for events), then you would use a condition variable and
776     tell them "simply ->recv the data".
777 root 1.4
778 root 1.11 =head3 Problems with the implementation and how to fix them
779 root 1.4
780     To make this example more real-world-ready, we would not only implement
781     some write buffering (for the paranoid), but we would also have to handle
782     timeouts and maybe protocol errors.
783    
784 root 1.11 Doing this quickly gets unwieldy, which is why we introduce
785     L<AnyEvent::Handle> in the next section, which takes care of all these
786     details for you and let's you concentrate on the actual protocol.
787    
788    
789     =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
790    
791     The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
792     see what it really offers.
793    
794     As finger is such a simple protocol, let's try something slightly more
795     complicated: HTTP/1.0.
796    
797     An HTTP GET request works by sending a single request line that indicates
798     what you want the server to do and the URI you want to act it on, followed
799     by as many "header" lines (C<Header: data>, same as e-mail headers) as
800     required for the request, ended by an empty line.
801    
802     The response is formatted very similarly, first a line with the response
803     status, then again as many header lines as required, then an empty line,
804     followed by any data that the server might send.
805    
806     Again, let's try it out with C<telnet> (I condensed the output a bit - if
807     you want to see the full response, do it yourself).
808    
809     # telnet www.google.com 80
810     Trying 209.85.135.99...
811     Connected to www.google.com (209.85.135.99).
812     Escape character is '^]'.
813     GET /test HTTP/1.0
814    
815     HTTP/1.0 404 Not Found
816     Date: Mon, 02 Jun 2008 07:05:54 GMT
817     Content-Type: text/html; charset=UTF-8
818    
819     <html><head>
820     [...]
821     Connection closed by foreign host.
822    
823     The C<GET ...> and the empty line were entered manually, the rest of the
824     telnet output is google's response, in which case a C<404 not found> one.
825    
826     So, here is how you would do it with C<AnyEvent::Handle>:
827    
828 root 1.12 sub http_get {
829     my ($host, $uri, $cb) = @_;
830    
831 root 1.13 tcp_connect $host, "http", sub {
832 root 1.12 my ($fh) = @_
833     or $cb->("HTTP/1.0 500 $!");
834    
835     # store results here
836     my ($response, $header, $body);
837    
838     my $handle; $handle = new AnyEvent::Handle
839     fh => $fh,
840     on_error => sub {
841     undef $handle;
842     $cb->("HTTP/1.0 500 $!");
843     },
844     on_eof => sub {
845     undef $handle; # keep it alive till eof
846     $cb->($response, $header, $body);
847     };
848    
849     $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
850    
851     # now fetch response status line
852     $handle->push_read (line => sub {
853     my ($handle, $line) = @_;
854     $response = $line;
855     });
856    
857     # then the headers
858     $handle->push_read (line => "\015\012\015\012", sub {
859     my ($handle, $line) = @_;
860     $header = $line;
861     });
862    
863     # and finally handle any remaining data as body
864     $handle->on_read (sub {
865     $body .= $_[0]->rbuf;
866     $_[0]->rbuf = "";
867     });
868     };
869     }
870 root 1.11
871     And now let's go through it step by step. First, as usual, the overall
872     C<http_get> function structure:
873    
874     sub http_get {
875     my ($host, $uri, $cb) = @_;
876    
877     tcp_connect $host, "http", sub {
878     ...
879     };
880     }
881    
882     Unlike in the finger example, this time the caller has to pass a callback
883     to C<http_get>. Also, instead of passing a URL as one would expect, the
884     caller has to provide the hostname and URI - normally you would use the
885     C<URI> module to parse a URL and separate it into those parts, but that is
886     left to the inspired reader :)
887    
888     Since everything else is left to the caller, all C<http_get> does it to
889     initiate the connection with C<tcp_connect> and leave everything else to
890     it's callback.
891    
892     The first thing the callback does is check for connection errors and
893     declare some variables:
894    
895     my ($fh) = @_
896     or $cb->("HTTP/1.0 500 $!");
897    
898     my ($response, $header, $body);
899    
900     Instead of having an extra mechanism to signal errors, connection errors
901     are signalled by crafting a special "response status line", like this:
902    
903     HTTP/1.0 500 Connection refused
904    
905     This means the caller cannot distinguish (easily) between
906     locally-generated errors and server errors, but it simplifies error
907     handling for the caller a lot.
908    
909     The next step finally involves L<AnyEvent::Handle>, namely it creates the
910     handle object:
911    
912     my $handle; $handle = new AnyEvent::Handle
913     fh => $fh,
914     on_error => sub {
915     undef $handle;
916     $cb->("HTTP/1.0 500 $!");
917     },
918     on_eof => sub {
919     undef $handle; # keep it alive till eof
920     $cb->($response, $header, $body);
921     };
922    
923     The constructor expects a file handle, which gets passed via the C<fh>
924     argument.
925    
926     The remaining two argument pairs specify two callbacks to be called on
927     any errors (C<on_error>) and in the case of a normal connection close
928     (C<on_eof>).
929    
930     In the first case, we C<undef>ine the handle object and pass the error to
931     the callback provided by the callback - done.
932    
933     In the second case we assume everything went fine and pass the results
934     gobbled up so far to the caller-provided callback. This is not quite
935     perfect, as when the server "cleanly" closes the connection in the middle
936     of sending headers we might wrongly report this as an "OK" to the caller,
937     but then, HTTP doesn't support a perfect mechanism that would detect such
938     problems in all cases, so we don't bother either.
939    
940     =head3 The write queue
941    
942     The next line sends the actual request:
943    
944     $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
945    
946     No headers will be sent (this is fine for simple requests), so the whole
947     request is just a single line followed by an empty line to signal the end
948     of the headers to the server.
949    
950     The more interesting question is why the method is called C<push_write>
951     and not just write. The reason is that you can I<always> add some write
952     data without blocking, and to do this, AnyEvent::Handle needs some write
953     queue internally - and C<push_write> simply pushes some data at the end of
954     that queue, just like Perl's C<push> pushes data at the end of an array.
955    
956     The deeper reason is that at some point in the future, there might
957     be C<unshift_write> as well, and in any case, we will shortly meet
958     C<push_read> and C<unshift_read>, and it's usually easiest if all those
959     functions have some symmetry in their name.
960    
961     If C<push_write> is called with more than one argument, then you can even
962     do I<formatted> I/O, which simply means your data will be transformed in
963     some ways. For example, this would JSON-encode your data before pushing it
964     to the write queue:
965    
966     $handle->push_write (json => [1, 2, 3]);
967    
968     Apart from that, this pretty much summarises the write queue, there is
969     little else to it.
970    
971     Reading the response if far more interesting:
972    
973     =head3 The read queue
974    
975 root 1.12 The response consists of three parts: a single line of response status, a
976 root 1.11 single paragraph of headers ended by an empty line, and the request body,
977     which is simply the remaining data on that connection.
978    
979     For the first two, we push two read requests onto the read queue:
980    
981     # now fetch response status line
982     $handle->push_read (line => sub {
983     my ($handle, $line) = @_;
984     $response = $line;
985     });
986    
987     # then the headers
988     $handle->push_read (line => "\015\012\015\012", sub {
989     my ($handle, $line) = @_;
990     $header = $line;
991     });
992    
993     While one can simply push a single callback to the queue, I<formatted> I/O
994     really comes to out advantage here, as there is a ready-made "read line"
995     read type. The first read expects a single line, ended by C<\015\012> (the
996     standard end-of-line marker in internet protocols).
997    
998     The second "line" is actually a single paragraph - instead of reading it
999     line by line we tell C<push_read> that the end-of-line marker is really
1000     C<\015\012\015\012>, which is an empty line. The result is that the whole
1001     header paragraph will be treated as a single line and read. The word
1002     "line" is interpreted very freely, much like Perl itself does it.
1003    
1004     Note that push read requests are pushed immediately after creating the
1005     handle object - since AnyEvent::Handle provides a queue we can push as
1006     many requests as we want, and AnyEvent::Handle will handle them in order.
1007    
1008     There is, however, no read type for "the remaining data". For that, we
1009     install our own C<on_read> callback:
1010    
1011     # and finally handle any remaining data as body
1012     $handle->on_read (sub {
1013     $body .= $_[0]->rbuf;
1014     $_[0]->rbuf = "";
1015     });
1016    
1017     This callback is invoked every time data arrives and the read queue is
1018     empty - which in this example will only be the case when both response and
1019 root 1.12 header have been read. The C<on_read> callback could actually have been
1020     specified when constructing the object, but doing it this way preserves
1021     logical ordering.
1022 root 1.1
1023 root 1.12 The read callback simply adds the current read buffer to it's C<$body>
1024     variable and, most importantly, I<empties> it by assign the empty string
1025     to it.
1026 root 1.1
1027 root 1.12 After AnyEvent::Handle has been so instructed, it will now handle incoming
1028     data according to these instructions - if all goes well, the callback will
1029     be invoked with the response data, if not, it will get an error.
1030 root 1.1
1031 root 1.13 In general, you get pipelining very easy with AnyEvent::Handle: If
1032     you have a protocol with a request/response structure, your request
1033     methods/functions will all look like this (simplified):
1034    
1035     sub request {
1036    
1037     # send the request to the server
1038     $handle->push_write (...);
1039    
1040     # push some response handlers
1041     $handle->push_read (...);
1042     }
1043    
1044 root 1.12 =head3 Using it
1045 root 1.1
1046 root 1.12 And here is how you would use it:
1047 root 1.1
1048 root 1.12 http_get "www.google.com", "/", sub {
1049     my ($response, $header, $body) = @_;
1050 root 1.1
1051 root 1.12 print
1052     $response, "\n",
1053     $body;
1054     };
1055 root 1.1
1056 root 1.12 And of course, you can run as many of these requests in parallel as you
1057     want (and your memory supports).
1058 root 1.1
1059 root 1.13 =head3 HTTPS
1060    
1061     Now, as promised, let's implement the same thing for HTTPS, or more
1062     correctly, let's change our C<http_get> function into a function that
1063     speaks HTTPS instead.
1064    
1065     HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1066     B<S>ecurity is the official name for what most people refer to as C<SSL>)
1067     that contains standard HTTP protocol exchanges. The other difference to
1068     HTTP is that it uses port C<443> instead of port C<80>.
1069    
1070     To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call
1071     we replace C<http> by C<https>):
1072    
1073     tcp_connect $host, "https", sub { ...
1074    
1075     The other change deals with TLS, which is something L<AnyEvent::Handle>
1076     does for us, as long as I<you> made sure that the L<Net::SSLeay> module is
1077     around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition
1078     C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1079    
1080     tls => "connect",
1081    
1082     Specifying C<tls> enables TLS, and the argument specifies whether
1083     AnyEvent::Handle is the server side ("accept") or the client side
1084     ("connect") for the TLS connection, as unlike TCP, there is a clear
1085     server/client relationship in TLS.
1086    
1087 root 1.14 That's all.
1088    
1089     Of course, all this should be handled transparently by C<http_get> after
1090     parsing the URL. See the part about exercising your inspiration earlier in
1091     this document.
1092 root 1.13
1093 root 1.12 =head3 The read queue - revisited
1094 root 1.1
1095 root 1.13 HTTP always uses the same structure in its responses, but many protocols
1096     require parsing responses different depending on the response itself.
1097    
1098     For example, in SMTP, you normally get a single response line:
1099    
1100     220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1101    
1102     But SMTP also supports multi-line responses:
1103    
1104     220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1105     220-hey guys
1106     220 my response is longer than yours
1107    
1108     To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1109     C<unshift_read> will not append your read request tot he end of the read
1110     queue, but instead it will prepend it to the queue.
1111    
1112     This is useful for this this situation: You push your response-line read
1113     request when sending the SMTP command, and when handling it, you look at
1114     the line to see if more is to come, and C<unshift_read> another reader,
1115     like this:
1116    
1117     my $response; # response lines end up in here
1118    
1119     my $read_response; $read_response = sub {
1120     my ($handle, $line) = @_;
1121    
1122     $response .= "$line\n";
1123    
1124     # check for continuation lines ("-" as 4th character")
1125     if ($line =~ /^...-/) {
1126     # if yes, then unshift another line read
1127     $handle->unshift_read (line => $read_response);
1128    
1129     } else {
1130     # otherwise we are done
1131    
1132     # free callback
1133     undef $read_response;
1134    
1135     print "we are don reading: $response\n";
1136     }
1137     };
1138    
1139     $handle->push_read (line => $read_response);
1140 root 1.1
1141 root 1.13 This recipe can be used for all similar parsing problems, for example in
1142     NNTP, the response code to some commands indicates that more data will be
1143 root 1.14 sent:
1144    
1145     $handle->push_write ("article 42");
1146    
1147     # read response line
1148     $handle->push_read (line => sub {
1149     my ($handle, $status) = @_;
1150    
1151     # article data following?
1152     if ($status =~ /^2/) {
1153     # yes, read article body
1154    
1155     $handle->unshift_read (line => "\012.\015\012", sub {
1156     my ($handle, $body) = @_;
1157    
1158     $finish->($status, $body);
1159     });
1160    
1161     } else {
1162     # some error occured, no article data
1163    
1164     $finish->($status);
1165     }
1166     }
1167    
1168     =head3 Your own read queue handler
1169    
1170     Sometimes, your protocol doesn't play nice and uses lines or chunks of
1171     data, in which case you have to implement your own read parser.
1172    
1173     To make up a contorted example, imagine you are looking for an even
1174     number of characters followed by a colon (":"). Also imagine that
1175     AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1176     had to do it manually.
1177    
1178     To implement this, you would C<push_read> (or C<unshift_read>) just a
1179     single code reference.
1180    
1181     This code reference will then be called each time there is (new) data
1182     available in the read buffer, and is expected to either eat/consume some
1183     of that data (and return true) or to return false to indicate that it
1184     wants to be called again.
1185    
1186     If the code reference returns true, then it will be removed from the read
1187     queue, otherwise it stays in front of it.
1188    
1189     The example above could be coded like this:
1190    
1191     $handle->push_read (sub {
1192     my ($handle) = @_;
1193    
1194     # check for even number of characters + ":"
1195     # and remove the data if a match is found.
1196     # if not, return false (actually nothing)
1197    
1198     $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1199     or return;
1200    
1201     # we got some data in $1, pass it to whoever wants it
1202     $finish->($1);
1203    
1204     # and return true to indicate we are done
1205     1
1206     });
1207    
1208 root 1.1
1209 root 1.15 =head1 Authors
1210 root 1.6
1211     Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1212