ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.25
Committed: Sat Jul 25 02:46:37 2009 UTC (14 years, 11 months ago) by root
Branch: MAIN
CVS Tags: rel-4_91, rel-5_112, rel-5_21, rel-5_22, rel-5_23, rel-5_1, rel-5_0, rel-5_2, rel-5_201, rel-5_202, rel-5_111, rel-4_881, rel-4_9, rel-5_01, rel-4_88, rel-5_11, rel-5_12
Changes since 1.24: +45 -39 lines
Log Message:
yeah

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Intro - an introductory tutorial to AnyEvent
4
5 =head1 Introduction to AnyEvent
6
7 This is a tutorial that will introduce you to the features of AnyEvent.
8
9 The first part introduces the core AnyEvent module (after swamping you a
10 bit in evangelism), which might already provide all you ever need: If you
11 are only interested in AnyEvent's event handling capabilities, read no
12 further.
13
14 The second part focuses on network programming using sockets, for which
15 AnyEvent offers a lot of support you can use, and a lot of workarounds
16 around portability quirks.
17
18
19 =head1 What is AnyEvent?
20
21 If you don't care for the whys and want to see code, skip this section!
22
23 AnyEvent is first of all just a framework to do event-based
24 programming. Typically such frameworks are an all-or-nothing thing: If you
25 use one such framework, you can't (easily, or even at all) use another in
26 the same program.
27
28 AnyEvent is different - it is a thin abstraction layer on top of other of
29 event loops, just like DBI is an abstraction of many different database
30 APIs. Its main purpose is to move the choice of the underlying framework
31 (the event loop) from the module author to the program author using the
32 module.
33
34 That means you can write code that uses events to control what it
35 does, without forcing other code in the same program to use the same
36 underlying framework as you do - i.e. you can create a Perl module
37 that is event-based using AnyEvent, and users of that module can still
38 choose between using L<Gtk2>, L<Tk>, L<Event> (or run inside Irssi or
39 rxvt-unicode) or any other supported event loop. AnyEvent even comes with
40 its own pure-perl event loop implementation, so your code works regardless
41 of other modules that might or might not be installed. The latter is
42 important, as AnyEvent does not have any hard dependencies to other
43 modules, which makes it easy to install, for example, when you lack a C
44 compiler. No mater what environment, AnyEvent will just cope with it.
45
46 A typical limitation of existing Perl modules such as L<Net::IRC> is that
47 they come with their own event loop: In L<Net::IRC>, the program who uses
48 it needs to start the event loop of L<Net::IRC>. That means that one
49 cannot integrate this module into a L<Gtk2> GUI for instance, as that
50 module, too, enforces the use of its own event loop (namely L<Glib>).
51
52 Another example is L<LWP>: it provides no event interface at all. It's
53 a pure blocking HTTP (and FTP etc.) client library, which usually means
54 that you either have to start another process or have to fork for a HTTP
55 request, or use threads (e.g. L<Coro::LWP>), if you want to do something
56 else while waiting for the request to finish.
57
58 The motivation behind these designs is often that a module doesn't want
59 to depend on some complicated XS-module (Net::IRC), or that it doesn't
60 want to force the user to use some specific event loop at all (LWP), out
61 of fear of severly limiting the usefulness of the module: If your module
62 requires Glib, it will not run in a Tk program.
63
64 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to
65 either:
66
67 =over 4
68
69 =item - write their own event loop (because it guarantees the availability
70 of an event loop everywhere - even on windows with no extra modules
71 installed).
72
73 =item - choose one specific event loop (because AnyEvent works with most
74 event loops available for Perl).
75
76 =back
77
78 If the module author uses L<AnyEvent> for all his (or her) event needs
79 (IO events, timers, signals, ...) then all other modules can just use
80 his module and don't have to choose an event loop or adapt to his event
81 loop. The choice of the event loop is ultimately made by the program
82 author who uses all the modules and writes the main program. And even
83 there he doesn't have to choose, he can just let L<AnyEvent> choose the
84 most efficient event loop available on the system.
85
86 Read more about this in the main documentation of the L<AnyEvent> module.
87
88
89 =head1 Introduction to Event-Based Programming
90
91 So what exactly is programming using events? It quite simply means that
92 instead of your code actively waiting for something, such as the user
93 entering something on STDIN:
94
95 $| = 1; print "enter your name> ";
96
97 my $name = <STDIN>;
98
99 You instead tell your event framework to notify you in the event of some
100 data being available on STDIN, by using a callback mechanism:
101
102 use AnyEvent;
103
104 $| = 1; print "enter your name> ";
105
106 my $name;
107
108 my $wait_for_input = AnyEvent->io (
109 fh => \*STDIN, # which file handle to check
110 poll => "r", # which event to wait for ("r"ead data)
111 cb => sub { # what callback to execute
112 $name = <STDIN>; # read it
113 }
114 );
115
116 # do something else here
117
118 Looks more complicated, and surely is, but the advantage of using events
119 is that your program can do something else instead of waiting for input
120 (side note: combining AnyEvent with a thread package such as Coro can
121 recoup much of the simplicity, effectively getting the best of two
122 worlds).
123
124 Waiting as done in the first example is also called "blocking" the process
125 because you "block"/keep your process from executing anything else while
126 you do so.
127
128 The second example avoids blocking by only registering interest in a read
129 event, which is fast and doesn't block your process. Only when read data
130 is available will the callback be called, which can then proceed to read
131 the data.
132
133 The "interest" is represented by an object returned by C<< AnyEvent->io
134 >> called a "watcher" object - called like that because it "watches" your
135 file handle (or other event sources) for the event you are interested in.
136
137 In the example above, we create an I/O watcher by calling the C<<
138 AnyEvent->io >> method. Disinterest in some event is simply expressed
139 by forgetting about the watcher, for example, by C<undef>'ing the only
140 variable it is stored in. AnyEvent will automatically clean up the watcher
141 if it is no longer used, much like Perl closes your file handles if you no
142 longer use them anywhere.
143
144 =head3 A short note on callbacks
145
146 A common issue that hits people is the problem of passing parameters
147 to callbacks. Programmers used to languages such as C or C++ are often
148 used to a style where one passes the address of a function (a function
149 reference) and some data value, e.g.:
150
151 sub callback {
152 my ($arg) = @_;
153
154 $arg->method;
155 }
156
157 my $arg = ...;
158
159 call_me_back_later \&callback, $arg;
160
161 This is clumsy, as the place where behaviour is specified (when the
162 callback is registered) is often far away from the place where behaviour
163 is implemented. It also doesn't use Perl syntax to invoke the code. There
164 is also an abstraction penalty to pay as one has to I<name> the callback,
165 which often is unnecessary and leads to nonsensical or duplicated names.
166
167 In Perl, one can specify behaviour much more directly by using
168 I<closures>. Closures are code blocks that take a reference to the
169 enclosing scope(s) when they are created. This means lexical variables in
170 scope at the time of creating the closure can simply be used inside the
171 closure:
172
173 my $arg = ...;
174
175 call_me_back_later sub { $arg->method };
176
177 Under most circumstances, closures are faster, use fewer resources and
178 result in much clearer code then the traditional approach. Faster,
179 because parameter passing and storing them in local variables in Perl
180 is relatively slow. Fewer resources, because closures take references
181 to existing variables without having to create new ones, and clearer
182 code because it is immediately obvious that the second example calls the
183 C<method> method when the callback is invoked.
184
185 Apart from these, the strongest argument for using closures with AnyEvent
186 is that AnyEvent does not allow passing parameters to the callback, so
187 closures are the only way to achieve that in most cases :->
188
189
190 =head3 A hint on debugging
191
192 AnyEvent does, by default, not do any argument checking. This can lead to
193 strange and unexpected results especially if you are trying to learn your
194 ways with AnyEvent.
195
196 AnyEvent supports a special "strict" mode - off by default - which does very
197 strict argument checking, at the expense of being somewhat slower. During
198 development, however, this mode is very useful.
199
200 You can enable this strict mode either by having an environment variable
201 C<PERL_ANYEVENT_STRICT> with a true value in your environment:
202
203 PERL_ANYEVENT_STRICT=1 perl test.pl
204
205 Or you can write C<use AnyEvent::Strict> in your program, which has the
206 same effect (do not do this in production, however).
207
208
209 =head2 Condition Variables
210
211 Back to the I/O watcher example: The code is not yet a fully working
212 program, and will not work as-is. The reason is that your callback will
213 not be invoked out of the blue, you have to run the event loop. Also,
214 event-based programs sometimes have to block, too, as when there simply is
215 nothing else to do and everything waits for some events, it needs to block
216 the process as well until new events arrive.
217
218 In AnyEvent, this is done using condition variables. Condition variables
219 are named "condition variables" because they represent a condition that is
220 initially false and needs to be fulfilled.
221
222 You can also call them "merge points", "sync points", "rendezvous ports"
223 or even callbacks and many other things (and they are often called like
224 this in other frameworks). The important point is that you can create them
225 freely and later wait for them to become true.
226
227 Condition variables have two sides - one side is the "producer" of the
228 condition (whatever code detects and flags the condition), the other side
229 is the "consumer" (the code that waits for that condition).
230
231 In our example in the previous section, the producer is the event callback
232 and there is no consumer yet - let's change that right now:
233
234 use AnyEvent;
235
236 $| = 1; print "enter your name> ";
237
238 my $name;
239
240 my $name_ready = AnyEvent->condvar;
241
242 my $wait_for_input = AnyEvent->io (
243 fh => \*STDIN,
244 poll => "r",
245 cb => sub {
246 $name = <STDIN>;
247 $name_ready->send;
248 }
249 );
250
251 # do something else here
252
253 # now wait until the name is available:
254 $name_ready->recv;
255
256 undef $wait_for_input; # watche rno longer needed
257
258 print "your name is $name\n";
259
260 This program creates an AnyEvent condvar by calling the C<<
261 AnyEvent->condvar >> method. It then creates a watcher as usual, but
262 inside the callback it C<send>'s the C<$name_ready> condition variable,
263 which causes whoever is waiting on it to continue.
264
265 The "whoever" in this case is the code that follows, which calls C<<
266 $name_ready->recv >>: The producer calls C<send>, the consumer calls
267 C<recv>.
268
269 If there is no C<$name> available yet, then the call to C<<
270 $name_ready->recv >> will halt your program until the condition becomes
271 true.
272
273 As the names C<send> and C<recv> imply, you can actually send and receive
274 data using this, for example, the above code could also be written like
275 this, without an extra variable to store the name in:
276
277 use AnyEvent;
278
279 $| = 1; print "enter your name> ";
280
281 my $name_ready = AnyEvent->condvar;
282
283 my $wait_for_input = AnyEvent->io (
284 fh => \*STDIN, poll => "r",
285 cb => sub { $name_ready->send (scalar <STDIN>) }
286 );
287
288 # do something else here
289
290 # now wait and fetch the name
291 my $name = $name_ready->recv;
292
293 undef $wait_for_input; # watche rno longer needed
294
295 print "your name is $name\n";
296
297 You can pass any number of arguments to C<send>, and everybody call to
298 C<recv> will return them.
299
300 =head2 The "main loop"
301
302 Most event-based frameworks have something called a "main loop" or "event
303 loop run function" or something similar.
304
305 Just like in C<recv> AnyEvent, these functions need to be called
306 eventually so that your event loop has a chance of actually looking for
307 those events you are interested in.
308
309 For example, in a L<Gtk2> program, the above example could also be written
310 like this:
311
312 use Gtk2 -init;
313 use AnyEvent;
314
315 ############################################
316 # create a window and some label
317
318 my $window = new Gtk2::Window "toplevel";
319 $window->add (my $label = new Gtk2::Label "soon replaced by name");
320
321 $window->show_all;
322
323 ############################################
324 # do our AnyEvent stuff
325
326 $| = 1; print "enter your name> ";
327
328 my $name_ready = AnyEvent->condvar;
329
330 my $wait_for_input = AnyEvent->io (
331 fh => \*STDIN, poll => "r",
332 cb => sub {
333 # set the label
334 $label->set_text (scalar <STDIN>);
335 print "enter another name> ";
336 }
337 );
338
339 ############################################
340 # Now enter Gtk2's event loop
341
342 main Gtk2;
343
344 No condition variable anywhere in sight - instead, we just read a line
345 from STDIN and replace the text in the label. In fact, since nobody
346 C<undef>'s C<$wait_for_input> you can enter multiple lines.
347
348 Instead of waiting for a condition variable, the program enters the Gtk2
349 main loop by calling C<< Gtk2->main >>, which will block the program and
350 wait for events to arrive.
351
352 This also shows that AnyEvent is quite flexible - you didn't have anything
353 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
354 worked.
355
356 Admittedly, the example is a bit silly - who would want to read names
357 from standard input in a Gtk+ application. But imagine that instead of
358 doing that, you would make a HTTP request in the background and display
359 it's results. In fact, with event-based programming you can make many
360 http-requests in parallel in your program and still provide feedback to
361 the user and stay interactive.
362
363 And in the next part you will see how to do just that - by implementing an
364 HTTP request, on our own, with the utility modules AnyEvent comes with.
365
366 Before that, however, let's briefly look at how you would write your
367 program with using only AnyEvent, without ever calling some other event
368 loop's run function.
369
370 In the example using condition variables, we used those to start waiting
371 for events, and in fact, condition variables are the solution:
372
373 my $quit_program = AnyEvent->condvar;
374
375 # create AnyEvent watchers (or not) here
376
377 $quit_program->recv;
378
379 If any of your watcher callbacks decide to quit (this is often
380 called an "unloop" in other frameworks), they can simply call C<<
381 $quit_program->send >>. Of course, they could also decide not to and
382 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
383 in a long-running daemon program).
384
385 If you don't need some clean quit functionality and just want to run the
386 event loop, you can simply do this:
387
388 AnyEvent->condvar->recv;
389
390 And this is, in fact, closest to the idea of a main loop run function that
391 AnyEvent offers.
392
393 =head2 Timers and other event sources
394
395 So far, we have only used I/O watchers. These are useful mainly to find
396 out whether a Socket has data to read, or space to write more data. On sane
397 operating systems this also works for console windows/terminals (typically
398 on standard input), serial lines, all sorts of other devices, basically
399 almost everything that has a file descriptor but isn't a file itself. (As
400 usual, "sane" excludes windows - on that platform you would need different
401 functions for all of these, complicating code immensely - think "socket
402 only" on windows).
403
404 However, I/O is not everything - the second most important event source is
405 the clock. For example when doing an HTTP request you might want to time
406 out when the server doesn't answer within some predefined amount of time.
407
408 In AnyEvent, timer event watchers are created by calling the C<<
409 AnyEvent->timer >> method:
410
411 use AnyEvent;
412
413 my $cv = AnyEvent->condvar;
414
415 my $wait_one_and_a_half_seconds = AnyEvent->timer (
416 after => 1.5, # after how many seconds to invoke the cb?
417 cb => sub { # the callback to invoke
418 $cv->send;
419 },
420 );
421
422 # can do something else here
423
424 # now wait till our time has come
425 $cv->recv;
426
427 Unlike I/O watchers, timers are only interested in the amount of seconds
428 they have to wait. When (at least) that amount of time has passed,
429 AnyEvent will invoke your callback.
430
431 Unlike I/O watchers, which will call your callback as many times as there
432 is data available, timers are normally one-shot: after they have "fired"
433 once and invoked your callback, they are dead and no longer do anything.
434
435 To get a repeating timer, such as a timer firing roughly once per second,
436 you can specify an C<interval> parameter:
437
438 my $once_per_second = AnyEvent->timer (
439 after => 0, # first invoke ASAP
440 interval => 1, # then invoke every second
441 cb => sub { # the callback to invoke
442 $cv->send;
443 },
444 );
445
446 =head3 More esoteric sources
447
448 AnyEvent also has some other, more esoteric event sources you can tap
449 into: signal, child and idle watchers.
450
451 Signal watchers can be used to wait for "signal events", which simply
452 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
453
454 Child-process watchers wait for a child process to exit. They are useful
455 when you fork a separate process and need to know when it exits, but you
456 do not wait for that by blocking.
457
458 Idle watchers invoke their callback when the event loop has handled all
459 outstanding events, polled for new events and didn't find any, i.e., when
460 your process is otherwise idle. They are useful if you want to do some
461 non-trivial data processing that can be done when your program doesn't
462 have anything better to do.
463
464 All these watcher types are described in detail in the main L<AnyEvent>
465 manual page.
466
467 Sometimes you also need to know what the current time is: C<<
468 AnyEvent->now >> returns the time the event toolkit uses to schedule
469 relative timers, and is usually what you want. It is often cached (which
470 means it can be a bit outdated). In that case, you can use the more costly
471 C<< AnyEvent->time >> method which will ask your operating system for the
472 current time, which is slower, but also more up to date.
473
474 =head1 Network programming and AnyEvent
475
476 So far you have seen how to register event watchers and handle events.
477
478 This is a great foundation to write network clients and servers, and might
479 be all that your module (or program) ever requires, but writing your own
480 I/O buffering again and again becomes tedious, not to mention that it
481 attracts errors.
482
483 While the core L<AnyEvent> module is still small and self-contained,
484 the distribution comes with some very useful utility modules such as
485 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
486 make your life as non-blocking network programmer a lot easier.
487
488 Here is a quick overview over these three modules:
489
490 =head2 L<AnyEvent::DNS>
491
492 This module allows fully asynchronous DNS resolution. It is used mainly by
493 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
494 a great way to do other DNS resolution tasks, such as reverse lookups of
495 IP addresses for log files.
496
497 =head2 L<AnyEvent::Handle>
498
499 This module handles non-blocking IO on (socket-, pipe- etc.) file handles
500 in an event based manner. It provides a wrapper object around your file
501 handle that provides queueing and buffering of incoming and outgoing data
502 for you.
503
504 It also implements the most common data formats, such as text lines, or
505 fixed and variable-width data blocks.
506
507 =head2 L<AnyEvent::Socket>
508
509 This module provides you with functions that handle socket creation
510 and IP address magic. The two main functions are C<tcp_connect> and
511 C<tcp_server>. The former will connect a (streaming) socket to an internet
512 host for you and the later will make a server socket for you, to accept
513 connections.
514
515 This module also comes with transparent IPv6 support, this means: If you
516 write your programs with this module, you will be IPv6 ready without doing
517 anything special.
518
519 It also works around a lot of portability quirks (especially on the
520 windows platform), which makes it even easier to write your programs in a
521 portable way (did you know that windows uses different error codes for all
522 socket functions and that Perl does not know about these? That "Unknown
523 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
524 successful? That unsuccessful TCP connects might never be reported back
525 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
526 ignored instead of being in progress? AnyEvent::Socket works around all of
527 these Windows/Perl bugs for you).
528
529 =head2 Implementing a parallel finger client with non-blocking connects
530 and AnyEvent::Socket
531
532 The finger protocol is one of the simplest protocols in use on the
533 internet. Or in use in the past, as almost nobody uses it anymore.
534
535 It works by connecting to the finger port on another host, writing a
536 single line with a user name and then reading the finger response, as
537 specified by that user. OK, RFC 1288 specifies a vastly more complex
538 protocol, but it basically boils down to this:
539
540 # telnet kernel.org finger
541 Trying 204.152.191.37...
542 Connected to kernel.org (204.152.191.37).
543 Escape character is '^]'.
544
545 The latest stable version of the Linux kernel is: [...]
546 Connection closed by foreign host.
547
548 So let's write a little AnyEvent function that makes a finger request:
549
550 use AnyEvent;
551 use AnyEvent::Socket;
552
553 sub finger($$) {
554 my ($user, $host) = @_;
555
556 # use a condvar to return results
557 my $cv = AnyEvent->condvar;
558
559 # first, connect to the host
560 tcp_connect $host, "finger", sub {
561 # the callback receives the socket handle - or nothing
562 my ($fh) = @_
563 or return $cv->send;
564
565 # now write the username
566 syswrite $fh, "$user\015\012";
567
568 my $response;
569
570 # register a read watcher
571 my $read_watcher; $read_watcher = AnyEvent->io (
572 fh => $fh,
573 poll => "r",
574 cb => sub {
575 my $len = sysread $fh, $response, 1024, length $response;
576
577 if ($len <= 0) {
578 # we are done, or an error occured, lets ignore the latter
579 undef $read_watcher; # no longer interested
580 $cv->send ($response); # send results
581 }
582 },
583 );
584 };
585
586 # pass $cv to the caller
587 $cv
588 }
589
590 That's a mouthful! Let's dissect this function a bit, first the overall
591 function and execution flow:
592
593 sub finger($$) {
594 my ($user, $host) = @_;
595
596 # use a condvar to return results
597 my $cv = AnyEvent->condvar;
598
599 # first, connect to the host
600 tcp_connect $host, "finger", sub {
601 ...
602 };
603
604 $cv
605 }
606
607 This isn't too complicated, just a function with two parameters, that
608 creates a condition variable, returns it, and while it does that,
609 initiates a TCP connect to C<$host>. The condition variable will be used
610 by the caller to receive the finger response, but one could equally well
611 pass a third argument, a callback, to the function.
612
613 Since we are programming event'ish, we do not wait for the connect to
614 finish - it could block the program for a minute or longer!
615
616 Instead, we pass the callback it should invoke when the connect is done to
617 C<tcp_connect>. If it is successful, that callback gets called with the
618 socket handle as first argument, otherwise, nothing will be passed to our
619 callback. The important point is that it will always be called as soon as
620 the outcome of the TCP connect is known.
621
622 This style of programming is also called "continuation style": the
623 "continuation" is simply the way the program continues - normally at the
624 next line after some statement (the exception is loops or things like
625 C<return>). When we are interested in events, however, we instead specify
626 the "continuation" of our program by passing a closure, which makes that
627 closure the "continuation" of the program.
628
629 The C<tcp_connect> call is like saying "return now, and when the
630 connection is established or it failed, continue there".
631
632 Now let's look at the callback/closure in more detail:
633
634 # the callback receives the socket handle - or nothing
635 my ($fh) = @_
636 or return $cv->send;
637
638 The first thing the callback does is indeed save the socket handle in
639 C<$fh>. When there was an error (no arguments), then our instinct as
640 expert Perl programmers would tell us to C<die>:
641
642 my ($fh) = @_
643 or die "$host: $!";
644
645 While this would give good feedback to the user (if he happens to watch
646 standard error), our program would probably stop working here, as we never
647 report the results to anybody, certainly not the caller of our C<finger>
648 function, and most event loops continue even after a C<die>!
649
650 This is why we instead C<return>, but also call C<< $cv->send >> without
651 any arguments to signal to the condvar consumer that something bad has
652 happened. The return value of C<< $cv->send >> is irrelevant, as is
653 the return value of our callback. The C<return> statement is simply
654 used for the side effect of, well, returning immediately from the
655 callback. Checking for errors and handling them this way is very common,
656 which is why this compact idiom is so handy.
657
658 As the next step in the finger protocol, we send the username to the
659 finger daemon on the other side of our connection (the kernel.org finger
660 service doesn't actually wait for a username, but the net is running out
661 of finger servers fast):
662
663 syswrite $fh, "$user\015\012";
664
665 Note that this isn't 100% clean socket programming - the socket could,
666 for whatever reasons, not accept our data. When writing a small amount
667 of data like in this example it doesn't matter, as a socket buffer is
668 almost always big enough for a mere "username", but for real-world
669 cases you might need to implement some kind of write buffering - or use
670 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
671 next section.
672
673 What we I<do> have to do is to implement our own read buffer - the response
674 data could arrive late or in multiple chunks, and we cannot just wait for
675 it (event-based programming, you know?).
676
677 To do that, we register a read watcher on the socket which waits for data:
678
679 my $read_watcher; $read_watcher = AnyEvent->io (
680 fh => $fh,
681 poll => "r",
682
683 There is a trick here, however: the read watcher isn't stored in a global
684 variable, but in a local one - if the callback returns, it would normally
685 destroy the variable and its contents, which would in turn unregister our
686 watcher.
687
688 To avoid that, we C<undef>ine the variable in the watcher callback. This
689 means that, when the C<tcp_connect> callback returns, perl thinks (quite
690 correctly) that the read watcher is still in use - namely in the callback,
691 and thus keeps it alive even if nothing else in the program refers to it
692 anymore (it is much like Baron Münchhausen keeping himself from dying by
693 pulling himself out of a swamp).
694
695 The trick, however, is that instead of:
696
697 my $read_watcher = AnyEvent->io (...
698
699 The program does:
700
701 my $read_watcher; $read_watcher = AnyEvent->io (...
702
703 The reason for this is a quirk in the way Perl works: variable names
704 declared with C<my> are only visible in the I<next> statement. If the
705 whole C<< AnyEvent->io >> call, including the callback, would be done in
706 a single statement, the callback could not refer to the C<$read_watcher>
707 variable to undefine it, so it is done in two statements.
708
709 Whether you'd want to format it like this is of course a matter of style,
710 this way emphasizes that the declaration and assignment really are one
711 logical statement.
712
713 The callback itself calls C<sysread> for as many times as necessary, until
714 C<sysread> returns either an error or end-of-file:
715
716 cb => sub {
717 my $len = sysread $fh, $response, 1024, length $response;
718
719 if ($len <= 0) {
720
721 Note that C<sysread> has the ability to append data it reads to a scalar,
722 by specifying an offset, a feature of which we make good use of in this
723 example.
724
725 When C<sysread> indicates we are done, the callback C<undef>ines
726 the watcher and then C<send>'s the response data to the condition
727 variable. All this has the following effects:
728
729 Undefining the watcher destroys it, as our callback was the only one still
730 having a reference to it. When the watcher gets destroyed, it destroys the
731 callback, which in turn means the C<$fh> handle is no longer used, so that
732 gets destroyed as well. The result is that all resources will be nicely
733 cleaned up by perl for us.
734
735 =head3 Using the finger client
736
737 Now, we could probably write the same finger client in a simpler way if
738 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
739 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
740
741 But the main advantage is that we can not only run this finger function in
742 the background, we even can run multiple sessions in parallel, like this:
743
744 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
745 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
746 my $f3 = finger "hpa" , "kernel.org"; # finger hpa
747
748 print "trouble tickets:\n" , $f1->recv, "\n";
749 print "trouble ticket #1736:\n", $f2->recv, "\n";
750 print "kernel release info: " , $f3->recv, "\n";
751
752 It doesn't look like it, but in fact all three requests run in
753 parallel. The code waits for the first finger request to finish first, but
754 that doesn't keep it from executing them parallel: when the first C<recv>
755 call sees that the data isn't ready yet, it serves events for all three
756 requests automatically, until the first request has finished.
757
758 The second C<recv> call might either find the data is already there, or it
759 will continue handling events until that is the case, and so on.
760
761 By taking advantage of network latencies, which allows us to serve other
762 requests and events while we wait for an event on one socket, the overall
763 time to do these three requests will be greatly reduced, typically all
764 three are done in the same time as the slowest of them would need to finish.
765
766 By the way, you do not actually have to wait in the C<recv> method on an
767 AnyEvent condition variable - after all, waiting is evil - you can also
768 register a callback:
769
770 $cv->cb (sub {
771 my $response = shift->recv;
772 # ...
773 });
774
775 The callback will only be invoked when C<send> was called. In fact,
776 instead of returning a condition variable you could also pass a third
777 parameter to your finger function, the callback to invoke with the
778 response:
779
780 sub finger($$$) {
781 my ($user, $host, $cb) = @_;
782
783 How you implement it is a matter of taste - if you expect your function to
784 be used mainly in an event-based program you would normally prefer to pass
785 a callback directly. If you write a module and expect your users to use
786 it "synchronously" often (for example, a simple http-get script would not
787 really care much for events), then you would use a condition variable and
788 tell them "simply C<< ->recv >> the data".
789
790 =head3 Problems with the implementation and how to fix them
791
792 To make this example more real-world-ready, we would not only implement
793 some write buffering (for the paranoid, or maybe denial-of-service aware
794 security expert), but we would also have to handle timeouts and maybe
795 protocol errors.
796
797 Doing this quickly gets unwieldy, which is why we introduce
798 L<AnyEvent::Handle> in the next section, which takes care of all these
799 details for you and let's you concentrate on the actual protocol.
800
801
802 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
803
804 The L<AnyEvent::Handle> module has been hyped quite a bit in this document
805 so far, so let's see what it really offers.
806
807 As finger is such a simple protocol, let's try something slightly more
808 complicated: HTTP/1.0.
809
810 An HTTP GET request works by sending a single request line that indicates
811 what you want the server to do and the URI you want to act it on, followed
812 by as many "header" lines (C<Header: data>, same as e-mail headers) as
813 required for the request, ended by an empty line.
814
815 The response is formatted very similarly, first a line with the response
816 status, then again as many header lines as required, then an empty line,
817 followed by any data that the server might send.
818
819 Again, let's try it out with C<telnet> (I condensed the output a bit - if
820 you want to see the full response, do it yourself).
821
822 # telnet www.google.com 80
823 Trying 209.85.135.99...
824 Connected to www.google.com (209.85.135.99).
825 Escape character is '^]'.
826 GET /test HTTP/1.0
827
828 HTTP/1.0 404 Not Found
829 Date: Mon, 02 Jun 2008 07:05:54 GMT
830 Content-Type: text/html; charset=UTF-8
831
832 <html><head>
833 [...]
834 Connection closed by foreign host.
835
836 The C<GET ...> and the empty line were entered manually, the rest of the
837 telnet output is google's response, in which case a C<404 not found> one.
838
839 So, here is how you would do it with C<AnyEvent::Handle>:
840
841 sub http_get {
842 my ($host, $uri, $cb) = @_;
843
844 # store results here
845 my ($response, $header, $body);
846
847 my $handle; $handle = new AnyEvent::Handle
848 connect => [$host => 'http'],
849 on_error => sub {
850 $cb->("HTTP/1.0 500 $!");
851 $handle->destroy; # explicitly destroy handle
852 },
853 on_eof => sub {
854 $cb->($response, $header, $body);
855 $handle->destroy; # explicitly destroy handle
856 };
857
858 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
859
860 # now fetch response status line
861 $handle->push_read (line => sub {
862 my ($handle, $line) = @_;
863 $response = $line;
864 });
865
866 # then the headers
867 $handle->push_read (line => "\015\012\015\012", sub {
868 my ($handle, $line) = @_;
869 $header = $line;
870 });
871
872 # and finally handle any remaining data as body
873 $handle->on_read (sub {
874 $body .= $_[0]->rbuf;
875 $_[0]->rbuf = "";
876 });
877 }
878
879 And now let's go through it step by step. First, as usual, the overall
880 C<http_get> function structure:
881
882 sub http_get {
883 my ($host, $uri, $cb) = @_;
884
885 # store results here
886 my ($response, $header, $body);
887
888 my $handle; $handle = new AnyEvent::Handle
889 ... create handle object
890
891 ... push data to write
892
893 ... push what to expect to read queue
894 }
895
896 Unlike in the finger example, this time the caller has to pass a callback
897 to C<http_get>. Also, instead of passing a URL as one would expect, the
898 caller has to provide the hostname and URI - normally you would use the
899 C<URI> module to parse a URL and separate it into those parts, but that is
900 left to the inspired reader :)
901
902 Since everything else is left to the caller, all C<http_get> does it to
903 initiate the connection by creating the AnyEvent::Handle object (which
904 calls C<tcp_connect> for us) and leave everything else to it's callback.
905
906 The handle object is created, unsurprisingly, by calling the C<new>
907 method of L<AnyEvent::Handle>:
908
909 my $handle; $handle = new AnyEvent::Handle
910 connect => [$host => 'http'],
911 on_error => sub {
912 $cb->("HTTP/1.0 500 $!");
913 $handle->destroy; # explicitly destroy handle
914 },
915 on_eof => sub {
916 $cb->($response, $header, $body);
917 $handle->destroy; # explicitly destroy handle
918 };
919
920 The C<connect> argument tells AnyEvent::Handle to call C<tcp_connect> for
921 the specified host and service/port.
922
923 The C<on_error> callback will be called on any unexpected error, such as a
924 refused connection, or unexpected connection while reading the header.
925
926 Instead of having an extra mechanism to signal errors, connection errors
927 are signalled by crafting a special "response status line", like this:
928
929 HTTP/1.0 500 Connection refused
930
931 This means the caller cannot distinguish (easily) between
932 locally-generated errors and server errors, but it simplifies error
933 handling for the caller a lot.
934
935 The error callback also destroys the handle explicitly, because we are not
936 interested in continuing after any errors. In AnyEvent::Handle callbacks
937 you have to call C<destroy> explicitly to destroy a handle. Outside of
938 those callbacks you cna just forget the object reference and it will be
939 automatically cleaned up.
940
941 Last not least, we set an C<on_eof> callback that is called when the
942 other side indicates it has stopped writing data, which we will use to
943 gracefully shut down the handle and report the results. This callback is
944 only called when the read queue is empty - if the read queue expects some
945 data and the handle gets an EOF from the other side this will be an error
946 - after all, you did expect more to come.
947
948 If you wanted to write a server using AnyEvent::Handle, you would use
949 C<tcp_accept> and then create the AnyEvent::Handle with the C<fh>
950 argument.
951
952 =head3 The write queue
953
954 The next line sends the actual request:
955
956 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
957
958 No headers will be sent (this is fine for simple requests), so the whole
959 request is just a single line followed by an empty line to signal the end
960 of the headers to the server.
961
962 The more interesting question is why the method is called C<push_write>
963 and not just write. The reason is that you can I<always> add some write
964 data without blocking, and to do this, AnyEvent::Handle needs some write
965 queue internally - and C<push_write> simply pushes some data onto the end
966 of that queue, just like Perl's C<push> pushes data onto the end of an
967 array.
968
969 The deeper reason is that at some point in the future, there might
970 be C<unshift_write> as well, and in any case, we will shortly meet
971 C<push_read> and C<unshift_read>, and it's usually easiest to remember if
972 all those functions have some symmetry in their name. So C<push> is used
973 as the opposite of C<unshift> in AnyEvent::Handle, not as the opposite of
974 C<pull> - just like in Perl.
975
976 Note that we call C<push_write> right after creating the AnyEvent::Handle
977 object, before it has had time to actually connect to the server. This is
978 fine, pushing the read and write requests will simply queue them in the
979 handle object until the connection has been established. Alternatively, we
980 could do this "on demand" in the C<on_connect> callback.
981
982 If C<push_write> is called with more than one argument, then you can even
983 do I<formatted> I/O, which simply means your data will be transformed in
984 some ways. For example, this would JSON-encode your data before pushing it
985 to the write queue:
986
987 $handle->push_write (json => [1, 2, 3]);
988
989 Apart from that, this pretty much summarises the write queue, there is
990 little else to it.
991
992 Reading the response is far more interesting, because it involves the more
993 powerful and complex I<read queue>:
994
995 =head3 The read queue
996
997 The response consists of three parts: a single line with the response
998 status, a single paragraph of headers ended by an empty line, and the
999 request body, which is simply the remaining data on that connection.
1000
1001 For the first two, we push two read requests onto the read queue:
1002
1003 # now fetch response status line
1004 $handle->push_read (line => sub {
1005 my ($handle, $line) = @_;
1006 $response = $line;
1007 });
1008
1009 # then the headers
1010 $handle->push_read (line => "\015\012\015\012", sub {
1011 my ($handle, $line) = @_;
1012 $header = $line;
1013 });
1014
1015 While one can simply push a single callback to parse the data the
1016 queue, I<formatted> I/O really comes to our advantage here, as there
1017 is a ready-made "read line" read type. The first read expects a single
1018 line, ended by C<\015\012> (the standard end-of-line marker in internet
1019 protocols).
1020
1021 The second "line" is actually a single paragraph - instead of reading it
1022 line by line we tell C<push_read> that the end-of-line marker is really
1023 C<\015\012\015\012>, which is an empty line. The result is that the whole
1024 header paragraph will be treated as a single line and read. The word
1025 "line" is interpreted very freely, much like Perl itself does it.
1026
1027 Note that push read requests are pushed immediately after creating the
1028 handle object - since AnyEvent::Handle provides a queue we can push as
1029 many requests as we want, and AnyEvent::Handle will handle them in order.
1030
1031 There is, however, no read type for "the remaining data". For that, we
1032 install our own C<on_read> callback:
1033
1034 # and finally handle any remaining data as body
1035 $handle->on_read (sub {
1036 $body .= $_[0]->rbuf;
1037 $_[0]->rbuf = "";
1038 });
1039
1040 This callback is invoked every time data arrives and the read queue is
1041 empty - which in this example will only be the case when both response and
1042 header have been read. The C<on_read> callback could actually have been
1043 specified when constructing the object, but doing it this way preserves
1044 logical ordering.
1045
1046 The read callback simply adds the current read buffer to it's C<$body>
1047 variable and, most importantly, I<empties> the buffer by assigning the
1048 empty string to it.
1049
1050 After AnyEvent::Handle has been so instructed, it will handle incoming
1051 data according to these instructions - if all goes well, the callback will
1052 be invoked with the response data, if not, it will get an error.
1053
1054 In general, you can implement pipelining (a semi-advanced feature of many
1055 protocols) very easy with AnyEvent::Handle: If you have a protocol with a
1056 request/response structure, your request methods/functions will all look
1057 like this (simplified):
1058
1059 sub request {
1060
1061 # send the request to the server
1062 $handle->push_write (...);
1063
1064 # push some response handlers
1065 $handle->push_read (...);
1066 }
1067
1068 This means you can queue as many requests as you want, and while
1069 AnyEvent::Handle goes through its read queue to handle the response data,
1070 the other side can work on the next request - queueing the request just
1071 appends some data to the write queue and installs a handler to be called
1072 later.
1073
1074 You might ask yourself how to handle decisions you can only make I<after>
1075 you have received some data (such as handling a short error response or a
1076 long and differently-formatted response). The answer to this problem is
1077 C<unshift_read>, which we will introduce together with an example in the
1078 coming sections.
1079
1080 =head3 Using C<http_get>
1081
1082 Finally, here is how you would use C<http_get>:
1083
1084 http_get "www.google.com", "/", sub {
1085 my ($response, $header, $body) = @_;
1086
1087 print
1088 $response, "\n",
1089 $body;
1090 };
1091
1092 And of course, you can run as many of these requests in parallel as you
1093 want (and your memory supports).
1094
1095 =head3 HTTPS
1096
1097 Now, as promised, let's implement the same thing for HTTPS, or more
1098 correctly, let's change our C<http_get> function into a function that
1099 speaks HTTPS instead.
1100
1101 HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1102 B<S>ecurity is the official name for what most people refer to as C<SSL>)
1103 that contains standard HTTP protocol exchanges. The only other difference
1104 to HTTP is that by default it uses port C<443> instead of port C<80>.
1105
1106 To implement these two differences we need two tiny changes, first, in the
1107 C<connect> parameter, we replace C<http> by C<https> to connect to the
1108 https port:
1109
1110 connect => [$host => 'https'],
1111
1112 The other change deals with TLS, which is something L<AnyEvent::Handle>
1113 does for us, as long as I<you> made sure that the L<Net::SSLeay> module
1114 is around. To enable TLS with L<AnyEvent::Handle>, we simply pass an
1115 additional C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1116
1117 tls => "connect",
1118
1119 Specifying C<tls> enables TLS, and the argument specifies whether
1120 AnyEvent::Handle is the server side ("accept") or the client side
1121 ("connect") for the TLS connection, as unlike TCP, there is a clear
1122 server/client relationship in TLS.
1123
1124 That's all.
1125
1126 Of course, all this should be handled transparently by C<http_get>
1127 after parsing the URL. If you need this, see the part about exercising
1128 your inspiration earlier in this document. You could also use the
1129 L<AnyEvent::HTTP> module from CPAN, which implements all this and works
1130 around a lot of quirks for you, too.
1131
1132 =head3 The read queue - revisited
1133
1134 HTTP always uses the same structure in its responses, but many protocols
1135 require parsing responses differently depending on the response itself.
1136
1137 For example, in SMTP, you normally get a single response line:
1138
1139 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1140
1141 But SMTP also supports multi-line responses:
1142
1143 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1144 220-hey guys
1145 220 my response is longer than yours
1146
1147 To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1148 C<unshift_read> will not append your read request to the end of the read
1149 queue, but instead it will prepend it to the queue.
1150
1151 This is useful in the situation above: Just push your response-line read
1152 request when sending the SMTP command, and when handling it, you look at
1153 the line to see if more is to come, and C<unshift_read> another reader
1154 callback if required, like this:
1155
1156 my $response; # response lines end up in here
1157
1158 my $read_response; $read_response = sub {
1159 my ($handle, $line) = @_;
1160
1161 $response .= "$line\n";
1162
1163 # check for continuation lines ("-" as 4th character")
1164 if ($line =~ /^...-/) {
1165 # if yes, then unshift another line read
1166 $handle->unshift_read (line => $read_response);
1167
1168 } else {
1169 # otherwise we are done
1170
1171 # free callback
1172 undef $read_response;
1173
1174 print "we are don reading: $response\n";
1175 }
1176 };
1177
1178 $handle->push_read (line => $read_response);
1179
1180 This recipe can be used for all similar parsing problems, for example in
1181 NNTP, the response code to some commands indicates that more data will be
1182 sent:
1183
1184 $handle->push_write ("article 42");
1185
1186 # read response line
1187 $handle->push_read (line => sub {
1188 my ($handle, $status) = @_;
1189
1190 # article data following?
1191 if ($status =~ /^2/) {
1192 # yes, read article body
1193
1194 $handle->unshift_read (line => "\012.\015\012", sub {
1195 my ($handle, $body) = @_;
1196
1197 $finish->($status, $body);
1198 });
1199
1200 } else {
1201 # some error occured, no article data
1202
1203 $finish->($status);
1204 }
1205 }
1206
1207 =head3 Your own read queue handler
1208
1209 Sometimes, your protocol doesn't play nice and uses lines or chunks of
1210 data not formatted in a way handled by AnyEvent::Handle out of the box. In
1211 this case you have to implement your own read parser.
1212
1213 To make up a contorted example, imagine you are looking for an even
1214 number of characters followed by a colon (":"). Also imagine that
1215 AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1216 had to do it manually.
1217
1218 To implement a read handler for this, you would C<push_read> (or
1219 C<unshift_read>) just a single code reference.
1220
1221 This code reference will then be called each time there is (new) data
1222 available in the read buffer, and is expected to either successfully
1223 eat/consume some of that data (and return true) or to return false to
1224 indicate that it wants to be called again.
1225
1226 If the code reference returns true, then it will be removed from the
1227 read queue (because it has parsed/consumed whatever it was supposed to
1228 consume), otherwise it stays in the front of it.
1229
1230 The example above could be coded like this:
1231
1232 $handle->push_read (sub {
1233 my ($handle) = @_;
1234
1235 # check for even number of characters + ":"
1236 # and remove the data if a match is found.
1237 # if not, return false (actually nothing)
1238
1239 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1240 or return;
1241
1242 # we got some data in $1, pass it to whoever wants it
1243 $finish->($1);
1244
1245 # and return true to indicate we are done
1246 1
1247 });
1248
1249 This concludes our little tutorial.
1250
1251 =head1 Where to go from here?
1252
1253 This introduction should have explained the key concepts of L<AnyEvent>
1254 - event watchers and condition variables, L<AnyEvent::Socket> - basic
1255 networking utilities, and L<AnyEvent::Handle> - a nice wrapper around
1256 handles.
1257
1258 You could either start coding stuff right away, look at those manual
1259 pages for the gory details, or roam CPAN for other AnyEvent modules (such
1260 as L<AnyEvent::IRC> or L<AnyEvent::HTTP>) to see more code examples (or
1261 simply to use them).
1262
1263 If you need a protocol that doesn't have an implementation using AnyEvent,
1264 remember that you can mix AnyEvent with one other event framework, such as
1265 L<POE>, so you can always use AnyEvent for your own tasks plus modules of
1266 one other event framework to fill any gaps.
1267
1268 And last not least, you could also look at L<Coro>, especially
1269 L<Coro::AnyEvent>, to see how you can turn event-based programming from
1270 callback style back to the usual imperative style (also called "inversion
1271 of control" - AnyEvent calls I<you>, but Coro lets I<you> call AnyEvent).
1272
1273 =head1 Authors
1274
1275 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1276