ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.22
Committed: Mon Jun 29 20:55:58 2009 UTC (14 years, 11 months ago) by root
Branch: MAIN
Changes since 1.21: +205 -159 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Intro - an introductory tutorial to AnyEvent
4
5 =head1 Introduction to AnyEvent
6
7 This is a tutorial that will introduce you to the features of AnyEvent.
8
9 The first part introduces the core AnyEvent module (after swamping you a
10 bit in evangelism), which might already provide all you ever need. If you
11 are only interested in AnyEvent's event handling capabilities, read no
12 further.
13
14 The second part focuses on network programming using sockets, for which
15 AnyEvent offers a lot of support you can use, and a lot of workarounds
16 around portability quirks.
17
18
19 =head1 What is AnyEvent?
20
21 If you don't care for the whys and want to see code, skip this section!
22
23 AnyEvent is first of all just a framework to do event-based
24 programming. Typically such frameworks are an all-or-nothing thing: If you
25 use one such framework, you can't (easily, or even at all) use another in
26 the same program.
27
28 AnyEvent is different - it is a thin abstraction layer above all kinds
29 of event loops. Its main purpose is to move the choice of the underlying
30 framework (the event loop) from the module author to the program author
31 using the module.
32
33 That means you can write code that uses events to control what it
34 does, without forcing other code in the same program to use the same
35 underlying framework as you do - i.e. you can create a Perl module
36 that is event-based using AnyEvent, and users of that module can still
37 choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
38 all: AnyEvent comes with its own event loop implementation, so your
39 code works regardless of other modules that might or might not be
40 installed. The latter is important, as AnyEvent does not have any
41 dependencies to other modules, which makes it easy to install, for
42 example, when you lack a C compiler.
43
44 A typical problem with Perl modules such as L<Net::IRC> is that they
45 come with their own event loop: In L<Net::IRC>, the program who uses it
46 needs to start the event loop of L<Net::IRC>. That means that one cannot
47 integrate this module into a L<Gtk2> GUI for instance, as that module,
48 too, enforces the use of its own event loop (namely L<Glib>).
49
50 Another example is L<LWP>: it provides no event interface at all. It's a
51 pure blocking HTTP (and FTP etc.) client library, which usually means that
52 you either have to start a thread or have to fork for a HTTP request, or
53 use L<Coro::LWP>, if you want to do something else while waiting for the
54 request to finish.
55
56 The motivation behind these designs is often that a module doesn't want to
57 depend on some complicated XS-module (Net::IRC), or that it doesn't want
58 to force the user to use some specific event loop at all (LWP).
59
60 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
61
62 =over 4
63
64 =item - write their own event loop (because guarantees to offer one
65 everywhere - even on windows).
66
67 =item - choose one fixed event loop (because AnyEvent works with all
68 important event loops available for Perl, and adding others is trivial).
69
70 =back
71
72 If the module author uses L<AnyEvent> for all his event needs (IO events,
73 timers, signals, ...) then all other modules can just use his module and
74 don't have to choose an event loop or adapt to his event loop. The choice
75 of the event loop is ultimately made by the program author who uses all
76 the modules and writes the main program. And even there he doesn't have to
77 choose, he can just let L<AnyEvent> choose the best available event loop
78 for him.
79
80 Read more about this in the main documentation of the L<AnyEvent> module.
81
82
83 =head1 Introduction to Event-Based Programming
84
85 So what exactly is programming using events? It quite simply means that
86 instead of your code actively waiting for something, such as the user
87 entering something on STDIN:
88
89 $| = 1; print "enter your name> ";
90
91 my $name = <STDIN>;
92
93 You instead tell your event framework to notify you in the event of some
94 data being available on STDIN, by using a callback mechanism:
95
96 use AnyEvent;
97
98 $| = 1; print "enter your name> ";
99
100 my $name;
101
102 my $wait_for_input = AnyEvent->io (
103 fh => \*STDIN, # which file handle to check
104 poll => "r", # which event to wait for ("r"ead data)
105 cb => sub { # what callback to execute
106 $name = <STDIN>; # read it
107 }
108 );
109
110 # do something else here
111
112 Looks more complicated, and surely is, but the advantage of using events
113 is that your program can do something else instead of waiting for input
114 (side note: combining AnyEvent with a thread package such as Coro can
115 recoup much of the simplicity, effectively getting the best of two
116 worlds).
117
118 Waiting as done in the first example is also called "blocking" the process
119 because you "block"/keep your process from executing anything else while
120 you do so.
121
122 The second example avoids blocking by only registering interest in a read
123 event, which is fast and doesn't block your process. Only when read data
124 is available will the callback be called, which can then proceed to read
125 the data.
126
127 The "interest" is represented by an object returned by C<< AnyEvent->io
128 >> called a "watcher" object - called like that because it "watches" your
129 file handle (or other event sources) for the event you are interested in.
130
131 In the example above, we create an I/O watcher by calling the C<<
132 AnyEvent->io >> method. Disinterest in some event is simply expressed
133 by forgetting about the watcher, for example, by C<undef>'ing the only
134 variable it is stored in. AnyEvent will automatically clean up the watcher
135 if it is no longer used, much like Perl closes your file handles if you no
136 longer use them anywhere.
137
138 =head3 A short note on callbacks
139
140 A common issue that hits people is the problem of passing parameters
141 to callbacks. Programmers used to languages such as C or C++ are often
142 used to a style where one passes the address of a function (a function
143 reference) and some data value, e.g.:
144
145 sub callback {
146 my ($arg) = @_;
147
148 $arg->method;
149 }
150
151 my $arg = ...;
152
153 call_me_back_later \&callback, $arg;
154
155 This is clumsy, as the place where behaviour is specified (when the
156 callback is registered) is often far away from the place where behaviour
157 is implemented. It also doesn't use Perl syntax to invoke the code. There
158 is also an abstraction penalty to pay as one has to I<name> the callback,
159 which often is unnecessary and leads to nonsensical or duplicated names.
160
161 In Perl, one can specify behaviour much more directly by using
162 I<closures>. Closures are code blocks that take a reference to the
163 enclosing scope(s) when they are created. This means lexical variables in
164 scope at the time of creating the closure can simply be used inside the
165 closure:
166
167 my $arg = ...;
168
169 call_me_back_later sub { $arg->method };
170
171 Under most circumstances, closures are faster, use fewer resources and
172 result in much clearer code then the traditional approach. Faster,
173 because parameter passing and storing them in local variables in Perl
174 is relatively slow. Fewer resources, because closures take references
175 to existing variables without having to create new ones, and clearer
176 code because it is immediately obvious that the second example calls the
177 C<method> method when the callback is invoked.
178
179 Apart from these, the strongest argument for using closures with AnyEvent
180 is that AnyEvent does not allow passing parameters to the callback, so
181 closures are the only way to achieve that in most cases :->
182
183
184 =head3 A hint on debugging
185
186 AnyEvent does, by default, not do any argument checking. This can lead to
187 strange and unexpected results especially if you are trying to learn your
188 ways with AnyEvent.
189
190 AnyEvent supports a special "strict" mode, off by default, which does very
191 strict argument checking, at the expense of being somewhat slower. During
192 development, however, this mode is very useful.
193
194 You can enable this strict mode either by having an environment variable
195 C<PERL_ANYEVENT_STRICT> with a true value in your environment:
196
197 PERL_ANYEVENT_STRICT=1 perl test.pl
198
199 Or you can write C<use AnyEvent::Strict> in your program, which has the
200 same effect (do not do this in production, however).
201
202
203 =head2 Condition Variables
204
205 Back to the I/O watcher example: The code is not yet a fully working
206 program, and will not work as-is. The reason is that your callback will
207 not be invoked out of the blue, you have to run the event loop. Also,
208 event-based programs sometimes have to block, too, as when there simply is
209 nothing else to do and everything waits for some events, it needs to block
210 the process as well until new events arrive.
211
212 In AnyEvent, this is done using condition variables. Condition variables
213 are named "condition variables" because they represent a condition that is
214 initially false and needs to be fulfilled.
215
216 You can also call them "merge points", "sync points", "rendezvous ports"
217 or even callbacks and many other things (and they are often called like
218 this in other frameworks). The important point is that you can create them
219 freely and later wait for them to become true.
220
221 Condition variables have two sides - one side is the "producer" of the
222 condition (whatever code detects and flags the condition), the other side
223 is the "consumer" (the code that waits for that condition).
224
225 In our example in the previous section, the producer is the event callback
226 and there is no consumer yet - let's change that right now:
227
228 use AnyEvent;
229
230 $| = 1; print "enter your name> ";
231
232 my $name;
233
234 my $name_ready = AnyEvent->condvar;
235
236 my $wait_for_input = AnyEvent->io (
237 fh => \*STDIN,
238 poll => "r",
239 cb => sub {
240 $name = <STDIN>;
241 $name_ready->send;
242 }
243 );
244
245 # do something else here
246
247 # now wait until the name is available:
248 $name_ready->recv;
249
250 undef $wait_for_input; # watche rno longer needed
251
252 print "your name is $name\n";
253
254 This program creates an AnyEvent condvar by calling the C<<
255 AnyEvent->condvar >> method. It then creates a watcher as usual, but
256 inside the callback it C<send>'s the C<$name_ready> condition variable,
257 which causes whoever is waiting on it to continue.
258
259 The "whoever" in this case is the code that follows, which calls C<<
260 $name_ready->recv >>: The producer calls C<send>, the consumer calls
261 C<recv>.
262
263 If there is no C<$name> available yet, then the call to C<<
264 $name_ready->recv >> will halt your program until the condition becomes
265 true.
266
267 As the names C<send> and C<recv> imply, you can actually send and receive
268 data using this, for example, the above code could also be written like
269 this, without an extra variable to store the name in:
270
271 use AnyEvent;
272
273 $| = 1; print "enter your name> ";
274
275 my $name_ready = AnyEvent->condvar;
276
277 my $wait_for_input = AnyEvent->io (
278 fh => \*STDIN, poll => "r",
279 cb => sub { $name_ready->send (scalar <STDIN>) }
280 );
281
282 # do something else here
283
284 # now wait and fetch the name
285 my $name = $name_ready->recv;
286
287 undef $wait_for_input; # watche rno longer needed
288
289 print "your name is $name\n";
290
291 You can pass any number of arguments to C<send>, and everybody call to
292 C<recv> will return them.
293
294 =head2 The "main loop"
295
296 Most event-based frameworks have something called a "main loop" or "event
297 loop run function" or something similar.
298
299 Just like in C<recv> AnyEvent, these functions need to be called
300 eventually so that your event loop has a chance of actually looking for
301 those events you are interested in.
302
303 For example, in a L<Gtk2> program, the above example could also be written
304 like this:
305
306 use Gtk2 -init;
307 use AnyEvent;
308
309 ############################################
310 # create a window and some label
311
312 my $window = new Gtk2::Window "toplevel";
313 $window->add (my $label = new Gtk2::Label "soon replaced by name");
314
315 $window->show_all;
316
317 ############################################
318 # do our AnyEvent stuff
319
320 $| = 1; print "enter your name> ";
321
322 my $name_ready = AnyEvent->condvar;
323
324 my $wait_for_input = AnyEvent->io (
325 fh => \*STDIN, poll => "r",
326 cb => sub {
327 # set the label
328 $label->set_text (scalar <STDIN>);
329 print "enter another name> ";
330 }
331 );
332
333 ############################################
334 # Now enter Gtk2's event loop
335
336 main Gtk2;
337
338 No condition variable anywhere in sight - instead, we just read a line
339 from STDIN and replace the text in the label. In fact, since nobody
340 C<undef>'s C<$wait_for_input> you can enter multiple lines.
341
342 Instead of waiting for a condition variable, the program enters the Gtk2
343 main loop by calling C<< Gtk2->main >>, which will block the program and
344 wait for events to arrive.
345
346 This also shows that AnyEvent is quite flexible - you didn't have anything
347 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
348 worked.
349
350 Admittedly, the example is a bit silly - who would want to read names
351 from standard input in a Gtk+ application. But imagine that instead of
352 doing that, you would make a HTTP request in the background and display
353 it's results. In fact, with event-based programming you can make many
354 http-requests in parallel in your program and still provide feedback to
355 the user and stay interactive.
356
357 And in the next part you will see how to do just that - by implementing an
358 HTTP request, on our own, with the utility modules AnyEvent comes with.
359
360 Before that, however, let's briefly look at how you would write your
361 program with using only AnyEvent, without ever calling some other event
362 loop's run function.
363
364 In the example using condition variables, we used those to start waiting
365 for events, and in fact, condition variables are the solution:
366
367 my $quit_program = AnyEvent->condvar;
368
369 # create AnyEvent watchers (or not) here
370
371 $quit_program->recv;
372
373 If any of your watcher callbacks decide to quit (this is often
374 called an "unloop" in other frameworks), they can simply call C<<
375 $quit_program->send >>. Of course, they could also decide not to and
376 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
377 in a long-running daemon program).
378
379 If you don't need some clean quit functionality and just want to run the
380 event loop, you can simply do this:
381
382 AnyEvent->condvar->recv;
383
384 And this is, in fact, closest to the idea of a main loop run function that
385 AnyEvent offers.
386
387 =head2 Timers and other event sources
388
389 So far, we have only used I/O watchers. These are useful mainly to find
390 out whether a Socket has data to read, or space to write more data. On sane
391 operating systems this also works for console windows/terminals (typically
392 on standard input), serial lines, all sorts of other devices, basically
393 almost everything that has a file descriptor but isn't a file itself. (As
394 usual, "sane" excludes windows - on that platform you would need different
395 functions for all of these, complicating code immensely - think "socket
396 only" on windows).
397
398 However, I/O is not everything - the second most important event source is
399 the clock. For example when doing an HTTP request you might want to time
400 out when the server doesn't answer within some predefined amount of time.
401
402 In AnyEvent, timer event watchers are created by calling the C<<
403 AnyEvent->timer >> method:
404
405 use AnyEvent;
406
407 my $cv = AnyEvent->condvar;
408
409 my $wait_one_and_a_half_seconds = AnyEvent->timer (
410 after => 1.5, # after how many seconds to invoke the cb?
411 cb => sub { # the callback to invoke
412 $cv->send;
413 },
414 );
415
416 # can do something else here
417
418 # now wait till our time has come
419 $cv->recv;
420
421 Unlike I/O watchers, timers are only interested in the amount of seconds
422 they have to wait. When (at least) that amount of time has passed,
423 AnyEvent will invoke your callback.
424
425 Unlike I/O watchers, which will call your callback as many times as there
426 is data available, timers are normally one-shot: after they have "fired"
427 once and invoked your callback, they are dead and no longer do anything.
428
429 To get a repeating timer, such as a timer firing roughly once per second,
430 you can specify an C<interval> parameter:
431
432 my $once_per_second = AnyEvent->timer (
433 after => 0, # first invoke ASAP
434 interval => 1, # then invoke every second
435 cb => sub { # the callback to invoke
436 $cv->send;
437 },
438 );
439
440 =head3 More esoteric sources
441
442 AnyEvent also has some other, more esoteric event sources you can tap
443 into: signal, child and idle watchers.
444
445 Signal watchers can be used to wait for "signal events", which simply
446 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
447
448 Child-process watchers wait for a child process to exit. They are useful
449 when you fork a separate process and need to know when it exits, but you
450 do not wait for that by blocking.
451
452 Idle watchers invoke their callback when the event loop has handled all
453 outstanding events, polled for new events and didn't find any, i.e., when
454 your process is otherwise idle. They are useful if you want to do some
455 non-trivial data processing that can be done when your program doesn't
456 have anything better to do.
457
458 All these watcher types are described in detail in the main L<AnyEvent>
459 manual page.
460
461 Sometimes you also need to know what the current time is: C<<
462 AnyEvent->now >> returns the time the event toolkit uses to schedule
463 relative timers, and is usually what you want. It is often cached (which
464 means it can be a bit outdated). In that case, you can use the more costly
465 C<< AnyEvent->time >> method which will ask your operating system for the
466 current time, which is slower, but also more up to date.
467
468 =head1 Network programming and AnyEvent
469
470 So far you have seen how to register event watchers and handle events.
471
472 This is a great foundation to write network clients and servers, and might
473 be all that your module (or program) ever requires, but writing your own
474 I/O buffering again and again becomes tedious, not to mention that it
475 attracts errors.
476
477 While the core L<AnyEvent> module is still small and self-contained,
478 the distribution comes with some very useful utility modules such as
479 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
480 make your life as non-blocking network programmer a lot easier.
481
482 Here is a quick overview over these three modules:
483
484 =head2 L<AnyEvent::DNS>
485
486 This module allows fully asynchronous DNS resolution. It is used mainly by
487 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
488 a great way to do other DNS resolution tasks, such as reverse lookups of
489 IP addresses for log files.
490
491 =head2 L<AnyEvent::Handle>
492
493 This module handles non-blocking IO on (socket-, pipe- etc.) file handles
494 in an event based manner. It provides a wrapper object around your file
495 handle that provides queueing and buffering of incoming and outgoing data
496 for you.
497
498 It also implements the most common data formats, such as text lines, or
499 fixed and variable-width data blocks.
500
501 =head2 L<AnyEvent::Socket>
502
503 This module provides you with functions that handle socket creation
504 and IP address magic. The two main functions are C<tcp_connect> and
505 C<tcp_server>. The former will connect a (streaming) socket to an internet
506 host for you and the later will make a server socket for you, to accept
507 connections.
508
509 This module also comes with transparent IPv6 support, this means: If you
510 write your programs with this module, you will be IPv6 ready without doing
511 anything special.
512
513 It also works around a lot of portability quirks (especially on the
514 windows platform), which makes it even easier to write your programs in a
515 portable way (did you know that windows uses different error codes for all
516 socket functions and that Perl does not know about these? That "Unknown
517 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
518 successful? That unsuccessful TCP connects might never be reported back
519 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
520 ignored instead of being in progress? AnyEvent::Socket works around all of
521 these Windows/Perl bugs for you).
522
523 =head2 Implementing a parallel finger client with non-blocking connects
524 and AnyEvent::Socket
525
526 The finger protocol is one of the simplest protocols in use on the
527 internet. Or in use in the past, as almost nobody uses it anymore.
528
529 It works by connecting to the finger port on another host, writing a
530 single line with a user name and then reading the finger response, as
531 specified by that user. OK, RFC 1288 specifies a vastly more complex
532 protocol, but it basically boils down to this:
533
534 # telnet kernel.org finger
535 Trying 204.152.191.37...
536 Connected to kernel.org (204.152.191.37).
537 Escape character is '^]'.
538
539 The latest stable version of the Linux kernel is: [...]
540 Connection closed by foreign host.
541
542 So let's write a little AnyEvent function that makes a finger request:
543
544 use AnyEvent;
545 use AnyEvent::Socket;
546
547 sub finger($$) {
548 my ($user, $host) = @_;
549
550 # use a condvar to return results
551 my $cv = AnyEvent->condvar;
552
553 # first, connect to the host
554 tcp_connect $host, "finger", sub {
555 # the callback receives the socket handle - or nothing
556 my ($fh) = @_
557 or return $cv->send;
558
559 # now write the username
560 syswrite $fh, "$user\015\012";
561
562 my $response;
563
564 # register a read watcher
565 my $read_watcher; $read_watcher = AnyEvent->io (
566 fh => $fh,
567 poll => "r",
568 cb => sub {
569 my $len = sysread $fh, $response, 1024, length $response;
570
571 if ($len <= 0) {
572 # we are done, or an error occured, lets ignore the latter
573 undef $read_watcher; # no longer interested
574 $cv->send ($response); # send results
575 }
576 },
577 );
578 };
579
580 # pass $cv to the caller
581 $cv
582 }
583
584 That's a mouthful! Let's dissect this function a bit, first the overall
585 function and execution flow:
586
587 sub finger($$) {
588 my ($user, $host) = @_;
589
590 # use a condvar to return results
591 my $cv = AnyEvent->condvar;
592
593 # first, connect to the host
594 tcp_connect $host, "finger", sub {
595 ...
596 };
597
598 $cv
599 }
600
601 This isn't too complicated, just a function with two parameters, that
602 creates a condition variable, returns it, and while it does that,
603 initiates a TCP connect to C<$host>. The condition variable will be used
604 by the caller to receive the finger response, but one could equally well
605 pass a third argument, a callback, to the function.
606
607 Since we are programming event'ish, we do not wait for the connect to
608 finish - it could block the program for a minute or longer!
609
610 Instead, we pass the callback it should invoke when the connect is done to
611 C<tcp_connect>. If it is successful, that callback gets called with the
612 socket handle as first argument, otherwise, nothing will be passed to our
613 callback. The important point is that it will always be called as soon as
614 the outcome of the TCP connect is known.
615
616 This style of programming is also called "continuation style": the
617 "continuation" is simply the way the program continues - normally at the
618 next line after some statement (the exception is loops or things like
619 C<return>). When we are interested in events, however, we instead specify
620 the "continuation" of our program by passing a closure, which makes that
621 closure the "continuation" of the program.
622
623 The C<tcp_connect> call is like saying "return now, and when the
624 connection is established or it failed, continue there".
625
626 Now let's look at the callback/closure in more detail:
627
628 # the callback receives the socket handle - or nothing
629 my ($fh) = @_
630 or return $cv->send;
631
632 The first thing the callback does is indeed save the socket handle in
633 C<$fh>. When there was an error (no arguments), then our instinct as
634 expert Perl programmers would tell us to C<die>:
635
636 my ($fh) = @_
637 or die "$host: $!";
638
639 While this would give good feedback to the user (if he happens to watch
640 standard error), our program would probably stop working here, as we never
641 report the results to anybody, certainly not the caller of our C<finger>
642 function, and most event loops continue even after a C<die>!
643
644 This is why we instead C<return>, but also call C<< $cv->send >> without
645 any arguments to signal to the condvar consumer that something bad has
646 happened. The return value of C<< $cv->send >> is irrelevant, as is
647 the return value of our callback. The C<return> statement is simply
648 used for the side effect of, well, returning immediately from the
649 callback. Checking for errors and handling them this way is very common,
650 which is why this compact idiom is so handy.
651
652 As the next step in the finger protocol, we send the username to the
653 finger daemon on the other side of our connection (the kernel.org finger
654 service doesn't actually wait for a username, but the net is running out
655 of finger servers fast):
656
657 syswrite $fh, "$user\015\012";
658
659 Note that this isn't 100% clean socket programming - the socket could,
660 for whatever reasons, not accept our data. When writing a small amount
661 of data like in this example it doesn't matter, as a socket buffer is
662 almost always big enough for a mere "username", but for real-world
663 cases you might need to implement some kind of write buffering - or use
664 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
665 next section.
666
667 What we I<do> have to do is to implement our own read buffer - the response
668 data could arrive late or in multiple chunks, and we cannot just wait for
669 it (event-based programming, you know?).
670
671 To do that, we register a read watcher on the socket which waits for data:
672
673 my $read_watcher; $read_watcher = AnyEvent->io (
674 fh => $fh,
675 poll => "r",
676
677 There is a trick here, however: the read watcher isn't stored in a global
678 variable, but in a local one - if the callback returns, it would normally
679 destroy the variable and its contents, which would in turn unregister our
680 watcher.
681
682 To avoid that, we C<undef>ine the variable in the watcher callback. This
683 means that, when the C<tcp_connect> callback returns, perl thinks (quite
684 correctly) that the read watcher is still in use - namely in the callback,
685 and thus keeps it alive even if nothing else in the program refers to it
686 anymore (it is much like Baron Münchhausen keeping himself from dying by
687 pulling himself out of a swamp).
688
689 The trick, however, is that instead of:
690
691 my $read_watcher = AnyEvent->io (...
692
693 The program does:
694
695 my $read_watcher; $read_watcher = AnyEvent->io (...
696
697 The reason for this is a quirk in the way Perl works: variable names
698 declared with C<my> are only visible in the I<next> statement. If the
699 whole C<< AnyEvent->io >> call, including the callback, would be done in
700 a single statement, the callback could not refer to the C<$read_watcher>
701 variable to undefine it, so it is done in two statements.
702
703 Whether you'd want to format it like this is of course a matter of style,
704 this way emphasizes that the declaration and assignment really are one
705 logical statement.
706
707 The callback itself calls C<sysread> for as many times as necessary, until
708 C<sysread> returns either an error or end-of-file:
709
710 cb => sub {
711 my $len = sysread $fh, $response, 1024, length $response;
712
713 if ($len <= 0) {
714
715 Note that C<sysread> has the ability to append data it reads to a scalar,
716 by specifying an offset, a feature of which we make good use of in this
717 example.
718
719 When C<sysread> indicates we are done, the callback C<undef>ines
720 the watcher and then C<send>'s the response data to the condition
721 variable. All this has the following effects:
722
723 Undefining the watcher destroys it, as our callback was the only one still
724 having a reference to it. When the watcher gets destroyed, it destroys the
725 callback, which in turn means the C<$fh> handle is no longer used, so that
726 gets destroyed as well. The result is that all resources will be nicely
727 cleaned up by perl for us.
728
729 =head3 Using the finger client
730
731 Now, we could probably write the same finger client in a simpler way if
732 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
733 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
734
735 But the main advantage is that we can not only run this finger function in
736 the background, we even can run multiple sessions in parallel, like this:
737
738 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
739 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
740 my $f3 = finger "hpa" , "kernel.org"; # finger hpa
741
742 print "trouble tickets:\n" , $f1->recv, "\n";
743 print "trouble ticket #1736:\n", $f2->recv, "\n";
744 print "kernel release info: " , $f3->recv, "\n";
745
746 It doesn't look like it, but in fact all three requests run in
747 parallel. The code waits for the first finger request to finish first, but
748 that doesn't keep it from executing them parallel: when the first C<recv>
749 call sees that the data isn't ready yet, it serves events for all three
750 requests automatically, until the first request has finished.
751
752 The second C<recv> call might either find the data is already there, or it
753 will continue handling events until that is the case, and so on.
754
755 By taking advantage of network latencies, which allows us to serve other
756 requests and events while we wait for an event on one socket, the overall
757 time to do these three requests will be greatly reduced, typically all
758 three are done in the same time as the slowest of them would need to finish.
759
760 By the way, you do not actually have to wait in the C<recv> method on an
761 AnyEvent condition variable - after all, waiting is evil - you can also
762 register a callback:
763
764 $cv->cb (sub {
765 my $response = shift->recv;
766 # ...
767 });
768
769 The callback will only be invoked when C<send> was called. In fact,
770 instead of returning a condition variable you could also pass a third
771 parameter to your finger function, the callback to invoke with the
772 response:
773
774 sub finger($$$) {
775 my ($user, $host, $cb) = @_;
776
777 How you implement it is a matter of taste - if you expect your function to
778 be used mainly in an event-based program you would normally prefer to pass
779 a callback directly. If you write a module and expect your users to use
780 it "synchronously" often (for example, a simple http-get script would not
781 really care much for events), then you would use a condition variable and
782 tell them "simply C<< ->recv >> the data".
783
784 =head3 Problems with the implementation and how to fix them
785
786 To make this example more real-world-ready, we would not only implement
787 some write buffering (for the paranoid, or maybe denial-of-service aware
788 security expert), but we would also have to handle timeouts and maybe
789 protocol errors.
790
791 Doing this quickly gets unwieldy, which is why we introduce
792 L<AnyEvent::Handle> in the next section, which takes care of all these
793 details for you and let's you concentrate on the actual protocol.
794
795
796 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
797
798 The L<AnyEvent::Handle> module has been hyped quite a bit in this document
799 so far, so let's see what it really offers.
800
801 As finger is such a simple protocol, let's try something slightly more
802 complicated: HTTP/1.0.
803
804 An HTTP GET request works by sending a single request line that indicates
805 what you want the server to do and the URI you want to act it on, followed
806 by as many "header" lines (C<Header: data>, same as e-mail headers) as
807 required for the request, ended by an empty line.
808
809 The response is formatted very similarly, first a line with the response
810 status, then again as many header lines as required, then an empty line,
811 followed by any data that the server might send.
812
813 Again, let's try it out with C<telnet> (I condensed the output a bit - if
814 you want to see the full response, do it yourself).
815
816 # telnet www.google.com 80
817 Trying 209.85.135.99...
818 Connected to www.google.com (209.85.135.99).
819 Escape character is '^]'.
820 GET /test HTTP/1.0
821
822 HTTP/1.0 404 Not Found
823 Date: Mon, 02 Jun 2008 07:05:54 GMT
824 Content-Type: text/html; charset=UTF-8
825
826 <html><head>
827 [...]
828 Connection closed by foreign host.
829
830 The C<GET ...> and the empty line were entered manually, the rest of the
831 telnet output is google's response, in which case a C<404 not found> one.
832
833 So, here is how you would do it with C<AnyEvent::Handle>:
834
835 sub http_get {
836 my ($host, $uri, $cb) = @_;
837
838 tcp_connect $host, "http", sub {
839 my ($fh) = @_
840 or $cb->("HTTP/1.0 500 $!");
841
842 # store results here
843 my ($response, $header, $body);
844
845 my $handle; $handle = new AnyEvent::Handle
846 fh => $fh,
847 on_error => sub {
848 undef $handle;
849 $cb->("HTTP/1.0 500 $!");
850 },
851 on_eof => sub {
852 undef $handle; # keep it alive till eof
853 $cb->($response, $header, $body);
854 };
855
856 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
857
858 # now fetch response status line
859 $handle->push_read (line => sub {
860 my ($handle, $line) = @_;
861 $response = $line;
862 });
863
864 # then the headers
865 $handle->push_read (line => "\015\012\015\012", sub {
866 my ($handle, $line) = @_;
867 $header = $line;
868 });
869
870 # and finally handle any remaining data as body
871 $handle->on_read (sub {
872 $body .= $_[0]->rbuf;
873 $_[0]->rbuf = "";
874 });
875 };
876 }
877
878 And now let's go through it step by step. First, as usual, the overall
879 C<http_get> function structure:
880
881 sub http_get {
882 my ($host, $uri, $cb) = @_;
883
884 tcp_connect $host, "http", sub {
885 ...
886 };
887 }
888
889 Unlike in the finger example, this time the caller has to pass a callback
890 to C<http_get>. Also, instead of passing a URL as one would expect, the
891 caller has to provide the hostname and URI - normally you would use the
892 C<URI> module to parse a URL and separate it into those parts, but that is
893 left to the inspired reader :)
894
895 Since everything else is left to the caller, all C<http_get> does it to
896 initiate the connection with C<tcp_connect> and leave everything else to
897 it's callback.
898
899 The first thing the callback does is check for connection errors and
900 declare some variables:
901
902 my ($fh) = @_
903 or $cb->("HTTP/1.0 500 $!");
904
905 my ($response, $header, $body);
906
907 Instead of having an extra mechanism to signal errors, connection errors
908 are signalled by crafting a special "response status line", like this:
909
910 HTTP/1.0 500 Connection refused
911
912 This means the caller cannot distinguish (easily) between
913 locally-generated errors and server errors, but it simplifies error
914 handling for the caller a lot.
915
916 The next step finally involves L<AnyEvent::Handle>, namely it creates the
917 handle object:
918
919 my $handle; $handle = new AnyEvent::Handle
920 fh => $fh,
921 on_error => sub {
922 undef $handle;
923 $cb->("HTTP/1.0 500 $!");
924 },
925 on_eof => sub {
926 undef $handle; # keep it alive till eof
927 $cb->($response, $header, $body);
928 };
929
930 The constructor expects a file handle, which gets passed via the C<fh>
931 argument.
932
933 The remaining two argument pairs specify two callbacks to be called on
934 any errors (C<on_error>) and in the case of a normal connection close
935 (C<on_eof>).
936
937 In the first case, we C<undef>ine the handle object and pass the error to
938 the callback provided by the callback - done.
939
940 In the second case we assume everything went fine and pass the results
941 gobbled up so far to the caller-provided callback. This is not quite
942 perfect, as when the server "cleanly" closes the connection in the middle
943 of sending headers we might wrongly report this as an "OK" to the caller,
944 but then, HTTP doesn't support a perfect mechanism that would detect such
945 problems in all cases, so we don't bother either.
946
947 =head3 The write queue
948
949 The next line sends the actual request:
950
951 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
952
953 No headers will be sent (this is fine for simple requests), so the whole
954 request is just a single line followed by an empty line to signal the end
955 of the headers to the server.
956
957 The more interesting question is why the method is called C<push_write>
958 and not just write. The reason is that you can I<always> add some write
959 data without blocking, and to do this, AnyEvent::Handle needs some write
960 queue internally - and C<push_write> simply pushes some data onto the end
961 of that queue, just like Perl's C<push> pushes data onto the end of an
962 array.
963
964 The deeper reason is that at some point in the future, there might
965 be C<unshift_write> as well, and in any case, we will shortly meet
966 C<push_read> and C<unshift_read>, and it's usually easiest to remember if
967 all those functions have some symmetry in their name.
968
969 If C<push_write> is called with more than one argument, then you can even
970 do I<formatted> I/O, which simply means your data will be transformed in
971 some ways. For example, this would JSON-encode your data before pushing it
972 to the write queue:
973
974 $handle->push_write (json => [1, 2, 3]);
975
976 Apart from that, this pretty much summarises the write queue, there is
977 little else to it.
978
979 Reading the response is far more interesting, because it involves the more
980 powerful and complex I<read queue>:
981
982 =head3 The read queue
983
984 The response consists of three parts: a single line with the response
985 status, a single paragraph of headers ended by an empty line, and the
986 request body, which is simply the remaining data on that connection.
987
988 For the first two, we push two read requests onto the read queue:
989
990 # now fetch response status line
991 $handle->push_read (line => sub {
992 my ($handle, $line) = @_;
993 $response = $line;
994 });
995
996 # then the headers
997 $handle->push_read (line => "\015\012\015\012", sub {
998 my ($handle, $line) = @_;
999 $header = $line;
1000 });
1001
1002 While one can simply push a single callback to parse the data the
1003 queue, I<formatted> I/O really comes to our advantage here, as there
1004 is a ready-made "read line" read type. The first read expects a single
1005 line, ended by C<\015\012> (the standard end-of-line marker in internet
1006 protocols).
1007
1008 The second "line" is actually a single paragraph - instead of reading it
1009 line by line we tell C<push_read> that the end-of-line marker is really
1010 C<\015\012\015\012>, which is an empty line. The result is that the whole
1011 header paragraph will be treated as a single line and read. The word
1012 "line" is interpreted very freely, much like Perl itself does it.
1013
1014 Note that push read requests are pushed immediately after creating the
1015 handle object - since AnyEvent::Handle provides a queue we can push as
1016 many requests as we want, and AnyEvent::Handle will handle them in order.
1017
1018 There is, however, no read type for "the remaining data". For that, we
1019 install our own C<on_read> callback:
1020
1021 # and finally handle any remaining data as body
1022 $handle->on_read (sub {
1023 $body .= $_[0]->rbuf;
1024 $_[0]->rbuf = "";
1025 });
1026
1027 This callback is invoked every time data arrives and the read queue is
1028 empty - which in this example will only be the case when both response and
1029 header have been read. The C<on_read> callback could actually have been
1030 specified when constructing the object, but doing it this way preserves
1031 logical ordering.
1032
1033 The read callback simply adds the current read buffer to it's C<$body>
1034 variable and, most importantly, I<empties> the buffer by assigning the
1035 empty string to it.
1036
1037 After AnyEvent::Handle has been so instructed, it will handle incoming
1038 data according to these instructions - if all goes well, the callback will
1039 be invoked with the response data, if not, it will get an error.
1040
1041 In general, you can implement pipelining (a semi-advanced feature of many
1042 protocols) very easy with AnyEvent::Handle: If you have a protocol with a
1043 request/response structure, your request methods/functions will all look
1044 like this (simplified):
1045
1046 sub request {
1047
1048 # send the request to the server
1049 $handle->push_write (...);
1050
1051 # push some response handlers
1052 $handle->push_read (...);
1053 }
1054
1055 This means you can queue as many requests as you want, and while
1056 AnyEvent::Handle goes through its read queue to handle the response data,
1057 the other side can work on the next request - queueing the request just
1058 appends some data to the write queue and installs a handler to be called
1059 later.
1060
1061 You might ask yourself how to handle decisions you can only make I<after>
1062 you have received some data (such as handling a short error response or a
1063 long and differently-formatted response). The answer to this problem is
1064 C<unshift_read>, which we will introduce together with an example in the
1065 coming sections.
1066
1067 =head3 Using C<http_get>
1068
1069 Finally, here is how you would use C<http_get>:
1070
1071 http_get "www.google.com", "/", sub {
1072 my ($response, $header, $body) = @_;
1073
1074 print
1075 $response, "\n",
1076 $body;
1077 };
1078
1079 And of course, you can run as many of these requests in parallel as you
1080 want (and your memory supports).
1081
1082 =head3 HTTPS
1083
1084 Now, as promised, let's implement the same thing for HTTPS, or more
1085 correctly, let's change our C<http_get> function into a function that
1086 speaks HTTPS instead.
1087
1088 HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1089 B<S>ecurity is the official name for what most people refer to as C<SSL>)
1090 that contains standard HTTP protocol exchanges. The only other difference
1091 to HTTP is that by default it uses port C<443> instead of port C<80>.
1092
1093 To implement these two differences we need two tiny changes, first, in the
1094 C<tcp_connect> call we replace C<http> by C<https>):
1095
1096 tcp_connect $host, "https", sub { ...
1097
1098 The other change deals with TLS, which is something L<AnyEvent::Handle>
1099 does for us, as long as I<you> made sure that the L<Net::SSLeay> module
1100 is around. To enable TLS with L<AnyEvent::Handle>, we simply pass an
1101 additional C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1102
1103 tls => "connect",
1104
1105 Specifying C<tls> enables TLS, and the argument specifies whether
1106 AnyEvent::Handle is the server side ("accept") or the client side
1107 ("connect") for the TLS connection, as unlike TCP, there is a clear
1108 server/client relationship in TLS.
1109
1110 That's all.
1111
1112 Of course, all this should be handled transparently by C<http_get>
1113 after parsing the URL. If you need this, see the part about exercising
1114 your inspiration earlier in this document. You could also use the
1115 L<AnyEvent::HTTP> module from CPAN, which implements all this and works
1116 around a lot of quirks for you, too.
1117
1118 =head3 The read queue - revisited
1119
1120 HTTP always uses the same structure in its responses, but many protocols
1121 require parsing responses differently depending on the response itself.
1122
1123 For example, in SMTP, you normally get a single response line:
1124
1125 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1126
1127 But SMTP also supports multi-line responses:
1128
1129 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1130 220-hey guys
1131 220 my response is longer than yours
1132
1133 To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1134 C<unshift_read> will not append your read request to the end of the read
1135 queue, but instead it will prepend it to the queue.
1136
1137 This is useful in the situation above: Just push your response-line read
1138 request when sending the SMTP command, and when handling it, you look at
1139 the line to see if more is to come, and C<unshift_read> another reader
1140 callback if required, like this:
1141
1142 my $response; # response lines end up in here
1143
1144 my $read_response; $read_response = sub {
1145 my ($handle, $line) = @_;
1146
1147 $response .= "$line\n";
1148
1149 # check for continuation lines ("-" as 4th character")
1150 if ($line =~ /^...-/) {
1151 # if yes, then unshift another line read
1152 $handle->unshift_read (line => $read_response);
1153
1154 } else {
1155 # otherwise we are done
1156
1157 # free callback
1158 undef $read_response;
1159
1160 print "we are don reading: $response\n";
1161 }
1162 };
1163
1164 $handle->push_read (line => $read_response);
1165
1166 This recipe can be used for all similar parsing problems, for example in
1167 NNTP, the response code to some commands indicates that more data will be
1168 sent:
1169
1170 $handle->push_write ("article 42");
1171
1172 # read response line
1173 $handle->push_read (line => sub {
1174 my ($handle, $status) = @_;
1175
1176 # article data following?
1177 if ($status =~ /^2/) {
1178 # yes, read article body
1179
1180 $handle->unshift_read (line => "\012.\015\012", sub {
1181 my ($handle, $body) = @_;
1182
1183 $finish->($status, $body);
1184 });
1185
1186 } else {
1187 # some error occured, no article data
1188
1189 $finish->($status);
1190 }
1191 }
1192
1193 =head3 Your own read queue handler
1194
1195 Sometimes, your protocol doesn't play nice and uses lines or chunks of
1196 data not formatted in a way handled by AnyEvent::Handle out of the box. In
1197 this case you have to implement your own read parser.
1198
1199 To make up a contorted example, imagine you are looking for an even
1200 number of characters followed by a colon (":"). Also imagine that
1201 AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1202 had to do it manually.
1203
1204 To implement a read handler for this, you would C<push_read> (or
1205 C<unshift_read>) just a single code reference.
1206
1207 This code reference will then be called each time there is (new) data
1208 available in the read buffer, and is expected to either successfully
1209 eat/consume some of that data (and return true) or to return false to
1210 indicate that it wants to be called again.
1211
1212 If the code reference returns true, then it will be removed from the
1213 read queue (because it has parsed/consumed whatever it was supposed to
1214 consume), otherwise it stays in the front of it.
1215
1216 The example above could be coded like this:
1217
1218 $handle->push_read (sub {
1219 my ($handle) = @_;
1220
1221 # check for even number of characters + ":"
1222 # and remove the data if a match is found.
1223 # if not, return false (actually nothing)
1224
1225 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1226 or return;
1227
1228 # we got some data in $1, pass it to whoever wants it
1229 $finish->($1);
1230
1231 # and return true to indicate we are done
1232 1
1233 });
1234
1235 This concludes our little tutorial.
1236
1237 =head1 Where to go from here?
1238
1239 This introduction should have explained the key concepts between
1240 L<AnyEvent>, namely event watchers and condition variables,
1241 L<AnyEvent::Socket>, for your basic networking needs, and
1242 L<AnyEvent::Handle>, a nice wrapper around handles.
1243
1244 You could either start coding stuff right away, look at those manual
1245 pages for the gory details, or roam CPAN for other AnyEvent modules (such
1246 as L<AnyEvent::IRC> or L<AnyEvent::HTTP>) to see more code examples (or
1247 simply to use them).
1248
1249 If you need a protocol that doesn't have an implementation using AnyEvent,
1250 remember that you can mix AnyEvent with one other event framework, such as
1251 L<POE>, so you can always use AnyEvent for your own tasks plus modules of
1252 one other event framework to fill any gaps.
1253
1254 And last not least, you could also look at L<Coro>, especially
1255 L<Coro::AnyEvent>, to see how you can turn event-based programming from
1256 callback style back to the usual imperative style (also called "inversion
1257 of control" - AnyEvent calls I<you>, but Coro lets I<you> call AnyEvent).
1258
1259 =head1 Authors
1260
1261 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1262