ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.28
Committed: Tue Aug 31 01:02:03 2010 UTC (13 years, 9 months ago) by root
Branch: MAIN
CVS Tags: rel-5_28
Changes since 1.27: +116 -121 lines
Log Message:
Message-ID: <20100714112109.GA31222@toroid.org>

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Intro - an introductory tutorial to AnyEvent
4
5 =head1 Introduction to AnyEvent
6
7 This is a tutorial that will introduce you to the features of AnyEvent.
8
9 The first part introduces the core AnyEvent module (after swamping you a
10 bit in evangelism), which might already provide all you ever need: If you
11 are only interested in AnyEvent's event handling capabilities, read no
12 further.
13
14 The second part focuses on network programming using sockets, for which
15 AnyEvent offers a lot of support you can use, and a lot of workarounds
16 around portability quirks.
17
18
19 =head1 What is AnyEvent?
20
21 If you don't care for the whys and want to see code, skip this section!
22
23 AnyEvent is first of all just a framework to do event-based
24 programming. Typically such frameworks are an all-or-nothing thing: If you
25 use one such framework, you can't (easily, or even at all) use another in
26 the same program.
27
28 AnyEvent is different - it is a thin abstraction layer on top of other
29 event loops, just like DBI is an abstraction of many different database
30 APIs. Its main purpose is to move the choice of the underlying framework
31 (the event loop) from the module author to the program author using the
32 module.
33
34 That means you can write code that uses events to control what it
35 does, without forcing other code in the same program to use the same
36 underlying framework as you do - i.e. you can create a Perl module
37 that is event-based using AnyEvent, and users of that module can still
38 choose between using L<Gtk2>, L<Tk>, L<Event> (or run inside Irssi or
39 rxvt-unicode) or any other supported event loop. AnyEvent even comes with
40 its own pure-perl event loop implementation, so your code works regardless
41 of other modules that might or might not be installed. The latter is
42 important, as AnyEvent does not have any hard dependencies to other
43 modules, which makes it easy to install, for example, when you lack a C
44 compiler. No matter what environment, AnyEvent will just cope with it.
45
46 A typical limitation of existing Perl modules such as L<Net::IRC> is that
47 they come with their own event loop: In L<Net::IRC>, a program which uses
48 it needs to start the event loop of L<Net::IRC>. That means that one
49 cannot integrate this module into a L<Gtk2> GUI for instance, as that
50 module, too, enforces the use of its own event loop (namely L<Glib>).
51
52 Another example is L<LWP>: it provides no event interface at all. It's
53 a pure blocking HTTP (and FTP etc.) client library, which usually means
54 that you either have to start another process or have to fork for a HTTP
55 request, or use threads (e.g. L<Coro::LWP>), if you want to do something
56 else while waiting for the request to finish.
57
58 The motivation behind these designs is often that a module doesn't want
59 to depend on some complicated XS-module (Net::IRC), or that it doesn't
60 want to force the user to use some specific event loop at all (LWP), out
61 of fear of severly limiting the usefulness of the module: If your module
62 requires Glib, it will not run in a Tk program.
63
64 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to
65 either:
66
67 =over 4
68
69 =item - write their own event loop (because it guarantees the availability
70 of an event loop everywhere - even on windows with no extra modules
71 installed).
72
73 =item - choose one specific event loop (because AnyEvent works with most
74 event loops available for Perl).
75
76 =back
77
78 If the module author uses L<AnyEvent> for all his (or her) event needs
79 (IO events, timers, signals, ...) then all other modules can just use
80 his module and don't have to choose an event loop or adapt to his event
81 loop. The choice of the event loop is ultimately made by the program
82 author who uses all the modules and writes the main program. And even
83 there he doesn't have to choose, he can just let L<AnyEvent> choose the
84 most efficient event loop available on the system.
85
86 Read more about this in the main documentation of the L<AnyEvent> module.
87
88
89 =head1 Introduction to Event-Based Programming
90
91 So what exactly is programming using events? It quite simply means that
92 instead of your code actively waiting for something, such as the user
93 entering something on STDIN:
94
95 $| = 1; print "enter your name> ";
96
97 my $name = <STDIN>;
98
99 You instead tell your event framework to notify you in the event of some
100 data being available on STDIN, by using a callback mechanism:
101
102 use AnyEvent;
103
104 $| = 1; print "enter your name> ";
105
106 my $name;
107
108 my $wait_for_input = AnyEvent->io (
109 fh => \*STDIN, # which file handle to check
110 poll => "r", # which event to wait for ("r"ead data)
111 cb => sub { # what callback to execute
112 $name = <STDIN>; # read it
113 }
114 );
115
116 # do something else here
117
118 Looks more complicated, and surely is, but the advantage of using events
119 is that your program can do something else instead of waiting for input
120 (side note: combining AnyEvent with a thread package such as Coro can
121 recoup much of the simplicity, effectively getting the best of two
122 worlds).
123
124 Waiting as done in the first example is also called "blocking" the process
125 because you "block"/keep your process from executing anything else while
126 you do so.
127
128 The second example avoids blocking by only registering interest in a read
129 event, which is fast and doesn't block your process. The callback will
130 be called only when data is available and can be read without blocking.
131
132 The "interest" is represented by an object returned by C<< AnyEvent->io
133 >> called a "watcher" object - thus named because it "watches" your
134 file handle (or other event sources) for the event you are interested in.
135
136 In the example above, we create an I/O watcher by calling the C<<
137 AnyEvent->io >> method. A lack of further interest in some event is
138 expressed by simply forgetting about its watcher, for example by
139 C<undef>-ing the only variable it is stored in. AnyEvent will
140 automatically clean up the watcher if it is no longer used, much like
141 Perl closes your file handles if you no longer use them anywhere.
142
143 =head3 A short note on callbacks
144
145 A common issue that hits people is the problem of passing parameters
146 to callbacks. Programmers used to languages such as C or C++ are often
147 used to a style where one passes the address of a function (a function
148 reference) and some data value, e.g.:
149
150 sub callback {
151 my ($arg) = @_;
152
153 $arg->method;
154 }
155
156 my $arg = ...;
157
158 call_me_back_later \&callback, $arg;
159
160 This is clumsy, as the place where behaviour is specified (when the
161 callback is registered) is often far away from the place where behaviour
162 is implemented. It also doesn't use Perl syntax to invoke the code. There
163 is also an abstraction penalty to pay as one has to I<name> the callback,
164 which often is unnecessary and leads to nonsensical or duplicated names.
165
166 In Perl, one can specify behaviour much more directly by using
167 I<closures>. Closures are code blocks that take a reference to the
168 enclosing scope(s) when they are created. This means lexical variables
169 in scope when a closure is created can be used inside the closure:
170
171 my $arg = ...;
172
173 call_me_back_later sub { $arg->method };
174
175 Under most circumstances, closures are faster, use fewer resources and
176 result in much clearer code then the traditional approach. Faster,
177 because parameter passing and storing them in local variables in Perl
178 is relatively slow. Fewer resources, because closures take references
179 to existing variables without having to create new ones, and clearer
180 code because it is immediately obvious that the second example calls the
181 C<method> method when the callback is invoked.
182
183 Apart from these, the strongest argument for using closures with AnyEvent
184 is that AnyEvent does not allow passing parameters to the callback, so
185 closures are the only way to achieve that in most cases :->
186
187
188 =head3 A hint on debugging
189
190 AnyEvent does not, by default, do any argument checking. This can lead to
191 strange and unexpected results especially if you are trying to find your
192 way with AnyEvent.
193
194 AnyEvent supports a special "strict" mode - off by default - which does very
195 strict argument checking, at the expense of being somewhat slower. During
196 development, however, this mode is very useful.
197
198 You can enable this strict mode either by having an environment variable
199 C<PERL_ANYEVENT_STRICT> with a true value in your environment:
200
201 PERL_ANYEVENT_STRICT=1 perl test.pl
202
203 Or you can write C<use AnyEvent::Strict> in your program, which has the
204 same effect (do not do this in production, however).
205
206
207 =head2 Condition Variables
208
209 Back to the I/O watcher example: The code is not yet a fully working
210 program, and will not work as-is. The reason is that your callback will
211 not be invoked out of the blue; you have to run the event loop first.
212 Also, event-based programs need to block sometimes too, such as when
213 there is nothing to do, and everything is waiting for new events to
214 arrive.
215
216 In AnyEvent, this is done using condition variables. Condition variables
217 are named "condition variables" because they represent a condition that is
218 initially false and needs to be fulfilled.
219
220 You can also call them "merge points", "sync points", "rendezvous ports"
221 or even callbacks and many other things (and they are often called these
222 names in other frameworks). The important point is that you can create them
223 freely and later wait for them to become true.
224
225 Condition variables have two sides - one side is the "producer" of the
226 condition (whatever code detects and flags the condition), the other side
227 is the "consumer" (the code that waits for that condition).
228
229 In our example in the previous section, the producer is the event callback
230 and there is no consumer yet - let's change that right now:
231
232 use AnyEvent;
233
234 $| = 1; print "enter your name> ";
235
236 my $name;
237
238 my $name_ready = AnyEvent->condvar;
239
240 my $wait_for_input = AnyEvent->io (
241 fh => \*STDIN,
242 poll => "r",
243 cb => sub {
244 $name = <STDIN>;
245 $name_ready->send;
246 }
247 );
248
249 # do something else here
250
251 # now wait until the name is available:
252 $name_ready->recv;
253
254 undef $wait_for_input; # watche rno longer needed
255
256 print "your name is $name\n";
257
258 This program creates an AnyEvent condvar by calling the C<<
259 AnyEvent->condvar >> method. It then creates a watcher as usual, but
260 inside the callback it C<send>s the C<$name_ready> condition variable,
261 which causes whoever is waiting on it to continue.
262
263 The "whoever" in this case is the code that follows, which calls C<<
264 $name_ready->recv >>: The producer calls C<send>, the consumer calls
265 C<recv>.
266
267 If there is no C<$name> available yet, then the call to C<<
268 $name_ready->recv >> will halt your program until the condition becomes
269 true.
270
271 As the names C<send> and C<recv> imply, you can actually send and receive
272 data using this, for example, the above code could also be written like
273 this, without an extra variable to store the name in:
274
275 use AnyEvent;
276
277 $| = 1; print "enter your name> ";
278
279 my $name_ready = AnyEvent->condvar;
280
281 my $wait_for_input = AnyEvent->io (
282 fh => \*STDIN, poll => "r",
283 cb => sub { $name_ready->send (scalar <STDIN>) }
284 );
285
286 # do something else here
287
288 # now wait and fetch the name
289 my $name = $name_ready->recv;
290
291 undef $wait_for_input; # watche rno longer needed
292
293 print "your name is $name\n";
294
295 You can pass any number of arguments to C<send>, and every subsequent
296 call to C<recv> will return them.
297
298 =head2 The "main loop"
299
300 Most event-based frameworks have something called a "main loop" or "event
301 loop run function" or something similar.
302
303 Just like in C<recv> AnyEvent, these functions need to be called
304 eventually so that your event loop has a chance of actually looking for
305 the events you are interested in.
306
307 For example, in a L<Gtk2> program, the above example could also be written
308 like this:
309
310 use Gtk2 -init;
311 use AnyEvent;
312
313 ############################################
314 # create a window and some label
315
316 my $window = new Gtk2::Window "toplevel";
317 $window->add (my $label = new Gtk2::Label "soon replaced by name");
318
319 $window->show_all;
320
321 ############################################
322 # do our AnyEvent stuff
323
324 $| = 1; print "enter your name> ";
325
326 my $name_ready = AnyEvent->condvar;
327
328 my $wait_for_input = AnyEvent->io (
329 fh => \*STDIN, poll => "r",
330 cb => sub {
331 # set the label
332 $label->set_text (scalar <STDIN>);
333 print "enter another name> ";
334 }
335 );
336
337 ############################################
338 # Now enter Gtk2's event loop
339
340 main Gtk2;
341
342 No condition variable anywhere in sight - instead, we just read a line
343 from STDIN and replace the text in the label. In fact, since nobody
344 C<undef>s C<$wait_for_input> you can enter multiple lines.
345
346 Instead of waiting for a condition variable, the program enters the Gtk2
347 main loop by calling C<< Gtk2->main >>, which will block the program and
348 wait for events to arrive.
349
350 This also shows that AnyEvent is quite flexible - you didn't have to do
351 anything to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
352 worked.
353
354 Admittedly, the example is a bit silly - who would want to read names
355 from standard input in a Gtk+ application? But imagine that instead of
356 doing that, you make an HTTP request in the background and display its
357 results. In fact, with event-based programming you can make many
358 HTTP requests in parallel in your program and still provide feedback to
359 the user and stay interactive.
360
361 And in the next part you will see how to do just that - by implementing an
362 HTTP request, on our own, with the utility modules AnyEvent comes with.
363
364 Before that, however, let's briefly look at how you would write your
365 program using only AnyEvent, without ever calling some other event
366 loop's run function.
367
368 In the example using condition variables, we used those to start waiting
369 for events, and in fact, condition variables are the solution:
370
371 my $quit_program = AnyEvent->condvar;
372
373 # create AnyEvent watchers (or not) here
374
375 $quit_program->recv;
376
377 If any of your watcher callbacks decide to quit (this is often
378 called an "unloop" in other frameworks), they can just call C<<
379 $quit_program->send >>. Of course, they could also decide not to and
380 call C<exit> instead, or they could decide never to quit (e.g. in a
381 long-running daemon program).
382
383 If you don't need some clean quit functionality and just want to run the
384 event loop, you can do this:
385
386 AnyEvent->condvar->recv;
387
388 And this is, in fact, the closest to the idea of a main loop run
389 function that AnyEvent offers.
390
391 =head2 Timers and other event sources
392
393 So far, we have used only I/O watchers. These are useful mainly to find
394 out whether a socket has data to read, or space to write more data. On sane
395 operating systems this also works for console windows/terminals (typically
396 on standard input), serial lines, all sorts of other devices, basically
397 almost everything that has a file descriptor but isn't a file itself. (As
398 usual, "sane" excludes windows - on that platform you would need different
399 functions for all of these, complicating code immensely - think "socket
400 only" on windows).
401
402 However, I/O is not everything - the second most important event source is
403 the clock. For example when doing an HTTP request you might want to time
404 out when the server doesn't answer within some predefined amount of time.
405
406 In AnyEvent, timer event watchers are created by calling the C<<
407 AnyEvent->timer >> method:
408
409 use AnyEvent;
410
411 my $cv = AnyEvent->condvar;
412
413 my $wait_one_and_a_half_seconds = AnyEvent->timer (
414 after => 1.5, # after how many seconds to invoke the cb?
415 cb => sub { # the callback to invoke
416 $cv->send;
417 },
418 );
419
420 # can do something else here
421
422 # now wait till our time has come
423 $cv->recv;
424
425 Unlike I/O watchers, timers are only interested in the amount of seconds
426 they have to wait. When (at least) that amount of time has passed,
427 AnyEvent will invoke your callback.
428
429 Unlike I/O watchers, which will call your callback as many times as there
430 is data available, timers are normally one-shot: after they have "fired"
431 once and invoked your callback, they are dead and no longer do anything.
432
433 To get a repeating timer, such as a timer firing roughly once per second,
434 you can specify an C<interval> parameter:
435
436 my $once_per_second = AnyEvent->timer (
437 after => 0, # first invoke ASAP
438 interval => 1, # then invoke every second
439 cb => sub { # the callback to invoke
440 $cv->send;
441 },
442 );
443
444 =head3 More esoteric sources
445
446 AnyEvent also has some other, more esoteric event sources you can tap
447 into: signal, child and idle watchers.
448
449 Signal watchers can be used to wait for "signal events", which means
450 your process was sent a signal (such as C<SIGTERM> or C<SIGUSR1>).
451
452 Child-process watchers wait for a child process to exit. They are useful
453 when you fork a separate process and need to know when it exits, but you
454 do not want to wait for that by blocking.
455
456 Idle watchers invoke their callback when the event loop has handled all
457 outstanding events, polled for new events and didn't find any, i.e., when
458 your process is otherwise idle. They are useful if you want to do some
459 non-trivial data processing that can be done when your program doesn't
460 have anything better to do.
461
462 All these watcher types are described in detail in the main L<AnyEvent>
463 manual page.
464
465 Sometimes you also need to know what the current time is: C<<
466 AnyEvent->now >> returns the time the event toolkit uses to schedule
467 relative timers, and is usually what you want. It is often cached (which
468 means it can be a bit outdated). In that case, you can use the more costly
469 C<< AnyEvent->time >> method which will ask your operating system for the
470 current time, which is slower, but also more up to date.
471
472 =head1 Network programming and AnyEvent
473
474 So far you have seen how to register event watchers and handle events.
475
476 This is a great foundation to write network clients and servers, and might
477 be all that your module (or program) ever requires, but writing your own
478 I/O buffering again and again becomes tedious, not to mention that it
479 attracts errors.
480
481 While the core L<AnyEvent> module is still small and self-contained,
482 the distribution comes with some very useful utility modules such as
483 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
484 make your life as a non-blocking network programmer a lot easier.
485
486 Here is a quick overview of these three modules:
487
488 =head2 L<AnyEvent::DNS>
489
490 This module allows fully asynchronous DNS resolution. It is used mainly by
491 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
492 a great way to do other DNS resolution tasks, such as reverse lookups of
493 IP addresses for log files.
494
495 =head2 L<AnyEvent::Handle>
496
497 This module handles non-blocking IO on (socket-, pipe- etc.) file handles
498 in an event based manner. It provides a wrapper object around your file
499 handle that provides queueing and buffering of incoming and outgoing data
500 for you.
501
502 It also implements the most common data formats, such as text lines, or
503 fixed and variable-width data blocks.
504
505 =head2 L<AnyEvent::Socket>
506
507 This module provides you with functions that handle socket creation
508 and IP address magic. The two main functions are C<tcp_connect> and
509 C<tcp_server>. The former will connect a (streaming) socket to an internet
510 host for you and the later will make a server socket for you, to accept
511 connections.
512
513 This module also comes with transparent IPv6 support, this means: If you
514 write your programs with this module, you will be IPv6 ready without doing
515 anything special.
516
517 It also works around a lot of portability quirks (especially on the
518 windows platform), which makes it even easier to write your programs in a
519 portable way (did you know that windows uses different error codes for all
520 socket functions and that Perl does not know about these? That "Unknown
521 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
522 successful? That unsuccessful TCP connects might never be reported back
523 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
524 ignored instead of being in progress? AnyEvent::Socket works around all of
525 these Windows/Perl bugs for you).
526
527 =head2 Implementing a parallel finger client with non-blocking connects
528 and AnyEvent::Socket
529
530 The finger protocol is one of the simplest protocols in use on the
531 internet. Or in use in the past, as almost nobody uses it anymore.
532
533 It works by connecting to the finger port on another host, writing a
534 single line with a user name and then reading the finger response, as
535 specified by that user. OK, RFC 1288 specifies a vastly more complex
536 protocol, but it basically boils down to this:
537
538 # telnet kernel.org finger
539 Trying 204.152.191.37...
540 Connected to kernel.org (204.152.191.37).
541 Escape character is '^]'.
542
543 The latest stable version of the Linux kernel is: [...]
544 Connection closed by foreign host.
545
546 So let's write a little AnyEvent function that makes a finger request:
547
548 use AnyEvent;
549 use AnyEvent::Socket;
550
551 sub finger($$) {
552 my ($user, $host) = @_;
553
554 # use a condvar to return results
555 my $cv = AnyEvent->condvar;
556
557 # first, connect to the host
558 tcp_connect $host, "finger", sub {
559 # the callback receives the socket handle - or nothing
560 my ($fh) = @_
561 or return $cv->send;
562
563 # now write the username
564 syswrite $fh, "$user\015\012";
565
566 my $response;
567
568 # register a read watcher
569 my $read_watcher; $read_watcher = AnyEvent->io (
570 fh => $fh,
571 poll => "r",
572 cb => sub {
573 my $len = sysread $fh, $response, 1024, length $response;
574
575 if ($len <= 0) {
576 # we are done, or an error occured, lets ignore the latter
577 undef $read_watcher; # no longer interested
578 $cv->send ($response); # send results
579 }
580 },
581 );
582 };
583
584 # pass $cv to the caller
585 $cv
586 }
587
588 That's a mouthful! Let's dissect this function a bit, first the overall
589 function and execution flow:
590
591 sub finger($$) {
592 my ($user, $host) = @_;
593
594 # use a condvar to return results
595 my $cv = AnyEvent->condvar;
596
597 # first, connect to the host
598 tcp_connect $host, "finger", sub {
599 ...
600 };
601
602 $cv
603 }
604
605 This isn't too complicated, just a function with two parameters that
606 creates a condition variable C<$cv>, initiates a TCP connect to
607 C<$host>, and returns C<$cv>. The caller can use the returned C<$cv> to
608 receive the finger response, but one could equally well pass a third
609 argument, a callback, to the function.
610
611 Since we are programming event'ish, we do not wait for the connect to
612 finish - it could block the program for a minute or longer!
613
614 Instead, we pass C<tcp_connect> a callback to invoke when the connect is
615 done. The callback is called with the socket handle as its first
616 argument if the connect succeeds, and no arguments otherwise. The
617 important point is that it will always be called as soon as the outcome
618 of the TCP connect is known.
619
620 This style of programming is also called "continuation style": the
621 "continuation" is simply the way the program continues - normally at the
622 next line after some statement (the exception is loops or things like
623 C<return>). When we are interested in events, however, we instead specify
624 the "continuation" of our program by passing a closure, which makes that
625 closure the "continuation" of the program.
626
627 The C<tcp_connect> call is like saying "return now, and when the
628 connection is established or the attempt failed, continue there".
629
630 Now let's look at the callback/closure in more detail:
631
632 # the callback receives the socket handle - or nothing
633 my ($fh) = @_
634 or return $cv->send;
635
636 The first thing the callback does is to save the socket handle in
637 C<$fh>. When there was an error (no arguments), then our instinct as
638 expert Perl programmers would tell us to C<die>:
639
640 my ($fh) = @_
641 or die "$host: $!";
642
643 While this would give good feedback to the user (if he happens to watch
644 standard error), our program would probably stop working here, as we never
645 report the results to anybody, certainly not the caller of our C<finger>
646 function, and most event loops continue even after a C<die>!
647
648 This is why we instead C<return>, but also call C<< $cv->send >> without
649 any arguments to signal to the condvar consumer that something bad has
650 happened. The return value of C<< $cv->send >> is irrelevant, as is
651 the return value of our callback. The C<return> statement is used for
652 the side effect of, well, returning immediately from the callback.
653 Checking for errors and handling them this way is very common, which is
654 why this compact idiom is so handy.
655
656 As the next step in the finger protocol, we send the username to the
657 finger daemon on the other side of our connection (the kernel.org finger
658 service doesn't actually wait for a username, but the net is running out
659 of finger servers fast):
660
661 syswrite $fh, "$user\015\012";
662
663 Note that this isn't 100% clean socket programming - the socket could,
664 for whatever reasons, not accept our data. When writing a small amount
665 of data like in this example it doesn't matter, as a socket buffer is
666 almost always big enough for a mere "username", but for real-world
667 cases you might need to implement some kind of write buffering - or use
668 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
669 next section.
670
671 What we I<do> have to do is implement our own read buffer - the response
672 data could arrive late or in multiple chunks, and we cannot just wait for
673 it (event-based programming, you know?).
674
675 To do that, we register a read watcher on the socket which waits for data:
676
677 my $read_watcher; $read_watcher = AnyEvent->io (
678 fh => $fh,
679 poll => "r",
680
681 There is a trick here, however: the read watcher isn't stored in a global
682 variable, but in a local one - if the callback returns, it would normally
683 destroy the variable and its contents, which would in turn unregister our
684 watcher.
685
686 To avoid that, we refer to the watcher variable in the watcher callback.
687 This means that, when the C<tcp_connect> callback returns, perl thinks
688 (quite correctly) that the read watcher is still in use - namely inside
689 the inner callback - and thus keeps it alive even if nothing else in the
690 program refers to it anymore (it is much like Baron Münchhausen keeping
691 himself from dying by pulling himself out of a swamp).
692
693 The trick, however, is that instead of:
694
695 my $read_watcher = AnyEvent->io (...
696
697 The program does:
698
699 my $read_watcher; $read_watcher = AnyEvent->io (...
700
701 The reason for this is a quirk in the way Perl works: variable names
702 declared with C<my> are only visible in the I<next> statement. If the
703 whole C<< AnyEvent->io >> call, including the callback, would be done in
704 a single statement, the callback could not refer to the C<$read_watcher>
705 variable to C<undef>ine it, so it is done in two statements.
706
707 Whether you'd want to format it like this is of course a matter of style.
708 This way emphasizes that the declaration and assignment really are one
709 logical statement.
710
711 The callback itself calls C<sysread> for as many times as necessary, until
712 C<sysread> returns either an error or end-of-file:
713
714 cb => sub {
715 my $len = sysread $fh, $response, 1024, length $response;
716
717 if ($len <= 0) {
718
719 Note that C<sysread> has the ability to append data it reads to a scalar
720 if we specify an offset, a feature which we make use of in this example.
721
722 When C<sysread> indicates we are done, the callback C<undef>ines
723 the watcher and then C<send>s the response data to the condition
724 variable. All this has the following effects:
725
726 Undefining the watcher destroys it, as our callback was the only one still
727 having a reference to it. When the watcher gets destroyed, it destroys the
728 callback, which in turn means the C<$fh> handle is no longer used, so that
729 gets destroyed as well. The result is that all resources will be nicely
730 cleaned up by perl for us.
731
732 =head3 Using the finger client
733
734 Now, we could probably write the same finger client in a simpler way if
735 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
736 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
737
738 But the main advantage is that we can not only run this finger function in
739 the background, we even can run multiple sessions in parallel, like this:
740
741 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
742 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
743 my $f3 = finger "hpa" , "kernel.org"; # finger hpa
744
745 print "trouble tickets:\n" , $f1->recv, "\n";
746 print "trouble ticket #1736:\n", $f2->recv, "\n";
747 print "kernel release info: " , $f3->recv, "\n";
748
749 It doesn't look like it, but in fact all three requests run in
750 parallel. The code waits for the first finger request to finish first, but
751 that doesn't keep it from executing them parallel: when the first C<recv>
752 call sees that the data isn't ready yet, it serves events for all three
753 requests automatically, until the first request has finished.
754
755 The second C<recv> call might either find the data is already there, or it
756 will continue handling events until that is the case, and so on.
757
758 By taking advantage of network latencies, which allows us to serve other
759 requests and events while we wait for an event on one socket, the overall
760 time to do these three requests will be greatly reduced, typically all
761 three are done in the same time as the slowest of the three requests.
762
763 By the way, you do not actually have to wait in the C<recv> method on an
764 AnyEvent condition variable - after all, waiting is evil - you can also
765 register a callback:
766
767 $f1->cb (sub {
768 my $response = shift->recv;
769 # ...
770 });
771
772 The callback will be invoked only when C<send> is called. In fact,
773 instead of returning a condition variable you could also pass a third
774 parameter to your finger function, the callback to invoke with the
775 response:
776
777 sub finger($$$) {
778 my ($user, $host, $cb) = @_;
779
780 How you implement it is a matter of taste - if you expect your function to
781 be used mainly in an event-based program you would normally prefer to pass
782 a callback directly. If you write a module and expect your users to use
783 it "synchronously" often (for example, a simple http-get script would not
784 really care much for events), then you would use a condition variable and
785 tell them "simply C<< ->recv >> the data".
786
787 =head3 Problems with the implementation and how to fix them
788
789 To make this example more real-world-ready, we would not only implement
790 some write buffering (for the paranoid, or maybe denial-of-service aware
791 security expert), but we would also have to handle timeouts and maybe
792 protocol errors.
793
794 Doing this quickly gets unwieldy, which is why we introduce
795 L<AnyEvent::Handle> in the next section, which takes care of all these
796 details for you and lets you concentrate on the actual protocol.
797
798
799 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
800
801 The L<AnyEvent::Handle> module has been hyped quite a bit in this document
802 so far, so let's see what it really offers.
803
804 As finger is such a simple protocol, let's try something slightly more
805 complicated: HTTP/1.0.
806
807 An HTTP GET request works by sending a single request line that indicates
808 what you want the server to do and the URI you want to act it on, followed
809 by as many "header" lines (C<Header: data>, same as e-mail headers) as
810 required for the request, followed by an empty line.
811
812 The response is formatted very similarly, first a line with the response
813 status, then again as many header lines as required, then an empty line,
814 followed by any data that the server might send.
815
816 Again, let's try it out with C<telnet> (I condensed the output a bit - if
817 you want to see the full response, do it yourself).
818
819 # telnet www.google.com 80
820 Trying 209.85.135.99...
821 Connected to www.google.com (209.85.135.99).
822 Escape character is '^]'.
823 GET /test HTTP/1.0
824
825 HTTP/1.0 404 Not Found
826 Date: Mon, 02 Jun 2008 07:05:54 GMT
827 Content-Type: text/html; charset=UTF-8
828
829 <html><head>
830 [...]
831 Connection closed by foreign host.
832
833 The C<GET ...> and the empty line were entered manually, the rest of the
834 telnet output is google's response, in this case a C<404 not found> one.
835
836 So, here is how you would do it with C<AnyEvent::Handle>:
837
838 sub http_get {
839 my ($host, $uri, $cb) = @_;
840
841 # store results here
842 my ($response, $header, $body);
843
844 my $handle; $handle = new AnyEvent::Handle
845 connect => [$host => 'http'],
846 on_error => sub {
847 $cb->("HTTP/1.0 500 $!");
848 $handle->destroy; # explicitly destroy handle
849 },
850 on_eof => sub {
851 $cb->($response, $header, $body);
852 $handle->destroy; # explicitly destroy handle
853 };
854
855 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
856
857 # now fetch response status line
858 $handle->push_read (line => sub {
859 my ($handle, $line) = @_;
860 $response = $line;
861 });
862
863 # then the headers
864 $handle->push_read (line => "\015\012\015\012", sub {
865 my ($handle, $line) = @_;
866 $header = $line;
867 });
868
869 # and finally handle any remaining data as body
870 $handle->on_read (sub {
871 $body .= $_[0]->rbuf;
872 $_[0]->rbuf = "";
873 });
874 }
875
876 And now let's go through it step by step. First, as usual, the overall
877 C<http_get> function structure:
878
879 sub http_get {
880 my ($host, $uri, $cb) = @_;
881
882 # store results here
883 my ($response, $header, $body);
884
885 my $handle; $handle = new AnyEvent::Handle
886 ... create handle object
887
888 ... push data to write
889
890 ... push what to expect to read queue
891 }
892
893 Unlike in the finger example, this time the caller has to pass a callback
894 to C<http_get>. Also, instead of passing a URL as one would expect, the
895 caller has to provide the hostname and URI - normally you would use the
896 C<URI> module to parse a URL and separate it into those parts, but that is
897 left to the inspired reader :)
898
899 Since everything else is left to the caller, all C<http_get> does is
900 initiate the connection by creating the AnyEvent::Handle object (which
901 calls C<tcp_connect> for us) and leave everything else to its callback.
902
903 The handle object is created, unsurprisingly, by calling the C<new>
904 method of L<AnyEvent::Handle>:
905
906 my $handle; $handle = new AnyEvent::Handle
907 connect => [$host => 'http'],
908 on_error => sub {
909 $cb->("HTTP/1.0 500 $!");
910 $handle->destroy; # explicitly destroy handle
911 },
912 on_eof => sub {
913 $cb->($response, $header, $body);
914 $handle->destroy; # explicitly destroy handle
915 };
916
917 The C<connect> argument tells AnyEvent::Handle to call C<tcp_connect> for
918 the specified host and service/port.
919
920 The C<on_error> callback will be called on any unexpected error, such as a
921 refused connection, or unexpected end-of-file while reading headers.
922
923 Instead of having an extra mechanism to signal errors, connection errors
924 are signalled by crafting a special "response status line", like this:
925
926 HTTP/1.0 500 Connection refused
927
928 This means the caller cannot distinguish (easily) between
929 locally-generated errors and server errors, but it simplifies error
930 handling for the caller a lot.
931
932 The error callback also destroys the handle explicitly, because we are not
933 interested in continuing after any errors. In AnyEvent::Handle callbacks
934 you have to call C<destroy> explicitly to destroy a handle. Outside of
935 those callbacks you can just forget the object reference and it will be
936 automatically cleaned up.
937
938 Last but not least, we set an C<on_eof> callback that is called when the
939 other side indicates it has stopped writing data, which we will use to
940 gracefully shut down the handle and report the results. This callback is
941 only called when the read queue is empty - if the read queue expects
942 some data and the handle gets an EOF from the other side this will be an
943 error - after all, you did expect more to come.
944
945 If you wanted to write a server using AnyEvent::Handle, you would use
946 C<tcp_accept> and then create the AnyEvent::Handle with the C<fh>
947 argument.
948
949 =head3 The write queue
950
951 The next line sends the actual request:
952
953 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
954
955 No headers will be sent (this is fine for simple requests), so the whole
956 request is just a single line followed by an empty line to signal the end
957 of the headers to the server.
958
959 The more interesting question is why the method is called C<push_write>
960 and not just write. The reason is that you can I<always> add some write
961 data without blocking, and to do this, AnyEvent::Handle needs some write
962 queue internally - and C<push_write> pushes some data onto the end of
963 that queue, just like Perl's C<push> pushes data onto the end of an
964 array.
965
966 The deeper reason is that at some point in the future, there might
967 be C<unshift_write> as well, and in any case, we will shortly meet
968 C<push_read> and C<unshift_read>, and it's usually easiest to remember if
969 all those functions have some symmetry in their name. So C<push> is used
970 as the opposite of C<unshift> in AnyEvent::Handle, not as the opposite of
971 C<pull> - just like in Perl.
972
973 Note that we call C<push_write> right after creating the AnyEvent::Handle
974 object, before it has had time to actually connect to the server. This is
975 fine, pushing the read and write requests will queue them in the handle
976 object until the connection has been established. Alternatively, we
977 could do this "on demand" in the C<on_connect> callback.
978
979 If C<push_write> is called with more than one argument, then you can do
980 I<formatted> I/O. For example, this would JSON-encode your data before
981 pushing it to the write queue:
982
983 $handle->push_write (json => [1, 2, 3]);
984
985 This pretty much summarises the write queue, there is little else to it.
986
987 Reading the response is far more interesting, because it involves the more
988 powerful and complex I<read queue>:
989
990 =head3 The read queue
991
992 The response consists of three parts: a single line with the response
993 status, a single paragraph of headers ended by an empty line, and the
994 request body, which is the remaining data on the connection.
995
996 For the first two, we push two read requests onto the read queue:
997
998 # now fetch response status line
999 $handle->push_read (line => sub {
1000 my ($handle, $line) = @_;
1001 $response = $line;
1002 });
1003
1004 # then the headers
1005 $handle->push_read (line => "\015\012\015\012", sub {
1006 my ($handle, $line) = @_;
1007 $header = $line;
1008 });
1009
1010 While one can just push a single callback to parse all the data on the
1011 queue, formatted I/O really comes to our aid here, since there is a
1012 ready-made "read line" read type. The first read expects a single line,
1013 ended by C<\015\012> (the standard end-of-line marker in internet
1014 protocols).
1015
1016 The second "line" is actually a single paragraph - instead of reading it
1017 line by line we tell C<push_read> that the end-of-line marker is really
1018 C<\015\012\015\012>, which is an empty line. The result is that the whole
1019 header paragraph will be treated as a single line and read. The word
1020 "line" is interpreted very freely, much like Perl itself does it.
1021
1022 Note that push read requests are pushed immediately after creating the
1023 handle object - since AnyEvent::Handle provides a queue we can push as
1024 many requests as we want, and AnyEvent::Handle will handle them in order.
1025
1026 There is, however, no read type for "the remaining data". For that, we
1027 install our own C<on_read> callback:
1028
1029 # and finally handle any remaining data as body
1030 $handle->on_read (sub {
1031 $body .= $_[0]->rbuf;
1032 $_[0]->rbuf = "";
1033 });
1034
1035 This callback is invoked every time data arrives and the read queue is
1036 empty - which in this example will only be the case when both response and
1037 header have been read. The C<on_read> callback could actually have been
1038 specified when constructing the object, but doing it this way preserves
1039 logical ordering.
1040
1041 The read callback adds the current read buffer to its C<$body>
1042 variable and, most importantly, I<empties> the buffer by assigning the
1043 empty string to it.
1044
1045 Given these instructions, AnyEvent::Handle will handle incoming data -
1046 if all goes well, the callback will be invoked with the response data;
1047 if not, it will get an error.
1048
1049 In general, you can implement pipelining (a semi-advanced feature of many
1050 protocols) very easily with AnyEvent::Handle: If you have a protocol
1051 with a request/response structure, your request methods/functions will
1052 all look like this (simplified):
1053
1054 sub request {
1055
1056 # send the request to the server
1057 $handle->push_write (...);
1058
1059 # push some response handlers
1060 $handle->push_read (...);
1061 }
1062
1063 This means you can queue as many requests as you want, and while
1064 AnyEvent::Handle goes through its read queue to handle the response data,
1065 the other side can work on the next request - queueing the request just
1066 appends some data to the write queue and installs a handler to be called
1067 later.
1068
1069 You might ask yourself how to handle decisions you can only make I<after>
1070 you have received some data (such as handling a short error response or a
1071 long and differently-formatted response). The answer to this problem is
1072 C<unshift_read>, which we will introduce together with an example in the
1073 coming sections.
1074
1075 =head3 Using C<http_get>
1076
1077 Finally, here is how you would use C<http_get>:
1078
1079 http_get "www.google.com", "/", sub {
1080 my ($response, $header, $body) = @_;
1081
1082 print
1083 $response, "\n",
1084 $body;
1085 };
1086
1087 And of course, you can run as many of these requests in parallel as you
1088 want (and your memory supports).
1089
1090 =head3 HTTPS
1091
1092 Now, as promised, let's implement the same thing for HTTPS, or more
1093 correctly, let's change our C<http_get> function into a function that
1094 speaks HTTPS instead.
1095
1096 HTTPS is a standard TLS connection (B<T>ransport B<L>ayer
1097 B<S>ecurity is the official name for what most people refer to as C<SSL>)
1098 that contains standard HTTP protocol exchanges. The only other difference
1099 to HTTP is that by default it uses port C<443> instead of port C<80>.
1100
1101 To implement these two differences we need two tiny changes, first, in the
1102 C<connect> parameter, we replace C<http> by C<https> to connect to the
1103 https port:
1104
1105 connect => [$host => 'https'],
1106
1107 The other change deals with TLS, which is something L<AnyEvent::Handle>
1108 does for us if the L<Net::SSLeay> module is available. To enable TLS
1109 with L<AnyEvent::Handle>, we pass an additional C<tls> parameter
1110 to the call to C<AnyEvent::Handle::new>:
1111
1112 tls => "connect",
1113
1114 Specifying C<tls> enables TLS, and the argument specifies whether
1115 AnyEvent::Handle is the server side ("accept") or the client side
1116 ("connect") for the TLS connection, as unlike TCP, there is a clear
1117 server/client relationship in TLS.
1118
1119 That's all.
1120
1121 Of course, all this should be handled transparently by C<http_get>
1122 after parsing the URL. If you need this, see the part about exercising
1123 your inspiration earlier in this document. You could also use the
1124 L<AnyEvent::HTTP> module from CPAN, which implements all this and works
1125 around a lot of quirks for you too.
1126
1127 =head3 The read queue - revisited
1128
1129 HTTP always uses the same structure in its responses, but many protocols
1130 require parsing responses differently depending on the response itself.
1131
1132 For example, in SMTP, you normally get a single response line:
1133
1134 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1135
1136 But SMTP also supports multi-line responses:
1137
1138 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1139 220-hey guys
1140 220 my response is longer than yours
1141
1142 To handle this, we need C<unshift_read>. As the name (we hope) implies,
1143 C<unshift_read> will not append your read request to the end of the read
1144 queue, but will prepend it to the queue instead.
1145
1146 This is useful in the situation above: Just push your response-line read
1147 request when sending the SMTP command, and when handling it, you look at
1148 the line to see if more is to come, and C<unshift_read> another reader
1149 callback if required, like this:
1150
1151 my $response; # response lines end up in here
1152
1153 my $read_response; $read_response = sub {
1154 my ($handle, $line) = @_;
1155
1156 $response .= "$line\n";
1157
1158 # check for continuation lines ("-" as 4th character")
1159 if ($line =~ /^...-/) {
1160 # if yes, then unshift another line read
1161 $handle->unshift_read (line => $read_response);
1162
1163 } else {
1164 # otherwise we are done
1165
1166 # free callback
1167 undef $read_response;
1168
1169 print "we are don reading: $response\n";
1170 }
1171 };
1172
1173 $handle->push_read (line => $read_response);
1174
1175 This recipe can be used for all similar parsing problems, for example in
1176 NNTP, the response code to some commands indicates that more data will be
1177 sent:
1178
1179 $handle->push_write ("article 42");
1180
1181 # read response line
1182 $handle->push_read (line => sub {
1183 my ($handle, $status) = @_;
1184
1185 # article data following?
1186 if ($status =~ /^2/) {
1187 # yes, read article body
1188
1189 $handle->unshift_read (line => "\012.\015\012", sub {
1190 my ($handle, $body) = @_;
1191
1192 $finish->($status, $body);
1193 });
1194
1195 } else {
1196 # some error occured, no article data
1197
1198 $finish->($status);
1199 }
1200 }
1201
1202 =head3 Your own read queue handler
1203
1204 Sometimes your protocol doesn't play nice, and uses lines or chunks of
1205 data not formatted in a way handled out of the box by AnyEvent::Handle.
1206 In this case you have to implement your own read parser.
1207
1208 To make up a contorted example, imagine you are looking for an even
1209 number of characters followed by a colon (":"). Also imagine that
1210 AnyEvent::Handle has no C<regex> read type which could be used, so you'd
1211 have to do it manually.
1212
1213 To implement a read handler for this, you would C<push_read> (or
1214 C<unshift_read>) a single code reference.
1215
1216 This code reference will then be called each time there is (new) data
1217 available in the read buffer, and is expected to either successfully
1218 eat/consume some of that data (and return true) or to return false to
1219 indicate that it wants to be called again.
1220
1221 If the code reference returns true, then it will be removed from the
1222 read queue (because it has parsed/consumed whatever it was supposed to
1223 consume), otherwise it stays in the front of it.
1224
1225 The example above could be coded like this:
1226
1227 $handle->push_read (sub {
1228 my ($handle) = @_;
1229
1230 # check for even number of characters + ":"
1231 # and remove the data if a match is found.
1232 # if not, return false (actually nothing)
1233
1234 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1235 or return;
1236
1237 # we got some data in $1, pass it to whoever wants it
1238 $finish->($1);
1239
1240 # and return true to indicate we are done
1241 1
1242 });
1243
1244 This concludes our little tutorial.
1245
1246 =head1 Where to go from here?
1247
1248 This introduction should have explained the key concepts of L<AnyEvent>
1249 - event watchers and condition variables, L<AnyEvent::Socket> - basic
1250 networking utilities, and L<AnyEvent::Handle> - a nice wrapper around
1251 sockets.
1252
1253 You could either start coding stuff right away, look at those manual
1254 pages for the gory details, or roam CPAN for other AnyEvent modules (such
1255 as L<AnyEvent::IRC> or L<AnyEvent::HTTP>) to see more code examples (or
1256 simply to use them).
1257
1258 If you need a protocol that doesn't have an implementation using AnyEvent,
1259 remember that you can mix AnyEvent with one other event framework, such as
1260 L<POE>, so you can always use AnyEvent for your own tasks plus modules of
1261 one other event framework to fill any gaps.
1262
1263 And last not least, you could also look at L<Coro>, especially
1264 L<Coro::AnyEvent>, to see how you can turn event-based programming from
1265 callback style back to the usual imperative style (also called "inversion
1266 of control" - AnyEvent calls I<you>, but Coro lets I<you> call AnyEvent).
1267
1268 =head1 Authors
1269
1270 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1271