ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.21
Committed: Sat Jun 6 12:04:30 2009 UTC (15 years ago) by root
Branch: MAIN
CVS Tags: rel-4_412, rel-4_411, rel-4_42
Changes since 1.20: +4 -0 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Intro - an introductory tutorial to AnyEvent
4
5 =head1 Introduction to AnyEvent
6
7 This is a tutorial that will introduce you to the features of AnyEvent.
8
9 The first part introduces the core AnyEvent module (after swamping you a
10 bit in evangelism), which might already provide all you ever need. If you
11 are only interested in AnyEvent's event handling capabilities, read no
12 further.
13
14 The second part focuses on network programming using sockets, for which
15 AnyEvent offers a lot of support you can use, and a lot of workarounds
16 around portability quirks.
17
18
19 =head1 What is AnyEvent?
20
21 If you don't care for the whys and want to see code, skip this section!
22
23 AnyEvent is first of all just a framework to do event-based
24 programming. Typically such frameworks are an all-or-nothing thing: If you
25 use one such framework, you can't (easily, or even at all) use another in
26 the same program.
27
28 AnyEvent is different - it is a thin abstraction layer above all kinds
29 of event loops. Its main purpose is to move the choice of the underlying
30 framework (the event loop) from the module author to the program author
31 using the module.
32
33 That means you can write code that uses events to control what it
34 does, without forcing other code in the same program to use the same
35 underlying framework as you do - i.e. you can create a Perl module
36 that is event-based using AnyEvent, and users of that module can still
37 choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
38 all: AnyEvent comes with its own event loop implementation, so your
39 code works regardless of other modules that might or might not be
40 installed. The latter is important, as AnyEvent does not have any
41 dependencies to other modules, which makes it easy to install, for
42 example, when you lack a C compiler.
43
44 A typical problem with Perl modules such as L<Net::IRC> is that they
45 come with their own event loop: In L<Net::IRC>, the program who uses it
46 needs to start the event loop of L<Net::IRC>. That means that one cannot
47 integrate this module into a L<Gtk2> GUI for instance, as that module,
48 too, enforces the use of its own event loop (namely L<Glib>).
49
50 Another example is L<LWP>: it provides no event interface at all. It's a
51 pure blocking HTTP (and FTP etc.) client library, which usually means that
52 you either have to start a thread or have to fork for a HTTP request, or
53 use L<Coro::LWP>, if you want to do something else while waiting for the
54 request to finish.
55
56 The motivation behind these designs is often that a module doesn't want to
57 depend on some complicated XS-module (Net::IRC), or that it doesn't want
58 to force the user to use some specific event loop at all (LWP).
59
60 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
61
62 =over 4
63
64 =item - write their own event loop (because guarantees to offer one
65 everywhere - even on windows).
66
67 =item - choose one fixed event loop (because AnyEvent works with all
68 important event loops available for Perl, and adding others is trivial).
69
70 =back
71
72 If the module author uses L<AnyEvent> for all his event needs (IO events,
73 timers, signals, ...) then all other modules can just use his module and
74 don't have to choose an event loop or adapt to his event loop. The choice
75 of the event loop is ultimately made by the program author who uses all
76 the modules and writes the main program. And even there he doesn't have to
77 choose, he can just let L<AnyEvent> choose the best available event loop
78 for him.
79
80 Read more about this in the main documentation of the L<AnyEvent> module.
81
82
83 =head1 Introduction to Event-Based Programming
84
85 So what exactly is programming using events? It quite simply means that
86 instead of your code actively waiting for something, such as the user
87 entering something on STDIN:
88
89 $| = 1; print "enter your name> ";
90
91 my $name = <STDIN>;
92
93 You instead tell your event framework to notify you in the event of some
94 data being available on STDIN, by using a callback mechanism:
95
96 use AnyEvent;
97
98 $| = 1; print "enter your name> ";
99
100 my $name;
101
102 my $wait_for_input = AnyEvent->io (
103 fh => \*STDIN, # which file handle to check
104 poll => "r", # which event to wait for ("r"ead data)
105 cb => sub { # what callback to execute
106 $name = <STDIN>; # read it
107 }
108 );
109
110 # do something else here
111
112 Looks more complicated, and surely is, but the advantage of using events
113 is that your program can do something else instead of waiting for
114 input. Waiting as in the first example is also called "blocking" because
115 you "block" your process from executing anything else while you do so.
116
117 The second example avoids blocking, by only registering interest in a read
118 event, which is fast and doesn't block your process. Only when read data
119 is available will the callback be called, which can then proceed to read
120 the data.
121
122 The "interest" is represented by an object returned by C<< AnyEvent->io
123 >> called a "watcher" object - called like that because it "watches" your
124 file handle (or other event sources) for the event you are interested in.
125
126 In the example above, we create an I/O watcher by calling the C<<
127 AnyEvent->io >> method. Disinterest in some event is simply expressed by
128 forgetting about the watcher, for example, by C<undef>'ing the variable it
129 is stored in. AnyEvent will automatically clean up the watcher if it is no
130 longer used, much like Perl closes your file handles if you no longer use
131 them anywhere.
132
133 =head3 A short note on callbacks
134
135 A common issue that hits people is the problem of passing parameters
136 to callbacks. Programmers used to languages such as C or C++ are often
137 used to a style where one passes the address of a function (a function
138 reference) and some data value, e.g.:
139
140 sub callback {
141 my ($arg) = @_;
142
143 $arg->method;
144 }
145
146 my $arg = ...;
147
148 call_me_back_later \&callback, $arg;
149
150 This is clumsy, as the place where behaviour is specified (when the
151 callback is registered) is often far away from the place where behaviour
152 is implemented. It also doesn't use Perl syntax to invoke the code. There
153 is also an abstraction penalty to pay as one has to I<name> the callback,
154 which often is unnecessary and leads to nonsensical or duplicated names.
155
156 In Perl, one can specify behaviour much more directly by using
157 I<closures>. Closures are code blocks that take a reference to the
158 enclosing scope(s) when they are created. This means lexical variables in scope at the time
159 of creating the closure can simply be used inside the closure:
160
161 my $arg = ...;
162
163 call_me_back_later sub { $arg->method };
164
165 Under most circumstances, closures are faster, use less resources and
166 result in much clearer code then the traditional approach. Faster,
167 because parameter passing and storing them in local variables in Perl
168 is relatively slow. Less resources, because closures take references to
169 existing variables without having to create new ones, and clearer code
170 because it is immediately obvious that the second example calls the
171 C<method> method when the callback is invoked.
172
173 Apart from these, the strongest argument for using closures with AnyEvent
174 is that AnyEvent does not allow passing parameters to the callback, so
175 closures are the only way to achieve that in most cases :->
176
177
178 =head3 A hint on debugging
179
180 AnyEvent does, by default, not do any argument checking. This can lead to
181 strange and unexpected results especially if you are trying to learn yur
182 ways with AnyEvent.
183
184 AnyEvent supports a special "strict" mode, off by default, which does very
185 strict argument checking, at the expense of being somewhat slower. When
186 developing, however, this mode is very useful.
187
188 You can enable this strict mode either by having an environment variable
189 C<PERL_ANYEVENT_STRICT> with a true value in your environment:
190
191 PERL_ANYEVENT_STRICT=1 perl test.pl
192
193 Or you can write C<use AnyEvent::Strict> in your program, which has the
194 same effect (do not do this in production, however).
195
196
197 =head2 Condition Variables
198
199 Back to the I/O watcher example: The code not yet a fully working program,
200 and will not work as-is. The reason is that your callback will not be
201 invoked out of the blue, you have to run the event loop. Also, event-based
202 programs sometimes have to block, too, as when there simply is nothing
203 else to do and everything waits for some events, it needs to block the
204 process as well.
205
206 In AnyEvent, this is done using condition variables. Condition variables
207 are named "condition variables" because they represent a condition that is
208 initially false and needs to be fulfilled.
209
210 You can also call them "merge points", "sync points", "rendezvous ports"
211 or even callbacks and many other things (and they are often called like
212 this in other frameworks). The important point is that you can create them
213 freely and later wait for them to become true.
214
215 Condition variables have two sides - one side is the "producer" of the
216 condition (whatever code detects and flags the condition), the other side
217 is the "consumer" (the code that waits for that condition).
218
219 In our example in the previous section, the producer is the event callback
220 and there is no consumer yet - let's change that right now:
221
222 use AnyEvent;
223
224 $| = 1; print "enter your name> ";
225
226 my $name;
227
228 my $name_ready = AnyEvent->condvar;
229
230 my $wait_for_input = AnyEvent->io (
231 fh => \*STDIN,
232 poll => "r",
233 cb => sub {
234 $name = <STDIN>;
235 $name_ready->send;
236 }
237 );
238
239 # do something else here
240
241 # now wait until the name is available:
242 $name_ready->recv;
243
244 undef $wait_for_input; # watche rno longer needed
245
246 print "your name is $name\n";
247
248 This program creates an AnyEvent condvar by calling the C<<
249 AnyEvent->condvar >> method. It then creates a watcher as usual, but
250 inside the callback it C<send>'s the C<$name_ready> condition variable,
251 which causes anybody waiting on it to continue.
252
253 The "anybody" in this case is the code that follows, which calls C<<
254 $name_ready->recv >>: The producer calls C<send>, the consumer calls
255 C<recv>.
256
257 If there is no C<$name> available yet, then the call to C<<
258 $name_ready->recv >> will halt your program until the condition becomes
259 true.
260
261 As the names C<send> and C<recv> imply, you can actually send and receive
262 data using this, for example, the above code could also be written like
263 this, without an extra variable to store the name in:
264
265 use AnyEvent;
266
267 $| = 1; print "enter your name> ";
268
269 my $name_ready = AnyEvent->condvar;
270
271 my $wait_for_input = AnyEvent->io (
272 fh => \*STDIN, poll => "r",
273 cb => sub { $name_ready->send (scalar <STDIN>) }
274 );
275
276 # do something else here
277
278 # now wait and fetch the name
279 my $name = $name_ready->recv;
280
281 undef $wait_for_input; # watche rno longer needed
282
283 print "your name is $name\n";
284
285 You can pass any number of arguments to C<send>, and everybody call to
286 C<recv> will return them.
287
288 =head2 The "main loop"
289
290 Most event-based frameworks have something called a "main loop" or "event
291 loop run function" or something similar.
292
293 Just like in C<recv> AnyEvent, these functions need to be called
294 eventually so that your event loop has a chance of actually looking for
295 those events you are interested in.
296
297 For example, in a L<Gtk2> program, the above example could also be written
298 like this:
299
300 use Gtk2 -init;
301 use AnyEvent;
302
303 ############################################
304 # create a window and some label
305
306 my $window = new Gtk2::Window "toplevel";
307 $window->add (my $label = new Gtk2::Label "soon replaced by name");
308
309 $window->show_all;
310
311 ############################################
312 # do our AnyEvent stuff
313
314 $| = 1; print "enter your name> ";
315
316 my $name_ready = AnyEvent->condvar;
317
318 my $wait_for_input = AnyEvent->io (
319 fh => \*STDIN, poll => "r",
320 cb => sub {
321 # set the label
322 $label->set_text (scalar <STDIN>);
323 print "enter another name> ";
324 }
325 );
326
327 ############################################
328 # Now enter Gtk2's event loop
329
330 main Gtk2;
331
332 No condition variable anywhere in sight - instead, we just read a line
333 from STDIN and replace the text in the label. In fact, since nobody
334 C<undef>'s C<$wait_for_input> you can enter multiple lines.
335
336 Instead of waiting for a condition variable, the program enters the Gtk2
337 main loop by calling C<< Gtk2->main >>, which will block the program and
338 wait for events to arrive.
339
340 This also shows that AnyEvent is quite flexible - you didn't have anything
341 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
342 worked.
343
344 Admittedly, the example is a bit silly - who would want to read names
345 form standard input in a Gtk+ application. But imagine that instead of
346 doing that, you would make a HTTP request in the background and display
347 it's results. In fact, with event-based programming you can make many
348 http-requests in parallel in your program and still provide feedback to
349 the user and stay interactive.
350
351 In the next part you will see how to do just that - by implementing an
352 HTTP request, on our own, with the utility modules AnyEvent comes with.
353
354 Before that, however, let's briefly look at how you would write your
355 program with using only AnyEvent, without ever calling some other event
356 loop's run function.
357
358 In the example using condition variables, we used that, and in fact, this
359 is the solution:
360
361 my $quit_program = AnyEvent->condvar;
362
363 # create AnyEvent watchers (or not) here
364
365 $quit_program->recv;
366
367 If any of your watcher callbacks decide to quit, they can simply call
368 C<< $quit_program->send >>. Of course, they could also decide not to and
369 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
370 in a long-running daemon program).
371
372 In that case, you can simply use:
373
374 AnyEvent->condvar->recv;
375
376 And this is, in fact, closest to the idea of a main loop run function that
377 AnyEvent offers.
378
379 =head2 Timers and other event sources
380
381 So far, we have only used I/O watchers. These are useful mainly to find
382 out whether a Socket has data to read, or space to write more data. On sane
383 operating systems this also works for console windows/terminals (typically
384 on standard input), serial lines, all sorts of other devices, basically
385 almost everything that has a file descriptor but isn't a file itself. (As
386 usual, "sane" excludes windows - on that platform you would need different
387 functions for all of these, complicating code immensely - think "socket
388 only" on windows).
389
390 However, I/O is not everything - the second most important event source is
391 the clock. For example when doing an HTTP request you might want to time
392 out when the server doesn't answer within some predefined amount of time.
393
394 In AnyEvent, timer event watchers are created by calling the C<<
395 AnyEvent->timer >> method:
396
397 use AnyEvent;
398
399 my $cv = AnyEvent->condvar;
400
401 my $wait_one_and_a_half_seconds = AnyEvent->timer (
402 after => 1.5, # after how many seconds to invoke the cb?
403 cb => sub { # the callback to invoke
404 $cv->send;
405 },
406 );
407
408 # can do something else here
409
410 # now wait till our time has come
411 $cv->recv;
412
413 Unlike I/O watchers, timers are only interested in the amount of seconds
414 they have to wait. When that amount of time has passed, AnyEvent will
415 invoke your callback.
416
417 Unlike I/O watchers, which will call your callback as many times as there
418 is data available, timers are one-shot: after they have "fired" once and
419 invoked your callback, they are dead and no longer do anything.
420
421 To get a repeating timer, such as a timer firing roughly once per second,
422 you have to recreate it:
423
424 use AnyEvent;
425
426 my $time_watcher;
427
428 sub once_per_second {
429 print "tick\n";
430
431 # (re-)create the watcher
432 $time_watcher = AnyEvent->timer (
433 after => 1,
434 cb => \&once_per_second,
435 );
436 }
437
438 # now start the timer
439 once_per_second;
440
441 Having to recreate your timer is a restriction put on AnyEvent that is
442 present in most event libraries it uses. It is so annoying that some
443 future version might work around this limitation, but right now, it's the
444 only way to do repeating timers.
445
446 Fortunately most timers aren't really repeating but specify timeouts of
447 some sort.
448
449 =head3 More esoteric sources
450
451 AnyEvent also has some other, more esoteric event sources you can tap
452 into: signal and child watchers.
453
454 Signal watchers can be used to wait for "signal events", which simply
455 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
456
457 Process watchers wait for a child process to exit. They are useful when
458 you fork a separate process and need to know when it exits, but you do not
459 wait for that by blocking.
460
461 Both watcher types are described in detail in the main L<AnyEvent> manual
462 page.
463
464
465 =head1 Network programming and AnyEvent
466
467 So far you have seen how to register event watchers and handle events.
468
469 This is a great foundation to write network clients and servers, and might be
470 all that your module (or program) ever requires, but writing your own I/O
471 buffering again and again becomes tedious, not to mention that it attracts
472 errors.
473
474 While the core L<AnyEvent> module is still small and self-contained,
475 the distribution comes with some very useful utility modules such as
476 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
477 make your life as non-blocking network programmer a lot easier.
478
479 Here is a quick overview over these three modules:
480
481 =head2 L<AnyEvent::DNS>
482
483 This module allows fully asynchronous DNS resolution. It is used mainly by
484 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
485 a great way to do other DNS resolution tasks, such as reverse lookups of
486 IP addresses for log files.
487
488 =head2 L<AnyEvent::Handle>
489
490 This module handles non-blocking IO on file handles in an event based
491 manner. It provides a wrapper object around your file handle that provides
492 queueing and buffering of incoming and outgoing data for you.
493
494 It also implements the most common data formats, such as text lines, or
495 fixed and variable-width data blocks.
496
497 =head2 L<AnyEvent::Socket>
498
499 This module provides you with functions that handle socket creation
500 and IP address magic. The two main functions are C<tcp_connect> and
501 C<tcp_server>. The former will connect a (streaming) socket to an internet
502 host for you and the later will make a server socket for you, to accept
503 connections.
504
505 This module also comes with transparent IPv6 support, this means: If you
506 write your programs with this module, you will be IPv6 ready without doing
507 anything special.
508
509 It also works around a lot of portability quirks (especially on the
510 windows platform), which makes it even easier to write your programs in a
511 portable way (did you know that windows uses different error codes for all
512 socket functions and that Perl does not know about these? That "Unknown
513 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
514 successful? That unsuccessful TCP connects might never be reported back
515 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
516 ignored instead of being in progress? AnyEvent::Socket works around all of
517 these Windows/Perl bugs for you).
518
519 =head2 Implementing a parallel finger client with non-blocking connects
520 and AnyEvent::Socket
521
522 The finger protocol is one of the simplest protocols in use on the
523 internet. Or in use in the past, as almost nobody uses it anymore.
524
525 It works by connecting to the finger port on another host, writing a
526 single line with a user name and then reading the finger response, as
527 specified by that user. OK, RFC 1288 specifies a vastly more complex
528 protocol, but it basically boils down to this:
529
530 # telnet idsoftware.com finger
531 Trying 192.246.40.37...
532 Connected to idsoftware.com (192.246.40.37).
533 Escape character is '^]'.
534 johnc
535 Welcome to id Software's Finger Service V1.5!
536
537 [...]
538 Now on the web:
539 [...]
540
541 Connection closed by foreign host.
542
543 "Now on the web..." yeah, I<was> used indeed, but at least the finger
544 daemon still works, so let's write a little AnyEvent function that makes a
545 finger request:
546
547 use AnyEvent;
548 use AnyEvent::Socket;
549
550 sub finger($$) {
551 my ($user, $host) = @_;
552
553 # use a condvar to return results
554 my $cv = AnyEvent->condvar;
555
556 # first, connect to the host
557 tcp_connect $host, "finger", sub {
558 # the callback receives the socket handle - or nothing
559 my ($fh) = @_
560 or return $cv->send;
561
562 # now write the username
563 syswrite $fh, "$user\015\012";
564
565 my $response;
566
567 # register a read watcher
568 my $read_watcher; $read_watcher = AnyEvent->io (
569 fh => $fh,
570 poll => "r",
571 cb => sub {
572 my $len = sysread $fh, $response, 1024, length $response;
573
574 if ($len <= 0) {
575 # we are done, or an error occured, lets ignore the latter
576 undef $read_watcher; # no longer interested
577 $cv->send ($response); # send results
578 }
579 },
580 );
581 };
582
583 # pass $cv to the caller
584 $cv
585 }
586
587 That's a mouthful! Let's dissect this function a bit, first the overall
588 function and execution flow:
589
590 sub finger($$) {
591 my ($user, $host) = @_;
592
593 # use a condvar to return results
594 my $cv = AnyEvent->condvar;
595
596 # first, connect to the host
597 tcp_connect $host, "finger", sub {
598 ...
599 };
600
601 $cv
602 }
603
604 This isn't too complicated, just a function with two parameters, that
605 creates a condition variable, returns it, and while it does that,
606 initiates a TCP connect to C<$host>. The condition variable will be used
607 by the caller to receive the finger response, but one could equally well
608 pass a third argument, a callback, to the function.
609
610 Since we are programming event'ish, we do not wait for the connect to
611 finish - it could block the program for a minute or longer!
612
613 Instead, we pass the callback it should invoke when the connect is done to
614 C<tcp_connect>. If it is successful, that callback gets called with the
615 socket handle as first argument, otherwise, nothing will be passed to our
616 callback. The important point is that it will always be called as soon as
617 the outcome of the TCP connect is known.
618
619 This style of programming is also called "continuation style": the
620 "continuation" is simply the way the program continues - normally, a
621 program continues at the next line after some statement (the exception
622 is loops or things like C<return>). When we are interested in events,
623 however, we instead specify the "continuation" of our program by passing a
624 closure, which makes that closure the "continuation" of the program. The
625 C<tcp_connect> call is like saying "return now, and when the connection is
626 established or it failed, continue there".
627
628 Now let's look at the callback/closure in more detail:
629
630 # the callback receives the socket handle - or nothing
631 my ($fh) = @_
632 or return $cv->send;
633
634 The first thing the callback does is indeed save the socket handle in
635 C<$fh>. When there was an error (no arguments), then our instinct as
636 expert Perl programmers would tell us to C<die>:
637
638 my ($fh) = @_
639 or die "$host: $!";
640
641 While this would give good feedback to the user (if he happens to watch
642 standard error), our program would probably stop working here, as we never
643 report the results to anybody, certainly not the caller of our C<finger>
644 function, and most event loops continue even after a C<die>!
645
646 This is why we instead C<return>, but also call C<< $cv->send >> without
647 any arguments to signal to the condvar consumer that something bad has
648 happened. The return value of C<< $cv->send >> is irrelevant, as is the
649 return value of our callback. The return statement is simply used for the
650 side effect of, well, returning immediately from the callback. Checking
651 for errors and handling them this way is very common, which is why this
652 compact idiom is so handy.
653
654 As the next step in the finger protocol, we send the username to the
655 finger daemon on the other side of our connection:
656
657 syswrite $fh, "$user\015\012";
658
659 Note that this isn't 100% clean socket programming - the socket could,
660 for whatever reasons, not accept our data. When writing a small amount
661 of data like in this example it doesn't matter, as a socket buffer is
662 almost always big enough for a mere "username", but for real-world
663 cases you might need to implement some kind of write buffering - or use
664 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
665 next section.
666
667 What we I<do> have to do is to implement our own read buffer - the response
668 data could arrive late or in multiple chunks, and we cannot just wait for
669 it (event-based programming, you know?).
670
671 To do that, we register a read watcher on the socket which waits for data:
672
673 my $read_watcher; $read_watcher = AnyEvent->io (
674 fh => $fh,
675 poll => "r",
676
677 There is a trick here, however: the read watcher isn't stored in a global
678 variable, but in a local one - if the callback returns, it would normally
679 destroy the variable and its contents, which would in turn unregister our
680 watcher.
681
682 To avoid that, we C<undef>ine the variable in the watcher callback. This
683 means that, when the C<tcp_connect> callback returns, that perl thinks
684 (quite correctly) that the read watcher is still in use - namely in the
685 callback.
686
687 The trick, however, is that instead of:
688
689 my $read_watcher = AnyEvent->io (...
690
691 The program does:
692
693 my $read_watcher; $read_watcher = AnyEvent->io (...
694
695 The reason for this is a quirk in the way Perl works: variable names
696 declared with C<my> are only visible in the I<next> statement. If the
697 whole C<< AnyEvent->io >> call, including the callback, would be done in
698 a single statement, the callback could not refer to the C<$read_watcher>
699 variable to undefine it, so it is done in two statements.
700
701 Whether you'd want to format it like this is of course a matter of style,
702 this way emphasizes that the declaration and assignment really are one
703 logical statement.
704
705 The callback itself calls C<sysread> for as many times as necessary, until
706 C<sysread> returns either an error or end-of-file:
707
708 cb => sub {
709 my $len = sysread $fh, $response, 1024, length $response;
710
711 if ($len <= 0) {
712
713 Note that C<sysread> has the ability to append data it reads to a scalar,
714 by specifying an offset, which is what we make good use of in this
715 example.
716
717 When C<sysread> indicates we are done, the callback C<undef>ines
718 the watcher and then C<send>'s the response data to the condition
719 variable. All this has the following effects:
720
721 Undefining the watcher destroys it, as our callback was the only one still
722 having a reference to it. When the watcher gets destroyed, it destroys the
723 callback, which in turn means the C<$fh> handle is no longer used, so that
724 gets destroyed as well. The result is that all resources will be nicely
725 cleaned up by perl for us.
726
727 =head3 Using the finger client
728
729 Now, we could probably write the same finger client in a simpler way if
730 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
731 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
732
733 But the main advantage is that we can not only run this finger function in
734 the background, we even can run multiple sessions in parallel, like this:
735
736 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
737 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
738 my $f3 = finger "johnc", "idsoftware.com"; # finger john
739
740 print "trouble tickets:\n", $f1->recv, "\n";
741 print "trouble ticket #1736:\n", $f2->recv, "\n";
742 print "john carmacks finger file: ", $f3->recv, "\n";
743
744 It doesn't look like it, but in fact all three requests run in
745 parallel. The code waits for the first finger request to finish first, but
746 that doesn't keep it from executing them parallel: when the first C<recv>
747 call sees that the data isn't ready yet, it serves events for all three
748 requests automatically, until the first request has finished.
749
750 The second C<recv> call might either find the data is already there, or it
751 will continue handling events until that is the case, and so on.
752
753 By taking advantage of network latencies, which allows us to serve other
754 requests and events while we wait for an event on one socket, the overall
755 time to do these three requests will be greatly reduced, typically all
756 three are done in the same time as the slowest of them would need to finish.
757
758 By the way, you do not actually have to wait in the C<recv> method on an
759 AnyEvent condition variable - after all, waiting is evil - you can also
760 register a callback:
761
762 $cv->cb (sub {
763 my $response = shift->recv;
764 # ...
765 });
766
767 The callback will only be invoked when C<send> was called. In fact,
768 instead of returning a condition variable you could also pass a third
769 parameter to your finger function, the callback to invoke with the
770 response:
771
772 sub finger($$$) {
773 my ($user, $host, $cb) = @_;
774
775 How you implement it is a matter of taste - if you expect your function to
776 be used mainly in an event-based program you would normally prefer to pass
777 a callback directly. If you write a module and expect your users to use
778 it "synchronously" often (for example, a simple http-get script would not
779 really care much for events), then you would use a condition variable and
780 tell them "simply ->recv the data".
781
782 =head3 Problems with the implementation and how to fix them
783
784 To make this example more real-world-ready, we would not only implement
785 some write buffering (for the paranoid), but we would also have to handle
786 timeouts and maybe protocol errors.
787
788 Doing this quickly gets unwieldy, which is why we introduce
789 L<AnyEvent::Handle> in the next section, which takes care of all these
790 details for you and let's you concentrate on the actual protocol.
791
792
793 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
794
795 The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
796 see what it really offers.
797
798 As finger is such a simple protocol, let's try something slightly more
799 complicated: HTTP/1.0.
800
801 An HTTP GET request works by sending a single request line that indicates
802 what you want the server to do and the URI you want to act it on, followed
803 by as many "header" lines (C<Header: data>, same as e-mail headers) as
804 required for the request, ended by an empty line.
805
806 The response is formatted very similarly, first a line with the response
807 status, then again as many header lines as required, then an empty line,
808 followed by any data that the server might send.
809
810 Again, let's try it out with C<telnet> (I condensed the output a bit - if
811 you want to see the full response, do it yourself).
812
813 # telnet www.google.com 80
814 Trying 209.85.135.99...
815 Connected to www.google.com (209.85.135.99).
816 Escape character is '^]'.
817 GET /test HTTP/1.0
818
819 HTTP/1.0 404 Not Found
820 Date: Mon, 02 Jun 2008 07:05:54 GMT
821 Content-Type: text/html; charset=UTF-8
822
823 <html><head>
824 [...]
825 Connection closed by foreign host.
826
827 The C<GET ...> and the empty line were entered manually, the rest of the
828 telnet output is google's response, in which case a C<404 not found> one.
829
830 So, here is how you would do it with C<AnyEvent::Handle>:
831
832 sub http_get {
833 my ($host, $uri, $cb) = @_;
834
835 tcp_connect $host, "http", sub {
836 my ($fh) = @_
837 or $cb->("HTTP/1.0 500 $!");
838
839 # store results here
840 my ($response, $header, $body);
841
842 my $handle; $handle = new AnyEvent::Handle
843 fh => $fh,
844 on_error => sub {
845 undef $handle;
846 $cb->("HTTP/1.0 500 $!");
847 },
848 on_eof => sub {
849 undef $handle; # keep it alive till eof
850 $cb->($response, $header, $body);
851 };
852
853 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
854
855 # now fetch response status line
856 $handle->push_read (line => sub {
857 my ($handle, $line) = @_;
858 $response = $line;
859 });
860
861 # then the headers
862 $handle->push_read (line => "\015\012\015\012", sub {
863 my ($handle, $line) = @_;
864 $header = $line;
865 });
866
867 # and finally handle any remaining data as body
868 $handle->on_read (sub {
869 $body .= $_[0]->rbuf;
870 $_[0]->rbuf = "";
871 });
872 };
873 }
874
875 And now let's go through it step by step. First, as usual, the overall
876 C<http_get> function structure:
877
878 sub http_get {
879 my ($host, $uri, $cb) = @_;
880
881 tcp_connect $host, "http", sub {
882 ...
883 };
884 }
885
886 Unlike in the finger example, this time the caller has to pass a callback
887 to C<http_get>. Also, instead of passing a URL as one would expect, the
888 caller has to provide the hostname and URI - normally you would use the
889 C<URI> module to parse a URL and separate it into those parts, but that is
890 left to the inspired reader :)
891
892 Since everything else is left to the caller, all C<http_get> does it to
893 initiate the connection with C<tcp_connect> and leave everything else to
894 it's callback.
895
896 The first thing the callback does is check for connection errors and
897 declare some variables:
898
899 my ($fh) = @_
900 or $cb->("HTTP/1.0 500 $!");
901
902 my ($response, $header, $body);
903
904 Instead of having an extra mechanism to signal errors, connection errors
905 are signalled by crafting a special "response status line", like this:
906
907 HTTP/1.0 500 Connection refused
908
909 This means the caller cannot distinguish (easily) between
910 locally-generated errors and server errors, but it simplifies error
911 handling for the caller a lot.
912
913 The next step finally involves L<AnyEvent::Handle>, namely it creates the
914 handle object:
915
916 my $handle; $handle = new AnyEvent::Handle
917 fh => $fh,
918 on_error => sub {
919 undef $handle;
920 $cb->("HTTP/1.0 500 $!");
921 },
922 on_eof => sub {
923 undef $handle; # keep it alive till eof
924 $cb->($response, $header, $body);
925 };
926
927 The constructor expects a file handle, which gets passed via the C<fh>
928 argument.
929
930 The remaining two argument pairs specify two callbacks to be called on
931 any errors (C<on_error>) and in the case of a normal connection close
932 (C<on_eof>).
933
934 In the first case, we C<undef>ine the handle object and pass the error to
935 the callback provided by the callback - done.
936
937 In the second case we assume everything went fine and pass the results
938 gobbled up so far to the caller-provided callback. This is not quite
939 perfect, as when the server "cleanly" closes the connection in the middle
940 of sending headers we might wrongly report this as an "OK" to the caller,
941 but then, HTTP doesn't support a perfect mechanism that would detect such
942 problems in all cases, so we don't bother either.
943
944 =head3 The write queue
945
946 The next line sends the actual request:
947
948 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
949
950 No headers will be sent (this is fine for simple requests), so the whole
951 request is just a single line followed by an empty line to signal the end
952 of the headers to the server.
953
954 The more interesting question is why the method is called C<push_write>
955 and not just write. The reason is that you can I<always> add some write
956 data without blocking, and to do this, AnyEvent::Handle needs some write
957 queue internally - and C<push_write> simply pushes some data at the end of
958 that queue, just like Perl's C<push> pushes data at the end of an array.
959
960 The deeper reason is that at some point in the future, there might
961 be C<unshift_write> as well, and in any case, we will shortly meet
962 C<push_read> and C<unshift_read>, and it's usually easiest if all those
963 functions have some symmetry in their name.
964
965 If C<push_write> is called with more than one argument, then you can even
966 do I<formatted> I/O, which simply means your data will be transformed in
967 some ways. For example, this would JSON-encode your data before pushing it
968 to the write queue:
969
970 $handle->push_write (json => [1, 2, 3]);
971
972 Apart from that, this pretty much summarises the write queue, there is
973 little else to it.
974
975 Reading the response if far more interesting:
976
977 =head3 The read queue
978
979 The response consists of three parts: a single line of response status, a
980 single paragraph of headers ended by an empty line, and the request body,
981 which is simply the remaining data on that connection.
982
983 For the first two, we push two read requests onto the read queue:
984
985 # now fetch response status line
986 $handle->push_read (line => sub {
987 my ($handle, $line) = @_;
988 $response = $line;
989 });
990
991 # then the headers
992 $handle->push_read (line => "\015\012\015\012", sub {
993 my ($handle, $line) = @_;
994 $header = $line;
995 });
996
997 While one can simply push a single callback to the queue, I<formatted> I/O
998 really comes to out advantage here, as there is a ready-made "read line"
999 read type. The first read expects a single line, ended by C<\015\012> (the
1000 standard end-of-line marker in internet protocols).
1001
1002 The second "line" is actually a single paragraph - instead of reading it
1003 line by line we tell C<push_read> that the end-of-line marker is really
1004 C<\015\012\015\012>, which is an empty line. The result is that the whole
1005 header paragraph will be treated as a single line and read. The word
1006 "line" is interpreted very freely, much like Perl itself does it.
1007
1008 Note that push read requests are pushed immediately after creating the
1009 handle object - since AnyEvent::Handle provides a queue we can push as
1010 many requests as we want, and AnyEvent::Handle will handle them in order.
1011
1012 There is, however, no read type for "the remaining data". For that, we
1013 install our own C<on_read> callback:
1014
1015 # and finally handle any remaining data as body
1016 $handle->on_read (sub {
1017 $body .= $_[0]->rbuf;
1018 $_[0]->rbuf = "";
1019 });
1020
1021 This callback is invoked every time data arrives and the read queue is
1022 empty - which in this example will only be the case when both response and
1023 header have been read. The C<on_read> callback could actually have been
1024 specified when constructing the object, but doing it this way preserves
1025 logical ordering.
1026
1027 The read callback simply adds the current read buffer to it's C<$body>
1028 variable and, most importantly, I<empties> it by assign the empty string
1029 to it.
1030
1031 After AnyEvent::Handle has been so instructed, it will now handle incoming
1032 data according to these instructions - if all goes well, the callback will
1033 be invoked with the response data, if not, it will get an error.
1034
1035 In general, you get pipelining very easy with AnyEvent::Handle: If
1036 you have a protocol with a request/response structure, your request
1037 methods/functions will all look like this (simplified):
1038
1039 sub request {
1040
1041 # send the request to the server
1042 $handle->push_write (...);
1043
1044 # push some response handlers
1045 $handle->push_read (...);
1046 }
1047
1048 =head3 Using it
1049
1050 And here is how you would use it:
1051
1052 http_get "www.google.com", "/", sub {
1053 my ($response, $header, $body) = @_;
1054
1055 print
1056 $response, "\n",
1057 $body;
1058 };
1059
1060 And of course, you can run as many of these requests in parallel as you
1061 want (and your memory supports).
1062
1063 =head3 HTTPS
1064
1065 Now, as promised, let's implement the same thing for HTTPS, or more
1066 correctly, let's change our C<http_get> function into a function that
1067 speaks HTTPS instead.
1068
1069 HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1070 B<S>ecurity is the official name for what most people refer to as C<SSL>)
1071 that contains standard HTTP protocol exchanges. The other difference to
1072 HTTP is that it uses port C<443> instead of port C<80>.
1073
1074 To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call
1075 we replace C<http> by C<https>):
1076
1077 tcp_connect $host, "https", sub { ...
1078
1079 The other change deals with TLS, which is something L<AnyEvent::Handle>
1080 does for us, as long as I<you> made sure that the L<Net::SSLeay> module is
1081 around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition
1082 C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1083
1084 tls => "connect",
1085
1086 Specifying C<tls> enables TLS, and the argument specifies whether
1087 AnyEvent::Handle is the server side ("accept") or the client side
1088 ("connect") for the TLS connection, as unlike TCP, there is a clear
1089 server/client relationship in TLS.
1090
1091 That's all.
1092
1093 Of course, all this should be handled transparently by C<http_get> after
1094 parsing the URL. See the part about exercising your inspiration earlier in
1095 this document.
1096
1097 =head3 The read queue - revisited
1098
1099 HTTP always uses the same structure in its responses, but many protocols
1100 require parsing responses different depending on the response itself.
1101
1102 For example, in SMTP, you normally get a single response line:
1103
1104 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1105
1106 But SMTP also supports multi-line responses:
1107
1108 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1109 220-hey guys
1110 220 my response is longer than yours
1111
1112 To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1113 C<unshift_read> will not append your read request tot he end of the read
1114 queue, but instead it will prepend it to the queue.
1115
1116 This is useful for this this situation: You push your response-line read
1117 request when sending the SMTP command, and when handling it, you look at
1118 the line to see if more is to come, and C<unshift_read> another reader,
1119 like this:
1120
1121 my $response; # response lines end up in here
1122
1123 my $read_response; $read_response = sub {
1124 my ($handle, $line) = @_;
1125
1126 $response .= "$line\n";
1127
1128 # check for continuation lines ("-" as 4th character")
1129 if ($line =~ /^...-/) {
1130 # if yes, then unshift another line read
1131 $handle->unshift_read (line => $read_response);
1132
1133 } else {
1134 # otherwise we are done
1135
1136 # free callback
1137 undef $read_response;
1138
1139 print "we are don reading: $response\n";
1140 }
1141 };
1142
1143 $handle->push_read (line => $read_response);
1144
1145 This recipe can be used for all similar parsing problems, for example in
1146 NNTP, the response code to some commands indicates that more data will be
1147 sent:
1148
1149 $handle->push_write ("article 42");
1150
1151 # read response line
1152 $handle->push_read (line => sub {
1153 my ($handle, $status) = @_;
1154
1155 # article data following?
1156 if ($status =~ /^2/) {
1157 # yes, read article body
1158
1159 $handle->unshift_read (line => "\012.\015\012", sub {
1160 my ($handle, $body) = @_;
1161
1162 $finish->($status, $body);
1163 });
1164
1165 } else {
1166 # some error occured, no article data
1167
1168 $finish->($status);
1169 }
1170 }
1171
1172 =head3 Your own read queue handler
1173
1174 Sometimes, your protocol doesn't play nice and uses lines or chunks of
1175 data, in which case you have to implement your own read parser.
1176
1177 To make up a contorted example, imagine you are looking for an even
1178 number of characters followed by a colon (":"). Also imagine that
1179 AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1180 had to do it manually.
1181
1182 To implement this, you would C<push_read> (or C<unshift_read>) just a
1183 single code reference.
1184
1185 This code reference will then be called each time there is (new) data
1186 available in the read buffer, and is expected to either eat/consume some
1187 of that data (and return true) or to return false to indicate that it
1188 wants to be called again.
1189
1190 If the code reference returns true, then it will be removed from the read
1191 queue, otherwise it stays in front of it.
1192
1193 The example above could be coded like this:
1194
1195 $handle->push_read (sub {
1196 my ($handle) = @_;
1197
1198 # check for even number of characters + ":"
1199 # and remove the data if a match is found.
1200 # if not, return false (actually nothing)
1201
1202 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1203 or return;
1204
1205 # we got some data in $1, pass it to whoever wants it
1206 $finish->($1);
1207
1208 # and return true to indicate we are done
1209 1
1210 });
1211
1212
1213 =head1 Authors
1214
1215 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1216