ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.18
Committed: Fri Jun 6 07:29:45 2008 UTC (16 years ago) by root
Branch: MAIN
CVS Tags: rel-4_151, rel-4_152, rel-4_15, rel-4_161, rel-4_160
Changes since 1.17: +54 -8 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 Introduction to AnyEvent
2
3 This is a tutorial that will introduce you to the features of AnyEvent.
4
5 The first part introduces the core AnyEvent module (after swamping you a
6 bit in evangelism), which might already provide all you ever need. If you
7 are only interested in AnyEvent's event handling capabilities, read no
8 further.
9
10 The second part focuses on network programming using sockets, for which
11 AnyEvent offers a lot of support you can use, and a lot of workarounds
12 around portability quirks.
13
14
15 =head1 What is AnyEvent?
16
17 If you don't care for the whys and want to see code, skip this section!
18
19 AnyEvent is first of all just a framework to do event-based
20 programming. Typically such frameworks are an all-or-nothing thing: If you
21 use one such framework, you can't (easily, or even at all) use another in
22 the same program.
23
24 AnyEvent is different - it is a thin abstraction layer above all kinds
25 of event loops. Its main purpose is to move the choice of the underlying
26 framework (the event loop) from the module author to the program author
27 using the module.
28
29 That means you can write code that uses events to control what it
30 does, without forcing other code in the same program to use the same
31 underlying framework as you do - i.e. you can create a Perl module
32 that is event-based using AnyEvent, and users of that module can still
33 choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
34 all: AnyEvent comes with its own event loop implementation, so your
35 code works regardless of other modules that might or might not be
36 installed. The latter is important, as AnyEvent does not have any
37 dependencies to other modules, which makes it easy to install, for
38 example, when you lack a C compiler.
39
40 A typical problem with Perl modules such as L<Net::IRC> is that they
41 come with their own event loop: In L<Net::IRC>, the program who uses it
42 needs to start the event loop of L<Net::IRC>. That means that one cannot
43 integrate this module into a L<Gtk2> GUI for instance, as that module,
44 too, enforces the use of its own event loop (namely L<Glib>).
45
46 Another example is L<LWP>: it provides no event interface at all. It's a
47 pure blocking HTTP (and FTP etc.) client library, which usually means that
48 you either have to start a thread or have to fork for a HTTP request, or
49 use L<Coro::LWP>, if you want to do something else while waiting for the
50 request to finish.
51
52 The motivation behind these designs is often that a module doesn't want to
53 depend on some complicated XS-module (Net::IRC), or that it doesn't want
54 to force the user to use some specific event loop at all (LWP).
55
56 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
57
58 =over 4
59
60 =item - write their own event loop (because guarantees to offer one
61 everywhere - even on windows).
62
63 =item - choose one fixed event loop (because AnyEvent works with all
64 important event loops available for Perl, and adding others is trivial).
65
66 =back
67
68 If the module author uses L<AnyEvent> for all his event needs (IO events,
69 timers, signals, ...) then all other modules can just use his module and
70 don't have to choose an event loop or adapt to his event loop. The choice
71 of the event loop is ultimately made by the program author who uses all
72 the modules and writes the main program. And even there he doesn't have to
73 choose, he can just let L<AnyEvent> choose the best available event loop
74 for him.
75
76 Read more about this in the main documentation of the L<AnyEvent> module.
77
78
79 =head1 Introduction to Event-Based Programming
80
81 So what exactly is programming using events? It quite simply means that
82 instead of your code actively waiting for something, such as the user
83 entering something on STDIN:
84
85 $| = 1; print "enter your name> ";
86
87 my $name = <STDIN>;
88
89 You instead tell your event framework to notify you in the event of some
90 data being available on STDIN, by using a callback mechanism:
91
92 use AnyEvent;
93
94 $| = 1; print "enter your name> ";
95
96 my $name;
97
98 my $wait_for_input = AnyEvent->io (
99 fh => \*STDIN, # which file handle to check
100 poll => "r", # which event to wait for ("r"ead data)
101 cb => sub { # what callback to execute
102 $name = <STDIN>; # read it
103 }
104 );
105
106 # do something else here
107
108 Looks more complicated, and surely is, but the advantage of using events
109 is that your program can do something else instead of waiting for
110 input. Waiting as in the first example is also called "blocking" because
111 you "block" your process from executing anything else while you do so.
112
113 The second example avoids blocking, by only registering interest in a read
114 event, which is fast and doesn't block your process. Only when read data
115 is available will the callback be called, which can then proceed to read
116 the data.
117
118 The "interest" is represented by an object returned by C<< AnyEvent->io
119 >> called a "watcher" object - called like that because it "watches" your
120 file handle (or other event sources) for the event you are interested in.
121
122 In the example above, we create an I/O watcher by calling the C<<
123 AnyEvent->io >> method. Disinterest in some event is simply expressed by
124 forgetting about the watcher, for example, by C<undef>'ing the variable it
125 is stored in. AnyEvent will automatically clean up the watcher if it is no
126 longer used, much like Perl closes your file handles if you no longer use
127 them anywhere.
128
129 =head3 A short note on callbacks
130
131 A common issue that hits people is the problem of passing parameters
132 to callbacks. Programmers used to languages such as C or C++ are often
133 used to a style where one passes the address of a function (a function
134 reference) and some data value, e.g.:
135
136 sub callback {
137 my ($arg) = @_;
138
139 $arg->method;
140 }
141
142 my $arg = ...;
143
144 call_me_back_later \&callback, $arg;
145
146 This is clumsy, as the place where behaviour is specified (when the
147 callback is registered) is often far away from the place where behaviour
148 is implemented. It also doesn't use Perl syntax to invoke the code. There
149 is also an abstraction penalty to pay as one has to I<name> the callback,
150 which often is unnecessary and leads to nonsensical or duplicated names.
151
152 In Perl, one can specify behaviour much more directly by using
153 I<closures>. Closures are code blocks that take a reference to the
154 enclosing scope(s) when they are created. This means lexical variables in scope at the time
155 of creating the closure can simply be used inside the closure:
156
157 my $arg = ...;
158
159 call_me_back_later sub { $arg->method };
160
161 Under most circumstances, closures are faster, use less resources and
162 result in much clearer code then the traditional approach. Faster,
163 because parameter passing and storing them in local variables in Perl
164 is relatively slow. Less resources, because closures take references to
165 existing variables without having to create new ones, and clearer code
166 because it is immediately obvious that the second example calls the
167 C<method> method when the callback is invoked.
168
169 Apart from these, the strongest argument for using closures with AnyEvent
170 is that AnyEvent does not allow passing parameters to the callback, so
171 closures are the only way to achieve that in most cases :->
172
173
174 =head2 Condition Variables
175
176 Back to the I/O watcher example: The code not yet a fully working program,
177 and will not work as-is. The reason is that your callback will not be
178 invoked out of the blue, you have to run the event loop. Also, event-based
179 programs sometimes have to block, too, as when there simply is nothing
180 else to do and everything waits for some events, it needs to block the
181 process as well.
182
183 In AnyEvent, this is done using condition variables. Condition variables
184 are named "condition variables" because they represent a condition that is
185 initially false and needs to be fulfilled.
186
187 You can also call them "merge points", "sync points", "rendezvous ports"
188 or even callbacks and many other things (and they are often called like
189 this in other frameworks). The important point is that you can create them
190 freely and later wait for them to become true.
191
192 Condition variables have two sides - one side is the "producer" of the
193 condition (whatever code detects and flags the condition), the other side
194 is the "consumer" (the code that waits for that condition).
195
196 In our example in the previous section, the producer is the event callback
197 and there is no consumer yet - let's change that right now:
198
199 use AnyEvent;
200
201 $| = 1; print "enter your name> ";
202
203 my $name;
204
205 my $name_ready = AnyEvent->condvar;
206
207 my $wait_for_input = AnyEvent->io (
208 fh => \*STDIN,
209 poll => "r",
210 cb => sub {
211 $name = <STDIN>;
212 $name_ready->send;
213 }
214 );
215
216 # do something else here
217
218 # now wait until the name is available:
219 $name_ready->recv;
220
221 undef $wait_for_input; # watche rno longer needed
222
223 print "your name is $name\n";
224
225 This program creates an AnyEvent condvar by calling the C<<
226 AnyEvent->condvar >> method. It then creates a watcher as usual, but
227 inside the callback it C<send>'s the C<$name_ready> condition variable,
228 which causes anybody waiting on it to continue.
229
230 The "anybody" in this case is the code that follows, which calls C<<
231 $name_ready->recv >>: The producer calls C<send>, the consumer calls
232 C<recv>.
233
234 If there is no C<$name> available yet, then the call to C<<
235 $name_ready->recv >> will halt your program until the condition becomes
236 true.
237
238 As the names C<send> and C<recv> imply, you can actually send and receive
239 data using this, for example, the above code could also be written like
240 this, without an extra variable to store the name in:
241
242 use AnyEvent;
243
244 $| = 1; print "enter your name> ";
245
246 my $name_ready = AnyEvent->condvar;
247
248 my $wait_for_input = AnyEvent->io (
249 fh => \*STDIN, poll => "r",
250 cb => sub { $name_ready->send (scalar = <STDIN>) }
251 );
252
253 # do something else here
254
255 # now wait and fetch the name
256 my $name = $name_ready->recv;
257
258 undef $wait_for_input; # watche rno longer needed
259
260 print "your name is $name\n";
261
262 You can pass any number of arguments to C<send>, and everybody call to
263 C<recv> will return them.
264
265 =head2 The "main loop"
266
267 Most event-based frameworks have something called a "main loop" or "event
268 loop run function" or something similar.
269
270 Just like in C<recv> AnyEvent, these functions need to be called
271 eventually so that your event loop has a chance of actually looking for
272 those events you are interested in.
273
274 For example, in a L<Gtk2> program, the above example could also be written
275 like this:
276
277 use Gtk2 -init;
278 use AnyEvent;
279
280 ############################################
281 # create a window and some label
282
283 my $window = new Gtk2::Window "toplevel";
284 $window->add (my $label = new Gtk2::Label "soon replaced by name");
285
286 $window->show_all;
287
288 ############################################
289 # do our AnyEvent stuff
290
291 $| = 1; print "enter your name> ";
292
293 my $name_ready = AnyEvent->condvar;
294
295 my $wait_for_input = AnyEvent->io (
296 fh => \*STDIN, poll => "r",
297 cb => sub {
298 # set the label
299 $label->set_text (scalar <STDIN>);
300 print "enter another name> ";
301 }
302 );
303
304 ############################################
305 # Now enter Gtk2's event loop
306
307 main Gtk2;
308
309 No condition variable anywhere in sight - instead, we just read a line
310 from STDIN and replace the text in the label. In fact, since nobody
311 C<undef>'s C<$wait_for_input> you can enter multiple lines.
312
313 Instead of waiting for a condition variable, the program enters the Gtk2
314 main loop by calling C<< Gtk2->main >>, which will block the program and
315 wait for events to arrive.
316
317 This also shows that AnyEvent is quite flexible - you didn't have anything
318 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
319 worked.
320
321 Admittedly, the example is a bit silly - who would want to read names
322 form standard input in a Gtk+ application. But imagine that instead of
323 doing that, you would make a HTTP request in the background and display
324 it's results. In fact, with event-based programming you can make many
325 http-requests in parallel in your program and still provide feedback to
326 the user and stay interactive.
327
328 In the next part you will see how to do just that - by implementing an
329 HTTP request, on our own, with the utility modules AnyEvent comes with.
330
331 Before that, however, let's briefly look at how you would write your
332 program with using only AnyEvent, without ever calling some other event
333 loop's run function.
334
335 In the example using condition variables, we used that, and in fact, this
336 is the solution:
337
338 my $quit_program = AnyEvent->condvar;
339
340 # create AnyEvent watchers (or not) here
341
342 $quit_program->recv;
343
344 If any of your watcher callbacks decide to quit, they can simply call
345 C<< $quit_program->send >>. Of course, they could also decide not to and
346 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
347 in a long-running daemon program).
348
349 In that case, you can simply use:
350
351 AnyEvent->condvar->recv;
352
353 And this is, in fact, closest to the idea of a main loop run function that
354 AnyEvent offers.
355
356 =head2 Timers and other event sources
357
358 So far, we have only used I/O watchers. These are useful mainly to find
359 out whether a Socket has data to read, or space to write more data. On sane
360 operating systems this also works for console windows/terminals (typically
361 on standard input), serial lines, all sorts of other devices, basically
362 almost everything that has a file descriptor but isn't a file itself. (As
363 usual, "sane" excludes windows - on that platform you would need different
364 functions for all of these, complicating code immensely - think "socket
365 only" on windows).
366
367 However, I/O is not everything - the second most important event source is
368 the clock. For example when doing an HTTP request you might want to time
369 out when the server doesn't answer within some predefined amount of time.
370
371 In AnyEvent, timer event watchers are created by calling the C<<
372 AnyEvent->timer >> method:
373
374 use AnyEvent;
375
376 my $cv = AnyEvent->condvar;
377
378 my $wait_one_and_a_half_seconds = AnyEvent->timer (
379 after => 1.5, # after how many seconds to invoke the cb?
380 cb => sub { # the callback to invoke
381 $cv->send;
382 },
383 );
384
385 # can do something else here
386
387 # now wait till our time has come
388 $cv->recv;
389
390 Unlike I/O watchers, timers are only interested in the amount of seconds
391 they have to wait. When that amount of time has passed, AnyEvent will
392 invoke your callback.
393
394 Unlike I/O watchers, which will call your callback as many times as there
395 is data available, timers are one-shot: after they have "fired" once and
396 invoked your callback, they are dead and no longer do anything.
397
398 To get a repeating timer, such as a timer firing roughly once per second,
399 you have to recreate it:
400
401 use AnyEvent;
402
403 my $time_watcher;
404
405 sub once_per_second {
406 print "tick\n";
407
408 # (re-)create the watcher
409 $time_watcher = AnyEvent->timer (
410 after => 1,
411 cb => \&once_per_second,
412 );
413 }
414
415 # now start the timer
416 once_per_second;
417
418 Having to recreate your timer is a restriction put on AnyEvent that is
419 present in most event libraries it uses. It is so annoying that some
420 future version might work around this limitation, but right now, it's the
421 only way to do repeating timers.
422
423 Fortunately most timers aren't really repeating but specify timeouts of
424 some sort.
425
426 =head3 More esoteric sources
427
428 AnyEvent also has some other, more esoteric event sources you can tap
429 into: signal and child watchers.
430
431 Signal watchers can be used to wait for "signal events", which simply
432 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
433
434 Process watchers wait for a child process to exit. They are useful when
435 you fork a separate process and need to know when it exits, but you do not
436 wait for that by blocking.
437
438 Both watcher types are described in detail in the main L<AnyEvent> manual
439 page.
440
441
442 =head1 Network programming and AnyEvent
443
444 So far you have seen how to register event watchers and handle events.
445
446 This is a great foundation to write network clients and servers, and might be
447 all that your module (or program) ever requires, but writing your own I/O
448 buffering again and again becomes tedious, not to mention that it attracts
449 errors.
450
451 While the core L<AnyEvent> module is still small and self-contained,
452 the distribution comes with some very useful utility modules such as
453 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
454 make your life as non-blocking network programmer a lot easier.
455
456 Here is a quick overview over these three modules:
457
458 =head2 L<AnyEvent::DNS>
459
460 This module allows fully asynchronous DNS resolution. It is used mainly by
461 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
462 a great way to do other DNS resolution tasks, such as reverse lookups of
463 IP addresses for log files.
464
465 =head2 L<AnyEvent::Handle>
466
467 This module handles non-blocking IO on file handles in an event based
468 manner. It provides a wrapper object around your file handle that provides
469 queueing and buffering of incoming and outgoing data for you.
470
471 It also implements the most common data formats, such as text lines, or
472 fixed and variable-width data blocks.
473
474 =head2 L<AnyEvent::Socket>
475
476 This module provides you with functions that handle socket creation
477 and IP address magic. The two main functions are C<tcp_connect> and
478 C<tcp_server>. The former will connect a (streaming) socket to an internet
479 host for you and the later will make a server socket for you, to accept
480 connections.
481
482 This module also comes with transparent IPv6 support, this means: If you
483 write your programs with this module, you will be IPv6 ready without doing
484 anything special.
485
486 It also works around a lot of portability quirks (especially on the
487 windows platform), which makes it even easier to write your programs in a
488 portable way (did you know that windows uses different error codes for all
489 socket functions and that Perl does not know about these? That "Unknown
490 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
491 successful? That unsuccessful TCP connects might never be reported back
492 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
493 ignored instead of being in progress? AnyEvent::Socket works around all of
494 these Windows/Perl bugs for you).
495
496 =head2 Implementing a parallel finger client with non-blocking connects
497 and AnyEvent::Socket
498
499 The finger protocol is one of the simplest protocols in use on the
500 internet. Or in use in the past, as almost nobody uses it anymore.
501
502 It works by connecting to the finger port on another host, writing a
503 single line with a user name and then reading the finger response, as
504 specified by that user. OK, RFC 1288 specifies a vastly more complex
505 protocol, but it basically boils down to this:
506
507 # telnet idsoftware.com finger
508 Trying 192.246.40.37...
509 Connected to idsoftware.com (192.246.40.37).
510 Escape character is '^]'.
511 johnc
512 Welcome to id Software's Finger Service V1.5!
513
514 [...]
515 Now on the web:
516 [...]
517
518 Connection closed by foreign host.
519
520 "Now on the web..." yeah, I<was> used indeed, but at least the finger
521 daemon still works, so let's write a little AnyEvent function that makes a
522 finger request:
523
524 use AnyEvent;
525 use AnyEvent::Socket;
526
527 sub finger($$) {
528 my ($user, $host) = @_;
529
530 # use a condvar to return results
531 my $cv = AnyEvent->condvar;
532
533 # first, connect to the host
534 tcp_connect $host, "finger", sub {
535 # the callback receives the socket handle - or nothing
536 my ($fh) = @_
537 or return $cv->send;
538
539 # now write the username
540 syswrite $fh, "$user\015\012";
541
542 my $response;
543
544 # register a read watcher
545 my $read_watcher; $read_watcher = AnyEvent->io (
546 fh => $fh,
547 poll => "r",
548 cb => sub {
549 my $len = sysread $fh, $response, 1024, length $response;
550
551 if ($len <= 0) {
552 # we are done, or an error occured, lets ignore the latter
553 undef $read_watcher; # no longer interested
554 $cv->send ($response); # send results
555 }
556 },
557 );
558 };
559
560 # pass $cv to the caller
561 $cv
562 }
563
564 That's a mouthful! Let's dissect this function a bit, first the overall
565 function and execution flow:
566
567 sub finger($$) {
568 my ($user, $host) = @_;
569
570 # use a condvar to return results
571 my $cv = AnyEvent->condvar;
572
573 # first, connect to the host
574 tcp_connect $host, "finger", sub {
575 ...
576 };
577
578 $cv
579 }
580
581 This isn't too complicated, just a function with two parameters, that
582 creates a condition variable, returns it, and while it does that,
583 initiates a TCP connect to C<$host>. The condition variable will be used
584 by the caller to receive the finger response, but one could equally well
585 pass a third argument, a callback, to the function.
586
587 Since we are programming event'ish, we do not wait for the connect to
588 finish - it could block the program for a minute or longer!
589
590 Instead, we pass the callback it should invoke when the connect is done to
591 C<tcp_connect>. If it is successful, that callback gets called with the
592 socket handle as first argument, otherwise, nothing will be passed to our
593 callback. The important point is that it will always be called as soon as
594 the outcome of the TCP connect is known.
595
596 This style of programming is also called "continuation style": the
597 "continuation" is simply the way the program continues - normally, a
598 program continues at the next line after some statement (the exception
599 is loops or things like C<return>). When we are interested in events,
600 however, we instead specify the "continuation" of our program by passing a
601 closure, which makes that closure the "continuation" of the program. The
602 C<tcp_connect> call is like saying "return now, and when the connection is
603 established or it failed, continue there".
604
605 Now let's look at the callback/closure in more detail:
606
607 # the callback receives the socket handle - or nothing
608 my ($fh) = @_
609 or return $cv->send;
610
611 The first thing the callback does is indeed save the socket handle in
612 C<$fh>. When there was an error (no arguments), then our instinct as
613 expert Perl programmers would tell us to C<die>:
614
615 my ($fh) = @_
616 or die "$host: $!";
617
618 While this would give good feedback to the user (if he happens to watch
619 standard error), our program would probably stop working here, as we never
620 report the results to anybody, certainly not the caller of our C<finger>
621 function, and most event loops continue even after a C<die>!
622
623 This is why we instead C<return>, but also call C<< $cv->send >> without
624 any arguments to signal to the condvar consumer that something bad has
625 happened. The return value of C<< $cv->send >> is irrelevant, as is the
626 return value of our callback. The return statement is simply used for the
627 side effect of, well, returning immediately from the callback. Checking
628 for errors and handling them this way is very common, which is why this
629 compact idiom is so handy.
630
631 As the next step in the finger protocol, we send the username to the
632 finger daemon on the other side of our connection:
633
634 syswrite $fh, "$user\015\012";
635
636 Note that this isn't 100% clean socket programming - the socket could,
637 for whatever reasons, not accept our data. When writing a small amount
638 of data like in this example it doesn't matter, as a socket buffer is
639 almost always big enough for a mere "username", but for real-world
640 cases you might need to implement some kind of write buffering - or use
641 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
642 next section.
643
644 What we I<do> have to do is to implement our own read buffer - the response
645 data could arrive late or in multiple chunks, and we cannot just wait for
646 it (event-based programming, you know?).
647
648 To do that, we register a read watcher on the socket which waits for data:
649
650 my $read_watcher; $read_watcher = AnyEvent->io (
651 fh => $fh,
652 poll => "r",
653
654 There is a trick here, however: the read watcher isn't stored in a global
655 variable, but in a local one - if the callback returns, it would normally
656 destroy the variable and its contents, which would in turn unregister our
657 watcher.
658
659 To avoid that, we C<undef>ine the variable in the watcher callback. This
660 means that, when the C<tcp_connect> callback returns, that perl thinks
661 (quite correctly) that the read watcher is still in use - namely in the
662 callback.
663
664 The trick, however, is that instead of:
665
666 my $read_watcher = AnyEvent->io (...
667
668 The program does:
669
670 my $read_watcher; $read_watcher = AnyEvent->io (...
671
672 The reason for this is a quirk in the way Perl works: variable names
673 declared with C<my> are only visible in the I<next> statement. If the
674 whole C<< AnyEvent->io >> call, including the callback, would be done in
675 a single statement, the callback could not refer to the C<$read_watcher>
676 variable to undefine it, so it is done in two statements.
677
678 Whether you'd want to format it like this is of course a matter of style,
679 this way emphasizes that the declaration and assignment really are one
680 logical statement.
681
682 The callback itself calls C<sysread> for as many times as necessary, until
683 C<sysread> returns either an error or end-of-file:
684
685 cb => sub {
686 my $len = sysread $fh, $response, 1024, length $response;
687
688 if ($len <= 0) {
689
690 Note that C<sysread> has the ability to append data it reads to a scalar,
691 by specifying an offset, which is what we make good use of in this
692 example.
693
694 When C<sysread> indicates we are done, the callback C<undef>ines
695 the watcher and then C<send>'s the response data to the condition
696 variable. All this has the following effects:
697
698 Undefining the watcher destroys it, as our callback was the only one still
699 having a reference to it. When the watcher gets destroyed, it destroys the
700 callback, which in turn means the C<$fh> handle is no longer used, so that
701 gets destroyed as well. The result is that all resources will be nicely
702 cleaned up by perl for us.
703
704 =head3 Using the finger client
705
706 Now, we could probably write the same finger client in a simpler way if
707 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
708 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
709
710 But the main advantage is that we can not only run this finger function in
711 the background, we even can run multiple sessions in parallel, like this:
712
713 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
714 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
715 my $f3 = finger "johnc", "idsoftware.com"; # finger john
716
717 print "trouble tickets:\n", $f1->recv, "\n";
718 print "trouble ticket #1736:\n", $f2->recv, "\n";
719 print "john carmacks finger file: ", $f3->recv, "\n";
720
721 It doesn't look like it, but in fact all three requests run in
722 parallel. The code waits for the first finger request to finish first, but
723 that doesn't keep it from executing them parallel: when the first C<recv>
724 call sees that the data isn't ready yet, it serves events for all three
725 requests automatically, until the first request has finished.
726
727 The second C<recv> call might either find the data is already there, or it
728 will continue handling events until that is the case, and so on.
729
730 By taking advantage of network latencies, which allows us to serve other
731 requests and events while we wait for an event on one socket, the overall
732 time to do these three requests will be greatly reduced, typically all
733 three are done in the same time as the slowest of them would need to finish.
734
735 By the way, you do not actually have to wait in the C<recv> method on an
736 AnyEvent condition variable - after all, waiting is evil - you can also
737 register a callback:
738
739 $cv->cb (sub {
740 my $response = shift->recv;
741 # ...
742 });
743
744 The callback will only be invoked when C<send> was called. In fact,
745 instead of returning a condition variable you could also pass a third
746 parameter to your finger function, the callback to invoke with the
747 response:
748
749 sub finger($$$) {
750 my ($user, $host, $cb) = @_;
751
752 How you implement it is a matter of taste - if you expect your function to
753 be used mainly in an event-based program you would normally prefer to pass
754 a callback directly. If you write a module and expect your users to use
755 it "synchronously" often (for example, a simple http-get script would not
756 really care much for events), then you would use a condition variable and
757 tell them "simply ->recv the data".
758
759 =head3 Problems with the implementation and how to fix them
760
761 To make this example more real-world-ready, we would not only implement
762 some write buffering (for the paranoid), but we would also have to handle
763 timeouts and maybe protocol errors.
764
765 Doing this quickly gets unwieldy, which is why we introduce
766 L<AnyEvent::Handle> in the next section, which takes care of all these
767 details for you and let's you concentrate on the actual protocol.
768
769
770 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
771
772 The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
773 see what it really offers.
774
775 As finger is such a simple protocol, let's try something slightly more
776 complicated: HTTP/1.0.
777
778 An HTTP GET request works by sending a single request line that indicates
779 what you want the server to do and the URI you want to act it on, followed
780 by as many "header" lines (C<Header: data>, same as e-mail headers) as
781 required for the request, ended by an empty line.
782
783 The response is formatted very similarly, first a line with the response
784 status, then again as many header lines as required, then an empty line,
785 followed by any data that the server might send.
786
787 Again, let's try it out with C<telnet> (I condensed the output a bit - if
788 you want to see the full response, do it yourself).
789
790 # telnet www.google.com 80
791 Trying 209.85.135.99...
792 Connected to www.google.com (209.85.135.99).
793 Escape character is '^]'.
794 GET /test HTTP/1.0
795
796 HTTP/1.0 404 Not Found
797 Date: Mon, 02 Jun 2008 07:05:54 GMT
798 Content-Type: text/html; charset=UTF-8
799
800 <html><head>
801 [...]
802 Connection closed by foreign host.
803
804 The C<GET ...> and the empty line were entered manually, the rest of the
805 telnet output is google's response, in which case a C<404 not found> one.
806
807 So, here is how you would do it with C<AnyEvent::Handle>:
808
809 sub http_get {
810 my ($host, $uri, $cb) = @_;
811
812 tcp_connect $host, "http", sub {
813 my ($fh) = @_
814 or $cb->("HTTP/1.0 500 $!");
815
816 # store results here
817 my ($response, $header, $body);
818
819 my $handle; $handle = new AnyEvent::Handle
820 fh => $fh,
821 on_error => sub {
822 undef $handle;
823 $cb->("HTTP/1.0 500 $!");
824 },
825 on_eof => sub {
826 undef $handle; # keep it alive till eof
827 $cb->($response, $header, $body);
828 };
829
830 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
831
832 # now fetch response status line
833 $handle->push_read (line => sub {
834 my ($handle, $line) = @_;
835 $response = $line;
836 });
837
838 # then the headers
839 $handle->push_read (line => "\015\012\015\012", sub {
840 my ($handle, $line) = @_;
841 $header = $line;
842 });
843
844 # and finally handle any remaining data as body
845 $handle->on_read (sub {
846 $body .= $_[0]->rbuf;
847 $_[0]->rbuf = "";
848 });
849 };
850 }
851
852 And now let's go through it step by step. First, as usual, the overall
853 C<http_get> function structure:
854
855 sub http_get {
856 my ($host, $uri, $cb) = @_;
857
858 tcp_connect $host, "http", sub {
859 ...
860 };
861 }
862
863 Unlike in the finger example, this time the caller has to pass a callback
864 to C<http_get>. Also, instead of passing a URL as one would expect, the
865 caller has to provide the hostname and URI - normally you would use the
866 C<URI> module to parse a URL and separate it into those parts, but that is
867 left to the inspired reader :)
868
869 Since everything else is left to the caller, all C<http_get> does it to
870 initiate the connection with C<tcp_connect> and leave everything else to
871 it's callback.
872
873 The first thing the callback does is check for connection errors and
874 declare some variables:
875
876 my ($fh) = @_
877 or $cb->("HTTP/1.0 500 $!");
878
879 my ($response, $header, $body);
880
881 Instead of having an extra mechanism to signal errors, connection errors
882 are signalled by crafting a special "response status line", like this:
883
884 HTTP/1.0 500 Connection refused
885
886 This means the caller cannot distinguish (easily) between
887 locally-generated errors and server errors, but it simplifies error
888 handling for the caller a lot.
889
890 The next step finally involves L<AnyEvent::Handle>, namely it creates the
891 handle object:
892
893 my $handle; $handle = new AnyEvent::Handle
894 fh => $fh,
895 on_error => sub {
896 undef $handle;
897 $cb->("HTTP/1.0 500 $!");
898 },
899 on_eof => sub {
900 undef $handle; # keep it alive till eof
901 $cb->($response, $header, $body);
902 };
903
904 The constructor expects a file handle, which gets passed via the C<fh>
905 argument.
906
907 The remaining two argument pairs specify two callbacks to be called on
908 any errors (C<on_error>) and in the case of a normal connection close
909 (C<on_eof>).
910
911 In the first case, we C<undef>ine the handle object and pass the error to
912 the callback provided by the callback - done.
913
914 In the second case we assume everything went fine and pass the results
915 gobbled up so far to the caller-provided callback. This is not quite
916 perfect, as when the server "cleanly" closes the connection in the middle
917 of sending headers we might wrongly report this as an "OK" to the caller,
918 but then, HTTP doesn't support a perfect mechanism that would detect such
919 problems in all cases, so we don't bother either.
920
921 =head3 The write queue
922
923 The next line sends the actual request:
924
925 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
926
927 No headers will be sent (this is fine for simple requests), so the whole
928 request is just a single line followed by an empty line to signal the end
929 of the headers to the server.
930
931 The more interesting question is why the method is called C<push_write>
932 and not just write. The reason is that you can I<always> add some write
933 data without blocking, and to do this, AnyEvent::Handle needs some write
934 queue internally - and C<push_write> simply pushes some data at the end of
935 that queue, just like Perl's C<push> pushes data at the end of an array.
936
937 The deeper reason is that at some point in the future, there might
938 be C<unshift_write> as well, and in any case, we will shortly meet
939 C<push_read> and C<unshift_read>, and it's usually easiest if all those
940 functions have some symmetry in their name.
941
942 If C<push_write> is called with more than one argument, then you can even
943 do I<formatted> I/O, which simply means your data will be transformed in
944 some ways. For example, this would JSON-encode your data before pushing it
945 to the write queue:
946
947 $handle->push_write (json => [1, 2, 3]);
948
949 Apart from that, this pretty much summarises the write queue, there is
950 little else to it.
951
952 Reading the response if far more interesting:
953
954 =head3 The read queue
955
956 The response consists of three parts: a single line of response status, a
957 single paragraph of headers ended by an empty line, and the request body,
958 which is simply the remaining data on that connection.
959
960 For the first two, we push two read requests onto the read queue:
961
962 # now fetch response status line
963 $handle->push_read (line => sub {
964 my ($handle, $line) = @_;
965 $response = $line;
966 });
967
968 # then the headers
969 $handle->push_read (line => "\015\012\015\012", sub {
970 my ($handle, $line) = @_;
971 $header = $line;
972 });
973
974 While one can simply push a single callback to the queue, I<formatted> I/O
975 really comes to out advantage here, as there is a ready-made "read line"
976 read type. The first read expects a single line, ended by C<\015\012> (the
977 standard end-of-line marker in internet protocols).
978
979 The second "line" is actually a single paragraph - instead of reading it
980 line by line we tell C<push_read> that the end-of-line marker is really
981 C<\015\012\015\012>, which is an empty line. The result is that the whole
982 header paragraph will be treated as a single line and read. The word
983 "line" is interpreted very freely, much like Perl itself does it.
984
985 Note that push read requests are pushed immediately after creating the
986 handle object - since AnyEvent::Handle provides a queue we can push as
987 many requests as we want, and AnyEvent::Handle will handle them in order.
988
989 There is, however, no read type for "the remaining data". For that, we
990 install our own C<on_read> callback:
991
992 # and finally handle any remaining data as body
993 $handle->on_read (sub {
994 $body .= $_[0]->rbuf;
995 $_[0]->rbuf = "";
996 });
997
998 This callback is invoked every time data arrives and the read queue is
999 empty - which in this example will only be the case when both response and
1000 header have been read. The C<on_read> callback could actually have been
1001 specified when constructing the object, but doing it this way preserves
1002 logical ordering.
1003
1004 The read callback simply adds the current read buffer to it's C<$body>
1005 variable and, most importantly, I<empties> it by assign the empty string
1006 to it.
1007
1008 After AnyEvent::Handle has been so instructed, it will now handle incoming
1009 data according to these instructions - if all goes well, the callback will
1010 be invoked with the response data, if not, it will get an error.
1011
1012 In general, you get pipelining very easy with AnyEvent::Handle: If
1013 you have a protocol with a request/response structure, your request
1014 methods/functions will all look like this (simplified):
1015
1016 sub request {
1017
1018 # send the request to the server
1019 $handle->push_write (...);
1020
1021 # push some response handlers
1022 $handle->push_read (...);
1023 }
1024
1025 =head3 Using it
1026
1027 And here is how you would use it:
1028
1029 http_get "www.google.com", "/", sub {
1030 my ($response, $header, $body) = @_;
1031
1032 print
1033 $response, "\n",
1034 $body;
1035 };
1036
1037 And of course, you can run as many of these requests in parallel as you
1038 want (and your memory supports).
1039
1040 =head3 HTTPS
1041
1042 Now, as promised, let's implement the same thing for HTTPS, or more
1043 correctly, let's change our C<http_get> function into a function that
1044 speaks HTTPS instead.
1045
1046 HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1047 B<S>ecurity is the official name for what most people refer to as C<SSL>)
1048 that contains standard HTTP protocol exchanges. The other difference to
1049 HTTP is that it uses port C<443> instead of port C<80>.
1050
1051 To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call
1052 we replace C<http> by C<https>):
1053
1054 tcp_connect $host, "https", sub { ...
1055
1056 The other change deals with TLS, which is something L<AnyEvent::Handle>
1057 does for us, as long as I<you> made sure that the L<Net::SSLeay> module is
1058 around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition
1059 C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1060
1061 tls => "connect",
1062
1063 Specifying C<tls> enables TLS, and the argument specifies whether
1064 AnyEvent::Handle is the server side ("accept") or the client side
1065 ("connect") for the TLS connection, as unlike TCP, there is a clear
1066 server/client relationship in TLS.
1067
1068 That's all.
1069
1070 Of course, all this should be handled transparently by C<http_get> after
1071 parsing the URL. See the part about exercising your inspiration earlier in
1072 this document.
1073
1074 =head3 The read queue - revisited
1075
1076 HTTP always uses the same structure in its responses, but many protocols
1077 require parsing responses different depending on the response itself.
1078
1079 For example, in SMTP, you normally get a single response line:
1080
1081 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1082
1083 But SMTP also supports multi-line responses:
1084
1085 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1086 220-hey guys
1087 220 my response is longer than yours
1088
1089 To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1090 C<unshift_read> will not append your read request tot he end of the read
1091 queue, but instead it will prepend it to the queue.
1092
1093 This is useful for this this situation: You push your response-line read
1094 request when sending the SMTP command, and when handling it, you look at
1095 the line to see if more is to come, and C<unshift_read> another reader,
1096 like this:
1097
1098 my $response; # response lines end up in here
1099
1100 my $read_response; $read_response = sub {
1101 my ($handle, $line) = @_;
1102
1103 $response .= "$line\n";
1104
1105 # check for continuation lines ("-" as 4th character")
1106 if ($line =~ /^...-/) {
1107 # if yes, then unshift another line read
1108 $handle->unshift_read (line => $read_response);
1109
1110 } else {
1111 # otherwise we are done
1112
1113 # free callback
1114 undef $read_response;
1115
1116 print "we are don reading: $response\n";
1117 }
1118 };
1119
1120 $handle->push_read (line => $read_response);
1121
1122 This recipe can be used for all similar parsing problems, for example in
1123 NNTP, the response code to some commands indicates that more data will be
1124 sent:
1125
1126 $handle->push_write ("article 42");
1127
1128 # read response line
1129 $handle->push_read (line => sub {
1130 my ($handle, $status) = @_;
1131
1132 # article data following?
1133 if ($status =~ /^2/) {
1134 # yes, read article body
1135
1136 $handle->unshift_read (line => "\012.\015\012", sub {
1137 my ($handle, $body) = @_;
1138
1139 $finish->($status, $body);
1140 });
1141
1142 } else {
1143 # some error occured, no article data
1144
1145 $finish->($status);
1146 }
1147 }
1148
1149 =head3 Your own read queue handler
1150
1151 Sometimes, your protocol doesn't play nice and uses lines or chunks of
1152 data, in which case you have to implement your own read parser.
1153
1154 To make up a contorted example, imagine you are looking for an even
1155 number of characters followed by a colon (":"). Also imagine that
1156 AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1157 had to do it manually.
1158
1159 To implement this, you would C<push_read> (or C<unshift_read>) just a
1160 single code reference.
1161
1162 This code reference will then be called each time there is (new) data
1163 available in the read buffer, and is expected to either eat/consume some
1164 of that data (and return true) or to return false to indicate that it
1165 wants to be called again.
1166
1167 If the code reference returns true, then it will be removed from the read
1168 queue, otherwise it stays in front of it.
1169
1170 The example above could be coded like this:
1171
1172 $handle->push_read (sub {
1173 my ($handle) = @_;
1174
1175 # check for even number of characters + ":"
1176 # and remove the data if a match is found.
1177 # if not, return false (actually nothing)
1178
1179 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1180 or return;
1181
1182 # we got some data in $1, pass it to whoever wants it
1183 $finish->($1);
1184
1185 # and return true to indicate we are done
1186 1
1187 });
1188
1189
1190 =head1 Authors
1191
1192 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1193