ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.19
Committed: Wed Jul 9 11:53:40 2008 UTC (15 years, 11 months ago) by root
Branch: MAIN
CVS Tags: rel-4_23, rel-4_21, rel-4_231, rel-4_233, rel-4_232, rel-4_234, rel-4_22, rel-4_2, rel-4_3, rel-4_31
Changes since 1.18: +19 -0 lines
Log Message:
document AnyEvent::Strict

File Contents

# Content
1 =head1 Introduction to AnyEvent
2
3 This is a tutorial that will introduce you to the features of AnyEvent.
4
5 The first part introduces the core AnyEvent module (after swamping you a
6 bit in evangelism), which might already provide all you ever need. If you
7 are only interested in AnyEvent's event handling capabilities, read no
8 further.
9
10 The second part focuses on network programming using sockets, for which
11 AnyEvent offers a lot of support you can use, and a lot of workarounds
12 around portability quirks.
13
14
15 =head1 What is AnyEvent?
16
17 If you don't care for the whys and want to see code, skip this section!
18
19 AnyEvent is first of all just a framework to do event-based
20 programming. Typically such frameworks are an all-or-nothing thing: If you
21 use one such framework, you can't (easily, or even at all) use another in
22 the same program.
23
24 AnyEvent is different - it is a thin abstraction layer above all kinds
25 of event loops. Its main purpose is to move the choice of the underlying
26 framework (the event loop) from the module author to the program author
27 using the module.
28
29 That means you can write code that uses events to control what it
30 does, without forcing other code in the same program to use the same
31 underlying framework as you do - i.e. you can create a Perl module
32 that is event-based using AnyEvent, and users of that module can still
33 choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
34 all: AnyEvent comes with its own event loop implementation, so your
35 code works regardless of other modules that might or might not be
36 installed. The latter is important, as AnyEvent does not have any
37 dependencies to other modules, which makes it easy to install, for
38 example, when you lack a C compiler.
39
40 A typical problem with Perl modules such as L<Net::IRC> is that they
41 come with their own event loop: In L<Net::IRC>, the program who uses it
42 needs to start the event loop of L<Net::IRC>. That means that one cannot
43 integrate this module into a L<Gtk2> GUI for instance, as that module,
44 too, enforces the use of its own event loop (namely L<Glib>).
45
46 Another example is L<LWP>: it provides no event interface at all. It's a
47 pure blocking HTTP (and FTP etc.) client library, which usually means that
48 you either have to start a thread or have to fork for a HTTP request, or
49 use L<Coro::LWP>, if you want to do something else while waiting for the
50 request to finish.
51
52 The motivation behind these designs is often that a module doesn't want to
53 depend on some complicated XS-module (Net::IRC), or that it doesn't want
54 to force the user to use some specific event loop at all (LWP).
55
56 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
57
58 =over 4
59
60 =item - write their own event loop (because guarantees to offer one
61 everywhere - even on windows).
62
63 =item - choose one fixed event loop (because AnyEvent works with all
64 important event loops available for Perl, and adding others is trivial).
65
66 =back
67
68 If the module author uses L<AnyEvent> for all his event needs (IO events,
69 timers, signals, ...) then all other modules can just use his module and
70 don't have to choose an event loop or adapt to his event loop. The choice
71 of the event loop is ultimately made by the program author who uses all
72 the modules and writes the main program. And even there he doesn't have to
73 choose, he can just let L<AnyEvent> choose the best available event loop
74 for him.
75
76 Read more about this in the main documentation of the L<AnyEvent> module.
77
78
79 =head1 Introduction to Event-Based Programming
80
81 So what exactly is programming using events? It quite simply means that
82 instead of your code actively waiting for something, such as the user
83 entering something on STDIN:
84
85 $| = 1; print "enter your name> ";
86
87 my $name = <STDIN>;
88
89 You instead tell your event framework to notify you in the event of some
90 data being available on STDIN, by using a callback mechanism:
91
92 use AnyEvent;
93
94 $| = 1; print "enter your name> ";
95
96 my $name;
97
98 my $wait_for_input = AnyEvent->io (
99 fh => \*STDIN, # which file handle to check
100 poll => "r", # which event to wait for ("r"ead data)
101 cb => sub { # what callback to execute
102 $name = <STDIN>; # read it
103 }
104 );
105
106 # do something else here
107
108 Looks more complicated, and surely is, but the advantage of using events
109 is that your program can do something else instead of waiting for
110 input. Waiting as in the first example is also called "blocking" because
111 you "block" your process from executing anything else while you do so.
112
113 The second example avoids blocking, by only registering interest in a read
114 event, which is fast and doesn't block your process. Only when read data
115 is available will the callback be called, which can then proceed to read
116 the data.
117
118 The "interest" is represented by an object returned by C<< AnyEvent->io
119 >> called a "watcher" object - called like that because it "watches" your
120 file handle (or other event sources) for the event you are interested in.
121
122 In the example above, we create an I/O watcher by calling the C<<
123 AnyEvent->io >> method. Disinterest in some event is simply expressed by
124 forgetting about the watcher, for example, by C<undef>'ing the variable it
125 is stored in. AnyEvent will automatically clean up the watcher if it is no
126 longer used, much like Perl closes your file handles if you no longer use
127 them anywhere.
128
129 =head3 A short note on callbacks
130
131 A common issue that hits people is the problem of passing parameters
132 to callbacks. Programmers used to languages such as C or C++ are often
133 used to a style where one passes the address of a function (a function
134 reference) and some data value, e.g.:
135
136 sub callback {
137 my ($arg) = @_;
138
139 $arg->method;
140 }
141
142 my $arg = ...;
143
144 call_me_back_later \&callback, $arg;
145
146 This is clumsy, as the place where behaviour is specified (when the
147 callback is registered) is often far away from the place where behaviour
148 is implemented. It also doesn't use Perl syntax to invoke the code. There
149 is also an abstraction penalty to pay as one has to I<name> the callback,
150 which often is unnecessary and leads to nonsensical or duplicated names.
151
152 In Perl, one can specify behaviour much more directly by using
153 I<closures>. Closures are code blocks that take a reference to the
154 enclosing scope(s) when they are created. This means lexical variables in scope at the time
155 of creating the closure can simply be used inside the closure:
156
157 my $arg = ...;
158
159 call_me_back_later sub { $arg->method };
160
161 Under most circumstances, closures are faster, use less resources and
162 result in much clearer code then the traditional approach. Faster,
163 because parameter passing and storing them in local variables in Perl
164 is relatively slow. Less resources, because closures take references to
165 existing variables without having to create new ones, and clearer code
166 because it is immediately obvious that the second example calls the
167 C<method> method when the callback is invoked.
168
169 Apart from these, the strongest argument for using closures with AnyEvent
170 is that AnyEvent does not allow passing parameters to the callback, so
171 closures are the only way to achieve that in most cases :->
172
173
174 =head3 A hint on debugging
175
176 AnyEvent does, by default, not do any argument checking. This can lead to
177 strange and unexpected results especially if you are trying to learn yur
178 ways with AnyEvent.
179
180 AnyEvent supports a special "strict" mode, off by default, which does very
181 strict argument checking, at the expense of being somewhat slower. When
182 developing, however, this mode is very useful.
183
184 You can enable this strict mode either by having an environment variable
185 C<PERL_ANYEVENT_STRICT> with a true value in your environment:
186
187 PERL_ANYEVENT_STRICT=1 perl test.pl
188
189 Or you can write C<use AnyEvent::Strict> in your program, which has the
190 same effect (do not do this in production, however).
191
192
193 =head2 Condition Variables
194
195 Back to the I/O watcher example: The code not yet a fully working program,
196 and will not work as-is. The reason is that your callback will not be
197 invoked out of the blue, you have to run the event loop. Also, event-based
198 programs sometimes have to block, too, as when there simply is nothing
199 else to do and everything waits for some events, it needs to block the
200 process as well.
201
202 In AnyEvent, this is done using condition variables. Condition variables
203 are named "condition variables" because they represent a condition that is
204 initially false and needs to be fulfilled.
205
206 You can also call them "merge points", "sync points", "rendezvous ports"
207 or even callbacks and many other things (and they are often called like
208 this in other frameworks). The important point is that you can create them
209 freely and later wait for them to become true.
210
211 Condition variables have two sides - one side is the "producer" of the
212 condition (whatever code detects and flags the condition), the other side
213 is the "consumer" (the code that waits for that condition).
214
215 In our example in the previous section, the producer is the event callback
216 and there is no consumer yet - let's change that right now:
217
218 use AnyEvent;
219
220 $| = 1; print "enter your name> ";
221
222 my $name;
223
224 my $name_ready = AnyEvent->condvar;
225
226 my $wait_for_input = AnyEvent->io (
227 fh => \*STDIN,
228 poll => "r",
229 cb => sub {
230 $name = <STDIN>;
231 $name_ready->send;
232 }
233 );
234
235 # do something else here
236
237 # now wait until the name is available:
238 $name_ready->recv;
239
240 undef $wait_for_input; # watche rno longer needed
241
242 print "your name is $name\n";
243
244 This program creates an AnyEvent condvar by calling the C<<
245 AnyEvent->condvar >> method. It then creates a watcher as usual, but
246 inside the callback it C<send>'s the C<$name_ready> condition variable,
247 which causes anybody waiting on it to continue.
248
249 The "anybody" in this case is the code that follows, which calls C<<
250 $name_ready->recv >>: The producer calls C<send>, the consumer calls
251 C<recv>.
252
253 If there is no C<$name> available yet, then the call to C<<
254 $name_ready->recv >> will halt your program until the condition becomes
255 true.
256
257 As the names C<send> and C<recv> imply, you can actually send and receive
258 data using this, for example, the above code could also be written like
259 this, without an extra variable to store the name in:
260
261 use AnyEvent;
262
263 $| = 1; print "enter your name> ";
264
265 my $name_ready = AnyEvent->condvar;
266
267 my $wait_for_input = AnyEvent->io (
268 fh => \*STDIN, poll => "r",
269 cb => sub { $name_ready->send (scalar = <STDIN>) }
270 );
271
272 # do something else here
273
274 # now wait and fetch the name
275 my $name = $name_ready->recv;
276
277 undef $wait_for_input; # watche rno longer needed
278
279 print "your name is $name\n";
280
281 You can pass any number of arguments to C<send>, and everybody call to
282 C<recv> will return them.
283
284 =head2 The "main loop"
285
286 Most event-based frameworks have something called a "main loop" or "event
287 loop run function" or something similar.
288
289 Just like in C<recv> AnyEvent, these functions need to be called
290 eventually so that your event loop has a chance of actually looking for
291 those events you are interested in.
292
293 For example, in a L<Gtk2> program, the above example could also be written
294 like this:
295
296 use Gtk2 -init;
297 use AnyEvent;
298
299 ############################################
300 # create a window and some label
301
302 my $window = new Gtk2::Window "toplevel";
303 $window->add (my $label = new Gtk2::Label "soon replaced by name");
304
305 $window->show_all;
306
307 ############################################
308 # do our AnyEvent stuff
309
310 $| = 1; print "enter your name> ";
311
312 my $name_ready = AnyEvent->condvar;
313
314 my $wait_for_input = AnyEvent->io (
315 fh => \*STDIN, poll => "r",
316 cb => sub {
317 # set the label
318 $label->set_text (scalar <STDIN>);
319 print "enter another name> ";
320 }
321 );
322
323 ############################################
324 # Now enter Gtk2's event loop
325
326 main Gtk2;
327
328 No condition variable anywhere in sight - instead, we just read a line
329 from STDIN and replace the text in the label. In fact, since nobody
330 C<undef>'s C<$wait_for_input> you can enter multiple lines.
331
332 Instead of waiting for a condition variable, the program enters the Gtk2
333 main loop by calling C<< Gtk2->main >>, which will block the program and
334 wait for events to arrive.
335
336 This also shows that AnyEvent is quite flexible - you didn't have anything
337 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
338 worked.
339
340 Admittedly, the example is a bit silly - who would want to read names
341 form standard input in a Gtk+ application. But imagine that instead of
342 doing that, you would make a HTTP request in the background and display
343 it's results. In fact, with event-based programming you can make many
344 http-requests in parallel in your program and still provide feedback to
345 the user and stay interactive.
346
347 In the next part you will see how to do just that - by implementing an
348 HTTP request, on our own, with the utility modules AnyEvent comes with.
349
350 Before that, however, let's briefly look at how you would write your
351 program with using only AnyEvent, without ever calling some other event
352 loop's run function.
353
354 In the example using condition variables, we used that, and in fact, this
355 is the solution:
356
357 my $quit_program = AnyEvent->condvar;
358
359 # create AnyEvent watchers (or not) here
360
361 $quit_program->recv;
362
363 If any of your watcher callbacks decide to quit, they can simply call
364 C<< $quit_program->send >>. Of course, they could also decide not to and
365 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
366 in a long-running daemon program).
367
368 In that case, you can simply use:
369
370 AnyEvent->condvar->recv;
371
372 And this is, in fact, closest to the idea of a main loop run function that
373 AnyEvent offers.
374
375 =head2 Timers and other event sources
376
377 So far, we have only used I/O watchers. These are useful mainly to find
378 out whether a Socket has data to read, or space to write more data. On sane
379 operating systems this also works for console windows/terminals (typically
380 on standard input), serial lines, all sorts of other devices, basically
381 almost everything that has a file descriptor but isn't a file itself. (As
382 usual, "sane" excludes windows - on that platform you would need different
383 functions for all of these, complicating code immensely - think "socket
384 only" on windows).
385
386 However, I/O is not everything - the second most important event source is
387 the clock. For example when doing an HTTP request you might want to time
388 out when the server doesn't answer within some predefined amount of time.
389
390 In AnyEvent, timer event watchers are created by calling the C<<
391 AnyEvent->timer >> method:
392
393 use AnyEvent;
394
395 my $cv = AnyEvent->condvar;
396
397 my $wait_one_and_a_half_seconds = AnyEvent->timer (
398 after => 1.5, # after how many seconds to invoke the cb?
399 cb => sub { # the callback to invoke
400 $cv->send;
401 },
402 );
403
404 # can do something else here
405
406 # now wait till our time has come
407 $cv->recv;
408
409 Unlike I/O watchers, timers are only interested in the amount of seconds
410 they have to wait. When that amount of time has passed, AnyEvent will
411 invoke your callback.
412
413 Unlike I/O watchers, which will call your callback as many times as there
414 is data available, timers are one-shot: after they have "fired" once and
415 invoked your callback, they are dead and no longer do anything.
416
417 To get a repeating timer, such as a timer firing roughly once per second,
418 you have to recreate it:
419
420 use AnyEvent;
421
422 my $time_watcher;
423
424 sub once_per_second {
425 print "tick\n";
426
427 # (re-)create the watcher
428 $time_watcher = AnyEvent->timer (
429 after => 1,
430 cb => \&once_per_second,
431 );
432 }
433
434 # now start the timer
435 once_per_second;
436
437 Having to recreate your timer is a restriction put on AnyEvent that is
438 present in most event libraries it uses. It is so annoying that some
439 future version might work around this limitation, but right now, it's the
440 only way to do repeating timers.
441
442 Fortunately most timers aren't really repeating but specify timeouts of
443 some sort.
444
445 =head3 More esoteric sources
446
447 AnyEvent also has some other, more esoteric event sources you can tap
448 into: signal and child watchers.
449
450 Signal watchers can be used to wait for "signal events", which simply
451 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
452
453 Process watchers wait for a child process to exit. They are useful when
454 you fork a separate process and need to know when it exits, but you do not
455 wait for that by blocking.
456
457 Both watcher types are described in detail in the main L<AnyEvent> manual
458 page.
459
460
461 =head1 Network programming and AnyEvent
462
463 So far you have seen how to register event watchers and handle events.
464
465 This is a great foundation to write network clients and servers, and might be
466 all that your module (or program) ever requires, but writing your own I/O
467 buffering again and again becomes tedious, not to mention that it attracts
468 errors.
469
470 While the core L<AnyEvent> module is still small and self-contained,
471 the distribution comes with some very useful utility modules such as
472 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
473 make your life as non-blocking network programmer a lot easier.
474
475 Here is a quick overview over these three modules:
476
477 =head2 L<AnyEvent::DNS>
478
479 This module allows fully asynchronous DNS resolution. It is used mainly by
480 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
481 a great way to do other DNS resolution tasks, such as reverse lookups of
482 IP addresses for log files.
483
484 =head2 L<AnyEvent::Handle>
485
486 This module handles non-blocking IO on file handles in an event based
487 manner. It provides a wrapper object around your file handle that provides
488 queueing and buffering of incoming and outgoing data for you.
489
490 It also implements the most common data formats, such as text lines, or
491 fixed and variable-width data blocks.
492
493 =head2 L<AnyEvent::Socket>
494
495 This module provides you with functions that handle socket creation
496 and IP address magic. The two main functions are C<tcp_connect> and
497 C<tcp_server>. The former will connect a (streaming) socket to an internet
498 host for you and the later will make a server socket for you, to accept
499 connections.
500
501 This module also comes with transparent IPv6 support, this means: If you
502 write your programs with this module, you will be IPv6 ready without doing
503 anything special.
504
505 It also works around a lot of portability quirks (especially on the
506 windows platform), which makes it even easier to write your programs in a
507 portable way (did you know that windows uses different error codes for all
508 socket functions and that Perl does not know about these? That "Unknown
509 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
510 successful? That unsuccessful TCP connects might never be reported back
511 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
512 ignored instead of being in progress? AnyEvent::Socket works around all of
513 these Windows/Perl bugs for you).
514
515 =head2 Implementing a parallel finger client with non-blocking connects
516 and AnyEvent::Socket
517
518 The finger protocol is one of the simplest protocols in use on the
519 internet. Or in use in the past, as almost nobody uses it anymore.
520
521 It works by connecting to the finger port on another host, writing a
522 single line with a user name and then reading the finger response, as
523 specified by that user. OK, RFC 1288 specifies a vastly more complex
524 protocol, but it basically boils down to this:
525
526 # telnet idsoftware.com finger
527 Trying 192.246.40.37...
528 Connected to idsoftware.com (192.246.40.37).
529 Escape character is '^]'.
530 johnc
531 Welcome to id Software's Finger Service V1.5!
532
533 [...]
534 Now on the web:
535 [...]
536
537 Connection closed by foreign host.
538
539 "Now on the web..." yeah, I<was> used indeed, but at least the finger
540 daemon still works, so let's write a little AnyEvent function that makes a
541 finger request:
542
543 use AnyEvent;
544 use AnyEvent::Socket;
545
546 sub finger($$) {
547 my ($user, $host) = @_;
548
549 # use a condvar to return results
550 my $cv = AnyEvent->condvar;
551
552 # first, connect to the host
553 tcp_connect $host, "finger", sub {
554 # the callback receives the socket handle - or nothing
555 my ($fh) = @_
556 or return $cv->send;
557
558 # now write the username
559 syswrite $fh, "$user\015\012";
560
561 my $response;
562
563 # register a read watcher
564 my $read_watcher; $read_watcher = AnyEvent->io (
565 fh => $fh,
566 poll => "r",
567 cb => sub {
568 my $len = sysread $fh, $response, 1024, length $response;
569
570 if ($len <= 0) {
571 # we are done, or an error occured, lets ignore the latter
572 undef $read_watcher; # no longer interested
573 $cv->send ($response); # send results
574 }
575 },
576 );
577 };
578
579 # pass $cv to the caller
580 $cv
581 }
582
583 That's a mouthful! Let's dissect this function a bit, first the overall
584 function and execution flow:
585
586 sub finger($$) {
587 my ($user, $host) = @_;
588
589 # use a condvar to return results
590 my $cv = AnyEvent->condvar;
591
592 # first, connect to the host
593 tcp_connect $host, "finger", sub {
594 ...
595 };
596
597 $cv
598 }
599
600 This isn't too complicated, just a function with two parameters, that
601 creates a condition variable, returns it, and while it does that,
602 initiates a TCP connect to C<$host>. The condition variable will be used
603 by the caller to receive the finger response, but one could equally well
604 pass a third argument, a callback, to the function.
605
606 Since we are programming event'ish, we do not wait for the connect to
607 finish - it could block the program for a minute or longer!
608
609 Instead, we pass the callback it should invoke when the connect is done to
610 C<tcp_connect>. If it is successful, that callback gets called with the
611 socket handle as first argument, otherwise, nothing will be passed to our
612 callback. The important point is that it will always be called as soon as
613 the outcome of the TCP connect is known.
614
615 This style of programming is also called "continuation style": the
616 "continuation" is simply the way the program continues - normally, a
617 program continues at the next line after some statement (the exception
618 is loops or things like C<return>). When we are interested in events,
619 however, we instead specify the "continuation" of our program by passing a
620 closure, which makes that closure the "continuation" of the program. The
621 C<tcp_connect> call is like saying "return now, and when the connection is
622 established or it failed, continue there".
623
624 Now let's look at the callback/closure in more detail:
625
626 # the callback receives the socket handle - or nothing
627 my ($fh) = @_
628 or return $cv->send;
629
630 The first thing the callback does is indeed save the socket handle in
631 C<$fh>. When there was an error (no arguments), then our instinct as
632 expert Perl programmers would tell us to C<die>:
633
634 my ($fh) = @_
635 or die "$host: $!";
636
637 While this would give good feedback to the user (if he happens to watch
638 standard error), our program would probably stop working here, as we never
639 report the results to anybody, certainly not the caller of our C<finger>
640 function, and most event loops continue even after a C<die>!
641
642 This is why we instead C<return>, but also call C<< $cv->send >> without
643 any arguments to signal to the condvar consumer that something bad has
644 happened. The return value of C<< $cv->send >> is irrelevant, as is the
645 return value of our callback. The return statement is simply used for the
646 side effect of, well, returning immediately from the callback. Checking
647 for errors and handling them this way is very common, which is why this
648 compact idiom is so handy.
649
650 As the next step in the finger protocol, we send the username to the
651 finger daemon on the other side of our connection:
652
653 syswrite $fh, "$user\015\012";
654
655 Note that this isn't 100% clean socket programming - the socket could,
656 for whatever reasons, not accept our data. When writing a small amount
657 of data like in this example it doesn't matter, as a socket buffer is
658 almost always big enough for a mere "username", but for real-world
659 cases you might need to implement some kind of write buffering - or use
660 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
661 next section.
662
663 What we I<do> have to do is to implement our own read buffer - the response
664 data could arrive late or in multiple chunks, and we cannot just wait for
665 it (event-based programming, you know?).
666
667 To do that, we register a read watcher on the socket which waits for data:
668
669 my $read_watcher; $read_watcher = AnyEvent->io (
670 fh => $fh,
671 poll => "r",
672
673 There is a trick here, however: the read watcher isn't stored in a global
674 variable, but in a local one - if the callback returns, it would normally
675 destroy the variable and its contents, which would in turn unregister our
676 watcher.
677
678 To avoid that, we C<undef>ine the variable in the watcher callback. This
679 means that, when the C<tcp_connect> callback returns, that perl thinks
680 (quite correctly) that the read watcher is still in use - namely in the
681 callback.
682
683 The trick, however, is that instead of:
684
685 my $read_watcher = AnyEvent->io (...
686
687 The program does:
688
689 my $read_watcher; $read_watcher = AnyEvent->io (...
690
691 The reason for this is a quirk in the way Perl works: variable names
692 declared with C<my> are only visible in the I<next> statement. If the
693 whole C<< AnyEvent->io >> call, including the callback, would be done in
694 a single statement, the callback could not refer to the C<$read_watcher>
695 variable to undefine it, so it is done in two statements.
696
697 Whether you'd want to format it like this is of course a matter of style,
698 this way emphasizes that the declaration and assignment really are one
699 logical statement.
700
701 The callback itself calls C<sysread> for as many times as necessary, until
702 C<sysread> returns either an error or end-of-file:
703
704 cb => sub {
705 my $len = sysread $fh, $response, 1024, length $response;
706
707 if ($len <= 0) {
708
709 Note that C<sysread> has the ability to append data it reads to a scalar,
710 by specifying an offset, which is what we make good use of in this
711 example.
712
713 When C<sysread> indicates we are done, the callback C<undef>ines
714 the watcher and then C<send>'s the response data to the condition
715 variable. All this has the following effects:
716
717 Undefining the watcher destroys it, as our callback was the only one still
718 having a reference to it. When the watcher gets destroyed, it destroys the
719 callback, which in turn means the C<$fh> handle is no longer used, so that
720 gets destroyed as well. The result is that all resources will be nicely
721 cleaned up by perl for us.
722
723 =head3 Using the finger client
724
725 Now, we could probably write the same finger client in a simpler way if
726 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
727 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
728
729 But the main advantage is that we can not only run this finger function in
730 the background, we even can run multiple sessions in parallel, like this:
731
732 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
733 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
734 my $f3 = finger "johnc", "idsoftware.com"; # finger john
735
736 print "trouble tickets:\n", $f1->recv, "\n";
737 print "trouble ticket #1736:\n", $f2->recv, "\n";
738 print "john carmacks finger file: ", $f3->recv, "\n";
739
740 It doesn't look like it, but in fact all three requests run in
741 parallel. The code waits for the first finger request to finish first, but
742 that doesn't keep it from executing them parallel: when the first C<recv>
743 call sees that the data isn't ready yet, it serves events for all three
744 requests automatically, until the first request has finished.
745
746 The second C<recv> call might either find the data is already there, or it
747 will continue handling events until that is the case, and so on.
748
749 By taking advantage of network latencies, which allows us to serve other
750 requests and events while we wait for an event on one socket, the overall
751 time to do these three requests will be greatly reduced, typically all
752 three are done in the same time as the slowest of them would need to finish.
753
754 By the way, you do not actually have to wait in the C<recv> method on an
755 AnyEvent condition variable - after all, waiting is evil - you can also
756 register a callback:
757
758 $cv->cb (sub {
759 my $response = shift->recv;
760 # ...
761 });
762
763 The callback will only be invoked when C<send> was called. In fact,
764 instead of returning a condition variable you could also pass a third
765 parameter to your finger function, the callback to invoke with the
766 response:
767
768 sub finger($$$) {
769 my ($user, $host, $cb) = @_;
770
771 How you implement it is a matter of taste - if you expect your function to
772 be used mainly in an event-based program you would normally prefer to pass
773 a callback directly. If you write a module and expect your users to use
774 it "synchronously" often (for example, a simple http-get script would not
775 really care much for events), then you would use a condition variable and
776 tell them "simply ->recv the data".
777
778 =head3 Problems with the implementation and how to fix them
779
780 To make this example more real-world-ready, we would not only implement
781 some write buffering (for the paranoid), but we would also have to handle
782 timeouts and maybe protocol errors.
783
784 Doing this quickly gets unwieldy, which is why we introduce
785 L<AnyEvent::Handle> in the next section, which takes care of all these
786 details for you and let's you concentrate on the actual protocol.
787
788
789 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
790
791 The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
792 see what it really offers.
793
794 As finger is such a simple protocol, let's try something slightly more
795 complicated: HTTP/1.0.
796
797 An HTTP GET request works by sending a single request line that indicates
798 what you want the server to do and the URI you want to act it on, followed
799 by as many "header" lines (C<Header: data>, same as e-mail headers) as
800 required for the request, ended by an empty line.
801
802 The response is formatted very similarly, first a line with the response
803 status, then again as many header lines as required, then an empty line,
804 followed by any data that the server might send.
805
806 Again, let's try it out with C<telnet> (I condensed the output a bit - if
807 you want to see the full response, do it yourself).
808
809 # telnet www.google.com 80
810 Trying 209.85.135.99...
811 Connected to www.google.com (209.85.135.99).
812 Escape character is '^]'.
813 GET /test HTTP/1.0
814
815 HTTP/1.0 404 Not Found
816 Date: Mon, 02 Jun 2008 07:05:54 GMT
817 Content-Type: text/html; charset=UTF-8
818
819 <html><head>
820 [...]
821 Connection closed by foreign host.
822
823 The C<GET ...> and the empty line were entered manually, the rest of the
824 telnet output is google's response, in which case a C<404 not found> one.
825
826 So, here is how you would do it with C<AnyEvent::Handle>:
827
828 sub http_get {
829 my ($host, $uri, $cb) = @_;
830
831 tcp_connect $host, "http", sub {
832 my ($fh) = @_
833 or $cb->("HTTP/1.0 500 $!");
834
835 # store results here
836 my ($response, $header, $body);
837
838 my $handle; $handle = new AnyEvent::Handle
839 fh => $fh,
840 on_error => sub {
841 undef $handle;
842 $cb->("HTTP/1.0 500 $!");
843 },
844 on_eof => sub {
845 undef $handle; # keep it alive till eof
846 $cb->($response, $header, $body);
847 };
848
849 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
850
851 # now fetch response status line
852 $handle->push_read (line => sub {
853 my ($handle, $line) = @_;
854 $response = $line;
855 });
856
857 # then the headers
858 $handle->push_read (line => "\015\012\015\012", sub {
859 my ($handle, $line) = @_;
860 $header = $line;
861 });
862
863 # and finally handle any remaining data as body
864 $handle->on_read (sub {
865 $body .= $_[0]->rbuf;
866 $_[0]->rbuf = "";
867 });
868 };
869 }
870
871 And now let's go through it step by step. First, as usual, the overall
872 C<http_get> function structure:
873
874 sub http_get {
875 my ($host, $uri, $cb) = @_;
876
877 tcp_connect $host, "http", sub {
878 ...
879 };
880 }
881
882 Unlike in the finger example, this time the caller has to pass a callback
883 to C<http_get>. Also, instead of passing a URL as one would expect, the
884 caller has to provide the hostname and URI - normally you would use the
885 C<URI> module to parse a URL and separate it into those parts, but that is
886 left to the inspired reader :)
887
888 Since everything else is left to the caller, all C<http_get> does it to
889 initiate the connection with C<tcp_connect> and leave everything else to
890 it's callback.
891
892 The first thing the callback does is check for connection errors and
893 declare some variables:
894
895 my ($fh) = @_
896 or $cb->("HTTP/1.0 500 $!");
897
898 my ($response, $header, $body);
899
900 Instead of having an extra mechanism to signal errors, connection errors
901 are signalled by crafting a special "response status line", like this:
902
903 HTTP/1.0 500 Connection refused
904
905 This means the caller cannot distinguish (easily) between
906 locally-generated errors and server errors, but it simplifies error
907 handling for the caller a lot.
908
909 The next step finally involves L<AnyEvent::Handle>, namely it creates the
910 handle object:
911
912 my $handle; $handle = new AnyEvent::Handle
913 fh => $fh,
914 on_error => sub {
915 undef $handle;
916 $cb->("HTTP/1.0 500 $!");
917 },
918 on_eof => sub {
919 undef $handle; # keep it alive till eof
920 $cb->($response, $header, $body);
921 };
922
923 The constructor expects a file handle, which gets passed via the C<fh>
924 argument.
925
926 The remaining two argument pairs specify two callbacks to be called on
927 any errors (C<on_error>) and in the case of a normal connection close
928 (C<on_eof>).
929
930 In the first case, we C<undef>ine the handle object and pass the error to
931 the callback provided by the callback - done.
932
933 In the second case we assume everything went fine and pass the results
934 gobbled up so far to the caller-provided callback. This is not quite
935 perfect, as when the server "cleanly" closes the connection in the middle
936 of sending headers we might wrongly report this as an "OK" to the caller,
937 but then, HTTP doesn't support a perfect mechanism that would detect such
938 problems in all cases, so we don't bother either.
939
940 =head3 The write queue
941
942 The next line sends the actual request:
943
944 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
945
946 No headers will be sent (this is fine for simple requests), so the whole
947 request is just a single line followed by an empty line to signal the end
948 of the headers to the server.
949
950 The more interesting question is why the method is called C<push_write>
951 and not just write. The reason is that you can I<always> add some write
952 data without blocking, and to do this, AnyEvent::Handle needs some write
953 queue internally - and C<push_write> simply pushes some data at the end of
954 that queue, just like Perl's C<push> pushes data at the end of an array.
955
956 The deeper reason is that at some point in the future, there might
957 be C<unshift_write> as well, and in any case, we will shortly meet
958 C<push_read> and C<unshift_read>, and it's usually easiest if all those
959 functions have some symmetry in their name.
960
961 If C<push_write> is called with more than one argument, then you can even
962 do I<formatted> I/O, which simply means your data will be transformed in
963 some ways. For example, this would JSON-encode your data before pushing it
964 to the write queue:
965
966 $handle->push_write (json => [1, 2, 3]);
967
968 Apart from that, this pretty much summarises the write queue, there is
969 little else to it.
970
971 Reading the response if far more interesting:
972
973 =head3 The read queue
974
975 The response consists of three parts: a single line of response status, a
976 single paragraph of headers ended by an empty line, and the request body,
977 which is simply the remaining data on that connection.
978
979 For the first two, we push two read requests onto the read queue:
980
981 # now fetch response status line
982 $handle->push_read (line => sub {
983 my ($handle, $line) = @_;
984 $response = $line;
985 });
986
987 # then the headers
988 $handle->push_read (line => "\015\012\015\012", sub {
989 my ($handle, $line) = @_;
990 $header = $line;
991 });
992
993 While one can simply push a single callback to the queue, I<formatted> I/O
994 really comes to out advantage here, as there is a ready-made "read line"
995 read type. The first read expects a single line, ended by C<\015\012> (the
996 standard end-of-line marker in internet protocols).
997
998 The second "line" is actually a single paragraph - instead of reading it
999 line by line we tell C<push_read> that the end-of-line marker is really
1000 C<\015\012\015\012>, which is an empty line. The result is that the whole
1001 header paragraph will be treated as a single line and read. The word
1002 "line" is interpreted very freely, much like Perl itself does it.
1003
1004 Note that push read requests are pushed immediately after creating the
1005 handle object - since AnyEvent::Handle provides a queue we can push as
1006 many requests as we want, and AnyEvent::Handle will handle them in order.
1007
1008 There is, however, no read type for "the remaining data". For that, we
1009 install our own C<on_read> callback:
1010
1011 # and finally handle any remaining data as body
1012 $handle->on_read (sub {
1013 $body .= $_[0]->rbuf;
1014 $_[0]->rbuf = "";
1015 });
1016
1017 This callback is invoked every time data arrives and the read queue is
1018 empty - which in this example will only be the case when both response and
1019 header have been read. The C<on_read> callback could actually have been
1020 specified when constructing the object, but doing it this way preserves
1021 logical ordering.
1022
1023 The read callback simply adds the current read buffer to it's C<$body>
1024 variable and, most importantly, I<empties> it by assign the empty string
1025 to it.
1026
1027 After AnyEvent::Handle has been so instructed, it will now handle incoming
1028 data according to these instructions - if all goes well, the callback will
1029 be invoked with the response data, if not, it will get an error.
1030
1031 In general, you get pipelining very easy with AnyEvent::Handle: If
1032 you have a protocol with a request/response structure, your request
1033 methods/functions will all look like this (simplified):
1034
1035 sub request {
1036
1037 # send the request to the server
1038 $handle->push_write (...);
1039
1040 # push some response handlers
1041 $handle->push_read (...);
1042 }
1043
1044 =head3 Using it
1045
1046 And here is how you would use it:
1047
1048 http_get "www.google.com", "/", sub {
1049 my ($response, $header, $body) = @_;
1050
1051 print
1052 $response, "\n",
1053 $body;
1054 };
1055
1056 And of course, you can run as many of these requests in parallel as you
1057 want (and your memory supports).
1058
1059 =head3 HTTPS
1060
1061 Now, as promised, let's implement the same thing for HTTPS, or more
1062 correctly, let's change our C<http_get> function into a function that
1063 speaks HTTPS instead.
1064
1065 HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1066 B<S>ecurity is the official name for what most people refer to as C<SSL>)
1067 that contains standard HTTP protocol exchanges. The other difference to
1068 HTTP is that it uses port C<443> instead of port C<80>.
1069
1070 To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call
1071 we replace C<http> by C<https>):
1072
1073 tcp_connect $host, "https", sub { ...
1074
1075 The other change deals with TLS, which is something L<AnyEvent::Handle>
1076 does for us, as long as I<you> made sure that the L<Net::SSLeay> module is
1077 around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition
1078 C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1079
1080 tls => "connect",
1081
1082 Specifying C<tls> enables TLS, and the argument specifies whether
1083 AnyEvent::Handle is the server side ("accept") or the client side
1084 ("connect") for the TLS connection, as unlike TCP, there is a clear
1085 server/client relationship in TLS.
1086
1087 That's all.
1088
1089 Of course, all this should be handled transparently by C<http_get> after
1090 parsing the URL. See the part about exercising your inspiration earlier in
1091 this document.
1092
1093 =head3 The read queue - revisited
1094
1095 HTTP always uses the same structure in its responses, but many protocols
1096 require parsing responses different depending on the response itself.
1097
1098 For example, in SMTP, you normally get a single response line:
1099
1100 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1101
1102 But SMTP also supports multi-line responses:
1103
1104 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1105 220-hey guys
1106 220 my response is longer than yours
1107
1108 To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1109 C<unshift_read> will not append your read request tot he end of the read
1110 queue, but instead it will prepend it to the queue.
1111
1112 This is useful for this this situation: You push your response-line read
1113 request when sending the SMTP command, and when handling it, you look at
1114 the line to see if more is to come, and C<unshift_read> another reader,
1115 like this:
1116
1117 my $response; # response lines end up in here
1118
1119 my $read_response; $read_response = sub {
1120 my ($handle, $line) = @_;
1121
1122 $response .= "$line\n";
1123
1124 # check for continuation lines ("-" as 4th character")
1125 if ($line =~ /^...-/) {
1126 # if yes, then unshift another line read
1127 $handle->unshift_read (line => $read_response);
1128
1129 } else {
1130 # otherwise we are done
1131
1132 # free callback
1133 undef $read_response;
1134
1135 print "we are don reading: $response\n";
1136 }
1137 };
1138
1139 $handle->push_read (line => $read_response);
1140
1141 This recipe can be used for all similar parsing problems, for example in
1142 NNTP, the response code to some commands indicates that more data will be
1143 sent:
1144
1145 $handle->push_write ("article 42");
1146
1147 # read response line
1148 $handle->push_read (line => sub {
1149 my ($handle, $status) = @_;
1150
1151 # article data following?
1152 if ($status =~ /^2/) {
1153 # yes, read article body
1154
1155 $handle->unshift_read (line => "\012.\015\012", sub {
1156 my ($handle, $body) = @_;
1157
1158 $finish->($status, $body);
1159 });
1160
1161 } else {
1162 # some error occured, no article data
1163
1164 $finish->($status);
1165 }
1166 }
1167
1168 =head3 Your own read queue handler
1169
1170 Sometimes, your protocol doesn't play nice and uses lines or chunks of
1171 data, in which case you have to implement your own read parser.
1172
1173 To make up a contorted example, imagine you are looking for an even
1174 number of characters followed by a colon (":"). Also imagine that
1175 AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1176 had to do it manually.
1177
1178 To implement this, you would C<push_read> (or C<unshift_read>) just a
1179 single code reference.
1180
1181 This code reference will then be called each time there is (new) data
1182 available in the read buffer, and is expected to either eat/consume some
1183 of that data (and return true) or to return false to indicate that it
1184 wants to be called again.
1185
1186 If the code reference returns true, then it will be removed from the read
1187 queue, otherwise it stays in front of it.
1188
1189 The example above could be coded like this:
1190
1191 $handle->push_read (sub {
1192 my ($handle) = @_;
1193
1194 # check for even number of characters + ":"
1195 # and remove the data if a match is found.
1196 # if not, return false (actually nothing)
1197
1198 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1199 or return;
1200
1201 # we got some data in $1, pass it to whoever wants it
1202 $finish->($1);
1203
1204 # and return true to indicate we are done
1205 1
1206 });
1207
1208
1209 =head1 Authors
1210
1211 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1212