ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.16
Committed: Mon Jun 2 11:03:40 2008 UTC (16 years ago) by root
Branch: MAIN
Changes since 1.15: +1 -0 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 Introduction to AnyEvent
2
3 This is a tutorial that will introduce you to the features of AnyEvent.
4
5 The first part introduces the core AnyEvent module (after swamping you a
6 bit in evangelism), which might already provide all you ever need.
7
8 The second part focuses on network programming using sockets, for which
9 AnyEvent offers a lot of support you can use.
10
11
12 =head1 What is AnyEvent?
13
14 If you don't care for the whys and want to see code, skip this section!
15
16 AnyEvent is first of all just a framework to do event-based
17 programming. Typically such frameworks are an all-or-nothing thing: If you
18 use one such framework, you can't (easily, or even at all) use another in
19 the same program.
20
21 AnyEvent is different - it is a thin abstraction layer above all kinds
22 of event loops. Its main purpose is to move the choice of the underlying
23 framework (the event loop) from the module author to the program author
24 using the module.
25
26 That means you can write code that uses events to control what it
27 does, without forcing other code in the same program to use the same
28 underlying framework as you do - i.e. you can create a Perl module
29 that is event-based using AnyEvent, and users of that module can still
30 choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
31 all: AnyEvent comes with its own event loop implementation, so your
32 code works regardless of other modules that might or might not be
33 installed. The latter is important, as AnyEvent does not have any
34 dependencies to other modules, which makes it easy to install, for
35 example, when you lack a C compiler.
36
37 A typical problem with Perl modules such as L<Net::IRC> is that they
38 come with their own event loop: In L<Net::IRC>, the program who uses it
39 needs to start the event loop of L<Net::IRC>. That means that one cannot
40 integrate this module into a L<Gtk2> GUI for instance, as that module,
41 too, enforces the use of its own event loop (namely L<Glib>).
42
43 Another example is L<LWP>: it provides no event interface at all. It's a
44 pure blocking HTTP (and FTP etc.) client library, which usually means that
45 you either have to start a thread or have to fork for a HTTP request, or
46 use L<Coro::LWP>, if you want to do something else while waiting for the
47 request to finish.
48
49 The motivation behind these designs is often that a module doesn't want to
50 depend on some complicated XS-module (Net::IRC), or that it doesn't want
51 to force the user to use some specific event loop at all (LWP).
52
53 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
54
55 =over 4
56
57 =item - write their own event loop (because guarantees to offer one
58 everywhere - even on windows).
59
60 =item - choose one fixed event loop (because AnyEvent works with all
61 important event loops available for Perl, and adding others is trivial).
62
63 =back
64
65 If the module author uses L<AnyEvent> for all his event needs (IO events,
66 timers, signals, ...) then all other modules can just use his module and
67 don't have to choose an event loop or adapt to his event loop. The choice
68 of the event loop is ultimately made by the program author who uses all
69 the modules and writes the main program. And even there he doesn't have to
70 choose, he can just let L<AnyEvent> choose the best available event loop
71 for him.
72
73 Read more about this in the main documentation of the L<AnyEvent> module.
74
75
76 =head1 Introduction to Event-Based Programming
77
78 So what exactly is programming using events? It quite simply means that
79 instead of your code actively waiting for something, such as the user
80 entering something on STDIN:
81
82 $| = 1; print "enter your name> ";
83
84 my $name = <STDIN>;
85
86 You instead tell your event framework to notify you in the event of some
87 data being available on STDIN, by using a callback mechanism:
88
89 use AnyEvent;
90
91 $| = 1; print "enter your name> ";
92
93 my $name;
94
95 my $wait_for_input = AnyEvent->io (
96 fh => \*STDIN, # which file handle to check
97 poll => "r", # which event to wait for ("r"ead data)
98 cb => sub { # what callback to execute
99 $name = <STDIN>; # read it
100 }
101 );
102
103 # do something else here
104
105 Looks more complicated, and surely is, but the advantage of using events
106 is that your program can do something else instead of waiting for
107 input. Waiting as in the first example is also called "blocking" because
108 you "block" your process from executing anything else while you do so.
109
110 The second example avoids blocking, by only registering interest in a read
111 event, which is fast and doesn't block your process. Only when read data
112 is available will the callback be called, which can then proceed to read
113 the data.
114
115 The "interest" is represented by an object returned by C<< AnyEvent->io
116 >> called a "watcher" object - called like that because it "watches" your
117 file handle (or other event sources) for the event you are interested in.
118
119 In the example above, we create an I/O watcher by calling the C<<
120 AnyEvent->io >> method. Disinterest in some event is simply expressed by
121 forgetting about the watcher, for example, by C<undef>'ing the variable it
122 is stored in. AnyEvent will automatically clean up the watcher if it is no
123 longer used, much like Perl closes your file handles if you no longer use
124 them anywhere.
125
126 =head2 Condition Variables
127
128 However, the above is not a fully working program, and will not work
129 as-is. The reason is that your callback will not be invoked out of the
130 blue, you have to run the event loop. Also, event-based programs sometimes
131 have to block, too, as when there simply is nothing else to do and
132 everything waits for some events, it needs to block the process as well.
133
134 In AnyEvent, this is done using condition variables. Condition variables
135 are named "condition variables" because they represent a condition that is
136 initially false and needs to be fulfilled.
137
138 You can also call them "merge points", "sync points", "rendezvous ports"
139 or even callbacks and many other things (and they are often called like
140 this in other frameworks). The important point is that you can create them
141 freely and later wait for them to become true.
142
143 Condition variables have two sides - one side is the "producer" of the
144 condition (whatever code detects the condition), the other side is the
145 "consumer" (the code that waits for that condition).
146
147 In our example in the previous section, the producer is the event callback
148 and there is no consumer yet - let's change that now:
149
150 use AnyEvent;
151
152 $| = 1; print "enter your name> ";
153
154 my $name;
155
156 my $name_ready = AnyEvent->condvar;
157
158 my $wait_for_input = AnyEvent->io (
159 fh => \*STDIN,
160 poll => "r",
161 cb => sub {
162 $name = <STDIN>;
163 $name_ready->send;
164 }
165 );
166
167 # do something else here
168
169 # now wait until the name is available:
170 $name_ready->recv;
171
172 undef $wait_for_input; # watche rno longer needed
173
174 print "your name is $name\n";
175
176 This program creates an AnyEvent condvar by calling the C<<
177 AnyEvent->condvar >> method. It then creates a watcher as usual, but
178 inside the callback it C<send>'s the C<$name_ready> condition variable,
179 which causes anybody waiting on it to continue.
180
181 The "anybody" in this case is the code that follows, which calls C<<
182 $name_ready->recv >>: The producer calls C<send>, the consumer calls
183 C<recv>.
184
185 If there is no C<$name> available yet, then the call to C<<
186 $name_ready->recv >> will halt your program until the condition becomes
187 true.
188
189 As the names C<send> and C<recv> imply, you can actually send and receive
190 data using this, for example, the above code could also be written like
191 this, without an extra variable to store the name in:
192
193 use AnyEvent;
194
195 $| = 1; print "enter your name> ";
196
197 my $name_ready = AnyEvent->condvar;
198
199 my $wait_for_input = AnyEvent->io (
200 fh => \*STDIN, poll => "r",
201 cb => sub { $name_ready->send (scalar = <STDIN>) }
202 );
203
204 # do something else here
205
206 # now wait and fetch the name
207 my $name = $name_ready->recv;
208
209 undef $wait_for_input; # watche rno longer needed
210
211 print "your name is $name\n";
212
213 You can pass any number of arguments to C<send>, and everybody call to
214 C<recv> will return them.
215
216 =head2 The "main loop"
217
218 Most event-based frameworks have something called a "main loop" or "event
219 loop run function" or something similar.
220
221 Just like in C<recv> AnyEvent, these functions need to be called
222 eventually so that your event loop has a chance of actually looking for
223 those events you are interested in.
224
225 For example, in a L<Gtk2> program, the above example could also be written
226 like this:
227
228 use Gtk2 -init;
229 use AnyEvent;
230
231 ############################################
232 # create a window and some label
233
234 my $window = new Gtk2::Window "toplevel";
235 $window->add (my $label = new Gtk2::Label "soon replaced by name");
236
237 $window->show_all;
238
239 ############################################
240 # do our AnyEvent stuff
241
242 $| = 1; print "enter your name> ";
243
244 my $name_ready = AnyEvent->condvar;
245
246 my $wait_for_input = AnyEvent->io (
247 fh => \*STDIN, poll => "r",
248 cb => sub {
249 # set the label
250 $label->set_text (scalar <STDIN>);
251 print "enter another name> ";
252 }
253 );
254
255 ############################################
256 # Now enter Gtk2's event loop
257
258 main Gtk2;
259
260 No condition variable anywhere in sight - instead, we just read a line
261 from STDIN and replace the text in the label. In fact, since nobody
262 C<undef>'s C<$wait_for_input> you can enter multiple lines.
263
264 Instead of waiting for a condition variable, the program enters the Gtk2
265 main loop by calling C<< Gtk2->main >>, which will block the program and
266 wait for events to arrive.
267
268 This also shows that AnyEvent is quite flexible - you didn't have anything
269 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
270 worked.
271
272 Admittedly, the example is a bit silly - who would want to read names
273 form standard input in a Gtk+ application. But imagine that instead of
274 doing that, you would make a HTTP request in the background and display
275 it's results. In fact, with event-based programming you can make many
276 http-requests in parallel in your program and still provide feedback to
277 the user and stay interactive.
278
279 In the next part you will see how to do just that - by implementing an
280 HTTP request, on our own, with the utility modules AnyEvent comes with.
281
282 Before that, however, let's briefly look at how you would write your
283 program with using only AnyEvent, without ever calling some other event
284 loop's run function.
285
286 In the example using condition variables, we used that, and in fact, this
287 is the solution:
288
289 my $quit_program = AnyEvent->condvar;
290
291 # create AnyEvent watchers (or not) here
292
293 $quit_program->recv;
294
295 If any of your watcher callbacks decide to quit, they can simply call
296 C<< $quit_program->send >>. Of course, they could also decide not to and
297 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
298 in a long-running daemon program).
299
300 In that case, you can simply use:
301
302 AnyEvent->condvar->recv;
303
304 And this is, in fact, closest to the idea of a main loop run function that
305 AnyEvent offers.
306
307 =head2 Timers and other event sources
308
309 So far, we have only used I/O watchers. These are useful mainly to find
310 out whether a Socket has data to read, or space to write more data. On sane
311 operating systems this also works for console windows/terminals (typically
312 on standard input), serial lines, all sorts of other devices, basically
313 almost everything that has a file descriptor but isn't a file itself. (As
314 usual, "sane" excludes windows - on that platform you would need different
315 functions for all of these, complicating code immensely - think "socket
316 only" on windows).
317
318 However, I/O is not everything - the second most important event source is
319 the clock. For example when doing an HTTP request you might want to time
320 out when the server doesn't answer within some predefined amount of time.
321
322 In AnyEvent, timer event watchers are created by calling the C<<
323 AnyEvent->timer >> method:
324
325 use AnyEvent;
326
327 my $cv = AnyEvent->condvar;
328
329 my $wait_one_and_a_half_seconds = AnyEvent->timer (
330 after => 1.5, # after how many seconds to invoke the cb?
331 cb => sub { # the callback to invoke
332 $cv->send;
333 },
334 );
335
336 # can do something else here
337
338 # now wait till our time has come
339 $cv->recv;
340
341 Unlike I/O watchers, timers are only interested in the amount of seconds
342 they have to wait. When that amount of time has passed, AnyEvent will
343 invoke your callback.
344
345 Unlike I/O watchers, which will call your callback as many times as there
346 is data available, timers are one-shot: after they have "fired" once and
347 invoked your callback, they are dead and no longer do anything.
348
349 To get a repeating timer, such as a timer firing roughly once per second,
350 you have to recreate it:
351
352 use AnyEvent;
353
354 my $time_watcher;
355
356 sub once_per_second {
357 print "tick\n";
358
359 # (re-)create the watcher
360 $time_watcher = AnyEvent->timer (
361 after => 1,
362 cb => \&once_per_second,
363 );
364 }
365
366 # now start the timer
367 once_per_second;
368
369 Having to recreate your timer is a restriction put on AnyEvent that is
370 present in most event libraries it uses. It is so annoying that some
371 future version might work around this limitation, but right now, it's the
372 only way to do repeating timers.
373
374 Fortunately most timers aren't really repeating but specify timeouts of
375 some sort.
376
377 =head3 More esoteric sources
378
379 AnyEvent also has some other, more esoteric event sources you can tap
380 into: signal and child watchers.
381
382 Signal watchers can be used to wait for "signal events", which simply
383 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
384
385 Process watchers wait for a child process to exit. They are useful when
386 you fork a separate process and need to know when it exits, but you do not
387 wait for that by blocking.
388
389 Both watcher types are described in detail in the main L<AnyEvent> manual
390 page.
391
392
393 =head1 Network programming and AnyEvent
394
395 So far you have seen how to register event watchers and handle events.
396
397 This is a great foundation to write network clients and servers, and might be
398 all that your module (or program) ever requires, but writing your own I/O
399 buffering again and again becomes tedious, not to mention that it attracts
400 errors.
401
402 While the core L<AnyEvent> module is still small and self-contained,
403 the distribution comes with some very useful utility modules such as
404 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
405 make your life as non-blocking network programmer a lot easier.
406
407 Here is a quick overview over these three modules:
408
409 =head2 L<AnyEvent::DNS>
410
411 This module allows fully asynchronous DNS resolution. It is used mainly by
412 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
413 a great way to do other DNS resolution tasks, such as reverse lookups of
414 IP addresses for log files.
415
416 =head2 L<AnyEvent::Handle>
417
418 This module handles non-blocking IO on file handles in an event based
419 manner. It provides a wrapper object around your file handle that provides
420 queueing and buffering of incoming and outgoing data for you.
421
422 It also implements the most common data formats, such as text lines, or
423 fixed and variable-width data blocks.
424
425 =head2 L<AnyEvent::Socket>
426
427 This module provides you with functions that handle socket creation
428 and IP address magic. The two main functions are C<tcp_connect> and
429 C<tcp_server>. The former will connect a (streaming) socket to an internet
430 host for you and the later will make a server socket for you, to accept
431 connections.
432
433 This module also comes with transparent IPv6 support, this means: If you
434 write your programs with this module, you will be IPv6 ready without doing
435 anything special.
436
437 It also works around a lot of portability quirks (especially on the
438 windows platform), which makes it even easier to write your programs in a
439 portable way (did you know that windows uses different error codes for all
440 socket functions and that Perl does not know about these? That "Unknown
441 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
442 successful? That unsuccessful TCP connects might never be reported back
443 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
444 ignored instead of being in progress? AnyEvent::Socket works around all of
445 these Windows/Perl bugs for you).
446
447 =head2 Implementing a parallel finger client with non-blocking connects
448 and AnyEvent::Socket
449
450 The finger protocol is one of the simplest protocols in use on the
451 internet. Or in use in the past, as almost nobody uses it anymore.
452
453 It works by connecting to the finger port on another host, writing a
454 single line with a user name and then reading the finger response, as
455 specified by that user. OK, RFC 1288 specifies a vastly more complex
456 protocol, but it basically boils down to this:
457
458 # telnet idsoftware.com finger
459 Trying 192.246.40.37...
460 Connected to idsoftware.com (192.246.40.37).
461 Escape character is '^]'.
462 johnc
463 Welcome to id Software's Finger Service V1.5!
464
465 [...]
466 Now on the web:
467 [...]
468
469 Connection closed by foreign host.
470
471 "Now on the web..." yeah, I<was> used indeed, but at least the finger
472 daemon still works, so let's write a little AnyEvent function that makes a
473 finger request:
474
475 use AnyEvent;
476 use AnyEvent::Socket;
477
478 sub finger($$) {
479 my ($user, $host) = @_;
480
481 # use a condvar to return results
482 my $cv = AnyEvent->condvar;
483
484 # first, connect to the host
485 tcp_connect $host, "finger", sub {
486 # the callback receives the socket handle - or nothing
487 my ($fh) = @_
488 or return $cv->send;
489
490 # now write the username
491 syswrite $fh, "$user\015\012";
492
493 my $response;
494
495 # register a read watcher
496 my $read_watcher; $read_watcher = AnyEvent->io (
497 fh => $fh,
498 poll => "r",
499 cb => sub {
500 my $len = sysread $fh, $response, 1024, length $response;
501
502 if ($len <= 0) {
503 # we are done, or an error occured, lets ignore the latter
504 undef $read_watcher; # no longer interested
505 $cv->send ($response); # send results
506 }
507 },
508 );
509 };
510
511 # pass $cv to the caller
512 $cv
513 }
514
515 That's a mouthful! Let's dissect this function a bit, first the overall
516 function and execution flow:
517
518 sub finger($$) {
519 my ($user, $host) = @_;
520
521 # use a condvar to return results
522 my $cv = AnyEvent->condvar;
523
524 # first, connect to the host
525 tcp_connect $host, "finger", sub {
526 ...
527 };
528
529 $cv
530 }
531
532 This isn't too complicated, just a function with two parameters, that
533 creates a condition variable, returns it, and while it does that,
534 initiates a TCP connect to C<$host>. The condition variable will be used
535 by the caller to receive the finger response, but one could equally well
536 pass a third argument, a callback, to the function.
537
538 Since we are programming event'ish, we do not wait for the connect to
539 finish - it could block the program for a minute or longer!
540
541 Instead, we pass the callback it should invoke when the connect is done to
542 C<tcp_connect>. If it is successful, that callback gets called with the
543 socket handle as first argument, otherwise, nothing will be passed to our
544 callback. The important point is that it will always be called as soon as
545 the outcome of the TCP connect is known.
546
547 This style of programming is also called "continuation style": the
548 "continuation" is simply the way the program continues - normally, a
549 program continues at the next line after some statement (the exception
550 is loops or things like C<return>). When we are interested in events,
551 however, we instead specify the "continuation" of our program by passing a
552 closure, which makes that closure the "continuation" of the program. The
553 C<tcp_connect> call is like saying "return now, and when the connection is
554 established or it failed, continue there".
555
556 Now let's look at the callback/closure in more detail:
557
558 # the callback receives the socket handle - or nothing
559 my ($fh) = @_
560 or return $cv->send;
561
562 The first thing the callback does is indeed save the socket handle in
563 C<$fh>. When there was an error (no arguments), then our instinct as
564 expert Perl programmers would tell us to C<die>:
565
566 my ($fh) = @_
567 or die "$host: $!";
568
569 While this would give good feedback to the user (if he happens to watch
570 standard error), our program would probably stop working here, as we never
571 report the results to anybody, certainly not the caller of our C<finger>
572 function, and most event loops continue even after a C<die>!
573
574 This is why we instead C<return>, but also call C<< $cv->send >> without
575 any arguments to signal to the condvar consumer that something bad has
576 happened. The return value of C<< $cv->send >> is irrelevant, as is the
577 return value of our callback. The return statement is simply used for the
578 side effect of, well, returning immediately from the callback. Checking
579 for errors and handling them this way is very common, which is why this
580 compact idiom is so handy.
581
582 As the next step in the finger protocol, we send the username to the
583 finger daemon on the other side of our connection:
584
585 syswrite $fh, "$user\015\012";
586
587 Note that this isn't 100% clean socket programming - the socket could,
588 for whatever reasons, not accept our data. When writing a small amount
589 of data like in this example it doesn't matter, as a socket buffer is
590 almost always big enough for a mere "username", but for real-world
591 cases you might need to implement some kind of write buffering - or use
592 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
593 next section.
594
595 What we I<do> have to do is to implement our own read buffer - the response
596 data could arrive late or in multiple chunks, and we cannot just wait for
597 it (event-based programming, you know?).
598
599 To do that, we register a read watcher on the socket which waits for data:
600
601 my $read_watcher; $read_watcher = AnyEvent->io (
602 fh => $fh,
603 poll => "r",
604
605 There is a trick here, however: the read watcher isn't stored in a global
606 variable, but in a local one - if the callback returns, it would normally
607 destroy the variable and its contents, which would in turn unregister our
608 watcher.
609
610 To avoid that, we C<undef>ine the variable in the watcher callback. This
611 means that, when the C<tcp_connect> callback returns, that perl thinks
612 (quite correctly) that the read watcher is still in use - namely in the
613 callback.
614
615 The trick, however, is that instead of:
616
617 my $read_watcher = AnyEvent->io (...
618
619 The program does:
620
621 my $read_watcher; $read_watcher = AnyEvent->io (...
622
623 The reason for this is a quirk in the way Perl works: variable names
624 declared with C<my> are only visible in the I<next> statement. If the
625 whole C<< AnyEvent->io >> call, including the callback, would be done in
626 a single statement, the callback could not refer to the C<$read_watcher>
627 variable to undefine it, so it is done in two statements.
628
629 Whether you'd want to format it like this is of course a matter of style,
630 this way emphasizes that the declaration and assignment really are one
631 logical statement.
632
633 The callback itself calls C<sysread> for as many times as necessary, until
634 C<sysread> returns either an error or end-of-file:
635
636 cb => sub {
637 my $len = sysread $fh, $response, 1024, length $response;
638
639 if ($len <= 0) {
640
641 Note that C<sysread> has the ability to append data it reads to a scalar,
642 by specifying an offset, which is what we make good use of in this
643 example.
644
645 When C<sysread> indicates we are done, the callback C<undef>ines
646 the watcher and then C<send>'s the response data to the condition
647 variable. All this has the following effects:
648
649 Undefining the watcher destroys it, as our callback was the only one still
650 having a reference to it. When the watcher gets destroyed, it destroys the
651 callback, which in turn means the C<$fh> handle is no longer used, so that
652 gets destroyed as well. The result is that all resources will be nicely
653 cleaned up by perl for us.
654
655 =head3 Using the finger client
656
657 Now, we could probably write the same finger client in a simpler way if
658 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
659 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
660
661 But the main advantage is that we can not only run this finger function in
662 the background, we even can run multiple sessions in parallel, like this:
663
664 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
665 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
666 my $f3 = finger "johnc", "idsoftware.com"; # finger john
667
668 print "trouble tickets:\n", $f1->recv, "\n";
669 print "trouble ticket #1736:\n", $f2->recv, "\n";
670 print "john carmacks finger file: ", $f3->recv, "\n";
671
672 It doesn't look like it, but in fact all three requests run in
673 parallel. The code waits for the first finger request to finish first, but
674 that doesn't keep it from executing them parallel: when the first C<recv>
675 call sees that the data isn't ready yet, it serves events for all three
676 requests automatically, until the first request has finished.
677
678 The second C<recv> call might either find the data is already there, or it
679 will continue handling events until that is the case, and so on.
680
681 By taking advantage of network latencies, which allows us to serve other
682 requests and events while we wait for an event on one socket, the overall
683 time to do these three requests will be greatly reduced, typically all
684 three are done in the same time as the slowest of them would need to finish.
685
686 By the way, you do not actually have to wait in the C<recv> method on an
687 AnyEvent condition variable - after all, waiting is evil - you can also
688 register a callback:
689
690 $cv->cb (sub {
691 my $response = shift->recv;
692 # ...
693 });
694
695 The callback will only be invoked when C<send> was called. In fact,
696 instead of returning a condition variable you could also pass a third
697 parameter to your finger function, the callback to invoke with the
698 response:
699
700 sub finger($$$) {
701 my ($user, $host, $cb) = @_;
702
703 How you implement it is a matter of taste - if you expect your function to
704 be used mainly in an event-based program you would normally prefer to pass
705 a callback directly. If you write a module and expect your users to use
706 it "synchronously" often (for example, a simple http-get script would not
707 really care much for events), then you would use a condition variable and
708 tell them "simply ->recv the data".
709
710 =head3 Problems with the implementation and how to fix them
711
712 To make this example more real-world-ready, we would not only implement
713 some write buffering (for the paranoid), but we would also have to handle
714 timeouts and maybe protocol errors.
715
716 Doing this quickly gets unwieldy, which is why we introduce
717 L<AnyEvent::Handle> in the next section, which takes care of all these
718 details for you and let's you concentrate on the actual protocol.
719
720
721 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
722
723 The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
724 see what it really offers.
725
726 As finger is such a simple protocol, let's try something slightly more
727 complicated: HTTP/1.0.
728
729 An HTTP GET request works by sending a single request line that indicates
730 what you want the server to do and the URI you want to act it on, followed
731 by as many "header" lines (C<Header: data>, same as e-mail headers) as
732 required for the request, ended by an empty line.
733
734 The response is formatted very similarly, first a line with the response
735 status, then again as many header lines as required, then an empty line,
736 followed by any data that the server might send.
737
738 Again, let's try it out with C<telnet> (I condensed the output a bit - if
739 you want to see the full response, do it yourself).
740
741 # telnet www.google.com 80
742 Trying 209.85.135.99...
743 Connected to www.google.com (209.85.135.99).
744 Escape character is '^]'.
745 GET /test HTTP/1.0
746
747 HTTP/1.0 404 Not Found
748 Date: Mon, 02 Jun 2008 07:05:54 GMT
749 Content-Type: text/html; charset=UTF-8
750
751 <html><head>
752 [...]
753 Connection closed by foreign host.
754
755 The C<GET ...> and the empty line were entered manually, the rest of the
756 telnet output is google's response, in which case a C<404 not found> one.
757
758 So, here is how you would do it with C<AnyEvent::Handle>:
759
760 sub http_get {
761 my ($host, $uri, $cb) = @_;
762
763 tcp_connect $host, "http", sub {
764 my ($fh) = @_
765 or $cb->("HTTP/1.0 500 $!");
766
767 # store results here
768 my ($response, $header, $body);
769
770 my $handle; $handle = new AnyEvent::Handle
771 fh => $fh,
772 on_error => sub {
773 undef $handle;
774 $cb->("HTTP/1.0 500 $!");
775 },
776 on_eof => sub {
777 undef $handle; # keep it alive till eof
778 $cb->($response, $header, $body);
779 };
780
781 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
782
783 # now fetch response status line
784 $handle->push_read (line => sub {
785 my ($handle, $line) = @_;
786 $response = $line;
787 });
788
789 # then the headers
790 $handle->push_read (line => "\015\012\015\012", sub {
791 my ($handle, $line) = @_;
792 $header = $line;
793 });
794
795 # and finally handle any remaining data as body
796 $handle->on_read (sub {
797 $body .= $_[0]->rbuf;
798 $_[0]->rbuf = "";
799 });
800 };
801 }
802
803 And now let's go through it step by step. First, as usual, the overall
804 C<http_get> function structure:
805
806 sub http_get {
807 my ($host, $uri, $cb) = @_;
808
809 tcp_connect $host, "http", sub {
810 ...
811 };
812 }
813
814 Unlike in the finger example, this time the caller has to pass a callback
815 to C<http_get>. Also, instead of passing a URL as one would expect, the
816 caller has to provide the hostname and URI - normally you would use the
817 C<URI> module to parse a URL and separate it into those parts, but that is
818 left to the inspired reader :)
819
820 Since everything else is left to the caller, all C<http_get> does it to
821 initiate the connection with C<tcp_connect> and leave everything else to
822 it's callback.
823
824 The first thing the callback does is check for connection errors and
825 declare some variables:
826
827 my ($fh) = @_
828 or $cb->("HTTP/1.0 500 $!");
829
830 my ($response, $header, $body);
831
832 Instead of having an extra mechanism to signal errors, connection errors
833 are signalled by crafting a special "response status line", like this:
834
835 HTTP/1.0 500 Connection refused
836
837 This means the caller cannot distinguish (easily) between
838 locally-generated errors and server errors, but it simplifies error
839 handling for the caller a lot.
840
841 The next step finally involves L<AnyEvent::Handle>, namely it creates the
842 handle object:
843
844 my $handle; $handle = new AnyEvent::Handle
845 fh => $fh,
846 on_error => sub {
847 undef $handle;
848 $cb->("HTTP/1.0 500 $!");
849 },
850 on_eof => sub {
851 undef $handle; # keep it alive till eof
852 $cb->($response, $header, $body);
853 };
854
855 The constructor expects a file handle, which gets passed via the C<fh>
856 argument.
857
858 The remaining two argument pairs specify two callbacks to be called on
859 any errors (C<on_error>) and in the case of a normal connection close
860 (C<on_eof>).
861
862 In the first case, we C<undef>ine the handle object and pass the error to
863 the callback provided by the callback - done.
864
865 In the second case we assume everything went fine and pass the results
866 gobbled up so far to the caller-provided callback. This is not quite
867 perfect, as when the server "cleanly" closes the connection in the middle
868 of sending headers we might wrongly report this as an "OK" to the caller,
869 but then, HTTP doesn't support a perfect mechanism that would detect such
870 problems in all cases, so we don't bother either.
871
872 =head3 The write queue
873
874 The next line sends the actual request:
875
876 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
877
878 No headers will be sent (this is fine for simple requests), so the whole
879 request is just a single line followed by an empty line to signal the end
880 of the headers to the server.
881
882 The more interesting question is why the method is called C<push_write>
883 and not just write. The reason is that you can I<always> add some write
884 data without blocking, and to do this, AnyEvent::Handle needs some write
885 queue internally - and C<push_write> simply pushes some data at the end of
886 that queue, just like Perl's C<push> pushes data at the end of an array.
887
888 The deeper reason is that at some point in the future, there might
889 be C<unshift_write> as well, and in any case, we will shortly meet
890 C<push_read> and C<unshift_read>, and it's usually easiest if all those
891 functions have some symmetry in their name.
892
893 If C<push_write> is called with more than one argument, then you can even
894 do I<formatted> I/O, which simply means your data will be transformed in
895 some ways. For example, this would JSON-encode your data before pushing it
896 to the write queue:
897
898 $handle->push_write (json => [1, 2, 3]);
899
900 Apart from that, this pretty much summarises the write queue, there is
901 little else to it.
902
903 Reading the response if far more interesting:
904
905 =head3 The read queue
906
907 The response consists of three parts: a single line of response status, a
908 single paragraph of headers ended by an empty line, and the request body,
909 which is simply the remaining data on that connection.
910
911 For the first two, we push two read requests onto the read queue:
912
913 # now fetch response status line
914 $handle->push_read (line => sub {
915 my ($handle, $line) = @_;
916 $response = $line;
917 });
918
919 # then the headers
920 $handle->push_read (line => "\015\012\015\012", sub {
921 my ($handle, $line) = @_;
922 $header = $line;
923 });
924
925 While one can simply push a single callback to the queue, I<formatted> I/O
926 really comes to out advantage here, as there is a ready-made "read line"
927 read type. The first read expects a single line, ended by C<\015\012> (the
928 standard end-of-line marker in internet protocols).
929
930 The second "line" is actually a single paragraph - instead of reading it
931 line by line we tell C<push_read> that the end-of-line marker is really
932 C<\015\012\015\012>, which is an empty line. The result is that the whole
933 header paragraph will be treated as a single line and read. The word
934 "line" is interpreted very freely, much like Perl itself does it.
935
936 Note that push read requests are pushed immediately after creating the
937 handle object - since AnyEvent::Handle provides a queue we can push as
938 many requests as we want, and AnyEvent::Handle will handle them in order.
939
940 There is, however, no read type for "the remaining data". For that, we
941 install our own C<on_read> callback:
942
943 # and finally handle any remaining data as body
944 $handle->on_read (sub {
945 $body .= $_[0]->rbuf;
946 $_[0]->rbuf = "";
947 });
948
949 This callback is invoked every time data arrives and the read queue is
950 empty - which in this example will only be the case when both response and
951 header have been read. The C<on_read> callback could actually have been
952 specified when constructing the object, but doing it this way preserves
953 logical ordering.
954
955 The read callback simply adds the current read buffer to it's C<$body>
956 variable and, most importantly, I<empties> it by assign the empty string
957 to it.
958
959 After AnyEvent::Handle has been so instructed, it will now handle incoming
960 data according to these instructions - if all goes well, the callback will
961 be invoked with the response data, if not, it will get an error.
962
963 In general, you get pipelining very easy with AnyEvent::Handle: If
964 you have a protocol with a request/response structure, your request
965 methods/functions will all look like this (simplified):
966
967 sub request {
968
969 # send the request to the server
970 $handle->push_write (...);
971
972 # push some response handlers
973 $handle->push_read (...);
974 }
975
976 =head3 Using it
977
978 And here is how you would use it:
979
980 http_get "www.google.com", "/", sub {
981 my ($response, $header, $body) = @_;
982
983 print
984 $response, "\n",
985 $body;
986 };
987
988 And of course, you can run as many of these requests in parallel as you
989 want (and your memory supports).
990
991 =head3 HTTPS
992
993 Now, as promised, let's implement the same thing for HTTPS, or more
994 correctly, let's change our C<http_get> function into a function that
995 speaks HTTPS instead.
996
997 HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
998 B<S>ecurity is the official name for what most people refer to as C<SSL>)
999 that contains standard HTTP protocol exchanges. The other difference to
1000 HTTP is that it uses port C<443> instead of port C<80>.
1001
1002 To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call
1003 we replace C<http> by C<https>):
1004
1005 tcp_connect $host, "https", sub { ...
1006
1007 The other change deals with TLS, which is something L<AnyEvent::Handle>
1008 does for us, as long as I<you> made sure that the L<Net::SSLeay> module is
1009 around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition
1010 C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1011
1012 tls => "connect",
1013
1014 Specifying C<tls> enables TLS, and the argument specifies whether
1015 AnyEvent::Handle is the server side ("accept") or the client side
1016 ("connect") for the TLS connection, as unlike TCP, there is a clear
1017 server/client relationship in TLS.
1018
1019 That's all.
1020
1021 Of course, all this should be handled transparently by C<http_get> after
1022 parsing the URL. See the part about exercising your inspiration earlier in
1023 this document.
1024
1025 =head3 The read queue - revisited
1026
1027 HTTP always uses the same structure in its responses, but many protocols
1028 require parsing responses different depending on the response itself.
1029
1030 For example, in SMTP, you normally get a single response line:
1031
1032 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1033
1034 But SMTP also supports multi-line responses:
1035
1036 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1037 220-hey guys
1038 220 my response is longer than yours
1039
1040 To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1041 C<unshift_read> will not append your read request tot he end of the read
1042 queue, but instead it will prepend it to the queue.
1043
1044 This is useful for this this situation: You push your response-line read
1045 request when sending the SMTP command, and when handling it, you look at
1046 the line to see if more is to come, and C<unshift_read> another reader,
1047 like this:
1048
1049 my $response; # response lines end up in here
1050
1051 my $read_response; $read_response = sub {
1052 my ($handle, $line) = @_;
1053
1054 $response .= "$line\n";
1055
1056 # check for continuation lines ("-" as 4th character")
1057 if ($line =~ /^...-/) {
1058 # if yes, then unshift another line read
1059 $handle->unshift_read (line => $read_response);
1060
1061 } else {
1062 # otherwise we are done
1063
1064 # free callback
1065 undef $read_response;
1066
1067 print "we are don reading: $response\n";
1068 }
1069 };
1070
1071 $handle->push_read (line => $read_response);
1072
1073 This recipe can be used for all similar parsing problems, for example in
1074 NNTP, the response code to some commands indicates that more data will be
1075 sent:
1076
1077 $handle->push_write ("article 42");
1078
1079 # read response line
1080 $handle->push_read (line => sub {
1081 my ($handle, $status) = @_;
1082
1083 # article data following?
1084 if ($status =~ /^2/) {
1085 # yes, read article body
1086
1087 $handle->unshift_read (line => "\012.\015\012", sub {
1088 my ($handle, $body) = @_;
1089
1090 $finish->($status, $body);
1091 });
1092
1093 } else {
1094 # some error occured, no article data
1095
1096 $finish->($status);
1097 }
1098 }
1099
1100 =head3 Your own read queue handler
1101
1102 Sometimes, your protocol doesn't play nice and uses lines or chunks of
1103 data, in which case you have to implement your own read parser.
1104
1105 To make up a contorted example, imagine you are looking for an even
1106 number of characters followed by a colon (":"). Also imagine that
1107 AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1108 had to do it manually.
1109
1110 To implement this, you would C<push_read> (or C<unshift_read>) just a
1111 single code reference.
1112
1113 This code reference will then be called each time there is (new) data
1114 available in the read buffer, and is expected to either eat/consume some
1115 of that data (and return true) or to return false to indicate that it
1116 wants to be called again.
1117
1118 If the code reference returns true, then it will be removed from the read
1119 queue, otherwise it stays in front of it.
1120
1121 The example above could be coded like this:
1122
1123 $handle->push_read (sub {
1124 my ($handle) = @_;
1125
1126 # check for even number of characters + ":"
1127 # and remove the data if a match is found.
1128 # if not, return false (actually nothing)
1129
1130 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1131 or return;
1132
1133 # we got some data in $1, pass it to whoever wants it
1134 $finish->($1);
1135
1136 # and return true to indicate we are done
1137 1
1138 });
1139
1140
1141 =head1 Authors
1142
1143 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1144