ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.17
Committed: Tue Jun 3 09:02:13 2008 UTC (16 years ago) by root
Branch: MAIN
CVS Tags: rel-4_14, rel-4_13, rel-4_12
Changes since 1.16: +5 -2 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 Introduction to AnyEvent
2
3 This is a tutorial that will introduce you to the features of AnyEvent.
4
5 The first part introduces the core AnyEvent module (after swamping you a
6 bit in evangelism), which might already provide all you ever need. If you
7 are only interested in AnyEvent's event handling capabilities, read no
8 further.
9
10 The second part focuses on network programming using sockets, for which
11 AnyEvent offers a lot of support you can use, and a lot of workarounds
12 around portability quirks.
13
14
15 =head1 What is AnyEvent?
16
17 If you don't care for the whys and want to see code, skip this section!
18
19 AnyEvent is first of all just a framework to do event-based
20 programming. Typically such frameworks are an all-or-nothing thing: If you
21 use one such framework, you can't (easily, or even at all) use another in
22 the same program.
23
24 AnyEvent is different - it is a thin abstraction layer above all kinds
25 of event loops. Its main purpose is to move the choice of the underlying
26 framework (the event loop) from the module author to the program author
27 using the module.
28
29 That means you can write code that uses events to control what it
30 does, without forcing other code in the same program to use the same
31 underlying framework as you do - i.e. you can create a Perl module
32 that is event-based using AnyEvent, and users of that module can still
33 choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
34 all: AnyEvent comes with its own event loop implementation, so your
35 code works regardless of other modules that might or might not be
36 installed. The latter is important, as AnyEvent does not have any
37 dependencies to other modules, which makes it easy to install, for
38 example, when you lack a C compiler.
39
40 A typical problem with Perl modules such as L<Net::IRC> is that they
41 come with their own event loop: In L<Net::IRC>, the program who uses it
42 needs to start the event loop of L<Net::IRC>. That means that one cannot
43 integrate this module into a L<Gtk2> GUI for instance, as that module,
44 too, enforces the use of its own event loop (namely L<Glib>).
45
46 Another example is L<LWP>: it provides no event interface at all. It's a
47 pure blocking HTTP (and FTP etc.) client library, which usually means that
48 you either have to start a thread or have to fork for a HTTP request, or
49 use L<Coro::LWP>, if you want to do something else while waiting for the
50 request to finish.
51
52 The motivation behind these designs is often that a module doesn't want to
53 depend on some complicated XS-module (Net::IRC), or that it doesn't want
54 to force the user to use some specific event loop at all (LWP).
55
56 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
57
58 =over 4
59
60 =item - write their own event loop (because guarantees to offer one
61 everywhere - even on windows).
62
63 =item - choose one fixed event loop (because AnyEvent works with all
64 important event loops available for Perl, and adding others is trivial).
65
66 =back
67
68 If the module author uses L<AnyEvent> for all his event needs (IO events,
69 timers, signals, ...) then all other modules can just use his module and
70 don't have to choose an event loop or adapt to his event loop. The choice
71 of the event loop is ultimately made by the program author who uses all
72 the modules and writes the main program. And even there he doesn't have to
73 choose, he can just let L<AnyEvent> choose the best available event loop
74 for him.
75
76 Read more about this in the main documentation of the L<AnyEvent> module.
77
78
79 =head1 Introduction to Event-Based Programming
80
81 So what exactly is programming using events? It quite simply means that
82 instead of your code actively waiting for something, such as the user
83 entering something on STDIN:
84
85 $| = 1; print "enter your name> ";
86
87 my $name = <STDIN>;
88
89 You instead tell your event framework to notify you in the event of some
90 data being available on STDIN, by using a callback mechanism:
91
92 use AnyEvent;
93
94 $| = 1; print "enter your name> ";
95
96 my $name;
97
98 my $wait_for_input = AnyEvent->io (
99 fh => \*STDIN, # which file handle to check
100 poll => "r", # which event to wait for ("r"ead data)
101 cb => sub { # what callback to execute
102 $name = <STDIN>; # read it
103 }
104 );
105
106 # do something else here
107
108 Looks more complicated, and surely is, but the advantage of using events
109 is that your program can do something else instead of waiting for
110 input. Waiting as in the first example is also called "blocking" because
111 you "block" your process from executing anything else while you do so.
112
113 The second example avoids blocking, by only registering interest in a read
114 event, which is fast and doesn't block your process. Only when read data
115 is available will the callback be called, which can then proceed to read
116 the data.
117
118 The "interest" is represented by an object returned by C<< AnyEvent->io
119 >> called a "watcher" object - called like that because it "watches" your
120 file handle (or other event sources) for the event you are interested in.
121
122 In the example above, we create an I/O watcher by calling the C<<
123 AnyEvent->io >> method. Disinterest in some event is simply expressed by
124 forgetting about the watcher, for example, by C<undef>'ing the variable it
125 is stored in. AnyEvent will automatically clean up the watcher if it is no
126 longer used, much like Perl closes your file handles if you no longer use
127 them anywhere.
128
129 =head2 Condition Variables
130
131 However, the above is not a fully working program, and will not work
132 as-is. The reason is that your callback will not be invoked out of the
133 blue, you have to run the event loop. Also, event-based programs sometimes
134 have to block, too, as when there simply is nothing else to do and
135 everything waits for some events, it needs to block the process as well.
136
137 In AnyEvent, this is done using condition variables. Condition variables
138 are named "condition variables" because they represent a condition that is
139 initially false and needs to be fulfilled.
140
141 You can also call them "merge points", "sync points", "rendezvous ports"
142 or even callbacks and many other things (and they are often called like
143 this in other frameworks). The important point is that you can create them
144 freely and later wait for them to become true.
145
146 Condition variables have two sides - one side is the "producer" of the
147 condition (whatever code detects the condition), the other side is the
148 "consumer" (the code that waits for that condition).
149
150 In our example in the previous section, the producer is the event callback
151 and there is no consumer yet - let's change that now:
152
153 use AnyEvent;
154
155 $| = 1; print "enter your name> ";
156
157 my $name;
158
159 my $name_ready = AnyEvent->condvar;
160
161 my $wait_for_input = AnyEvent->io (
162 fh => \*STDIN,
163 poll => "r",
164 cb => sub {
165 $name = <STDIN>;
166 $name_ready->send;
167 }
168 );
169
170 # do something else here
171
172 # now wait until the name is available:
173 $name_ready->recv;
174
175 undef $wait_for_input; # watche rno longer needed
176
177 print "your name is $name\n";
178
179 This program creates an AnyEvent condvar by calling the C<<
180 AnyEvent->condvar >> method. It then creates a watcher as usual, but
181 inside the callback it C<send>'s the C<$name_ready> condition variable,
182 which causes anybody waiting on it to continue.
183
184 The "anybody" in this case is the code that follows, which calls C<<
185 $name_ready->recv >>: The producer calls C<send>, the consumer calls
186 C<recv>.
187
188 If there is no C<$name> available yet, then the call to C<<
189 $name_ready->recv >> will halt your program until the condition becomes
190 true.
191
192 As the names C<send> and C<recv> imply, you can actually send and receive
193 data using this, for example, the above code could also be written like
194 this, without an extra variable to store the name in:
195
196 use AnyEvent;
197
198 $| = 1; print "enter your name> ";
199
200 my $name_ready = AnyEvent->condvar;
201
202 my $wait_for_input = AnyEvent->io (
203 fh => \*STDIN, poll => "r",
204 cb => sub { $name_ready->send (scalar = <STDIN>) }
205 );
206
207 # do something else here
208
209 # now wait and fetch the name
210 my $name = $name_ready->recv;
211
212 undef $wait_for_input; # watche rno longer needed
213
214 print "your name is $name\n";
215
216 You can pass any number of arguments to C<send>, and everybody call to
217 C<recv> will return them.
218
219 =head2 The "main loop"
220
221 Most event-based frameworks have something called a "main loop" or "event
222 loop run function" or something similar.
223
224 Just like in C<recv> AnyEvent, these functions need to be called
225 eventually so that your event loop has a chance of actually looking for
226 those events you are interested in.
227
228 For example, in a L<Gtk2> program, the above example could also be written
229 like this:
230
231 use Gtk2 -init;
232 use AnyEvent;
233
234 ############################################
235 # create a window and some label
236
237 my $window = new Gtk2::Window "toplevel";
238 $window->add (my $label = new Gtk2::Label "soon replaced by name");
239
240 $window->show_all;
241
242 ############################################
243 # do our AnyEvent stuff
244
245 $| = 1; print "enter your name> ";
246
247 my $name_ready = AnyEvent->condvar;
248
249 my $wait_for_input = AnyEvent->io (
250 fh => \*STDIN, poll => "r",
251 cb => sub {
252 # set the label
253 $label->set_text (scalar <STDIN>);
254 print "enter another name> ";
255 }
256 );
257
258 ############################################
259 # Now enter Gtk2's event loop
260
261 main Gtk2;
262
263 No condition variable anywhere in sight - instead, we just read a line
264 from STDIN and replace the text in the label. In fact, since nobody
265 C<undef>'s C<$wait_for_input> you can enter multiple lines.
266
267 Instead of waiting for a condition variable, the program enters the Gtk2
268 main loop by calling C<< Gtk2->main >>, which will block the program and
269 wait for events to arrive.
270
271 This also shows that AnyEvent is quite flexible - you didn't have anything
272 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
273 worked.
274
275 Admittedly, the example is a bit silly - who would want to read names
276 form standard input in a Gtk+ application. But imagine that instead of
277 doing that, you would make a HTTP request in the background and display
278 it's results. In fact, with event-based programming you can make many
279 http-requests in parallel in your program and still provide feedback to
280 the user and stay interactive.
281
282 In the next part you will see how to do just that - by implementing an
283 HTTP request, on our own, with the utility modules AnyEvent comes with.
284
285 Before that, however, let's briefly look at how you would write your
286 program with using only AnyEvent, without ever calling some other event
287 loop's run function.
288
289 In the example using condition variables, we used that, and in fact, this
290 is the solution:
291
292 my $quit_program = AnyEvent->condvar;
293
294 # create AnyEvent watchers (or not) here
295
296 $quit_program->recv;
297
298 If any of your watcher callbacks decide to quit, they can simply call
299 C<< $quit_program->send >>. Of course, they could also decide not to and
300 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
301 in a long-running daemon program).
302
303 In that case, you can simply use:
304
305 AnyEvent->condvar->recv;
306
307 And this is, in fact, closest to the idea of a main loop run function that
308 AnyEvent offers.
309
310 =head2 Timers and other event sources
311
312 So far, we have only used I/O watchers. These are useful mainly to find
313 out whether a Socket has data to read, or space to write more data. On sane
314 operating systems this also works for console windows/terminals (typically
315 on standard input), serial lines, all sorts of other devices, basically
316 almost everything that has a file descriptor but isn't a file itself. (As
317 usual, "sane" excludes windows - on that platform you would need different
318 functions for all of these, complicating code immensely - think "socket
319 only" on windows).
320
321 However, I/O is not everything - the second most important event source is
322 the clock. For example when doing an HTTP request you might want to time
323 out when the server doesn't answer within some predefined amount of time.
324
325 In AnyEvent, timer event watchers are created by calling the C<<
326 AnyEvent->timer >> method:
327
328 use AnyEvent;
329
330 my $cv = AnyEvent->condvar;
331
332 my $wait_one_and_a_half_seconds = AnyEvent->timer (
333 after => 1.5, # after how many seconds to invoke the cb?
334 cb => sub { # the callback to invoke
335 $cv->send;
336 },
337 );
338
339 # can do something else here
340
341 # now wait till our time has come
342 $cv->recv;
343
344 Unlike I/O watchers, timers are only interested in the amount of seconds
345 they have to wait. When that amount of time has passed, AnyEvent will
346 invoke your callback.
347
348 Unlike I/O watchers, which will call your callback as many times as there
349 is data available, timers are one-shot: after they have "fired" once and
350 invoked your callback, they are dead and no longer do anything.
351
352 To get a repeating timer, such as a timer firing roughly once per second,
353 you have to recreate it:
354
355 use AnyEvent;
356
357 my $time_watcher;
358
359 sub once_per_second {
360 print "tick\n";
361
362 # (re-)create the watcher
363 $time_watcher = AnyEvent->timer (
364 after => 1,
365 cb => \&once_per_second,
366 );
367 }
368
369 # now start the timer
370 once_per_second;
371
372 Having to recreate your timer is a restriction put on AnyEvent that is
373 present in most event libraries it uses. It is so annoying that some
374 future version might work around this limitation, but right now, it's the
375 only way to do repeating timers.
376
377 Fortunately most timers aren't really repeating but specify timeouts of
378 some sort.
379
380 =head3 More esoteric sources
381
382 AnyEvent also has some other, more esoteric event sources you can tap
383 into: signal and child watchers.
384
385 Signal watchers can be used to wait for "signal events", which simply
386 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
387
388 Process watchers wait for a child process to exit. They are useful when
389 you fork a separate process and need to know when it exits, but you do not
390 wait for that by blocking.
391
392 Both watcher types are described in detail in the main L<AnyEvent> manual
393 page.
394
395
396 =head1 Network programming and AnyEvent
397
398 So far you have seen how to register event watchers and handle events.
399
400 This is a great foundation to write network clients and servers, and might be
401 all that your module (or program) ever requires, but writing your own I/O
402 buffering again and again becomes tedious, not to mention that it attracts
403 errors.
404
405 While the core L<AnyEvent> module is still small and self-contained,
406 the distribution comes with some very useful utility modules such as
407 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
408 make your life as non-blocking network programmer a lot easier.
409
410 Here is a quick overview over these three modules:
411
412 =head2 L<AnyEvent::DNS>
413
414 This module allows fully asynchronous DNS resolution. It is used mainly by
415 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
416 a great way to do other DNS resolution tasks, such as reverse lookups of
417 IP addresses for log files.
418
419 =head2 L<AnyEvent::Handle>
420
421 This module handles non-blocking IO on file handles in an event based
422 manner. It provides a wrapper object around your file handle that provides
423 queueing and buffering of incoming and outgoing data for you.
424
425 It also implements the most common data formats, such as text lines, or
426 fixed and variable-width data blocks.
427
428 =head2 L<AnyEvent::Socket>
429
430 This module provides you with functions that handle socket creation
431 and IP address magic. The two main functions are C<tcp_connect> and
432 C<tcp_server>. The former will connect a (streaming) socket to an internet
433 host for you and the later will make a server socket for you, to accept
434 connections.
435
436 This module also comes with transparent IPv6 support, this means: If you
437 write your programs with this module, you will be IPv6 ready without doing
438 anything special.
439
440 It also works around a lot of portability quirks (especially on the
441 windows platform), which makes it even easier to write your programs in a
442 portable way (did you know that windows uses different error codes for all
443 socket functions and that Perl does not know about these? That "Unknown
444 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
445 successful? That unsuccessful TCP connects might never be reported back
446 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
447 ignored instead of being in progress? AnyEvent::Socket works around all of
448 these Windows/Perl bugs for you).
449
450 =head2 Implementing a parallel finger client with non-blocking connects
451 and AnyEvent::Socket
452
453 The finger protocol is one of the simplest protocols in use on the
454 internet. Or in use in the past, as almost nobody uses it anymore.
455
456 It works by connecting to the finger port on another host, writing a
457 single line with a user name and then reading the finger response, as
458 specified by that user. OK, RFC 1288 specifies a vastly more complex
459 protocol, but it basically boils down to this:
460
461 # telnet idsoftware.com finger
462 Trying 192.246.40.37...
463 Connected to idsoftware.com (192.246.40.37).
464 Escape character is '^]'.
465 johnc
466 Welcome to id Software's Finger Service V1.5!
467
468 [...]
469 Now on the web:
470 [...]
471
472 Connection closed by foreign host.
473
474 "Now on the web..." yeah, I<was> used indeed, but at least the finger
475 daemon still works, so let's write a little AnyEvent function that makes a
476 finger request:
477
478 use AnyEvent;
479 use AnyEvent::Socket;
480
481 sub finger($$) {
482 my ($user, $host) = @_;
483
484 # use a condvar to return results
485 my $cv = AnyEvent->condvar;
486
487 # first, connect to the host
488 tcp_connect $host, "finger", sub {
489 # the callback receives the socket handle - or nothing
490 my ($fh) = @_
491 or return $cv->send;
492
493 # now write the username
494 syswrite $fh, "$user\015\012";
495
496 my $response;
497
498 # register a read watcher
499 my $read_watcher; $read_watcher = AnyEvent->io (
500 fh => $fh,
501 poll => "r",
502 cb => sub {
503 my $len = sysread $fh, $response, 1024, length $response;
504
505 if ($len <= 0) {
506 # we are done, or an error occured, lets ignore the latter
507 undef $read_watcher; # no longer interested
508 $cv->send ($response); # send results
509 }
510 },
511 );
512 };
513
514 # pass $cv to the caller
515 $cv
516 }
517
518 That's a mouthful! Let's dissect this function a bit, first the overall
519 function and execution flow:
520
521 sub finger($$) {
522 my ($user, $host) = @_;
523
524 # use a condvar to return results
525 my $cv = AnyEvent->condvar;
526
527 # first, connect to the host
528 tcp_connect $host, "finger", sub {
529 ...
530 };
531
532 $cv
533 }
534
535 This isn't too complicated, just a function with two parameters, that
536 creates a condition variable, returns it, and while it does that,
537 initiates a TCP connect to C<$host>. The condition variable will be used
538 by the caller to receive the finger response, but one could equally well
539 pass a third argument, a callback, to the function.
540
541 Since we are programming event'ish, we do not wait for the connect to
542 finish - it could block the program for a minute or longer!
543
544 Instead, we pass the callback it should invoke when the connect is done to
545 C<tcp_connect>. If it is successful, that callback gets called with the
546 socket handle as first argument, otherwise, nothing will be passed to our
547 callback. The important point is that it will always be called as soon as
548 the outcome of the TCP connect is known.
549
550 This style of programming is also called "continuation style": the
551 "continuation" is simply the way the program continues - normally, a
552 program continues at the next line after some statement (the exception
553 is loops or things like C<return>). When we are interested in events,
554 however, we instead specify the "continuation" of our program by passing a
555 closure, which makes that closure the "continuation" of the program. The
556 C<tcp_connect> call is like saying "return now, and when the connection is
557 established or it failed, continue there".
558
559 Now let's look at the callback/closure in more detail:
560
561 # the callback receives the socket handle - or nothing
562 my ($fh) = @_
563 or return $cv->send;
564
565 The first thing the callback does is indeed save the socket handle in
566 C<$fh>. When there was an error (no arguments), then our instinct as
567 expert Perl programmers would tell us to C<die>:
568
569 my ($fh) = @_
570 or die "$host: $!";
571
572 While this would give good feedback to the user (if he happens to watch
573 standard error), our program would probably stop working here, as we never
574 report the results to anybody, certainly not the caller of our C<finger>
575 function, and most event loops continue even after a C<die>!
576
577 This is why we instead C<return>, but also call C<< $cv->send >> without
578 any arguments to signal to the condvar consumer that something bad has
579 happened. The return value of C<< $cv->send >> is irrelevant, as is the
580 return value of our callback. The return statement is simply used for the
581 side effect of, well, returning immediately from the callback. Checking
582 for errors and handling them this way is very common, which is why this
583 compact idiom is so handy.
584
585 As the next step in the finger protocol, we send the username to the
586 finger daemon on the other side of our connection:
587
588 syswrite $fh, "$user\015\012";
589
590 Note that this isn't 100% clean socket programming - the socket could,
591 for whatever reasons, not accept our data. When writing a small amount
592 of data like in this example it doesn't matter, as a socket buffer is
593 almost always big enough for a mere "username", but for real-world
594 cases you might need to implement some kind of write buffering - or use
595 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
596 next section.
597
598 What we I<do> have to do is to implement our own read buffer - the response
599 data could arrive late or in multiple chunks, and we cannot just wait for
600 it (event-based programming, you know?).
601
602 To do that, we register a read watcher on the socket which waits for data:
603
604 my $read_watcher; $read_watcher = AnyEvent->io (
605 fh => $fh,
606 poll => "r",
607
608 There is a trick here, however: the read watcher isn't stored in a global
609 variable, but in a local one - if the callback returns, it would normally
610 destroy the variable and its contents, which would in turn unregister our
611 watcher.
612
613 To avoid that, we C<undef>ine the variable in the watcher callback. This
614 means that, when the C<tcp_connect> callback returns, that perl thinks
615 (quite correctly) that the read watcher is still in use - namely in the
616 callback.
617
618 The trick, however, is that instead of:
619
620 my $read_watcher = AnyEvent->io (...
621
622 The program does:
623
624 my $read_watcher; $read_watcher = AnyEvent->io (...
625
626 The reason for this is a quirk in the way Perl works: variable names
627 declared with C<my> are only visible in the I<next> statement. If the
628 whole C<< AnyEvent->io >> call, including the callback, would be done in
629 a single statement, the callback could not refer to the C<$read_watcher>
630 variable to undefine it, so it is done in two statements.
631
632 Whether you'd want to format it like this is of course a matter of style,
633 this way emphasizes that the declaration and assignment really are one
634 logical statement.
635
636 The callback itself calls C<sysread> for as many times as necessary, until
637 C<sysread> returns either an error or end-of-file:
638
639 cb => sub {
640 my $len = sysread $fh, $response, 1024, length $response;
641
642 if ($len <= 0) {
643
644 Note that C<sysread> has the ability to append data it reads to a scalar,
645 by specifying an offset, which is what we make good use of in this
646 example.
647
648 When C<sysread> indicates we are done, the callback C<undef>ines
649 the watcher and then C<send>'s the response data to the condition
650 variable. All this has the following effects:
651
652 Undefining the watcher destroys it, as our callback was the only one still
653 having a reference to it. When the watcher gets destroyed, it destroys the
654 callback, which in turn means the C<$fh> handle is no longer used, so that
655 gets destroyed as well. The result is that all resources will be nicely
656 cleaned up by perl for us.
657
658 =head3 Using the finger client
659
660 Now, we could probably write the same finger client in a simpler way if
661 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
662 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
663
664 But the main advantage is that we can not only run this finger function in
665 the background, we even can run multiple sessions in parallel, like this:
666
667 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
668 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
669 my $f3 = finger "johnc", "idsoftware.com"; # finger john
670
671 print "trouble tickets:\n", $f1->recv, "\n";
672 print "trouble ticket #1736:\n", $f2->recv, "\n";
673 print "john carmacks finger file: ", $f3->recv, "\n";
674
675 It doesn't look like it, but in fact all three requests run in
676 parallel. The code waits for the first finger request to finish first, but
677 that doesn't keep it from executing them parallel: when the first C<recv>
678 call sees that the data isn't ready yet, it serves events for all three
679 requests automatically, until the first request has finished.
680
681 The second C<recv> call might either find the data is already there, or it
682 will continue handling events until that is the case, and so on.
683
684 By taking advantage of network latencies, which allows us to serve other
685 requests and events while we wait for an event on one socket, the overall
686 time to do these three requests will be greatly reduced, typically all
687 three are done in the same time as the slowest of them would need to finish.
688
689 By the way, you do not actually have to wait in the C<recv> method on an
690 AnyEvent condition variable - after all, waiting is evil - you can also
691 register a callback:
692
693 $cv->cb (sub {
694 my $response = shift->recv;
695 # ...
696 });
697
698 The callback will only be invoked when C<send> was called. In fact,
699 instead of returning a condition variable you could also pass a third
700 parameter to your finger function, the callback to invoke with the
701 response:
702
703 sub finger($$$) {
704 my ($user, $host, $cb) = @_;
705
706 How you implement it is a matter of taste - if you expect your function to
707 be used mainly in an event-based program you would normally prefer to pass
708 a callback directly. If you write a module and expect your users to use
709 it "synchronously" often (for example, a simple http-get script would not
710 really care much for events), then you would use a condition variable and
711 tell them "simply ->recv the data".
712
713 =head3 Problems with the implementation and how to fix them
714
715 To make this example more real-world-ready, we would not only implement
716 some write buffering (for the paranoid), but we would also have to handle
717 timeouts and maybe protocol errors.
718
719 Doing this quickly gets unwieldy, which is why we introduce
720 L<AnyEvent::Handle> in the next section, which takes care of all these
721 details for you and let's you concentrate on the actual protocol.
722
723
724 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
725
726 The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
727 see what it really offers.
728
729 As finger is such a simple protocol, let's try something slightly more
730 complicated: HTTP/1.0.
731
732 An HTTP GET request works by sending a single request line that indicates
733 what you want the server to do and the URI you want to act it on, followed
734 by as many "header" lines (C<Header: data>, same as e-mail headers) as
735 required for the request, ended by an empty line.
736
737 The response is formatted very similarly, first a line with the response
738 status, then again as many header lines as required, then an empty line,
739 followed by any data that the server might send.
740
741 Again, let's try it out with C<telnet> (I condensed the output a bit - if
742 you want to see the full response, do it yourself).
743
744 # telnet www.google.com 80
745 Trying 209.85.135.99...
746 Connected to www.google.com (209.85.135.99).
747 Escape character is '^]'.
748 GET /test HTTP/1.0
749
750 HTTP/1.0 404 Not Found
751 Date: Mon, 02 Jun 2008 07:05:54 GMT
752 Content-Type: text/html; charset=UTF-8
753
754 <html><head>
755 [...]
756 Connection closed by foreign host.
757
758 The C<GET ...> and the empty line were entered manually, the rest of the
759 telnet output is google's response, in which case a C<404 not found> one.
760
761 So, here is how you would do it with C<AnyEvent::Handle>:
762
763 sub http_get {
764 my ($host, $uri, $cb) = @_;
765
766 tcp_connect $host, "http", sub {
767 my ($fh) = @_
768 or $cb->("HTTP/1.0 500 $!");
769
770 # store results here
771 my ($response, $header, $body);
772
773 my $handle; $handle = new AnyEvent::Handle
774 fh => $fh,
775 on_error => sub {
776 undef $handle;
777 $cb->("HTTP/1.0 500 $!");
778 },
779 on_eof => sub {
780 undef $handle; # keep it alive till eof
781 $cb->($response, $header, $body);
782 };
783
784 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
785
786 # now fetch response status line
787 $handle->push_read (line => sub {
788 my ($handle, $line) = @_;
789 $response = $line;
790 });
791
792 # then the headers
793 $handle->push_read (line => "\015\012\015\012", sub {
794 my ($handle, $line) = @_;
795 $header = $line;
796 });
797
798 # and finally handle any remaining data as body
799 $handle->on_read (sub {
800 $body .= $_[0]->rbuf;
801 $_[0]->rbuf = "";
802 });
803 };
804 }
805
806 And now let's go through it step by step. First, as usual, the overall
807 C<http_get> function structure:
808
809 sub http_get {
810 my ($host, $uri, $cb) = @_;
811
812 tcp_connect $host, "http", sub {
813 ...
814 };
815 }
816
817 Unlike in the finger example, this time the caller has to pass a callback
818 to C<http_get>. Also, instead of passing a URL as one would expect, the
819 caller has to provide the hostname and URI - normally you would use the
820 C<URI> module to parse a URL and separate it into those parts, but that is
821 left to the inspired reader :)
822
823 Since everything else is left to the caller, all C<http_get> does it to
824 initiate the connection with C<tcp_connect> and leave everything else to
825 it's callback.
826
827 The first thing the callback does is check for connection errors and
828 declare some variables:
829
830 my ($fh) = @_
831 or $cb->("HTTP/1.0 500 $!");
832
833 my ($response, $header, $body);
834
835 Instead of having an extra mechanism to signal errors, connection errors
836 are signalled by crafting a special "response status line", like this:
837
838 HTTP/1.0 500 Connection refused
839
840 This means the caller cannot distinguish (easily) between
841 locally-generated errors and server errors, but it simplifies error
842 handling for the caller a lot.
843
844 The next step finally involves L<AnyEvent::Handle>, namely it creates the
845 handle object:
846
847 my $handle; $handle = new AnyEvent::Handle
848 fh => $fh,
849 on_error => sub {
850 undef $handle;
851 $cb->("HTTP/1.0 500 $!");
852 },
853 on_eof => sub {
854 undef $handle; # keep it alive till eof
855 $cb->($response, $header, $body);
856 };
857
858 The constructor expects a file handle, which gets passed via the C<fh>
859 argument.
860
861 The remaining two argument pairs specify two callbacks to be called on
862 any errors (C<on_error>) and in the case of a normal connection close
863 (C<on_eof>).
864
865 In the first case, we C<undef>ine the handle object and pass the error to
866 the callback provided by the callback - done.
867
868 In the second case we assume everything went fine and pass the results
869 gobbled up so far to the caller-provided callback. This is not quite
870 perfect, as when the server "cleanly" closes the connection in the middle
871 of sending headers we might wrongly report this as an "OK" to the caller,
872 but then, HTTP doesn't support a perfect mechanism that would detect such
873 problems in all cases, so we don't bother either.
874
875 =head3 The write queue
876
877 The next line sends the actual request:
878
879 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
880
881 No headers will be sent (this is fine for simple requests), so the whole
882 request is just a single line followed by an empty line to signal the end
883 of the headers to the server.
884
885 The more interesting question is why the method is called C<push_write>
886 and not just write. The reason is that you can I<always> add some write
887 data without blocking, and to do this, AnyEvent::Handle needs some write
888 queue internally - and C<push_write> simply pushes some data at the end of
889 that queue, just like Perl's C<push> pushes data at the end of an array.
890
891 The deeper reason is that at some point in the future, there might
892 be C<unshift_write> as well, and in any case, we will shortly meet
893 C<push_read> and C<unshift_read>, and it's usually easiest if all those
894 functions have some symmetry in their name.
895
896 If C<push_write> is called with more than one argument, then you can even
897 do I<formatted> I/O, which simply means your data will be transformed in
898 some ways. For example, this would JSON-encode your data before pushing it
899 to the write queue:
900
901 $handle->push_write (json => [1, 2, 3]);
902
903 Apart from that, this pretty much summarises the write queue, there is
904 little else to it.
905
906 Reading the response if far more interesting:
907
908 =head3 The read queue
909
910 The response consists of three parts: a single line of response status, a
911 single paragraph of headers ended by an empty line, and the request body,
912 which is simply the remaining data on that connection.
913
914 For the first two, we push two read requests onto the read queue:
915
916 # now fetch response status line
917 $handle->push_read (line => sub {
918 my ($handle, $line) = @_;
919 $response = $line;
920 });
921
922 # then the headers
923 $handle->push_read (line => "\015\012\015\012", sub {
924 my ($handle, $line) = @_;
925 $header = $line;
926 });
927
928 While one can simply push a single callback to the queue, I<formatted> I/O
929 really comes to out advantage here, as there is a ready-made "read line"
930 read type. The first read expects a single line, ended by C<\015\012> (the
931 standard end-of-line marker in internet protocols).
932
933 The second "line" is actually a single paragraph - instead of reading it
934 line by line we tell C<push_read> that the end-of-line marker is really
935 C<\015\012\015\012>, which is an empty line. The result is that the whole
936 header paragraph will be treated as a single line and read. The word
937 "line" is interpreted very freely, much like Perl itself does it.
938
939 Note that push read requests are pushed immediately after creating the
940 handle object - since AnyEvent::Handle provides a queue we can push as
941 many requests as we want, and AnyEvent::Handle will handle them in order.
942
943 There is, however, no read type for "the remaining data". For that, we
944 install our own C<on_read> callback:
945
946 # and finally handle any remaining data as body
947 $handle->on_read (sub {
948 $body .= $_[0]->rbuf;
949 $_[0]->rbuf = "";
950 });
951
952 This callback is invoked every time data arrives and the read queue is
953 empty - which in this example will only be the case when both response and
954 header have been read. The C<on_read> callback could actually have been
955 specified when constructing the object, but doing it this way preserves
956 logical ordering.
957
958 The read callback simply adds the current read buffer to it's C<$body>
959 variable and, most importantly, I<empties> it by assign the empty string
960 to it.
961
962 After AnyEvent::Handle has been so instructed, it will now handle incoming
963 data according to these instructions - if all goes well, the callback will
964 be invoked with the response data, if not, it will get an error.
965
966 In general, you get pipelining very easy with AnyEvent::Handle: If
967 you have a protocol with a request/response structure, your request
968 methods/functions will all look like this (simplified):
969
970 sub request {
971
972 # send the request to the server
973 $handle->push_write (...);
974
975 # push some response handlers
976 $handle->push_read (...);
977 }
978
979 =head3 Using it
980
981 And here is how you would use it:
982
983 http_get "www.google.com", "/", sub {
984 my ($response, $header, $body) = @_;
985
986 print
987 $response, "\n",
988 $body;
989 };
990
991 And of course, you can run as many of these requests in parallel as you
992 want (and your memory supports).
993
994 =head3 HTTPS
995
996 Now, as promised, let's implement the same thing for HTTPS, or more
997 correctly, let's change our C<http_get> function into a function that
998 speaks HTTPS instead.
999
1000 HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer
1001 B<S>ecurity is the official name for what most people refer to as C<SSL>)
1002 that contains standard HTTP protocol exchanges. The other difference to
1003 HTTP is that it uses port C<443> instead of port C<80>.
1004
1005 To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call
1006 we replace C<http> by C<https>):
1007
1008 tcp_connect $host, "https", sub { ...
1009
1010 The other change deals with TLS, which is something L<AnyEvent::Handle>
1011 does for us, as long as I<you> made sure that the L<Net::SSLeay> module is
1012 around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition
1013 C<tls> parameter to the call to C<AnyEvent::Handle::new>:
1014
1015 tls => "connect",
1016
1017 Specifying C<tls> enables TLS, and the argument specifies whether
1018 AnyEvent::Handle is the server side ("accept") or the client side
1019 ("connect") for the TLS connection, as unlike TCP, there is a clear
1020 server/client relationship in TLS.
1021
1022 That's all.
1023
1024 Of course, all this should be handled transparently by C<http_get> after
1025 parsing the URL. See the part about exercising your inspiration earlier in
1026 this document.
1027
1028 =head3 The read queue - revisited
1029
1030 HTTP always uses the same structure in its responses, but many protocols
1031 require parsing responses different depending on the response itself.
1032
1033 For example, in SMTP, you normally get a single response line:
1034
1035 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1036
1037 But SMTP also supports multi-line responses:
1038
1039 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net>
1040 220-hey guys
1041 220 my response is longer than yours
1042
1043 To handle this, we need C<unshift_read>. As the name (hopefully) implies,
1044 C<unshift_read> will not append your read request tot he end of the read
1045 queue, but instead it will prepend it to the queue.
1046
1047 This is useful for this this situation: You push your response-line read
1048 request when sending the SMTP command, and when handling it, you look at
1049 the line to see if more is to come, and C<unshift_read> another reader,
1050 like this:
1051
1052 my $response; # response lines end up in here
1053
1054 my $read_response; $read_response = sub {
1055 my ($handle, $line) = @_;
1056
1057 $response .= "$line\n";
1058
1059 # check for continuation lines ("-" as 4th character")
1060 if ($line =~ /^...-/) {
1061 # if yes, then unshift another line read
1062 $handle->unshift_read (line => $read_response);
1063
1064 } else {
1065 # otherwise we are done
1066
1067 # free callback
1068 undef $read_response;
1069
1070 print "we are don reading: $response\n";
1071 }
1072 };
1073
1074 $handle->push_read (line => $read_response);
1075
1076 This recipe can be used for all similar parsing problems, for example in
1077 NNTP, the response code to some commands indicates that more data will be
1078 sent:
1079
1080 $handle->push_write ("article 42");
1081
1082 # read response line
1083 $handle->push_read (line => sub {
1084 my ($handle, $status) = @_;
1085
1086 # article data following?
1087 if ($status =~ /^2/) {
1088 # yes, read article body
1089
1090 $handle->unshift_read (line => "\012.\015\012", sub {
1091 my ($handle, $body) = @_;
1092
1093 $finish->($status, $body);
1094 });
1095
1096 } else {
1097 # some error occured, no article data
1098
1099 $finish->($status);
1100 }
1101 }
1102
1103 =head3 Your own read queue handler
1104
1105 Sometimes, your protocol doesn't play nice and uses lines or chunks of
1106 data, in which case you have to implement your own read parser.
1107
1108 To make up a contorted example, imagine you are looking for an even
1109 number of characters followed by a colon (":"). Also imagine that
1110 AnyEvent::Handle had no C<regex> read type which could be used, so you'd
1111 had to do it manually.
1112
1113 To implement this, you would C<push_read> (or C<unshift_read>) just a
1114 single code reference.
1115
1116 This code reference will then be called each time there is (new) data
1117 available in the read buffer, and is expected to either eat/consume some
1118 of that data (and return true) or to return false to indicate that it
1119 wants to be called again.
1120
1121 If the code reference returns true, then it will be removed from the read
1122 queue, otherwise it stays in front of it.
1123
1124 The example above could be coded like this:
1125
1126 $handle->push_read (sub {
1127 my ($handle) = @_;
1128
1129 # check for even number of characters + ":"
1130 # and remove the data if a match is found.
1131 # if not, return false (actually nothing)
1132
1133 $handle->{rbuf} =~ s/^( (?:..)* ) ://x
1134 or return;
1135
1136 # we got some data in $1, pass it to whoever wants it
1137 $finish->($1);
1138
1139 # and return true to indicate we are done
1140 1
1141 });
1142
1143
1144 =head1 Authors
1145
1146 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
1147