ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent/Intro.pod
Revision: 1.12
Committed: Mon Jun 2 09:16:00 2008 UTC (16 years ago) by root
Branch: MAIN
Changes since 1.11: +64 -143 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 Introduction to AnyEvent
2
3 This is a tutorial that will introduce you to the features of AnyEvent.
4
5 The first part introduces the core AnyEvent module (after swamping you a
6 bit in evangelism), which might already provide all you ever need.
7
8 The second part focuses on network programming using sockets, for which
9 AnyEvent offers a lot of support you can use.
10
11
12 =head1 What is AnyEvent?
13
14 If you don't care for the whys and want to see code, skip this section!
15
16 AnyEvent is first of all just a framework to do event-based
17 programming. Typically such frameworks are an all-or-nothing thing: If you
18 use one such framework, you can't (easily, or even at all) use another in
19 the same program.
20
21 AnyEvent is different - it is a thin abstraction layer above all kinds
22 of event loops. Its main purpose is to move the choice of the underlying
23 framework (the event loop) from the module author to the program author
24 using the module.
25
26 That means you can write code that uses events to control what it
27 does, without forcing other code in the same program to use the same
28 underlying framework as you do - i.e. you can create a Perl module
29 that is event-based using AnyEvent, and users of that module can still
30 choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at
31 all: AnyEvent comes with its own event loop implementation, so your
32 code works regardless of other modules that might or might not be
33 installed. The latter is important, as AnyEvent does not have any
34 dependencies to other modules, which makes it easy to install, for
35 example, when you lack a C compiler.
36
37 A typical problem with Perl modules such as L<Net::IRC> is that they
38 come with their own event loop: In L<Net::IRC>, the program who uses it
39 needs to start the event loop of L<Net::IRC>. That means that one cannot
40 integrate this module into a L<Gtk2> GUI for instance, as that module,
41 too, enforces the use of its own event loop (namely L<Glib>).
42
43 Another example is L<LWP>: it provides no event interface at all. It's a
44 pure blocking HTTP (and FTP etc.) client library, which usually means that
45 you either have to start a thread or have to fork for a HTTP request, or
46 use L<Coro::LWP>, if you want to do something else while waiting for the
47 request to finish.
48
49 The motivation behind these designs is often that a module doesn't want to
50 depend on some complicated XS-module (Net::IRC), or that it doesn't want
51 to force the user to use some specific event loop at all (LWP).
52
53 L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either
54
55 =over 4
56
57 =item - write their own event loop (because guarantees to offer one
58 everywhere - even on windows).
59
60 =item - choose one fixed event loop (because AnyEvent works with all
61 important event loops available for Perl, and adding others is trivial).
62
63 =back
64
65 If the module author uses L<AnyEvent> for all his event needs (IO events,
66 timers, signals, ...) then all other modules can just use his module and
67 don't have to choose an event loop or adapt to his event loop. The choice
68 of the event loop is ultimately made by the program author who uses all
69 the modules and writes the main program. And even there he doesn't have to
70 choose, he can just let L<AnyEvent> choose the best available event loop
71 for him.
72
73 Read more about this in the main documentation of the L<AnyEvent> module.
74
75
76 =head1 Introduction to Event-Based Programming
77
78 So what exactly is programming using events? It quite simply means that
79 instead of your code actively waiting for something, such as the user
80 entering something on STDIN:
81
82 $| = 1; print "enter your name> ";
83
84 my $name = <STDIN>;
85
86 You instead tell your event framework to notify you in the event of some
87 data being available on STDIN, by using a callback mechanism:
88
89 use AnyEvent;
90
91 $| = 1; print "enter your name> ";
92
93 my $name;
94
95 my $wait_for_input = AnyEvent->io (
96 fh => \*STDIN, # which file handle to check
97 poll => "r", # which event to wait for ("r"ead data)
98 cb => sub { # what callback to execute
99 $name = <STDIN>; # read it
100 }
101 );
102
103 # do something else here
104
105 Looks more complicated, and surely is, but the advantage of using events
106 is that your program can do something else instead of waiting for
107 input. Waiting as in the first example is also called "blocking" because
108 you "block" your process from executing anything else while you do so.
109
110 The second example avoids blocking, by only registering interest in a read
111 event, which is fast and doesn't block your process. Only when read data
112 is available will the callback be called, which can then proceed to read
113 the data.
114
115 The "interest" is represented by an object returned by C<< AnyEvent->io
116 >> called a "watcher" object - called like that because it "watches" your
117 file handle (or other event sources) for the event you are interested in.
118
119 In the example above, we create an I/O watcher by calling the C<<
120 AnyEvent->io >> method. Disinterest in some event is simply expressed by
121 forgetting about the watcher, for example, by C<undef>'ing the variable it
122 is stored in. AnyEvent will automatically clean up the watcher if it is no
123 longer used, much like Perl closes your file handles if you no longer use
124 them anywhere.
125
126 =head2 Condition Variables
127
128 However, the above is not a fully working program, and will not work
129 as-is. The reason is that your callback will not be invoked out of the
130 blue, you have to run the event loop. Also, event-based programs sometimes
131 have to block, too, as when there simply is nothing else to do and
132 everything waits for some events, it needs to block the process as well.
133
134 In AnyEvent, this is done using condition variables. Condition variables
135 are named "condition variables" because they represent a condition that is
136 initially false and needs to be fulfilled.
137
138 You can also call them "merge points", "sync points", "rendezvous ports"
139 or even callbacks and many other things (and they are often called like
140 this in other frameworks). The important point is that you can create them
141 freely and later wait for them to become true.
142
143 Condition variables have two sides - one side is the "producer" of the
144 condition (whatever code detects the condition), the other side is the
145 "consumer" (the code that waits for that condition).
146
147 In our example in the previous section, the producer is the event callback
148 and there is no consumer yet - let's change that now:
149
150 use AnyEvent;
151
152 $| = 1; print "enter your name> ";
153
154 my $name;
155
156 my $name_ready = AnyEvent->condvar;
157
158 my $wait_for_input = AnyEvent->io (
159 fh => \*STDIN,
160 poll => "r",
161 cb => sub {
162 $name = <STDIN>;
163 $name_ready->send;
164 }
165 );
166
167 # do something else here
168
169 # now wait until the name is available:
170 $name_ready->recv;
171
172 undef $wait_for_input; # watche rno longer needed
173
174 print "your name is $name\n";
175
176 This program creates an AnyEvent condvar by calling the C<<
177 AnyEvent->condvar >> method. It then creates a watcher as usual, but
178 inside the callback it C<send>'s the C<$name_ready> condition variable,
179 which causes anybody waiting on it to continue.
180
181 The "anybody" in this case is the code that follows, which calls C<<
182 $name_ready->recv >>: The producer calls C<send>, the consumer calls
183 C<recv>.
184
185 If there is no C<$name> available yet, then the call to C<<
186 $name_ready->recv >> will halt your program until the condition becomes
187 true.
188
189 As the names C<send> and C<recv> imply, you can actually send and receive
190 data using this, for example, the above code could also be written like
191 this, without an extra variable to store the name in:
192
193 use AnyEvent;
194
195 $| = 1; print "enter your name> ";
196
197 my $name_ready = AnyEvent->condvar;
198
199 my $wait_for_input = AnyEvent->io (
200 fh => \*STDIN, poll => "r",
201 cb => sub { $name_ready->send (scalar = <STDIN>) }
202 );
203
204 # do something else here
205
206 # now wait and fetch the name
207 my $name = $name_ready->recv;
208
209 undef $wait_for_input; # watche rno longer needed
210
211 print "your name is $name\n";
212
213 You can pass any number of arguments to C<send>, and everybody call to
214 C<recv> will return them.
215
216 =head2 The "main loop"
217
218 Most event-based frameworks have something called a "main loop" or "event
219 loop run function" or something similar.
220
221 Just like in C<recv> AnyEvent, these functions need to be called
222 eventually so that your event loop has a chance of actually looking for
223 those events you are interested in.
224
225 For example, in a L<Gtk2> program, the above example could also be written
226 like this:
227
228 use Gtk2 -init;
229 use AnyEvent;
230
231 ############################################
232 # create a window and some label
233
234 my $window = new Gtk2::Window "toplevel";
235 $window->add (my $label = new Gtk2::Label "soon replaced by name");
236
237 $window->show_all;
238
239 ############################################
240 # do our AnyEvent stuff
241
242 $| = 1; print "enter your name> ";
243
244 my $name_ready = AnyEvent->condvar;
245
246 my $wait_for_input = AnyEvent->io (
247 fh => \*STDIN, poll => "r",
248 cb => sub {
249 # set the label
250 $label->set_text (scalar <STDIN>);
251 print "enter another name> ";
252 }
253 );
254
255 ############################################
256 # Now enter Gtk2's event loop
257
258 main Gtk2;
259
260 No condition variable anywhere in sight - instead, we just read a line
261 from STDIN and replace the text in the label. In fact, since nobody
262 C<undef>'s C<$wait_for_input> you can enter multiple lines.
263
264 Instead of waiting for a condition variable, the program enters the Gtk2
265 main loop by calling C<< Gtk2->main >>, which will block the program and
266 wait for events to arrive.
267
268 This also shows that AnyEvent is quite flexible - you didn't have anything
269 to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just
270 worked.
271
272 Admittedly, the example is a bit silly - who would want to read names
273 form standard input in a Gtk+ application. But imagine that instead of
274 doing that, you would make a HTTP request in the background and display
275 it's results. In fact, with event-based programming you can make many
276 http-requests in parallel in your program and still provide feedback to
277 the user and stay interactive.
278
279 In the next part you will see how to do just that - by implementing an
280 HTTP request, on our own, with the utility modules AnyEvent comes with.
281
282 Before that, however, let's briefly look at how you would write your
283 program with using only AnyEvent, without ever calling some other event
284 loop's run function.
285
286 In the example using condition variables, we used that, and in fact, this
287 is the solution:
288
289 my $quit_program = AnyEvent->condvar;
290
291 # create AnyEvent watchers (or not) here
292
293 $quit_program->recv;
294
295 If any of your watcher callbacks decide to quit, they can simply call
296 C<< $quit_program->send >>. Of course, they could also decide not to and
297 simply call C<exit> instead, or they could decide not to quit, ever (e.g.
298 in a long-running daemon program).
299
300 In that case, you can simply use:
301
302 AnyEvent->condvar->recv;
303
304 And this is, in fact, closest to the idea of a main loop run function that
305 AnyEvent offers.
306
307 =head2 Timers and other event sources
308
309 So far, we have only used I/O watchers. These are useful mainly to find
310 out whether a Socket has data to read, or space to write more data. On sane
311 operating systems this also works for console windows/terminals (typically
312 on standard input), serial lines, all sorts of other devices, basically
313 almost everything that has a file descriptor but isn't a file itself. (As
314 usual, "sane" excludes windows - on that platform you would need different
315 functions for all of these, complicating code immensely - think "socket
316 only" on windows).
317
318 However, I/O is not everything - the second most important event source is
319 the clock. For example when doing an HTTP request you might want to time
320 out when the server doesn't answer within some predefined amount of time.
321
322 In AnyEvent, timer event watchers are created by calling the C<<
323 AnyEvent->timer >> method:
324
325 use AnyEvent;
326
327 my $cv = AnyEvent->condvar;
328
329 my $wait_one_and_a_half_seconds = AnyEvent->timer (
330 after => 1.5, # after how many seconds to invoke the cb?
331 cb => sub { # the callback to invoke
332 $cv->send;
333 },
334 );
335
336 # can do something else here
337
338 # now wait till our time has come
339 $cv->recv;
340
341 Unlike I/O watchers, timers are only interested in the amount of seconds
342 they have to wait. When that amount of time has passed, AnyEvent will
343 invoke your callback.
344
345 Unlike I/O watchers, which will call your callback as many times as there
346 is data available, timers are one-shot: after they have "fired" once and
347 invoked your callback, they are dead and no longer do anything.
348
349 To get a repeating timer, such as a timer firing roughly once per second,
350 you have to recreate it:
351
352 use AnyEvent;
353
354 my $time_watcher;
355
356 sub once_per_second {
357 print "tick\n";
358
359 # (re-)create the watcher
360 $time_watcher = AnyEvent->timer (
361 after => 1,
362 cb => \&once_per_second,
363 );
364 }
365
366 # now start the timer
367 once_per_second;
368
369 Having to recreate your timer is a restriction put on AnyEvent that is
370 present in most event libraries it uses. It is so annoying that some
371 future version might work around this limitation, but right now, it's the
372 only way to do repeating timers.
373
374 Fortunately most timers aren't really repeating but specify timeouts of
375 some sort.
376
377 =head3 More esoteric sources
378
379 AnyEvent also has some other, more esoteric event sources you can tap
380 into: signal and child watchers.
381
382 Signal watchers can be used to wait for "signal events", which simply
383 means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>).
384
385 Process watchers wait for a child process to exit. They are useful when
386 you fork a separate process and need to know when it exits, but you do not
387 wait for that by blocking.
388
389 Both watcher types are described in detail in the main L<AnyEvent> manual
390 page.
391
392
393 =head1 Network programming and AnyEvent
394
395 So far you have seen how to register event watchers and handle events.
396
397 This is a great foundation to write network clients and servers, and might be
398 all that your module (or program) ever requires, but writing your own I/O
399 buffering again and again becomes tedious, not to mention that it attracts
400 errors.
401
402 While the core L<AnyEvent> module is still small and self-contained,
403 the distribution comes with some very useful utility modules such as
404 L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can
405 make your life as non-blocking network programmer a lot easier.
406
407 Here is a quick overview over these three modules:
408
409 =head2 L<AnyEvent::DNS>
410
411 This module allows fully asynchronous DNS resolution. It is used mainly by
412 L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is
413 a great way to do other DNS resolution tasks, such as reverse lookups of
414 IP addresses for log files.
415
416 =head2 L<AnyEvent::Handle>
417
418 This module handles non-blocking IO on file handles in an event based
419 manner. It provides a wrapper object around your file handle that provides
420 queueing and buffering of incoming and outgoing data for you.
421
422 It also implements the most common data formats, such as text lines, or
423 fixed and variable-width data blocks.
424
425 =head2 L<AnyEvent::Socket>
426
427 This module provides you with functions that handle socket creation
428 and IP address magic. The two main functions are C<tcp_connect> and
429 C<tcp_server>. The former will connect a (streaming) socket to an internet
430 host for you and the later will make a server socket for you, to accept
431 connections.
432
433 This module also comes with transparent IPv6 support, this means: If you
434 write your programs with this module, you will be IPv6 ready without doing
435 anything special.
436
437 It also works around a lot of portability quirks (especially on the
438 windows platform), which makes it even easier to write your programs in a
439 portable way (did you know that windows uses different error codes for all
440 socket functions and that Perl does not know about these? That "Unknown
441 error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was
442 successful? That unsuccessful TCP connects might never be reported back
443 to your program? That C<WSAEINPROGRESS> means your C<connect> call was
444 ignored instead of being in progress? AnyEvent::Socket works around all of
445 these Windows/Perl bugs for you).
446
447 =head2 Implementing a parallel finger client with non-blocking connects
448
449 The finger protocol is one of the simplest protocols in use on the
450 internet. Or in use in the past, as almost nobody uses it anymore.
451
452 It works by connecting to the finger port on another host, writing a
453 single line with a user name and then reading the finger response, as
454 specified by that user. OK, RFC 1288 specifies a vastly more complex
455 protocol, but it basically boils down to this:
456
457 # telnet idsoftware.com finger
458 Trying 192.246.40.37...
459 Connected to idsoftware.com (192.246.40.37).
460 Escape character is '^]'.
461 johnc
462 Welcome to id Software's Finger Service V1.5!
463
464 [...]
465 Now on the web:
466 [...]
467
468 Connection closed by foreign host.
469
470 "Now on the web..." yeah, I<was> used indeed, but at least the finger
471 daemon still works, so let's write a little AnyEvent function that makes a
472 finger request:
473
474 use AnyEvent;
475 use AnyEvent::Socket;
476
477 sub finger($$) {
478 my ($user, $host) = @_;
479
480 # use a condvar to return results
481 my $cv = AnyEvent->condvar;
482
483 # first, connect to the host
484 tcp_connect $host, "finger", sub {
485 # the callback receives the socket handle - or nothing
486 my ($fh) = @_
487 or return $cv->send;
488
489 # now write the username
490 syswrite $fh, "$user\015\012";
491
492 my $response;
493
494 # register a read watcher
495 my $read_watcher; $read_watcher = AnyEvent->io (
496 fh => $fh,
497 poll => "r",
498 cb => sub {
499 my $len = sysread $fh, $response, 1024, length $response;
500
501 if ($len <= 0) {
502 # we are done, or an error occured, lets ignore the latter
503 undef $read_watcher; # no longer interested
504 $cv->send ($response); # send results
505 }
506 },
507 );
508 };
509
510 # pass $cv to the caller
511 $cv
512 }
513
514 That's a mouthful! Let's dissect this function a bit, first the overall
515 function and execution flow:
516
517 sub finger($$) {
518 my ($user, $host) = @_;
519
520 # use a condvar to return results
521 my $cv = AnyEvent->condvar;
522
523 # first, connect to the host
524 tcp_connect $host, "finger", sub {
525 ...
526 };
527
528 $cv
529 }
530
531 This isn't too complicated, just a function with two parameters, that
532 creates a condition variable, returns it, and while it does that,
533 initiates a TCP connect to C<$host>. The condition variable will be used
534 by the caller to receive the finger response, but one could equally well
535 pass a third argument, a callback, to the function.
536
537 Since we are programming event'ish, we do not wait for the connect to
538 finish - it could block the program for a minute or longer!
539
540 Instead, we pass the callback it should invoke when the connect is done to
541 C<tcp_connect>. If it is successful, that callback gets called with the
542 socket handle as first argument, otherwise, nothing will be passed to our
543 callback. The important point is that it will always be called as soon as
544 the outcome of the TCP connect is known.
545
546 This style of programming is also called "continuation style": the
547 "continuation" is simply the way the program continues - normally, a
548 program continues at the next line after some statement (the exception
549 is loops or things like C<return>). When we are interested in events,
550 however, we instead specify the "continuation" of our program by passing a
551 closure, which makes that closure the "continuation" of the program. The
552 C<tcp_connect> call is like saying "return now, and when the connection is
553 established or it failed, continue there".
554
555 Now let's look at the callback/closure in more detail:
556
557 # the callback receives the socket handle - or nothing
558 my ($fh) = @_
559 or return $cv->send;
560
561 The first thing the callback does is indeed save the socket handle in
562 C<$fh>. When there was an error (no arguments), then our instinct as
563 expert Perl programmers would tell us to C<die>:
564
565 my ($fh) = @_
566 or die "$host: $!";
567
568 While this would give good feedback to the user (if he happens to watch
569 standard error), our program would probably stop working here, as we never
570 report the results to anybody, certainly not the caller of our C<finger>
571 function, and most event loops continue even after a C<die>!
572
573 This is why we instead C<return>, but also call C<< $cv->send >> without
574 any arguments to signal to the condvar consumer that something bad has
575 happened. The return value of C<< $cv->send >> is irrelevant, as is the
576 return value of our callback. The return statement is simply used for the
577 side effect of, well, returning immediately from the callback. Checking
578 for errors and handling them this way is very common, which is why this
579 compact idiom is so handy.
580
581 As the next step in the finger protocol, we send the username to the
582 finger daemon on the other side of our connection:
583
584 syswrite $fh, "$user\015\012";
585
586 Note that this isn't 100% clean socket programming - the socket could,
587 for whatever reasons, not accept our data. When writing a small amount
588 of data like in this example it doesn't matter, as a socket buffer is
589 almost always big enough for a mere "username", but for real-world
590 cases you might need to implement some kind of write buffering - or use
591 L<AnyEvent::Handle>, which handles these matters for you, as shown in the
592 next section.
593
594 What we I<do> have to do is to implement our own read buffer - the response
595 data could arrive late or in multiple chunks, and we cannot just wait for
596 it (event-based programming, you know?).
597
598 To do that, we register a read watcher on the socket which waits for data:
599
600 my $read_watcher; $read_watcher = AnyEvent->io (
601 fh => $fh,
602 poll => "r",
603
604 There is a trick here, however: the read watcher isn't stored in a global
605 variable, but in a local one - if the callback returns, it would normally
606 destroy the variable and its contents, which would in turn unregister our
607 watcher.
608
609 To avoid that, we C<undef>ine the variable in the watcher callback. This
610 means that, when the C<tcp_connect> callback returns, that perl thinks
611 (quite correctly) that the read watcher is still in use - namely in the
612 callback.
613
614 The trick, however, is that instead of:
615
616 my $read_watcher = AnyEvent->io (...
617
618 The program does:
619
620 my $read_watcher; $read_watcher = AnyEvent->io (...
621
622 The reason for this is a quirk in the way Perl works: variable names
623 declared with C<my> are only visible in the I<next> statement. If the
624 whole C<< AnyEvent->io >> call, including the callback, would be done in
625 a single statement, the callback could not refer to the C<$read_watcher>
626 variable to undefine it, so it is done in two statements.
627
628 Whether you'd want to format it like this is of course a matter of style,
629 this way emphasizes that the declaration and assignment really are one
630 logical statement.
631
632 The callback itself calls C<sysread> for as many times as necessary, until
633 C<sysread> returns either an error or end-of-file:
634
635 cb => sub {
636 my $len = sysread $fh, $response, 1024, length $response;
637
638 if ($len <= 0) {
639
640 Note that C<sysread> has the ability to append data it reads to a scalar,
641 by specifying an offset, which is what we make good use of in this
642 example.
643
644 When C<sysread> indicates we are done, the callback C<undef>ines
645 the watcher and then C<send>'s the response data to the condition
646 variable. All this has the following effects:
647
648 Undefining the watcher destroys it, as our callback was the only one still
649 having a reference to it. When the watcher gets destroyed, it destroys the
650 callback, which in turn means the C<$fh> handle is no longer used, so that
651 gets destroyed as well. The result is that all resources will be nicely
652 cleaned up by perl for us.
653
654 =head3 Using the finger client
655
656 Now, we could probably write the same finger client in a simpler way if
657 we used C<IO::Socket::INET>, ignored the problem of multiple hosts and
658 ignored IPv6 and a few other things that C<tcp_connect> handles for us.
659
660 But the main advantage is that we can not only run this finger function in
661 the background, we even can run multiple sessions in parallel, like this:
662
663 my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets
664 my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736
665 my $f3 = finger "johnc", "idsoftware.com"; # finger john
666
667 print "trouble tickets:\n", $f1->recv, "\n";
668 print "trouble ticket #1736:\n", $f2->recv, "\n";
669 print "john carmacks finger file: ", $f3->recv, "\n";
670
671 It doesn't look like it, but in fact all three requests run in
672 parallel. The code waits for the first finger request to finish first, but
673 that doesn't keep it from executing them parallel: when the first C<recv>
674 call sees that the data isn't ready yet, it serves events for all three
675 requests automatically, until the first request has finished.
676
677 The second C<recv> call might either find the data is already there, or it
678 will continue handling events until that is the case, and so on.
679
680 By taking advantage of network latencies, which allows us to serve other
681 requests and events while we wait for an event on one socket, the overall
682 time to do these three requests will be greatly reduced, typically all
683 three are done in the same time as the slowest of them would need to finish.
684
685 By the way, you do not actually have to wait in the C<recv> method on an
686 AnyEvent condition variable - after all, waiting is evil - you can also
687 register a callback:
688
689 $cv->cb (sub {
690 my $response = shift->recv;
691 # ...
692 });
693
694 The callback will only be invoked when C<send> was called. In fact,
695 instead of returning a condition variable you could also pass a third
696 parameter to your finger function, the callback to invoke with the
697 response:
698
699 sub finger($$$) {
700 my ($user, $host, $cb) = @_;
701
702 How you implement it is a matter of taste - if you expect your function to
703 be used mainly in an event-based program you would normally prefer to pass
704 a callback directly. If you write a module and expect your users to use
705 it "synchronously" often (for example, a simple http-get script would not
706 really care much for events), then you would use a condition variable and
707 tell them "simply ->recv the data".
708
709 =head3 Problems with the implementation and how to fix them
710
711 To make this example more real-world-ready, we would not only implement
712 some write buffering (for the paranoid), but we would also have to handle
713 timeouts and maybe protocol errors.
714
715 Doing this quickly gets unwieldy, which is why we introduce
716 L<AnyEvent::Handle> in the next section, which takes care of all these
717 details for you and let's you concentrate on the actual protocol.
718
719
720 =head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle
721
722 The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's
723 see what it really offers.
724
725 As finger is such a simple protocol, let's try something slightly more
726 complicated: HTTP/1.0.
727
728 An HTTP GET request works by sending a single request line that indicates
729 what you want the server to do and the URI you want to act it on, followed
730 by as many "header" lines (C<Header: data>, same as e-mail headers) as
731 required for the request, ended by an empty line.
732
733 The response is formatted very similarly, first a line with the response
734 status, then again as many header lines as required, then an empty line,
735 followed by any data that the server might send.
736
737 Again, let's try it out with C<telnet> (I condensed the output a bit - if
738 you want to see the full response, do it yourself).
739
740 # telnet www.google.com 80
741 Trying 209.85.135.99...
742 Connected to www.google.com (209.85.135.99).
743 Escape character is '^]'.
744 GET /test HTTP/1.0
745
746 HTTP/1.0 404 Not Found
747 Date: Mon, 02 Jun 2008 07:05:54 GMT
748 Content-Type: text/html; charset=UTF-8
749
750 <html><head>
751 [...]
752 Connection closed by foreign host.
753
754 The C<GET ...> and the empty line were entered manually, the rest of the
755 telnet output is google's response, in which case a C<404 not found> one.
756
757 So, here is how you would do it with C<AnyEvent::Handle>:
758
759 sub http_get {
760 my ($host, $uri, $cb) = @_;
761
762 AnyEvent::Socket::tcp_connect $host, "http", sub {
763 my ($fh) = @_
764 or $cb->("HTTP/1.0 500 $!");
765
766 # store results here
767 my ($response, $header, $body);
768
769 my $handle; $handle = new AnyEvent::Handle
770 fh => $fh,
771 on_error => sub {
772 undef $handle;
773 $cb->("HTTP/1.0 500 $!");
774 },
775 on_eof => sub {
776 undef $handle; # keep it alive till eof
777 $cb->($response, $header, $body);
778 };
779
780 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
781
782 # now fetch response status line
783 $handle->push_read (line => sub {
784 my ($handle, $line) = @_;
785 $response = $line;
786 });
787
788 # then the headers
789 $handle->push_read (line => "\015\012\015\012", sub {
790 my ($handle, $line) = @_;
791 $header = $line;
792 });
793
794 # and finally handle any remaining data as body
795 $handle->on_read (sub {
796 $body .= $_[0]->rbuf;
797 $_[0]->rbuf = "";
798 });
799 };
800 }
801
802 And now let's go through it step by step. First, as usual, the overall
803 C<http_get> function structure:
804
805 sub http_get {
806 my ($host, $uri, $cb) = @_;
807
808 tcp_connect $host, "http", sub {
809 ...
810 };
811 }
812
813 Unlike in the finger example, this time the caller has to pass a callback
814 to C<http_get>. Also, instead of passing a URL as one would expect, the
815 caller has to provide the hostname and URI - normally you would use the
816 C<URI> module to parse a URL and separate it into those parts, but that is
817 left to the inspired reader :)
818
819 Since everything else is left to the caller, all C<http_get> does it to
820 initiate the connection with C<tcp_connect> and leave everything else to
821 it's callback.
822
823 The first thing the callback does is check for connection errors and
824 declare some variables:
825
826 my ($fh) = @_
827 or $cb->("HTTP/1.0 500 $!");
828
829 my ($response, $header, $body);
830
831 Instead of having an extra mechanism to signal errors, connection errors
832 are signalled by crafting a special "response status line", like this:
833
834 HTTP/1.0 500 Connection refused
835
836 This means the caller cannot distinguish (easily) between
837 locally-generated errors and server errors, but it simplifies error
838 handling for the caller a lot.
839
840 The next step finally involves L<AnyEvent::Handle>, namely it creates the
841 handle object:
842
843 my $handle; $handle = new AnyEvent::Handle
844 fh => $fh,
845 on_error => sub {
846 undef $handle;
847 $cb->("HTTP/1.0 500 $!");
848 },
849 on_eof => sub {
850 undef $handle; # keep it alive till eof
851 $cb->($response, $header, $body);
852 };
853
854 The constructor expects a file handle, which gets passed via the C<fh>
855 argument.
856
857 The remaining two argument pairs specify two callbacks to be called on
858 any errors (C<on_error>) and in the case of a normal connection close
859 (C<on_eof>).
860
861 In the first case, we C<undef>ine the handle object and pass the error to
862 the callback provided by the callback - done.
863
864 In the second case we assume everything went fine and pass the results
865 gobbled up so far to the caller-provided callback. This is not quite
866 perfect, as when the server "cleanly" closes the connection in the middle
867 of sending headers we might wrongly report this as an "OK" to the caller,
868 but then, HTTP doesn't support a perfect mechanism that would detect such
869 problems in all cases, so we don't bother either.
870
871 =head3 The write queue
872
873 The next line sends the actual request:
874
875 $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012");
876
877 No headers will be sent (this is fine for simple requests), so the whole
878 request is just a single line followed by an empty line to signal the end
879 of the headers to the server.
880
881 The more interesting question is why the method is called C<push_write>
882 and not just write. The reason is that you can I<always> add some write
883 data without blocking, and to do this, AnyEvent::Handle needs some write
884 queue internally - and C<push_write> simply pushes some data at the end of
885 that queue, just like Perl's C<push> pushes data at the end of an array.
886
887 The deeper reason is that at some point in the future, there might
888 be C<unshift_write> as well, and in any case, we will shortly meet
889 C<push_read> and C<unshift_read>, and it's usually easiest if all those
890 functions have some symmetry in their name.
891
892 If C<push_write> is called with more than one argument, then you can even
893 do I<formatted> I/O, which simply means your data will be transformed in
894 some ways. For example, this would JSON-encode your data before pushing it
895 to the write queue:
896
897 $handle->push_write (json => [1, 2, 3]);
898
899 Apart from that, this pretty much summarises the write queue, there is
900 little else to it.
901
902 Reading the response if far more interesting:
903
904 =head3 The read queue
905
906 The response consists of three parts: a single line of response status, a
907 single paragraph of headers ended by an empty line, and the request body,
908 which is simply the remaining data on that connection.
909
910 For the first two, we push two read requests onto the read queue:
911
912 # now fetch response status line
913 $handle->push_read (line => sub {
914 my ($handle, $line) = @_;
915 $response = $line;
916 });
917
918 # then the headers
919 $handle->push_read (line => "\015\012\015\012", sub {
920 my ($handle, $line) = @_;
921 $header = $line;
922 });
923
924 While one can simply push a single callback to the queue, I<formatted> I/O
925 really comes to out advantage here, as there is a ready-made "read line"
926 read type. The first read expects a single line, ended by C<\015\012> (the
927 standard end-of-line marker in internet protocols).
928
929 The second "line" is actually a single paragraph - instead of reading it
930 line by line we tell C<push_read> that the end-of-line marker is really
931 C<\015\012\015\012>, which is an empty line. The result is that the whole
932 header paragraph will be treated as a single line and read. The word
933 "line" is interpreted very freely, much like Perl itself does it.
934
935 Note that push read requests are pushed immediately after creating the
936 handle object - since AnyEvent::Handle provides a queue we can push as
937 many requests as we want, and AnyEvent::Handle will handle them in order.
938
939 There is, however, no read type for "the remaining data". For that, we
940 install our own C<on_read> callback:
941
942 # and finally handle any remaining data as body
943 $handle->on_read (sub {
944 $body .= $_[0]->rbuf;
945 $_[0]->rbuf = "";
946 });
947
948 This callback is invoked every time data arrives and the read queue is
949 empty - which in this example will only be the case when both response and
950 header have been read. The C<on_read> callback could actually have been
951 specified when constructing the object, but doing it this way preserves
952 logical ordering.
953
954 The read callback simply adds the current read buffer to it's C<$body>
955 variable and, most importantly, I<empties> it by assign the empty string
956 to it.
957
958 After AnyEvent::Handle has been so instructed, it will now handle incoming
959 data according to these instructions - if all goes well, the callback will
960 be invoked with the response data, if not, it will get an error.
961
962 =head3 Using it
963
964 And here is how you would use it:
965
966 http_get "www.google.com", "/", sub {
967 my ($response, $header, $body) = @_;
968
969 print
970 $response, "\n",
971 $body;
972 };
973
974 And of course, you can run as many of these requests in parallel as you
975 want (and your memory supports).
976
977 =head3 The read queue - revisited
978
979 ###TODO
980
981
982 =head1 AUTHORS
983
984 Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>.
985