1 |
=head1 Introduction to AnyEvent |
2 |
|
3 |
This is a tutorial that will introduce you to the features of AnyEvent. |
4 |
|
5 |
The first part introduces the core AnyEvent module (after swamping you a |
6 |
bit in evangelism), which might already provide all you ever need. If you |
7 |
are only interested in AnyEvent's event handling capabilities, read no |
8 |
further. |
9 |
|
10 |
The second part focuses on network programming using sockets, for which |
11 |
AnyEvent offers a lot of support you can use, and a lot of workarounds |
12 |
around portability quirks. |
13 |
|
14 |
|
15 |
=head1 What is AnyEvent? |
16 |
|
17 |
If you don't care for the whys and want to see code, skip this section! |
18 |
|
19 |
AnyEvent is first of all just a framework to do event-based |
20 |
programming. Typically such frameworks are an all-or-nothing thing: If you |
21 |
use one such framework, you can't (easily, or even at all) use another in |
22 |
the same program. |
23 |
|
24 |
AnyEvent is different - it is a thin abstraction layer above all kinds |
25 |
of event loops. Its main purpose is to move the choice of the underlying |
26 |
framework (the event loop) from the module author to the program author |
27 |
using the module. |
28 |
|
29 |
That means you can write code that uses events to control what it |
30 |
does, without forcing other code in the same program to use the same |
31 |
underlying framework as you do - i.e. you can create a Perl module |
32 |
that is event-based using AnyEvent, and users of that module can still |
33 |
choose between using L<Gtk2>, L<Tk>, L<Event> or no event loop at |
34 |
all: AnyEvent comes with its own event loop implementation, so your |
35 |
code works regardless of other modules that might or might not be |
36 |
installed. The latter is important, as AnyEvent does not have any |
37 |
dependencies to other modules, which makes it easy to install, for |
38 |
example, when you lack a C compiler. |
39 |
|
40 |
A typical problem with Perl modules such as L<Net::IRC> is that they |
41 |
come with their own event loop: In L<Net::IRC>, the program who uses it |
42 |
needs to start the event loop of L<Net::IRC>. That means that one cannot |
43 |
integrate this module into a L<Gtk2> GUI for instance, as that module, |
44 |
too, enforces the use of its own event loop (namely L<Glib>). |
45 |
|
46 |
Another example is L<LWP>: it provides no event interface at all. It's a |
47 |
pure blocking HTTP (and FTP etc.) client library, which usually means that |
48 |
you either have to start a thread or have to fork for a HTTP request, or |
49 |
use L<Coro::LWP>, if you want to do something else while waiting for the |
50 |
request to finish. |
51 |
|
52 |
The motivation behind these designs is often that a module doesn't want to |
53 |
depend on some complicated XS-module (Net::IRC), or that it doesn't want |
54 |
to force the user to use some specific event loop at all (LWP). |
55 |
|
56 |
L<AnyEvent> solves this dilemma, by B<not> forcing module authors to either |
57 |
|
58 |
=over 4 |
59 |
|
60 |
=item - write their own event loop (because guarantees to offer one |
61 |
everywhere - even on windows). |
62 |
|
63 |
=item - choose one fixed event loop (because AnyEvent works with all |
64 |
important event loops available for Perl, and adding others is trivial). |
65 |
|
66 |
=back |
67 |
|
68 |
If the module author uses L<AnyEvent> for all his event needs (IO events, |
69 |
timers, signals, ...) then all other modules can just use his module and |
70 |
don't have to choose an event loop or adapt to his event loop. The choice |
71 |
of the event loop is ultimately made by the program author who uses all |
72 |
the modules and writes the main program. And even there he doesn't have to |
73 |
choose, he can just let L<AnyEvent> choose the best available event loop |
74 |
for him. |
75 |
|
76 |
Read more about this in the main documentation of the L<AnyEvent> module. |
77 |
|
78 |
|
79 |
=head1 Introduction to Event-Based Programming |
80 |
|
81 |
So what exactly is programming using events? It quite simply means that |
82 |
instead of your code actively waiting for something, such as the user |
83 |
entering something on STDIN: |
84 |
|
85 |
$| = 1; print "enter your name> "; |
86 |
|
87 |
my $name = <STDIN>; |
88 |
|
89 |
You instead tell your event framework to notify you in the event of some |
90 |
data being available on STDIN, by using a callback mechanism: |
91 |
|
92 |
use AnyEvent; |
93 |
|
94 |
$| = 1; print "enter your name> "; |
95 |
|
96 |
my $name; |
97 |
|
98 |
my $wait_for_input = AnyEvent->io ( |
99 |
fh => \*STDIN, # which file handle to check |
100 |
poll => "r", # which event to wait for ("r"ead data) |
101 |
cb => sub { # what callback to execute |
102 |
$name = <STDIN>; # read it |
103 |
} |
104 |
); |
105 |
|
106 |
# do something else here |
107 |
|
108 |
Looks more complicated, and surely is, but the advantage of using events |
109 |
is that your program can do something else instead of waiting for |
110 |
input. Waiting as in the first example is also called "blocking" because |
111 |
you "block" your process from executing anything else while you do so. |
112 |
|
113 |
The second example avoids blocking, by only registering interest in a read |
114 |
event, which is fast and doesn't block your process. Only when read data |
115 |
is available will the callback be called, which can then proceed to read |
116 |
the data. |
117 |
|
118 |
The "interest" is represented by an object returned by C<< AnyEvent->io |
119 |
>> called a "watcher" object - called like that because it "watches" your |
120 |
file handle (or other event sources) for the event you are interested in. |
121 |
|
122 |
In the example above, we create an I/O watcher by calling the C<< |
123 |
AnyEvent->io >> method. Disinterest in some event is simply expressed by |
124 |
forgetting about the watcher, for example, by C<undef>'ing the variable it |
125 |
is stored in. AnyEvent will automatically clean up the watcher if it is no |
126 |
longer used, much like Perl closes your file handles if you no longer use |
127 |
them anywhere. |
128 |
|
129 |
=head2 Condition Variables |
130 |
|
131 |
However, the above is not a fully working program, and will not work |
132 |
as-is. The reason is that your callback will not be invoked out of the |
133 |
blue, you have to run the event loop. Also, event-based programs sometimes |
134 |
have to block, too, as when there simply is nothing else to do and |
135 |
everything waits for some events, it needs to block the process as well. |
136 |
|
137 |
In AnyEvent, this is done using condition variables. Condition variables |
138 |
are named "condition variables" because they represent a condition that is |
139 |
initially false and needs to be fulfilled. |
140 |
|
141 |
You can also call them "merge points", "sync points", "rendezvous ports" |
142 |
or even callbacks and many other things (and they are often called like |
143 |
this in other frameworks). The important point is that you can create them |
144 |
freely and later wait for them to become true. |
145 |
|
146 |
Condition variables have two sides - one side is the "producer" of the |
147 |
condition (whatever code detects the condition), the other side is the |
148 |
"consumer" (the code that waits for that condition). |
149 |
|
150 |
In our example in the previous section, the producer is the event callback |
151 |
and there is no consumer yet - let's change that now: |
152 |
|
153 |
use AnyEvent; |
154 |
|
155 |
$| = 1; print "enter your name> "; |
156 |
|
157 |
my $name; |
158 |
|
159 |
my $name_ready = AnyEvent->condvar; |
160 |
|
161 |
my $wait_for_input = AnyEvent->io ( |
162 |
fh => \*STDIN, |
163 |
poll => "r", |
164 |
cb => sub { |
165 |
$name = <STDIN>; |
166 |
$name_ready->send; |
167 |
} |
168 |
); |
169 |
|
170 |
# do something else here |
171 |
|
172 |
# now wait until the name is available: |
173 |
$name_ready->recv; |
174 |
|
175 |
undef $wait_for_input; # watche rno longer needed |
176 |
|
177 |
print "your name is $name\n"; |
178 |
|
179 |
This program creates an AnyEvent condvar by calling the C<< |
180 |
AnyEvent->condvar >> method. It then creates a watcher as usual, but |
181 |
inside the callback it C<send>'s the C<$name_ready> condition variable, |
182 |
which causes anybody waiting on it to continue. |
183 |
|
184 |
The "anybody" in this case is the code that follows, which calls C<< |
185 |
$name_ready->recv >>: The producer calls C<send>, the consumer calls |
186 |
C<recv>. |
187 |
|
188 |
If there is no C<$name> available yet, then the call to C<< |
189 |
$name_ready->recv >> will halt your program until the condition becomes |
190 |
true. |
191 |
|
192 |
As the names C<send> and C<recv> imply, you can actually send and receive |
193 |
data using this, for example, the above code could also be written like |
194 |
this, without an extra variable to store the name in: |
195 |
|
196 |
use AnyEvent; |
197 |
|
198 |
$| = 1; print "enter your name> "; |
199 |
|
200 |
my $name_ready = AnyEvent->condvar; |
201 |
|
202 |
my $wait_for_input = AnyEvent->io ( |
203 |
fh => \*STDIN, poll => "r", |
204 |
cb => sub { $name_ready->send (scalar = <STDIN>) } |
205 |
); |
206 |
|
207 |
# do something else here |
208 |
|
209 |
# now wait and fetch the name |
210 |
my $name = $name_ready->recv; |
211 |
|
212 |
undef $wait_for_input; # watche rno longer needed |
213 |
|
214 |
print "your name is $name\n"; |
215 |
|
216 |
You can pass any number of arguments to C<send>, and everybody call to |
217 |
C<recv> will return them. |
218 |
|
219 |
=head2 The "main loop" |
220 |
|
221 |
Most event-based frameworks have something called a "main loop" or "event |
222 |
loop run function" or something similar. |
223 |
|
224 |
Just like in C<recv> AnyEvent, these functions need to be called |
225 |
eventually so that your event loop has a chance of actually looking for |
226 |
those events you are interested in. |
227 |
|
228 |
For example, in a L<Gtk2> program, the above example could also be written |
229 |
like this: |
230 |
|
231 |
use Gtk2 -init; |
232 |
use AnyEvent; |
233 |
|
234 |
############################################ |
235 |
# create a window and some label |
236 |
|
237 |
my $window = new Gtk2::Window "toplevel"; |
238 |
$window->add (my $label = new Gtk2::Label "soon replaced by name"); |
239 |
|
240 |
$window->show_all; |
241 |
|
242 |
############################################ |
243 |
# do our AnyEvent stuff |
244 |
|
245 |
$| = 1; print "enter your name> "; |
246 |
|
247 |
my $name_ready = AnyEvent->condvar; |
248 |
|
249 |
my $wait_for_input = AnyEvent->io ( |
250 |
fh => \*STDIN, poll => "r", |
251 |
cb => sub { |
252 |
# set the label |
253 |
$label->set_text (scalar <STDIN>); |
254 |
print "enter another name> "; |
255 |
} |
256 |
); |
257 |
|
258 |
############################################ |
259 |
# Now enter Gtk2's event loop |
260 |
|
261 |
main Gtk2; |
262 |
|
263 |
No condition variable anywhere in sight - instead, we just read a line |
264 |
from STDIN and replace the text in the label. In fact, since nobody |
265 |
C<undef>'s C<$wait_for_input> you can enter multiple lines. |
266 |
|
267 |
Instead of waiting for a condition variable, the program enters the Gtk2 |
268 |
main loop by calling C<< Gtk2->main >>, which will block the program and |
269 |
wait for events to arrive. |
270 |
|
271 |
This also shows that AnyEvent is quite flexible - you didn't have anything |
272 |
to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just |
273 |
worked. |
274 |
|
275 |
Admittedly, the example is a bit silly - who would want to read names |
276 |
form standard input in a Gtk+ application. But imagine that instead of |
277 |
doing that, you would make a HTTP request in the background and display |
278 |
it's results. In fact, with event-based programming you can make many |
279 |
http-requests in parallel in your program and still provide feedback to |
280 |
the user and stay interactive. |
281 |
|
282 |
In the next part you will see how to do just that - by implementing an |
283 |
HTTP request, on our own, with the utility modules AnyEvent comes with. |
284 |
|
285 |
Before that, however, let's briefly look at how you would write your |
286 |
program with using only AnyEvent, without ever calling some other event |
287 |
loop's run function. |
288 |
|
289 |
In the example using condition variables, we used that, and in fact, this |
290 |
is the solution: |
291 |
|
292 |
my $quit_program = AnyEvent->condvar; |
293 |
|
294 |
# create AnyEvent watchers (or not) here |
295 |
|
296 |
$quit_program->recv; |
297 |
|
298 |
If any of your watcher callbacks decide to quit, they can simply call |
299 |
C<< $quit_program->send >>. Of course, they could also decide not to and |
300 |
simply call C<exit> instead, or they could decide not to quit, ever (e.g. |
301 |
in a long-running daemon program). |
302 |
|
303 |
In that case, you can simply use: |
304 |
|
305 |
AnyEvent->condvar->recv; |
306 |
|
307 |
And this is, in fact, closest to the idea of a main loop run function that |
308 |
AnyEvent offers. |
309 |
|
310 |
=head2 Timers and other event sources |
311 |
|
312 |
So far, we have only used I/O watchers. These are useful mainly to find |
313 |
out whether a Socket has data to read, or space to write more data. On sane |
314 |
operating systems this also works for console windows/terminals (typically |
315 |
on standard input), serial lines, all sorts of other devices, basically |
316 |
almost everything that has a file descriptor but isn't a file itself. (As |
317 |
usual, "sane" excludes windows - on that platform you would need different |
318 |
functions for all of these, complicating code immensely - think "socket |
319 |
only" on windows). |
320 |
|
321 |
However, I/O is not everything - the second most important event source is |
322 |
the clock. For example when doing an HTTP request you might want to time |
323 |
out when the server doesn't answer within some predefined amount of time. |
324 |
|
325 |
In AnyEvent, timer event watchers are created by calling the C<< |
326 |
AnyEvent->timer >> method: |
327 |
|
328 |
use AnyEvent; |
329 |
|
330 |
my $cv = AnyEvent->condvar; |
331 |
|
332 |
my $wait_one_and_a_half_seconds = AnyEvent->timer ( |
333 |
after => 1.5, # after how many seconds to invoke the cb? |
334 |
cb => sub { # the callback to invoke |
335 |
$cv->send; |
336 |
}, |
337 |
); |
338 |
|
339 |
# can do something else here |
340 |
|
341 |
# now wait till our time has come |
342 |
$cv->recv; |
343 |
|
344 |
Unlike I/O watchers, timers are only interested in the amount of seconds |
345 |
they have to wait. When that amount of time has passed, AnyEvent will |
346 |
invoke your callback. |
347 |
|
348 |
Unlike I/O watchers, which will call your callback as many times as there |
349 |
is data available, timers are one-shot: after they have "fired" once and |
350 |
invoked your callback, they are dead and no longer do anything. |
351 |
|
352 |
To get a repeating timer, such as a timer firing roughly once per second, |
353 |
you have to recreate it: |
354 |
|
355 |
use AnyEvent; |
356 |
|
357 |
my $time_watcher; |
358 |
|
359 |
sub once_per_second { |
360 |
print "tick\n"; |
361 |
|
362 |
# (re-)create the watcher |
363 |
$time_watcher = AnyEvent->timer ( |
364 |
after => 1, |
365 |
cb => \&once_per_second, |
366 |
); |
367 |
} |
368 |
|
369 |
# now start the timer |
370 |
once_per_second; |
371 |
|
372 |
Having to recreate your timer is a restriction put on AnyEvent that is |
373 |
present in most event libraries it uses. It is so annoying that some |
374 |
future version might work around this limitation, but right now, it's the |
375 |
only way to do repeating timers. |
376 |
|
377 |
Fortunately most timers aren't really repeating but specify timeouts of |
378 |
some sort. |
379 |
|
380 |
=head3 More esoteric sources |
381 |
|
382 |
AnyEvent also has some other, more esoteric event sources you can tap |
383 |
into: signal and child watchers. |
384 |
|
385 |
Signal watchers can be used to wait for "signal events", which simply |
386 |
means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>). |
387 |
|
388 |
Process watchers wait for a child process to exit. They are useful when |
389 |
you fork a separate process and need to know when it exits, but you do not |
390 |
wait for that by blocking. |
391 |
|
392 |
Both watcher types are described in detail in the main L<AnyEvent> manual |
393 |
page. |
394 |
|
395 |
|
396 |
=head1 Network programming and AnyEvent |
397 |
|
398 |
So far you have seen how to register event watchers and handle events. |
399 |
|
400 |
This is a great foundation to write network clients and servers, and might be |
401 |
all that your module (or program) ever requires, but writing your own I/O |
402 |
buffering again and again becomes tedious, not to mention that it attracts |
403 |
errors. |
404 |
|
405 |
While the core L<AnyEvent> module is still small and self-contained, |
406 |
the distribution comes with some very useful utility modules such as |
407 |
L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can |
408 |
make your life as non-blocking network programmer a lot easier. |
409 |
|
410 |
Here is a quick overview over these three modules: |
411 |
|
412 |
=head2 L<AnyEvent::DNS> |
413 |
|
414 |
This module allows fully asynchronous DNS resolution. It is used mainly by |
415 |
L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is |
416 |
a great way to do other DNS resolution tasks, such as reverse lookups of |
417 |
IP addresses for log files. |
418 |
|
419 |
=head2 L<AnyEvent::Handle> |
420 |
|
421 |
This module handles non-blocking IO on file handles in an event based |
422 |
manner. It provides a wrapper object around your file handle that provides |
423 |
queueing and buffering of incoming and outgoing data for you. |
424 |
|
425 |
It also implements the most common data formats, such as text lines, or |
426 |
fixed and variable-width data blocks. |
427 |
|
428 |
=head2 L<AnyEvent::Socket> |
429 |
|
430 |
This module provides you with functions that handle socket creation |
431 |
and IP address magic. The two main functions are C<tcp_connect> and |
432 |
C<tcp_server>. The former will connect a (streaming) socket to an internet |
433 |
host for you and the later will make a server socket for you, to accept |
434 |
connections. |
435 |
|
436 |
This module also comes with transparent IPv6 support, this means: If you |
437 |
write your programs with this module, you will be IPv6 ready without doing |
438 |
anything special. |
439 |
|
440 |
It also works around a lot of portability quirks (especially on the |
441 |
windows platform), which makes it even easier to write your programs in a |
442 |
portable way (did you know that windows uses different error codes for all |
443 |
socket functions and that Perl does not know about these? That "Unknown |
444 |
error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was |
445 |
successful? That unsuccessful TCP connects might never be reported back |
446 |
to your program? That C<WSAEINPROGRESS> means your C<connect> call was |
447 |
ignored instead of being in progress? AnyEvent::Socket works around all of |
448 |
these Windows/Perl bugs for you). |
449 |
|
450 |
=head2 Implementing a parallel finger client with non-blocking connects |
451 |
and AnyEvent::Socket |
452 |
|
453 |
The finger protocol is one of the simplest protocols in use on the |
454 |
internet. Or in use in the past, as almost nobody uses it anymore. |
455 |
|
456 |
It works by connecting to the finger port on another host, writing a |
457 |
single line with a user name and then reading the finger response, as |
458 |
specified by that user. OK, RFC 1288 specifies a vastly more complex |
459 |
protocol, but it basically boils down to this: |
460 |
|
461 |
# telnet idsoftware.com finger |
462 |
Trying 192.246.40.37... |
463 |
Connected to idsoftware.com (192.246.40.37). |
464 |
Escape character is '^]'. |
465 |
johnc |
466 |
Welcome to id Software's Finger Service V1.5! |
467 |
|
468 |
[...] |
469 |
Now on the web: |
470 |
[...] |
471 |
|
472 |
Connection closed by foreign host. |
473 |
|
474 |
"Now on the web..." yeah, I<was> used indeed, but at least the finger |
475 |
daemon still works, so let's write a little AnyEvent function that makes a |
476 |
finger request: |
477 |
|
478 |
use AnyEvent; |
479 |
use AnyEvent::Socket; |
480 |
|
481 |
sub finger($$) { |
482 |
my ($user, $host) = @_; |
483 |
|
484 |
# use a condvar to return results |
485 |
my $cv = AnyEvent->condvar; |
486 |
|
487 |
# first, connect to the host |
488 |
tcp_connect $host, "finger", sub { |
489 |
# the callback receives the socket handle - or nothing |
490 |
my ($fh) = @_ |
491 |
or return $cv->send; |
492 |
|
493 |
# now write the username |
494 |
syswrite $fh, "$user\015\012"; |
495 |
|
496 |
my $response; |
497 |
|
498 |
# register a read watcher |
499 |
my $read_watcher; $read_watcher = AnyEvent->io ( |
500 |
fh => $fh, |
501 |
poll => "r", |
502 |
cb => sub { |
503 |
my $len = sysread $fh, $response, 1024, length $response; |
504 |
|
505 |
if ($len <= 0) { |
506 |
# we are done, or an error occured, lets ignore the latter |
507 |
undef $read_watcher; # no longer interested |
508 |
$cv->send ($response); # send results |
509 |
} |
510 |
}, |
511 |
); |
512 |
}; |
513 |
|
514 |
# pass $cv to the caller |
515 |
$cv |
516 |
} |
517 |
|
518 |
That's a mouthful! Let's dissect this function a bit, first the overall |
519 |
function and execution flow: |
520 |
|
521 |
sub finger($$) { |
522 |
my ($user, $host) = @_; |
523 |
|
524 |
# use a condvar to return results |
525 |
my $cv = AnyEvent->condvar; |
526 |
|
527 |
# first, connect to the host |
528 |
tcp_connect $host, "finger", sub { |
529 |
... |
530 |
}; |
531 |
|
532 |
$cv |
533 |
} |
534 |
|
535 |
This isn't too complicated, just a function with two parameters, that |
536 |
creates a condition variable, returns it, and while it does that, |
537 |
initiates a TCP connect to C<$host>. The condition variable will be used |
538 |
by the caller to receive the finger response, but one could equally well |
539 |
pass a third argument, a callback, to the function. |
540 |
|
541 |
Since we are programming event'ish, we do not wait for the connect to |
542 |
finish - it could block the program for a minute or longer! |
543 |
|
544 |
Instead, we pass the callback it should invoke when the connect is done to |
545 |
C<tcp_connect>. If it is successful, that callback gets called with the |
546 |
socket handle as first argument, otherwise, nothing will be passed to our |
547 |
callback. The important point is that it will always be called as soon as |
548 |
the outcome of the TCP connect is known. |
549 |
|
550 |
This style of programming is also called "continuation style": the |
551 |
"continuation" is simply the way the program continues - normally, a |
552 |
program continues at the next line after some statement (the exception |
553 |
is loops or things like C<return>). When we are interested in events, |
554 |
however, we instead specify the "continuation" of our program by passing a |
555 |
closure, which makes that closure the "continuation" of the program. The |
556 |
C<tcp_connect> call is like saying "return now, and when the connection is |
557 |
established or it failed, continue there". |
558 |
|
559 |
Now let's look at the callback/closure in more detail: |
560 |
|
561 |
# the callback receives the socket handle - or nothing |
562 |
my ($fh) = @_ |
563 |
or return $cv->send; |
564 |
|
565 |
The first thing the callback does is indeed save the socket handle in |
566 |
C<$fh>. When there was an error (no arguments), then our instinct as |
567 |
expert Perl programmers would tell us to C<die>: |
568 |
|
569 |
my ($fh) = @_ |
570 |
or die "$host: $!"; |
571 |
|
572 |
While this would give good feedback to the user (if he happens to watch |
573 |
standard error), our program would probably stop working here, as we never |
574 |
report the results to anybody, certainly not the caller of our C<finger> |
575 |
function, and most event loops continue even after a C<die>! |
576 |
|
577 |
This is why we instead C<return>, but also call C<< $cv->send >> without |
578 |
any arguments to signal to the condvar consumer that something bad has |
579 |
happened. The return value of C<< $cv->send >> is irrelevant, as is the |
580 |
return value of our callback. The return statement is simply used for the |
581 |
side effect of, well, returning immediately from the callback. Checking |
582 |
for errors and handling them this way is very common, which is why this |
583 |
compact idiom is so handy. |
584 |
|
585 |
As the next step in the finger protocol, we send the username to the |
586 |
finger daemon on the other side of our connection: |
587 |
|
588 |
syswrite $fh, "$user\015\012"; |
589 |
|
590 |
Note that this isn't 100% clean socket programming - the socket could, |
591 |
for whatever reasons, not accept our data. When writing a small amount |
592 |
of data like in this example it doesn't matter, as a socket buffer is |
593 |
almost always big enough for a mere "username", but for real-world |
594 |
cases you might need to implement some kind of write buffering - or use |
595 |
L<AnyEvent::Handle>, which handles these matters for you, as shown in the |
596 |
next section. |
597 |
|
598 |
What we I<do> have to do is to implement our own read buffer - the response |
599 |
data could arrive late or in multiple chunks, and we cannot just wait for |
600 |
it (event-based programming, you know?). |
601 |
|
602 |
To do that, we register a read watcher on the socket which waits for data: |
603 |
|
604 |
my $read_watcher; $read_watcher = AnyEvent->io ( |
605 |
fh => $fh, |
606 |
poll => "r", |
607 |
|
608 |
There is a trick here, however: the read watcher isn't stored in a global |
609 |
variable, but in a local one - if the callback returns, it would normally |
610 |
destroy the variable and its contents, which would in turn unregister our |
611 |
watcher. |
612 |
|
613 |
To avoid that, we C<undef>ine the variable in the watcher callback. This |
614 |
means that, when the C<tcp_connect> callback returns, that perl thinks |
615 |
(quite correctly) that the read watcher is still in use - namely in the |
616 |
callback. |
617 |
|
618 |
The trick, however, is that instead of: |
619 |
|
620 |
my $read_watcher = AnyEvent->io (... |
621 |
|
622 |
The program does: |
623 |
|
624 |
my $read_watcher; $read_watcher = AnyEvent->io (... |
625 |
|
626 |
The reason for this is a quirk in the way Perl works: variable names |
627 |
declared with C<my> are only visible in the I<next> statement. If the |
628 |
whole C<< AnyEvent->io >> call, including the callback, would be done in |
629 |
a single statement, the callback could not refer to the C<$read_watcher> |
630 |
variable to undefine it, so it is done in two statements. |
631 |
|
632 |
Whether you'd want to format it like this is of course a matter of style, |
633 |
this way emphasizes that the declaration and assignment really are one |
634 |
logical statement. |
635 |
|
636 |
The callback itself calls C<sysread> for as many times as necessary, until |
637 |
C<sysread> returns either an error or end-of-file: |
638 |
|
639 |
cb => sub { |
640 |
my $len = sysread $fh, $response, 1024, length $response; |
641 |
|
642 |
if ($len <= 0) { |
643 |
|
644 |
Note that C<sysread> has the ability to append data it reads to a scalar, |
645 |
by specifying an offset, which is what we make good use of in this |
646 |
example. |
647 |
|
648 |
When C<sysread> indicates we are done, the callback C<undef>ines |
649 |
the watcher and then C<send>'s the response data to the condition |
650 |
variable. All this has the following effects: |
651 |
|
652 |
Undefining the watcher destroys it, as our callback was the only one still |
653 |
having a reference to it. When the watcher gets destroyed, it destroys the |
654 |
callback, which in turn means the C<$fh> handle is no longer used, so that |
655 |
gets destroyed as well. The result is that all resources will be nicely |
656 |
cleaned up by perl for us. |
657 |
|
658 |
=head3 Using the finger client |
659 |
|
660 |
Now, we could probably write the same finger client in a simpler way if |
661 |
we used C<IO::Socket::INET>, ignored the problem of multiple hosts and |
662 |
ignored IPv6 and a few other things that C<tcp_connect> handles for us. |
663 |
|
664 |
But the main advantage is that we can not only run this finger function in |
665 |
the background, we even can run multiple sessions in parallel, like this: |
666 |
|
667 |
my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets |
668 |
my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736 |
669 |
my $f3 = finger "johnc", "idsoftware.com"; # finger john |
670 |
|
671 |
print "trouble tickets:\n", $f1->recv, "\n"; |
672 |
print "trouble ticket #1736:\n", $f2->recv, "\n"; |
673 |
print "john carmacks finger file: ", $f3->recv, "\n"; |
674 |
|
675 |
It doesn't look like it, but in fact all three requests run in |
676 |
parallel. The code waits for the first finger request to finish first, but |
677 |
that doesn't keep it from executing them parallel: when the first C<recv> |
678 |
call sees that the data isn't ready yet, it serves events for all three |
679 |
requests automatically, until the first request has finished. |
680 |
|
681 |
The second C<recv> call might either find the data is already there, or it |
682 |
will continue handling events until that is the case, and so on. |
683 |
|
684 |
By taking advantage of network latencies, which allows us to serve other |
685 |
requests and events while we wait for an event on one socket, the overall |
686 |
time to do these three requests will be greatly reduced, typically all |
687 |
three are done in the same time as the slowest of them would need to finish. |
688 |
|
689 |
By the way, you do not actually have to wait in the C<recv> method on an |
690 |
AnyEvent condition variable - after all, waiting is evil - you can also |
691 |
register a callback: |
692 |
|
693 |
$cv->cb (sub { |
694 |
my $response = shift->recv; |
695 |
# ... |
696 |
}); |
697 |
|
698 |
The callback will only be invoked when C<send> was called. In fact, |
699 |
instead of returning a condition variable you could also pass a third |
700 |
parameter to your finger function, the callback to invoke with the |
701 |
response: |
702 |
|
703 |
sub finger($$$) { |
704 |
my ($user, $host, $cb) = @_; |
705 |
|
706 |
How you implement it is a matter of taste - if you expect your function to |
707 |
be used mainly in an event-based program you would normally prefer to pass |
708 |
a callback directly. If you write a module and expect your users to use |
709 |
it "synchronously" often (for example, a simple http-get script would not |
710 |
really care much for events), then you would use a condition variable and |
711 |
tell them "simply ->recv the data". |
712 |
|
713 |
=head3 Problems with the implementation and how to fix them |
714 |
|
715 |
To make this example more real-world-ready, we would not only implement |
716 |
some write buffering (for the paranoid), but we would also have to handle |
717 |
timeouts and maybe protocol errors. |
718 |
|
719 |
Doing this quickly gets unwieldy, which is why we introduce |
720 |
L<AnyEvent::Handle> in the next section, which takes care of all these |
721 |
details for you and let's you concentrate on the actual protocol. |
722 |
|
723 |
|
724 |
=head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle |
725 |
|
726 |
The L<AnyEvent::Handle> module has been hyped quite a bit so far, so let's |
727 |
see what it really offers. |
728 |
|
729 |
As finger is such a simple protocol, let's try something slightly more |
730 |
complicated: HTTP/1.0. |
731 |
|
732 |
An HTTP GET request works by sending a single request line that indicates |
733 |
what you want the server to do and the URI you want to act it on, followed |
734 |
by as many "header" lines (C<Header: data>, same as e-mail headers) as |
735 |
required for the request, ended by an empty line. |
736 |
|
737 |
The response is formatted very similarly, first a line with the response |
738 |
status, then again as many header lines as required, then an empty line, |
739 |
followed by any data that the server might send. |
740 |
|
741 |
Again, let's try it out with C<telnet> (I condensed the output a bit - if |
742 |
you want to see the full response, do it yourself). |
743 |
|
744 |
# telnet www.google.com 80 |
745 |
Trying 209.85.135.99... |
746 |
Connected to www.google.com (209.85.135.99). |
747 |
Escape character is '^]'. |
748 |
GET /test HTTP/1.0 |
749 |
|
750 |
HTTP/1.0 404 Not Found |
751 |
Date: Mon, 02 Jun 2008 07:05:54 GMT |
752 |
Content-Type: text/html; charset=UTF-8 |
753 |
|
754 |
<html><head> |
755 |
[...] |
756 |
Connection closed by foreign host. |
757 |
|
758 |
The C<GET ...> and the empty line were entered manually, the rest of the |
759 |
telnet output is google's response, in which case a C<404 not found> one. |
760 |
|
761 |
So, here is how you would do it with C<AnyEvent::Handle>: |
762 |
|
763 |
sub http_get { |
764 |
my ($host, $uri, $cb) = @_; |
765 |
|
766 |
tcp_connect $host, "http", sub { |
767 |
my ($fh) = @_ |
768 |
or $cb->("HTTP/1.0 500 $!"); |
769 |
|
770 |
# store results here |
771 |
my ($response, $header, $body); |
772 |
|
773 |
my $handle; $handle = new AnyEvent::Handle |
774 |
fh => $fh, |
775 |
on_error => sub { |
776 |
undef $handle; |
777 |
$cb->("HTTP/1.0 500 $!"); |
778 |
}, |
779 |
on_eof => sub { |
780 |
undef $handle; # keep it alive till eof |
781 |
$cb->($response, $header, $body); |
782 |
}; |
783 |
|
784 |
$handle->push_write ("GET $uri HTTP/1.0\015\012\015\012"); |
785 |
|
786 |
# now fetch response status line |
787 |
$handle->push_read (line => sub { |
788 |
my ($handle, $line) = @_; |
789 |
$response = $line; |
790 |
}); |
791 |
|
792 |
# then the headers |
793 |
$handle->push_read (line => "\015\012\015\012", sub { |
794 |
my ($handle, $line) = @_; |
795 |
$header = $line; |
796 |
}); |
797 |
|
798 |
# and finally handle any remaining data as body |
799 |
$handle->on_read (sub { |
800 |
$body .= $_[0]->rbuf; |
801 |
$_[0]->rbuf = ""; |
802 |
}); |
803 |
}; |
804 |
} |
805 |
|
806 |
And now let's go through it step by step. First, as usual, the overall |
807 |
C<http_get> function structure: |
808 |
|
809 |
sub http_get { |
810 |
my ($host, $uri, $cb) = @_; |
811 |
|
812 |
tcp_connect $host, "http", sub { |
813 |
... |
814 |
}; |
815 |
} |
816 |
|
817 |
Unlike in the finger example, this time the caller has to pass a callback |
818 |
to C<http_get>. Also, instead of passing a URL as one would expect, the |
819 |
caller has to provide the hostname and URI - normally you would use the |
820 |
C<URI> module to parse a URL and separate it into those parts, but that is |
821 |
left to the inspired reader :) |
822 |
|
823 |
Since everything else is left to the caller, all C<http_get> does it to |
824 |
initiate the connection with C<tcp_connect> and leave everything else to |
825 |
it's callback. |
826 |
|
827 |
The first thing the callback does is check for connection errors and |
828 |
declare some variables: |
829 |
|
830 |
my ($fh) = @_ |
831 |
or $cb->("HTTP/1.0 500 $!"); |
832 |
|
833 |
my ($response, $header, $body); |
834 |
|
835 |
Instead of having an extra mechanism to signal errors, connection errors |
836 |
are signalled by crafting a special "response status line", like this: |
837 |
|
838 |
HTTP/1.0 500 Connection refused |
839 |
|
840 |
This means the caller cannot distinguish (easily) between |
841 |
locally-generated errors and server errors, but it simplifies error |
842 |
handling for the caller a lot. |
843 |
|
844 |
The next step finally involves L<AnyEvent::Handle>, namely it creates the |
845 |
handle object: |
846 |
|
847 |
my $handle; $handle = new AnyEvent::Handle |
848 |
fh => $fh, |
849 |
on_error => sub { |
850 |
undef $handle; |
851 |
$cb->("HTTP/1.0 500 $!"); |
852 |
}, |
853 |
on_eof => sub { |
854 |
undef $handle; # keep it alive till eof |
855 |
$cb->($response, $header, $body); |
856 |
}; |
857 |
|
858 |
The constructor expects a file handle, which gets passed via the C<fh> |
859 |
argument. |
860 |
|
861 |
The remaining two argument pairs specify two callbacks to be called on |
862 |
any errors (C<on_error>) and in the case of a normal connection close |
863 |
(C<on_eof>). |
864 |
|
865 |
In the first case, we C<undef>ine the handle object and pass the error to |
866 |
the callback provided by the callback - done. |
867 |
|
868 |
In the second case we assume everything went fine and pass the results |
869 |
gobbled up so far to the caller-provided callback. This is not quite |
870 |
perfect, as when the server "cleanly" closes the connection in the middle |
871 |
of sending headers we might wrongly report this as an "OK" to the caller, |
872 |
but then, HTTP doesn't support a perfect mechanism that would detect such |
873 |
problems in all cases, so we don't bother either. |
874 |
|
875 |
=head3 The write queue |
876 |
|
877 |
The next line sends the actual request: |
878 |
|
879 |
$handle->push_write ("GET $uri HTTP/1.0\015\012\015\012"); |
880 |
|
881 |
No headers will be sent (this is fine for simple requests), so the whole |
882 |
request is just a single line followed by an empty line to signal the end |
883 |
of the headers to the server. |
884 |
|
885 |
The more interesting question is why the method is called C<push_write> |
886 |
and not just write. The reason is that you can I<always> add some write |
887 |
data without blocking, and to do this, AnyEvent::Handle needs some write |
888 |
queue internally - and C<push_write> simply pushes some data at the end of |
889 |
that queue, just like Perl's C<push> pushes data at the end of an array. |
890 |
|
891 |
The deeper reason is that at some point in the future, there might |
892 |
be C<unshift_write> as well, and in any case, we will shortly meet |
893 |
C<push_read> and C<unshift_read>, and it's usually easiest if all those |
894 |
functions have some symmetry in their name. |
895 |
|
896 |
If C<push_write> is called with more than one argument, then you can even |
897 |
do I<formatted> I/O, which simply means your data will be transformed in |
898 |
some ways. For example, this would JSON-encode your data before pushing it |
899 |
to the write queue: |
900 |
|
901 |
$handle->push_write (json => [1, 2, 3]); |
902 |
|
903 |
Apart from that, this pretty much summarises the write queue, there is |
904 |
little else to it. |
905 |
|
906 |
Reading the response if far more interesting: |
907 |
|
908 |
=head3 The read queue |
909 |
|
910 |
The response consists of three parts: a single line of response status, a |
911 |
single paragraph of headers ended by an empty line, and the request body, |
912 |
which is simply the remaining data on that connection. |
913 |
|
914 |
For the first two, we push two read requests onto the read queue: |
915 |
|
916 |
# now fetch response status line |
917 |
$handle->push_read (line => sub { |
918 |
my ($handle, $line) = @_; |
919 |
$response = $line; |
920 |
}); |
921 |
|
922 |
# then the headers |
923 |
$handle->push_read (line => "\015\012\015\012", sub { |
924 |
my ($handle, $line) = @_; |
925 |
$header = $line; |
926 |
}); |
927 |
|
928 |
While one can simply push a single callback to the queue, I<formatted> I/O |
929 |
really comes to out advantage here, as there is a ready-made "read line" |
930 |
read type. The first read expects a single line, ended by C<\015\012> (the |
931 |
standard end-of-line marker in internet protocols). |
932 |
|
933 |
The second "line" is actually a single paragraph - instead of reading it |
934 |
line by line we tell C<push_read> that the end-of-line marker is really |
935 |
C<\015\012\015\012>, which is an empty line. The result is that the whole |
936 |
header paragraph will be treated as a single line and read. The word |
937 |
"line" is interpreted very freely, much like Perl itself does it. |
938 |
|
939 |
Note that push read requests are pushed immediately after creating the |
940 |
handle object - since AnyEvent::Handle provides a queue we can push as |
941 |
many requests as we want, and AnyEvent::Handle will handle them in order. |
942 |
|
943 |
There is, however, no read type for "the remaining data". For that, we |
944 |
install our own C<on_read> callback: |
945 |
|
946 |
# and finally handle any remaining data as body |
947 |
$handle->on_read (sub { |
948 |
$body .= $_[0]->rbuf; |
949 |
$_[0]->rbuf = ""; |
950 |
}); |
951 |
|
952 |
This callback is invoked every time data arrives and the read queue is |
953 |
empty - which in this example will only be the case when both response and |
954 |
header have been read. The C<on_read> callback could actually have been |
955 |
specified when constructing the object, but doing it this way preserves |
956 |
logical ordering. |
957 |
|
958 |
The read callback simply adds the current read buffer to it's C<$body> |
959 |
variable and, most importantly, I<empties> it by assign the empty string |
960 |
to it. |
961 |
|
962 |
After AnyEvent::Handle has been so instructed, it will now handle incoming |
963 |
data according to these instructions - if all goes well, the callback will |
964 |
be invoked with the response data, if not, it will get an error. |
965 |
|
966 |
In general, you get pipelining very easy with AnyEvent::Handle: If |
967 |
you have a protocol with a request/response structure, your request |
968 |
methods/functions will all look like this (simplified): |
969 |
|
970 |
sub request { |
971 |
|
972 |
# send the request to the server |
973 |
$handle->push_write (...); |
974 |
|
975 |
# push some response handlers |
976 |
$handle->push_read (...); |
977 |
} |
978 |
|
979 |
=head3 Using it |
980 |
|
981 |
And here is how you would use it: |
982 |
|
983 |
http_get "www.google.com", "/", sub { |
984 |
my ($response, $header, $body) = @_; |
985 |
|
986 |
print |
987 |
$response, "\n", |
988 |
$body; |
989 |
}; |
990 |
|
991 |
And of course, you can run as many of these requests in parallel as you |
992 |
want (and your memory supports). |
993 |
|
994 |
=head3 HTTPS |
995 |
|
996 |
Now, as promised, let's implement the same thing for HTTPS, or more |
997 |
correctly, let's change our C<http_get> function into a function that |
998 |
speaks HTTPS instead. |
999 |
|
1000 |
HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer |
1001 |
B<S>ecurity is the official name for what most people refer to as C<SSL>) |
1002 |
that contains standard HTTP protocol exchanges. The other difference to |
1003 |
HTTP is that it uses port C<443> instead of port C<80>. |
1004 |
|
1005 |
To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call |
1006 |
we replace C<http> by C<https>): |
1007 |
|
1008 |
tcp_connect $host, "https", sub { ... |
1009 |
|
1010 |
The other change deals with TLS, which is something L<AnyEvent::Handle> |
1011 |
does for us, as long as I<you> made sure that the L<Net::SSLeay> module is |
1012 |
around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition |
1013 |
C<tls> parameter to the call to C<AnyEvent::Handle::new>: |
1014 |
|
1015 |
tls => "connect", |
1016 |
|
1017 |
Specifying C<tls> enables TLS, and the argument specifies whether |
1018 |
AnyEvent::Handle is the server side ("accept") or the client side |
1019 |
("connect") for the TLS connection, as unlike TCP, there is a clear |
1020 |
server/client relationship in TLS. |
1021 |
|
1022 |
That's all. |
1023 |
|
1024 |
Of course, all this should be handled transparently by C<http_get> after |
1025 |
parsing the URL. See the part about exercising your inspiration earlier in |
1026 |
this document. |
1027 |
|
1028 |
=head3 The read queue - revisited |
1029 |
|
1030 |
HTTP always uses the same structure in its responses, but many protocols |
1031 |
require parsing responses different depending on the response itself. |
1032 |
|
1033 |
For example, in SMTP, you normally get a single response line: |
1034 |
|
1035 |
220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
1036 |
|
1037 |
But SMTP also supports multi-line responses: |
1038 |
|
1039 |
220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
1040 |
220-hey guys |
1041 |
220 my response is longer than yours |
1042 |
|
1043 |
To handle this, we need C<unshift_read>. As the name (hopefully) implies, |
1044 |
C<unshift_read> will not append your read request tot he end of the read |
1045 |
queue, but instead it will prepend it to the queue. |
1046 |
|
1047 |
This is useful for this this situation: You push your response-line read |
1048 |
request when sending the SMTP command, and when handling it, you look at |
1049 |
the line to see if more is to come, and C<unshift_read> another reader, |
1050 |
like this: |
1051 |
|
1052 |
my $response; # response lines end up in here |
1053 |
|
1054 |
my $read_response; $read_response = sub { |
1055 |
my ($handle, $line) = @_; |
1056 |
|
1057 |
$response .= "$line\n"; |
1058 |
|
1059 |
# check for continuation lines ("-" as 4th character") |
1060 |
if ($line =~ /^...-/) { |
1061 |
# if yes, then unshift another line read |
1062 |
$handle->unshift_read (line => $read_response); |
1063 |
|
1064 |
} else { |
1065 |
# otherwise we are done |
1066 |
|
1067 |
# free callback |
1068 |
undef $read_response; |
1069 |
|
1070 |
print "we are don reading: $response\n"; |
1071 |
} |
1072 |
}; |
1073 |
|
1074 |
$handle->push_read (line => $read_response); |
1075 |
|
1076 |
This recipe can be used for all similar parsing problems, for example in |
1077 |
NNTP, the response code to some commands indicates that more data will be |
1078 |
sent: |
1079 |
|
1080 |
$handle->push_write ("article 42"); |
1081 |
|
1082 |
# read response line |
1083 |
$handle->push_read (line => sub { |
1084 |
my ($handle, $status) = @_; |
1085 |
|
1086 |
# article data following? |
1087 |
if ($status =~ /^2/) { |
1088 |
# yes, read article body |
1089 |
|
1090 |
$handle->unshift_read (line => "\012.\015\012", sub { |
1091 |
my ($handle, $body) = @_; |
1092 |
|
1093 |
$finish->($status, $body); |
1094 |
}); |
1095 |
|
1096 |
} else { |
1097 |
# some error occured, no article data |
1098 |
|
1099 |
$finish->($status); |
1100 |
} |
1101 |
} |
1102 |
|
1103 |
=head3 Your own read queue handler |
1104 |
|
1105 |
Sometimes, your protocol doesn't play nice and uses lines or chunks of |
1106 |
data, in which case you have to implement your own read parser. |
1107 |
|
1108 |
To make up a contorted example, imagine you are looking for an even |
1109 |
number of characters followed by a colon (":"). Also imagine that |
1110 |
AnyEvent::Handle had no C<regex> read type which could be used, so you'd |
1111 |
had to do it manually. |
1112 |
|
1113 |
To implement this, you would C<push_read> (or C<unshift_read>) just a |
1114 |
single code reference. |
1115 |
|
1116 |
This code reference will then be called each time there is (new) data |
1117 |
available in the read buffer, and is expected to either eat/consume some |
1118 |
of that data (and return true) or to return false to indicate that it |
1119 |
wants to be called again. |
1120 |
|
1121 |
If the code reference returns true, then it will be removed from the read |
1122 |
queue, otherwise it stays in front of it. |
1123 |
|
1124 |
The example above could be coded like this: |
1125 |
|
1126 |
$handle->push_read (sub { |
1127 |
my ($handle) = @_; |
1128 |
|
1129 |
# check for even number of characters + ":" |
1130 |
# and remove the data if a match is found. |
1131 |
# if not, return false (actually nothing) |
1132 |
|
1133 |
$handle->{rbuf} =~ s/^( (?:..)* ) ://x |
1134 |
or return; |
1135 |
|
1136 |
# we got some data in $1, pass it to whoever wants it |
1137 |
$finish->($1); |
1138 |
|
1139 |
# and return true to indicate we are done |
1140 |
1 |
1141 |
}); |
1142 |
|
1143 |
|
1144 |
=head1 Authors |
1145 |
|
1146 |
Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>. |
1147 |
|