1 |
root |
1.21 |
=head1 NAME |
2 |
|
|
|
3 |
|
|
AnyEvent::Intro - an introductory tutorial to AnyEvent |
4 |
|
|
|
5 |
root |
1.2 |
=head1 Introduction to AnyEvent |
6 |
root |
1.1 |
|
7 |
root |
1.2 |
This is a tutorial that will introduce you to the features of AnyEvent. |
8 |
root |
1.1 |
|
9 |
root |
1.2 |
The first part introduces the core AnyEvent module (after swamping you a |
10 |
root |
1.25 |
bit in evangelism), which might already provide all you ever need: If you |
11 |
root |
1.17 |
are only interested in AnyEvent's event handling capabilities, read no |
12 |
|
|
further. |
13 |
root |
1.1 |
|
14 |
root |
1.2 |
The second part focuses on network programming using sockets, for which |
15 |
root |
1.17 |
AnyEvent offers a lot of support you can use, and a lot of workarounds |
16 |
|
|
around portability quirks. |
17 |
root |
1.2 |
|
18 |
|
|
|
19 |
|
|
=head1 What is AnyEvent? |
20 |
|
|
|
21 |
root |
1.10 |
If you don't care for the whys and want to see code, skip this section! |
22 |
root |
1.2 |
|
23 |
|
|
AnyEvent is first of all just a framework to do event-based |
24 |
|
|
programming. Typically such frameworks are an all-or-nothing thing: If you |
25 |
|
|
use one such framework, you can't (easily, or even at all) use another in |
26 |
|
|
the same program. |
27 |
|
|
|
28 |
root |
1.25 |
AnyEvent is different - it is a thin abstraction layer on top of other of |
29 |
|
|
event loops, just like DBI is an abstraction of many different database |
30 |
|
|
APIs. Its main purpose is to move the choice of the underlying framework |
31 |
|
|
(the event loop) from the module author to the program author using the |
32 |
|
|
module. |
33 |
root |
1.2 |
|
34 |
|
|
That means you can write code that uses events to control what it |
35 |
|
|
does, without forcing other code in the same program to use the same |
36 |
|
|
underlying framework as you do - i.e. you can create a Perl module |
37 |
|
|
that is event-based using AnyEvent, and users of that module can still |
38 |
root |
1.25 |
choose between using L<Gtk2>, L<Tk>, L<Event> (or run inside Irssi or |
39 |
|
|
rxvt-unicode) or any other supported event loop. AnyEvent even comes with |
40 |
|
|
its own pure-perl event loop implementation, so your code works regardless |
41 |
|
|
of other modules that might or might not be installed. The latter is |
42 |
|
|
important, as AnyEvent does not have any hard dependencies to other |
43 |
|
|
modules, which makes it easy to install, for example, when you lack a C |
44 |
|
|
compiler. No mater what environment, AnyEvent will just cope with it. |
45 |
|
|
|
46 |
|
|
A typical limitation of existing Perl modules such as L<Net::IRC> is that |
47 |
|
|
they come with their own event loop: In L<Net::IRC>, the program who uses |
48 |
|
|
it needs to start the event loop of L<Net::IRC>. That means that one |
49 |
|
|
cannot integrate this module into a L<Gtk2> GUI for instance, as that |
50 |
|
|
module, too, enforces the use of its own event loop (namely L<Glib>). |
51 |
|
|
|
52 |
|
|
Another example is L<LWP>: it provides no event interface at all. It's |
53 |
|
|
a pure blocking HTTP (and FTP etc.) client library, which usually means |
54 |
|
|
that you either have to start another process or have to fork for a HTTP |
55 |
|
|
request, or use threads (e.g. L<Coro::LWP>), if you want to do something |
56 |
|
|
else while waiting for the request to finish. |
57 |
|
|
|
58 |
|
|
The motivation behind these designs is often that a module doesn't want |
59 |
|
|
to depend on some complicated XS-module (Net::IRC), or that it doesn't |
60 |
|
|
want to force the user to use some specific event loop at all (LWP), out |
61 |
|
|
of fear of severly limiting the usefulness of the module: If your module |
62 |
|
|
requires Glib, it will not run in a Tk program. |
63 |
root |
1.1 |
|
64 |
root |
1.25 |
L<AnyEvent> solves this dilemma, by B<not> forcing module authors to |
65 |
|
|
either: |
66 |
root |
1.1 |
|
67 |
|
|
=over 4 |
68 |
|
|
|
69 |
root |
1.25 |
=item - write their own event loop (because it guarantees the availability |
70 |
|
|
of an event loop everywhere - even on windows with no extra modules |
71 |
|
|
installed). |
72 |
root |
1.1 |
|
73 |
root |
1.25 |
=item - choose one specific event loop (because AnyEvent works with most |
74 |
|
|
event loops available for Perl). |
75 |
root |
1.1 |
|
76 |
|
|
=back |
77 |
|
|
|
78 |
root |
1.25 |
If the module author uses L<AnyEvent> for all his (or her) event needs |
79 |
|
|
(IO events, timers, signals, ...) then all other modules can just use |
80 |
|
|
his module and don't have to choose an event loop or adapt to his event |
81 |
|
|
loop. The choice of the event loop is ultimately made by the program |
82 |
|
|
author who uses all the modules and writes the main program. And even |
83 |
|
|
there he doesn't have to choose, he can just let L<AnyEvent> choose the |
84 |
|
|
most efficient event loop available on the system. |
85 |
root |
1.1 |
|
86 |
|
|
Read more about this in the main documentation of the L<AnyEvent> module. |
87 |
|
|
|
88 |
|
|
|
89 |
root |
1.2 |
=head1 Introduction to Event-Based Programming |
90 |
|
|
|
91 |
|
|
So what exactly is programming using events? It quite simply means that |
92 |
|
|
instead of your code actively waiting for something, such as the user |
93 |
|
|
entering something on STDIN: |
94 |
|
|
|
95 |
|
|
$| = 1; print "enter your name> "; |
96 |
|
|
|
97 |
|
|
my $name = <STDIN>; |
98 |
|
|
|
99 |
|
|
You instead tell your event framework to notify you in the event of some |
100 |
|
|
data being available on STDIN, by using a callback mechanism: |
101 |
|
|
|
102 |
|
|
use AnyEvent; |
103 |
|
|
|
104 |
|
|
$| = 1; print "enter your name> "; |
105 |
|
|
|
106 |
|
|
my $name; |
107 |
|
|
|
108 |
|
|
my $wait_for_input = AnyEvent->io ( |
109 |
|
|
fh => \*STDIN, # which file handle to check |
110 |
|
|
poll => "r", # which event to wait for ("r"ead data) |
111 |
|
|
cb => sub { # what callback to execute |
112 |
|
|
$name = <STDIN>; # read it |
113 |
|
|
} |
114 |
|
|
); |
115 |
|
|
|
116 |
|
|
# do something else here |
117 |
|
|
|
118 |
|
|
Looks more complicated, and surely is, but the advantage of using events |
119 |
root |
1.22 |
is that your program can do something else instead of waiting for input |
120 |
|
|
(side note: combining AnyEvent with a thread package such as Coro can |
121 |
|
|
recoup much of the simplicity, effectively getting the best of two |
122 |
|
|
worlds). |
123 |
|
|
|
124 |
|
|
Waiting as done in the first example is also called "blocking" the process |
125 |
|
|
because you "block"/keep your process from executing anything else while |
126 |
|
|
you do so. |
127 |
root |
1.2 |
|
128 |
root |
1.22 |
The second example avoids blocking by only registering interest in a read |
129 |
root |
1.2 |
event, which is fast and doesn't block your process. Only when read data |
130 |
|
|
is available will the callback be called, which can then proceed to read |
131 |
|
|
the data. |
132 |
|
|
|
133 |
|
|
The "interest" is represented by an object returned by C<< AnyEvent->io |
134 |
|
|
>> called a "watcher" object - called like that because it "watches" your |
135 |
|
|
file handle (or other event sources) for the event you are interested in. |
136 |
|
|
|
137 |
|
|
In the example above, we create an I/O watcher by calling the C<< |
138 |
root |
1.22 |
AnyEvent->io >> method. Disinterest in some event is simply expressed |
139 |
|
|
by forgetting about the watcher, for example, by C<undef>'ing the only |
140 |
|
|
variable it is stored in. AnyEvent will automatically clean up the watcher |
141 |
|
|
if it is no longer used, much like Perl closes your file handles if you no |
142 |
|
|
longer use them anywhere. |
143 |
root |
1.2 |
|
144 |
root |
1.18 |
=head3 A short note on callbacks |
145 |
|
|
|
146 |
|
|
A common issue that hits people is the problem of passing parameters |
147 |
|
|
to callbacks. Programmers used to languages such as C or C++ are often |
148 |
|
|
used to a style where one passes the address of a function (a function |
149 |
|
|
reference) and some data value, e.g.: |
150 |
|
|
|
151 |
|
|
sub callback { |
152 |
|
|
my ($arg) = @_; |
153 |
|
|
|
154 |
|
|
$arg->method; |
155 |
|
|
} |
156 |
|
|
|
157 |
|
|
my $arg = ...; |
158 |
|
|
|
159 |
|
|
call_me_back_later \&callback, $arg; |
160 |
|
|
|
161 |
|
|
This is clumsy, as the place where behaviour is specified (when the |
162 |
|
|
callback is registered) is often far away from the place where behaviour |
163 |
|
|
is implemented. It also doesn't use Perl syntax to invoke the code. There |
164 |
|
|
is also an abstraction penalty to pay as one has to I<name> the callback, |
165 |
|
|
which often is unnecessary and leads to nonsensical or duplicated names. |
166 |
|
|
|
167 |
|
|
In Perl, one can specify behaviour much more directly by using |
168 |
|
|
I<closures>. Closures are code blocks that take a reference to the |
169 |
root |
1.22 |
enclosing scope(s) when they are created. This means lexical variables in |
170 |
|
|
scope at the time of creating the closure can simply be used inside the |
171 |
|
|
closure: |
172 |
root |
1.18 |
|
173 |
|
|
my $arg = ...; |
174 |
|
|
|
175 |
|
|
call_me_back_later sub { $arg->method }; |
176 |
|
|
|
177 |
root |
1.22 |
Under most circumstances, closures are faster, use fewer resources and |
178 |
root |
1.18 |
result in much clearer code then the traditional approach. Faster, |
179 |
|
|
because parameter passing and storing them in local variables in Perl |
180 |
root |
1.22 |
is relatively slow. Fewer resources, because closures take references |
181 |
|
|
to existing variables without having to create new ones, and clearer |
182 |
|
|
code because it is immediately obvious that the second example calls the |
183 |
root |
1.18 |
C<method> method when the callback is invoked. |
184 |
|
|
|
185 |
|
|
Apart from these, the strongest argument for using closures with AnyEvent |
186 |
|
|
is that AnyEvent does not allow passing parameters to the callback, so |
187 |
|
|
closures are the only way to achieve that in most cases :-> |
188 |
|
|
|
189 |
|
|
|
190 |
root |
1.19 |
=head3 A hint on debugging |
191 |
|
|
|
192 |
|
|
AnyEvent does, by default, not do any argument checking. This can lead to |
193 |
root |
1.22 |
strange and unexpected results especially if you are trying to learn your |
194 |
root |
1.19 |
ways with AnyEvent. |
195 |
|
|
|
196 |
root |
1.24 |
AnyEvent supports a special "strict" mode - off by default - which does very |
197 |
root |
1.22 |
strict argument checking, at the expense of being somewhat slower. During |
198 |
|
|
development, however, this mode is very useful. |
199 |
root |
1.19 |
|
200 |
|
|
You can enable this strict mode either by having an environment variable |
201 |
|
|
C<PERL_ANYEVENT_STRICT> with a true value in your environment: |
202 |
|
|
|
203 |
|
|
PERL_ANYEVENT_STRICT=1 perl test.pl |
204 |
|
|
|
205 |
|
|
Or you can write C<use AnyEvent::Strict> in your program, which has the |
206 |
|
|
same effect (do not do this in production, however). |
207 |
|
|
|
208 |
|
|
|
209 |
root |
1.2 |
=head2 Condition Variables |
210 |
|
|
|
211 |
root |
1.22 |
Back to the I/O watcher example: The code is not yet a fully working |
212 |
|
|
program, and will not work as-is. The reason is that your callback will |
213 |
|
|
not be invoked out of the blue, you have to run the event loop. Also, |
214 |
|
|
event-based programs sometimes have to block, too, as when there simply is |
215 |
|
|
nothing else to do and everything waits for some events, it needs to block |
216 |
|
|
the process as well until new events arrive. |
217 |
root |
1.2 |
|
218 |
|
|
In AnyEvent, this is done using condition variables. Condition variables |
219 |
|
|
are named "condition variables" because they represent a condition that is |
220 |
|
|
initially false and needs to be fulfilled. |
221 |
|
|
|
222 |
root |
1.10 |
You can also call them "merge points", "sync points", "rendezvous ports" |
223 |
|
|
or even callbacks and many other things (and they are often called like |
224 |
|
|
this in other frameworks). The important point is that you can create them |
225 |
|
|
freely and later wait for them to become true. |
226 |
root |
1.2 |
|
227 |
|
|
Condition variables have two sides - one side is the "producer" of the |
228 |
root |
1.18 |
condition (whatever code detects and flags the condition), the other side |
229 |
|
|
is the "consumer" (the code that waits for that condition). |
230 |
root |
1.2 |
|
231 |
|
|
In our example in the previous section, the producer is the event callback |
232 |
root |
1.18 |
and there is no consumer yet - let's change that right now: |
233 |
root |
1.2 |
|
234 |
|
|
use AnyEvent; |
235 |
|
|
|
236 |
|
|
$| = 1; print "enter your name> "; |
237 |
|
|
|
238 |
|
|
my $name; |
239 |
|
|
|
240 |
|
|
my $name_ready = AnyEvent->condvar; |
241 |
|
|
|
242 |
|
|
my $wait_for_input = AnyEvent->io ( |
243 |
|
|
fh => \*STDIN, |
244 |
|
|
poll => "r", |
245 |
|
|
cb => sub { |
246 |
|
|
$name = <STDIN>; |
247 |
|
|
$name_ready->send; |
248 |
|
|
} |
249 |
|
|
); |
250 |
|
|
|
251 |
|
|
# do something else here |
252 |
|
|
|
253 |
|
|
# now wait until the name is available: |
254 |
|
|
$name_ready->recv; |
255 |
|
|
|
256 |
|
|
undef $wait_for_input; # watche rno longer needed |
257 |
|
|
|
258 |
|
|
print "your name is $name\n"; |
259 |
|
|
|
260 |
|
|
This program creates an AnyEvent condvar by calling the C<< |
261 |
|
|
AnyEvent->condvar >> method. It then creates a watcher as usual, but |
262 |
|
|
inside the callback it C<send>'s the C<$name_ready> condition variable, |
263 |
root |
1.22 |
which causes whoever is waiting on it to continue. |
264 |
root |
1.2 |
|
265 |
root |
1.22 |
The "whoever" in this case is the code that follows, which calls C<< |
266 |
root |
1.2 |
$name_ready->recv >>: The producer calls C<send>, the consumer calls |
267 |
|
|
C<recv>. |
268 |
|
|
|
269 |
|
|
If there is no C<$name> available yet, then the call to C<< |
270 |
|
|
$name_ready->recv >> will halt your program until the condition becomes |
271 |
|
|
true. |
272 |
|
|
|
273 |
|
|
As the names C<send> and C<recv> imply, you can actually send and receive |
274 |
|
|
data using this, for example, the above code could also be written like |
275 |
|
|
this, without an extra variable to store the name in: |
276 |
|
|
|
277 |
|
|
use AnyEvent; |
278 |
|
|
|
279 |
|
|
$| = 1; print "enter your name> "; |
280 |
|
|
|
281 |
|
|
my $name_ready = AnyEvent->condvar; |
282 |
|
|
|
283 |
|
|
my $wait_for_input = AnyEvent->io ( |
284 |
|
|
fh => \*STDIN, poll => "r", |
285 |
root |
1.20 |
cb => sub { $name_ready->send (scalar <STDIN>) } |
286 |
root |
1.2 |
); |
287 |
|
|
|
288 |
|
|
# do something else here |
289 |
|
|
|
290 |
|
|
# now wait and fetch the name |
291 |
|
|
my $name = $name_ready->recv; |
292 |
|
|
|
293 |
|
|
undef $wait_for_input; # watche rno longer needed |
294 |
|
|
|
295 |
|
|
print "your name is $name\n"; |
296 |
|
|
|
297 |
|
|
You can pass any number of arguments to C<send>, and everybody call to |
298 |
|
|
C<recv> will return them. |
299 |
|
|
|
300 |
|
|
=head2 The "main loop" |
301 |
|
|
|
302 |
|
|
Most event-based frameworks have something called a "main loop" or "event |
303 |
|
|
loop run function" or something similar. |
304 |
|
|
|
305 |
|
|
Just like in C<recv> AnyEvent, these functions need to be called |
306 |
|
|
eventually so that your event loop has a chance of actually looking for |
307 |
|
|
those events you are interested in. |
308 |
|
|
|
309 |
|
|
For example, in a L<Gtk2> program, the above example could also be written |
310 |
|
|
like this: |
311 |
|
|
|
312 |
|
|
use Gtk2 -init; |
313 |
|
|
use AnyEvent; |
314 |
|
|
|
315 |
|
|
############################################ |
316 |
|
|
# create a window and some label |
317 |
|
|
|
318 |
|
|
my $window = new Gtk2::Window "toplevel"; |
319 |
|
|
$window->add (my $label = new Gtk2::Label "soon replaced by name"); |
320 |
|
|
|
321 |
|
|
$window->show_all; |
322 |
|
|
|
323 |
|
|
############################################ |
324 |
|
|
# do our AnyEvent stuff |
325 |
|
|
|
326 |
|
|
$| = 1; print "enter your name> "; |
327 |
|
|
|
328 |
|
|
my $name_ready = AnyEvent->condvar; |
329 |
|
|
|
330 |
|
|
my $wait_for_input = AnyEvent->io ( |
331 |
|
|
fh => \*STDIN, poll => "r", |
332 |
|
|
cb => sub { |
333 |
|
|
# set the label |
334 |
|
|
$label->set_text (scalar <STDIN>); |
335 |
|
|
print "enter another name> "; |
336 |
|
|
} |
337 |
|
|
); |
338 |
|
|
|
339 |
|
|
############################################ |
340 |
|
|
# Now enter Gtk2's event loop |
341 |
|
|
|
342 |
|
|
main Gtk2; |
343 |
|
|
|
344 |
|
|
No condition variable anywhere in sight - instead, we just read a line |
345 |
|
|
from STDIN and replace the text in the label. In fact, since nobody |
346 |
|
|
C<undef>'s C<$wait_for_input> you can enter multiple lines. |
347 |
|
|
|
348 |
|
|
Instead of waiting for a condition variable, the program enters the Gtk2 |
349 |
|
|
main loop by calling C<< Gtk2->main >>, which will block the program and |
350 |
|
|
wait for events to arrive. |
351 |
|
|
|
352 |
|
|
This also shows that AnyEvent is quite flexible - you didn't have anything |
353 |
|
|
to do to make the AnyEvent watcher use Gtk2 (actually Glib) - it just |
354 |
|
|
worked. |
355 |
|
|
|
356 |
|
|
Admittedly, the example is a bit silly - who would want to read names |
357 |
root |
1.22 |
from standard input in a Gtk+ application. But imagine that instead of |
358 |
root |
1.2 |
doing that, you would make a HTTP request in the background and display |
359 |
|
|
it's results. In fact, with event-based programming you can make many |
360 |
|
|
http-requests in parallel in your program and still provide feedback to |
361 |
|
|
the user and stay interactive. |
362 |
|
|
|
363 |
root |
1.22 |
And in the next part you will see how to do just that - by implementing an |
364 |
root |
1.2 |
HTTP request, on our own, with the utility modules AnyEvent comes with. |
365 |
|
|
|
366 |
root |
1.4 |
Before that, however, let's briefly look at how you would write your |
367 |
root |
1.2 |
program with using only AnyEvent, without ever calling some other event |
368 |
|
|
loop's run function. |
369 |
|
|
|
370 |
root |
1.22 |
In the example using condition variables, we used those to start waiting |
371 |
|
|
for events, and in fact, condition variables are the solution: |
372 |
root |
1.2 |
|
373 |
|
|
my $quit_program = AnyEvent->condvar; |
374 |
|
|
|
375 |
|
|
# create AnyEvent watchers (or not) here |
376 |
|
|
|
377 |
|
|
$quit_program->recv; |
378 |
|
|
|
379 |
root |
1.22 |
If any of your watcher callbacks decide to quit (this is often |
380 |
|
|
called an "unloop" in other frameworks), they can simply call C<< |
381 |
|
|
$quit_program->send >>. Of course, they could also decide not to and |
382 |
root |
1.2 |
simply call C<exit> instead, or they could decide not to quit, ever (e.g. |
383 |
|
|
in a long-running daemon program). |
384 |
|
|
|
385 |
root |
1.22 |
If you don't need some clean quit functionality and just want to run the |
386 |
|
|
event loop, you can simply do this: |
387 |
root |
1.2 |
|
388 |
|
|
AnyEvent->condvar->recv; |
389 |
|
|
|
390 |
|
|
And this is, in fact, closest to the idea of a main loop run function that |
391 |
|
|
AnyEvent offers. |
392 |
|
|
|
393 |
|
|
=head2 Timers and other event sources |
394 |
|
|
|
395 |
|
|
So far, we have only used I/O watchers. These are useful mainly to find |
396 |
root |
1.10 |
out whether a Socket has data to read, or space to write more data. On sane |
397 |
root |
1.2 |
operating systems this also works for console windows/terminals (typically |
398 |
|
|
on standard input), serial lines, all sorts of other devices, basically |
399 |
|
|
almost everything that has a file descriptor but isn't a file itself. (As |
400 |
|
|
usual, "sane" excludes windows - on that platform you would need different |
401 |
root |
1.10 |
functions for all of these, complicating code immensely - think "socket |
402 |
root |
1.2 |
only" on windows). |
403 |
|
|
|
404 |
root |
1.10 |
However, I/O is not everything - the second most important event source is |
405 |
root |
1.2 |
the clock. For example when doing an HTTP request you might want to time |
406 |
root |
1.10 |
out when the server doesn't answer within some predefined amount of time. |
407 |
root |
1.2 |
|
408 |
|
|
In AnyEvent, timer event watchers are created by calling the C<< |
409 |
|
|
AnyEvent->timer >> method: |
410 |
|
|
|
411 |
|
|
use AnyEvent; |
412 |
|
|
|
413 |
|
|
my $cv = AnyEvent->condvar; |
414 |
|
|
|
415 |
|
|
my $wait_one_and_a_half_seconds = AnyEvent->timer ( |
416 |
|
|
after => 1.5, # after how many seconds to invoke the cb? |
417 |
|
|
cb => sub { # the callback to invoke |
418 |
|
|
$cv->send; |
419 |
|
|
}, |
420 |
|
|
); |
421 |
|
|
|
422 |
root |
1.10 |
# can do something else here |
423 |
root |
1.2 |
|
424 |
|
|
# now wait till our time has come |
425 |
|
|
$cv->recv; |
426 |
|
|
|
427 |
|
|
Unlike I/O watchers, timers are only interested in the amount of seconds |
428 |
root |
1.22 |
they have to wait. When (at least) that amount of time has passed, |
429 |
|
|
AnyEvent will invoke your callback. |
430 |
root |
1.2 |
|
431 |
|
|
Unlike I/O watchers, which will call your callback as many times as there |
432 |
root |
1.22 |
is data available, timers are normally one-shot: after they have "fired" |
433 |
|
|
once and invoked your callback, they are dead and no longer do anything. |
434 |
root |
1.2 |
|
435 |
|
|
To get a repeating timer, such as a timer firing roughly once per second, |
436 |
root |
1.22 |
you can specify an C<interval> parameter: |
437 |
root |
1.2 |
|
438 |
root |
1.22 |
my $once_per_second = AnyEvent->timer ( |
439 |
|
|
after => 0, # first invoke ASAP |
440 |
|
|
interval => 1, # then invoke every second |
441 |
|
|
cb => sub { # the callback to invoke |
442 |
|
|
$cv->send; |
443 |
|
|
}, |
444 |
|
|
); |
445 |
root |
1.2 |
|
446 |
|
|
=head3 More esoteric sources |
447 |
|
|
|
448 |
|
|
AnyEvent also has some other, more esoteric event sources you can tap |
449 |
root |
1.22 |
into: signal, child and idle watchers. |
450 |
root |
1.2 |
|
451 |
|
|
Signal watchers can be used to wait for "signal events", which simply |
452 |
root |
1.7 |
means your process got send a signal (such as C<SIGTERM> or C<SIGUSR1>). |
453 |
root |
1.2 |
|
454 |
root |
1.22 |
Child-process watchers wait for a child process to exit. They are useful |
455 |
|
|
when you fork a separate process and need to know when it exits, but you |
456 |
|
|
do not wait for that by blocking. |
457 |
|
|
|
458 |
|
|
Idle watchers invoke their callback when the event loop has handled all |
459 |
|
|
outstanding events, polled for new events and didn't find any, i.e., when |
460 |
|
|
your process is otherwise idle. They are useful if you want to do some |
461 |
|
|
non-trivial data processing that can be done when your program doesn't |
462 |
|
|
have anything better to do. |
463 |
|
|
|
464 |
|
|
All these watcher types are described in detail in the main L<AnyEvent> |
465 |
|
|
manual page. |
466 |
|
|
|
467 |
|
|
Sometimes you also need to know what the current time is: C<< |
468 |
|
|
AnyEvent->now >> returns the time the event toolkit uses to schedule |
469 |
|
|
relative timers, and is usually what you want. It is often cached (which |
470 |
|
|
means it can be a bit outdated). In that case, you can use the more costly |
471 |
|
|
C<< AnyEvent->time >> method which will ask your operating system for the |
472 |
|
|
current time, which is slower, but also more up to date. |
473 |
root |
1.2 |
|
474 |
|
|
=head1 Network programming and AnyEvent |
475 |
|
|
|
476 |
root |
1.3 |
So far you have seen how to register event watchers and handle events. |
477 |
root |
1.1 |
|
478 |
root |
1.22 |
This is a great foundation to write network clients and servers, and might |
479 |
|
|
be all that your module (or program) ever requires, but writing your own |
480 |
|
|
I/O buffering again and again becomes tedious, not to mention that it |
481 |
|
|
attracts errors. |
482 |
root |
1.3 |
|
483 |
|
|
While the core L<AnyEvent> module is still small and self-contained, |
484 |
|
|
the distribution comes with some very useful utility modules such as |
485 |
|
|
L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can |
486 |
|
|
make your life as non-blocking network programmer a lot easier. |
487 |
|
|
|
488 |
root |
1.4 |
Here is a quick overview over these three modules: |
489 |
|
|
|
490 |
|
|
=head2 L<AnyEvent::DNS> |
491 |
|
|
|
492 |
|
|
This module allows fully asynchronous DNS resolution. It is used mainly by |
493 |
|
|
L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is |
494 |
|
|
a great way to do other DNS resolution tasks, such as reverse lookups of |
495 |
|
|
IP addresses for log files. |
496 |
root |
1.1 |
|
497 |
root |
1.2 |
=head2 L<AnyEvent::Handle> |
498 |
root |
1.1 |
|
499 |
root |
1.22 |
This module handles non-blocking IO on (socket-, pipe- etc.) file handles |
500 |
|
|
in an event based manner. It provides a wrapper object around your file |
501 |
|
|
handle that provides queueing and buffering of incoming and outgoing data |
502 |
|
|
for you. |
503 |
root |
1.1 |
|
504 |
root |
1.4 |
It also implements the most common data formats, such as text lines, or |
505 |
|
|
fixed and variable-width data blocks. |
506 |
root |
1.1 |
|
507 |
root |
1.2 |
=head2 L<AnyEvent::Socket> |
508 |
root |
1.1 |
|
509 |
|
|
This module provides you with functions that handle socket creation |
510 |
|
|
and IP address magic. The two main functions are C<tcp_connect> and |
511 |
|
|
C<tcp_server>. The former will connect a (streaming) socket to an internet |
512 |
|
|
host for you and the later will make a server socket for you, to accept |
513 |
|
|
connections. |
514 |
|
|
|
515 |
|
|
This module also comes with transparent IPv6 support, this means: If you |
516 |
|
|
write your programs with this module, you will be IPv6 ready without doing |
517 |
root |
1.4 |
anything special. |
518 |
root |
1.1 |
|
519 |
|
|
It also works around a lot of portability quirks (especially on the |
520 |
|
|
windows platform), which makes it even easier to write your programs in a |
521 |
root |
1.4 |
portable way (did you know that windows uses different error codes for all |
522 |
|
|
socket functions and that Perl does not know about these? That "Unknown |
523 |
|
|
error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was |
524 |
|
|
successful? That unsuccessful TCP connects might never be reported back |
525 |
|
|
to your program? That C<WSAEINPROGRESS> means your C<connect> call was |
526 |
|
|
ignored instead of being in progress? AnyEvent::Socket works around all of |
527 |
|
|
these Windows/Perl bugs for you). |
528 |
|
|
|
529 |
root |
1.11 |
=head2 Implementing a parallel finger client with non-blocking connects |
530 |
root |
1.16 |
and AnyEvent::Socket |
531 |
root |
1.4 |
|
532 |
|
|
The finger protocol is one of the simplest protocols in use on the |
533 |
|
|
internet. Or in use in the past, as almost nobody uses it anymore. |
534 |
|
|
|
535 |
|
|
It works by connecting to the finger port on another host, writing a |
536 |
|
|
single line with a user name and then reading the finger response, as |
537 |
|
|
specified by that user. OK, RFC 1288 specifies a vastly more complex |
538 |
|
|
protocol, but it basically boils down to this: |
539 |
|
|
|
540 |
root |
1.22 |
# telnet kernel.org finger |
541 |
|
|
Trying 204.152.191.37... |
542 |
|
|
Connected to kernel.org (204.152.191.37). |
543 |
root |
1.4 |
Escape character is '^]'. |
544 |
root |
1.22 |
|
545 |
|
|
The latest stable version of the Linux kernel is: [...] |
546 |
root |
1.4 |
Connection closed by foreign host. |
547 |
|
|
|
548 |
root |
1.22 |
So let's write a little AnyEvent function that makes a finger request: |
549 |
root |
1.4 |
|
550 |
|
|
use AnyEvent; |
551 |
|
|
use AnyEvent::Socket; |
552 |
|
|
|
553 |
|
|
sub finger($$) { |
554 |
|
|
my ($user, $host) = @_; |
555 |
|
|
|
556 |
|
|
# use a condvar to return results |
557 |
|
|
my $cv = AnyEvent->condvar; |
558 |
|
|
|
559 |
|
|
# first, connect to the host |
560 |
|
|
tcp_connect $host, "finger", sub { |
561 |
root |
1.8 |
# the callback receives the socket handle - or nothing |
562 |
root |
1.4 |
my ($fh) = @_ |
563 |
|
|
or return $cv->send; |
564 |
|
|
|
565 |
|
|
# now write the username |
566 |
|
|
syswrite $fh, "$user\015\012"; |
567 |
|
|
|
568 |
|
|
my $response; |
569 |
|
|
|
570 |
|
|
# register a read watcher |
571 |
|
|
my $read_watcher; $read_watcher = AnyEvent->io ( |
572 |
|
|
fh => $fh, |
573 |
|
|
poll => "r", |
574 |
|
|
cb => sub { |
575 |
|
|
my $len = sysread $fh, $response, 1024, length $response; |
576 |
|
|
|
577 |
|
|
if ($len <= 0) { |
578 |
|
|
# we are done, or an error occured, lets ignore the latter |
579 |
|
|
undef $read_watcher; # no longer interested |
580 |
|
|
$cv->send ($response); # send results |
581 |
|
|
} |
582 |
|
|
}, |
583 |
|
|
); |
584 |
|
|
}; |
585 |
|
|
|
586 |
|
|
# pass $cv to the caller |
587 |
|
|
$cv |
588 |
|
|
} |
589 |
|
|
|
590 |
root |
1.11 |
That's a mouthful! Let's dissect this function a bit, first the overall |
591 |
|
|
function and execution flow: |
592 |
root |
1.4 |
|
593 |
|
|
sub finger($$) { |
594 |
|
|
my ($user, $host) = @_; |
595 |
|
|
|
596 |
|
|
# use a condvar to return results |
597 |
|
|
my $cv = AnyEvent->condvar; |
598 |
|
|
|
599 |
|
|
# first, connect to the host |
600 |
|
|
tcp_connect $host, "finger", sub { |
601 |
|
|
... |
602 |
|
|
}; |
603 |
|
|
|
604 |
|
|
$cv |
605 |
|
|
} |
606 |
|
|
|
607 |
root |
1.11 |
This isn't too complicated, just a function with two parameters, that |
608 |
root |
1.4 |
creates a condition variable, returns it, and while it does that, |
609 |
root |
1.11 |
initiates a TCP connect to C<$host>. The condition variable will be used |
610 |
|
|
by the caller to receive the finger response, but one could equally well |
611 |
|
|
pass a third argument, a callback, to the function. |
612 |
root |
1.4 |
|
613 |
root |
1.11 |
Since we are programming event'ish, we do not wait for the connect to |
614 |
|
|
finish - it could block the program for a minute or longer! |
615 |
|
|
|
616 |
|
|
Instead, we pass the callback it should invoke when the connect is done to |
617 |
|
|
C<tcp_connect>. If it is successful, that callback gets called with the |
618 |
root |
1.5 |
socket handle as first argument, otherwise, nothing will be passed to our |
619 |
root |
1.11 |
callback. The important point is that it will always be called as soon as |
620 |
|
|
the outcome of the TCP connect is known. |
621 |
|
|
|
622 |
|
|
This style of programming is also called "continuation style": the |
623 |
root |
1.22 |
"continuation" is simply the way the program continues - normally at the |
624 |
|
|
next line after some statement (the exception is loops or things like |
625 |
|
|
C<return>). When we are interested in events, however, we instead specify |
626 |
|
|
the "continuation" of our program by passing a closure, which makes that |
627 |
|
|
closure the "continuation" of the program. |
628 |
|
|
|
629 |
|
|
The C<tcp_connect> call is like saying "return now, and when the |
630 |
|
|
connection is established or it failed, continue there". |
631 |
root |
1.4 |
|
632 |
root |
1.11 |
Now let's look at the callback/closure in more detail: |
633 |
root |
1.4 |
|
634 |
root |
1.11 |
# the callback receives the socket handle - or nothing |
635 |
root |
1.4 |
my ($fh) = @_ |
636 |
|
|
or return $cv->send; |
637 |
|
|
|
638 |
root |
1.5 |
The first thing the callback does is indeed save the socket handle in |
639 |
|
|
C<$fh>. When there was an error (no arguments), then our instinct as |
640 |
root |
1.11 |
expert Perl programmers would tell us to C<die>: |
641 |
root |
1.4 |
|
642 |
|
|
my ($fh) = @_ |
643 |
|
|
or die "$host: $!"; |
644 |
root |
1.1 |
|
645 |
root |
1.11 |
While this would give good feedback to the user (if he happens to watch |
646 |
|
|
standard error), our program would probably stop working here, as we never |
647 |
|
|
report the results to anybody, certainly not the caller of our C<finger> |
648 |
|
|
function, and most event loops continue even after a C<die>! |
649 |
|
|
|
650 |
|
|
This is why we instead C<return>, but also call C<< $cv->send >> without |
651 |
|
|
any arguments to signal to the condvar consumer that something bad has |
652 |
root |
1.22 |
happened. The return value of C<< $cv->send >> is irrelevant, as is |
653 |
|
|
the return value of our callback. The C<return> statement is simply |
654 |
|
|
used for the side effect of, well, returning immediately from the |
655 |
|
|
callback. Checking for errors and handling them this way is very common, |
656 |
|
|
which is why this compact idiom is so handy. |
657 |
root |
1.4 |
|
658 |
|
|
As the next step in the finger protocol, we send the username to the |
659 |
root |
1.22 |
finger daemon on the other side of our connection (the kernel.org finger |
660 |
|
|
service doesn't actually wait for a username, but the net is running out |
661 |
|
|
of finger servers fast): |
662 |
root |
1.4 |
|
663 |
|
|
syswrite $fh, "$user\015\012"; |
664 |
|
|
|
665 |
root |
1.11 |
Note that this isn't 100% clean socket programming - the socket could, |
666 |
|
|
for whatever reasons, not accept our data. When writing a small amount |
667 |
|
|
of data like in this example it doesn't matter, as a socket buffer is |
668 |
|
|
almost always big enough for a mere "username", but for real-world |
669 |
|
|
cases you might need to implement some kind of write buffering - or use |
670 |
|
|
L<AnyEvent::Handle>, which handles these matters for you, as shown in the |
671 |
|
|
next section. |
672 |
root |
1.4 |
|
673 |
root |
1.11 |
What we I<do> have to do is to implement our own read buffer - the response |
674 |
root |
1.4 |
data could arrive late or in multiple chunks, and we cannot just wait for |
675 |
|
|
it (event-based programming, you know?). |
676 |
|
|
|
677 |
|
|
To do that, we register a read watcher on the socket which waits for data: |
678 |
|
|
|
679 |
|
|
my $read_watcher; $read_watcher = AnyEvent->io ( |
680 |
|
|
fh => $fh, |
681 |
|
|
poll => "r", |
682 |
|
|
|
683 |
|
|
There is a trick here, however: the read watcher isn't stored in a global |
684 |
|
|
variable, but in a local one - if the callback returns, it would normally |
685 |
|
|
destroy the variable and its contents, which would in turn unregister our |
686 |
root |
1.5 |
watcher. |
687 |
root |
1.4 |
|
688 |
|
|
To avoid that, we C<undef>ine the variable in the watcher callback. This |
689 |
root |
1.22 |
means that, when the C<tcp_connect> callback returns, perl thinks (quite |
690 |
|
|
correctly) that the read watcher is still in use - namely in the callback, |
691 |
|
|
and thus keeps it alive even if nothing else in the program refers to it |
692 |
|
|
anymore (it is much like Baron Münchhausen keeping himself from dying by |
693 |
|
|
pulling himself out of a swamp). |
694 |
root |
1.4 |
|
695 |
root |
1.11 |
The trick, however, is that instead of: |
696 |
|
|
|
697 |
|
|
my $read_watcher = AnyEvent->io (... |
698 |
|
|
|
699 |
|
|
The program does: |
700 |
|
|
|
701 |
|
|
my $read_watcher; $read_watcher = AnyEvent->io (... |
702 |
|
|
|
703 |
|
|
The reason for this is a quirk in the way Perl works: variable names |
704 |
|
|
declared with C<my> are only visible in the I<next> statement. If the |
705 |
|
|
whole C<< AnyEvent->io >> call, including the callback, would be done in |
706 |
|
|
a single statement, the callback could not refer to the C<$read_watcher> |
707 |
|
|
variable to undefine it, so it is done in two statements. |
708 |
|
|
|
709 |
|
|
Whether you'd want to format it like this is of course a matter of style, |
710 |
|
|
this way emphasizes that the declaration and assignment really are one |
711 |
|
|
logical statement. |
712 |
|
|
|
713 |
root |
1.4 |
The callback itself calls C<sysread> for as many times as necessary, until |
714 |
root |
1.11 |
C<sysread> returns either an error or end-of-file: |
715 |
root |
1.4 |
|
716 |
|
|
cb => sub { |
717 |
|
|
my $len = sysread $fh, $response, 1024, length $response; |
718 |
|
|
|
719 |
|
|
if ($len <= 0) { |
720 |
|
|
|
721 |
|
|
Note that C<sysread> has the ability to append data it reads to a scalar, |
722 |
root |
1.22 |
by specifying an offset, a feature of which we make good use of in this |
723 |
root |
1.11 |
example. |
724 |
root |
1.4 |
|
725 |
|
|
When C<sysread> indicates we are done, the callback C<undef>ines |
726 |
|
|
the watcher and then C<send>'s the response data to the condition |
727 |
|
|
variable. All this has the following effects: |
728 |
|
|
|
729 |
|
|
Undefining the watcher destroys it, as our callback was the only one still |
730 |
|
|
having a reference to it. When the watcher gets destroyed, it destroys the |
731 |
|
|
callback, which in turn means the C<$fh> handle is no longer used, so that |
732 |
|
|
gets destroyed as well. The result is that all resources will be nicely |
733 |
|
|
cleaned up by perl for us. |
734 |
|
|
|
735 |
|
|
=head3 Using the finger client |
736 |
|
|
|
737 |
root |
1.5 |
Now, we could probably write the same finger client in a simpler way if |
738 |
|
|
we used C<IO::Socket::INET>, ignored the problem of multiple hosts and |
739 |
|
|
ignored IPv6 and a few other things that C<tcp_connect> handles for us. |
740 |
root |
1.4 |
|
741 |
|
|
But the main advantage is that we can not only run this finger function in |
742 |
|
|
the background, we even can run multiple sessions in parallel, like this: |
743 |
|
|
|
744 |
|
|
my $f1 = finger "trouble", "noc.dfn.de"; # check for trouble tickets |
745 |
|
|
my $f2 = finger "1736" , "noc.dfn.de"; # fetch ticket 1736 |
746 |
root |
1.22 |
my $f3 = finger "hpa" , "kernel.org"; # finger hpa |
747 |
root |
1.4 |
|
748 |
root |
1.22 |
print "trouble tickets:\n" , $f1->recv, "\n"; |
749 |
root |
1.4 |
print "trouble ticket #1736:\n", $f2->recv, "\n"; |
750 |
root |
1.22 |
print "kernel release info: " , $f3->recv, "\n"; |
751 |
root |
1.4 |
|
752 |
|
|
It doesn't look like it, but in fact all three requests run in |
753 |
root |
1.9 |
parallel. The code waits for the first finger request to finish first, but |
754 |
root |
1.11 |
that doesn't keep it from executing them parallel: when the first C<recv> |
755 |
|
|
call sees that the data isn't ready yet, it serves events for all three |
756 |
|
|
requests automatically, until the first request has finished. |
757 |
|
|
|
758 |
|
|
The second C<recv> call might either find the data is already there, or it |
759 |
|
|
will continue handling events until that is the case, and so on. |
760 |
root |
1.9 |
|
761 |
|
|
By taking advantage of network latencies, which allows us to serve other |
762 |
|
|
requests and events while we wait for an event on one socket, the overall |
763 |
root |
1.11 |
time to do these three requests will be greatly reduced, typically all |
764 |
|
|
three are done in the same time as the slowest of them would need to finish. |
765 |
root |
1.5 |
|
766 |
|
|
By the way, you do not actually have to wait in the C<recv> method on an |
767 |
root |
1.11 |
AnyEvent condition variable - after all, waiting is evil - you can also |
768 |
|
|
register a callback: |
769 |
root |
1.5 |
|
770 |
|
|
$cv->cb (sub { |
771 |
|
|
my $response = shift->recv; |
772 |
|
|
# ... |
773 |
|
|
}); |
774 |
|
|
|
775 |
|
|
The callback will only be invoked when C<send> was called. In fact, |
776 |
|
|
instead of returning a condition variable you could also pass a third |
777 |
|
|
parameter to your finger function, the callback to invoke with the |
778 |
|
|
response: |
779 |
|
|
|
780 |
|
|
sub finger($$$) { |
781 |
|
|
my ($user, $host, $cb) = @_; |
782 |
|
|
|
783 |
root |
1.11 |
How you implement it is a matter of taste - if you expect your function to |
784 |
|
|
be used mainly in an event-based program you would normally prefer to pass |
785 |
|
|
a callback directly. If you write a module and expect your users to use |
786 |
|
|
it "synchronously" often (for example, a simple http-get script would not |
787 |
|
|
really care much for events), then you would use a condition variable and |
788 |
root |
1.22 |
tell them "simply C<< ->recv >> the data". |
789 |
root |
1.4 |
|
790 |
root |
1.11 |
=head3 Problems with the implementation and how to fix them |
791 |
root |
1.4 |
|
792 |
|
|
To make this example more real-world-ready, we would not only implement |
793 |
root |
1.22 |
some write buffering (for the paranoid, or maybe denial-of-service aware |
794 |
|
|
security expert), but we would also have to handle timeouts and maybe |
795 |
|
|
protocol errors. |
796 |
root |
1.4 |
|
797 |
root |
1.11 |
Doing this quickly gets unwieldy, which is why we introduce |
798 |
|
|
L<AnyEvent::Handle> in the next section, which takes care of all these |
799 |
|
|
details for you and let's you concentrate on the actual protocol. |
800 |
|
|
|
801 |
|
|
|
802 |
|
|
=head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle |
803 |
|
|
|
804 |
root |
1.22 |
The L<AnyEvent::Handle> module has been hyped quite a bit in this document |
805 |
|
|
so far, so let's see what it really offers. |
806 |
root |
1.11 |
|
807 |
|
|
As finger is such a simple protocol, let's try something slightly more |
808 |
|
|
complicated: HTTP/1.0. |
809 |
|
|
|
810 |
|
|
An HTTP GET request works by sending a single request line that indicates |
811 |
|
|
what you want the server to do and the URI you want to act it on, followed |
812 |
|
|
by as many "header" lines (C<Header: data>, same as e-mail headers) as |
813 |
|
|
required for the request, ended by an empty line. |
814 |
|
|
|
815 |
|
|
The response is formatted very similarly, first a line with the response |
816 |
|
|
status, then again as many header lines as required, then an empty line, |
817 |
|
|
followed by any data that the server might send. |
818 |
|
|
|
819 |
|
|
Again, let's try it out with C<telnet> (I condensed the output a bit - if |
820 |
|
|
you want to see the full response, do it yourself). |
821 |
|
|
|
822 |
|
|
# telnet www.google.com 80 |
823 |
|
|
Trying 209.85.135.99... |
824 |
|
|
Connected to www.google.com (209.85.135.99). |
825 |
|
|
Escape character is '^]'. |
826 |
|
|
GET /test HTTP/1.0 |
827 |
|
|
|
828 |
|
|
HTTP/1.0 404 Not Found |
829 |
|
|
Date: Mon, 02 Jun 2008 07:05:54 GMT |
830 |
|
|
Content-Type: text/html; charset=UTF-8 |
831 |
|
|
|
832 |
|
|
<html><head> |
833 |
|
|
[...] |
834 |
|
|
Connection closed by foreign host. |
835 |
|
|
|
836 |
|
|
The C<GET ...> and the empty line were entered manually, the rest of the |
837 |
|
|
telnet output is google's response, in which case a C<404 not found> one. |
838 |
|
|
|
839 |
|
|
So, here is how you would do it with C<AnyEvent::Handle>: |
840 |
|
|
|
841 |
root |
1.12 |
sub http_get { |
842 |
|
|
my ($host, $uri, $cb) = @_; |
843 |
|
|
|
844 |
root |
1.24 |
# store results here |
845 |
|
|
my ($response, $header, $body); |
846 |
|
|
|
847 |
|
|
my $handle; $handle = new AnyEvent::Handle |
848 |
|
|
connect => [$host => 'http'], |
849 |
|
|
on_error => sub { |
850 |
|
|
$cb->("HTTP/1.0 500 $!"); |
851 |
|
|
$handle->destroy; # explicitly destroy handle |
852 |
|
|
}, |
853 |
|
|
on_eof => sub { |
854 |
|
|
$cb->($response, $header, $body); |
855 |
|
|
$handle->destroy; # explicitly destroy handle |
856 |
|
|
}; |
857 |
root |
1.12 |
|
858 |
root |
1.24 |
$handle->push_write ("GET $uri HTTP/1.0\015\012\015\012"); |
859 |
root |
1.12 |
|
860 |
root |
1.24 |
# now fetch response status line |
861 |
|
|
$handle->push_read (line => sub { |
862 |
|
|
my ($handle, $line) = @_; |
863 |
|
|
$response = $line; |
864 |
|
|
}); |
865 |
root |
1.12 |
|
866 |
root |
1.24 |
# then the headers |
867 |
|
|
$handle->push_read (line => "\015\012\015\012", sub { |
868 |
|
|
my ($handle, $line) = @_; |
869 |
|
|
$header = $line; |
870 |
|
|
}); |
871 |
root |
1.12 |
|
872 |
root |
1.24 |
# and finally handle any remaining data as body |
873 |
|
|
$handle->on_read (sub { |
874 |
|
|
$body .= $_[0]->rbuf; |
875 |
|
|
$_[0]->rbuf = ""; |
876 |
|
|
}); |
877 |
root |
1.12 |
} |
878 |
root |
1.11 |
|
879 |
|
|
And now let's go through it step by step. First, as usual, the overall |
880 |
|
|
C<http_get> function structure: |
881 |
|
|
|
882 |
|
|
sub http_get { |
883 |
|
|
my ($host, $uri, $cb) = @_; |
884 |
|
|
|
885 |
root |
1.24 |
# store results here |
886 |
|
|
my ($response, $header, $body); |
887 |
|
|
|
888 |
|
|
my $handle; $handle = new AnyEvent::Handle |
889 |
|
|
... create handle object |
890 |
|
|
|
891 |
|
|
... push data to write |
892 |
|
|
|
893 |
|
|
... push what to expect to read queue |
894 |
root |
1.11 |
} |
895 |
|
|
|
896 |
|
|
Unlike in the finger example, this time the caller has to pass a callback |
897 |
|
|
to C<http_get>. Also, instead of passing a URL as one would expect, the |
898 |
|
|
caller has to provide the hostname and URI - normally you would use the |
899 |
|
|
C<URI> module to parse a URL and separate it into those parts, but that is |
900 |
|
|
left to the inspired reader :) |
901 |
|
|
|
902 |
|
|
Since everything else is left to the caller, all C<http_get> does it to |
903 |
root |
1.24 |
initiate the connection by creating the AnyEvent::Handle object (which |
904 |
|
|
calls C<tcp_connect> for us) and leave everything else to it's callback. |
905 |
|
|
|
906 |
|
|
The handle object is created, unsurprisingly, by calling the C<new> |
907 |
|
|
method of L<AnyEvent::Handle>: |
908 |
root |
1.11 |
|
909 |
root |
1.24 |
my $handle; $handle = new AnyEvent::Handle |
910 |
|
|
connect => [$host => 'http'], |
911 |
|
|
on_error => sub { |
912 |
|
|
$cb->("HTTP/1.0 500 $!"); |
913 |
|
|
$handle->destroy; # explicitly destroy handle |
914 |
|
|
}, |
915 |
|
|
on_eof => sub { |
916 |
|
|
$cb->($response, $header, $body); |
917 |
|
|
$handle->destroy; # explicitly destroy handle |
918 |
|
|
}; |
919 |
root |
1.11 |
|
920 |
root |
1.24 |
The C<connect> argument tells AnyEvent::Handle to call C<tcp_connect> for |
921 |
|
|
the specified host and service/port. |
922 |
root |
1.11 |
|
923 |
root |
1.24 |
The C<on_error> callback will be called on any unexpected error, such as a |
924 |
|
|
refused connection, or unexpected connection while reading the header. |
925 |
root |
1.11 |
|
926 |
|
|
Instead of having an extra mechanism to signal errors, connection errors |
927 |
|
|
are signalled by crafting a special "response status line", like this: |
928 |
|
|
|
929 |
|
|
HTTP/1.0 500 Connection refused |
930 |
|
|
|
931 |
|
|
This means the caller cannot distinguish (easily) between |
932 |
|
|
locally-generated errors and server errors, but it simplifies error |
933 |
|
|
handling for the caller a lot. |
934 |
|
|
|
935 |
root |
1.24 |
The error callback also destroys the handle explicitly, because we are not |
936 |
|
|
interested in continuing after any errors. In AnyEvent::Handle callbacks |
937 |
|
|
you have to call C<destroy> explicitly to destroy a handle. Outside of |
938 |
|
|
those callbacks you cna just forget the object reference and it will be |
939 |
|
|
automatically cleaned up. |
940 |
|
|
|
941 |
|
|
Last not least, we set an C<on_eof> callback that is called when the |
942 |
|
|
other side indicates it has stopped writing data, which we will use to |
943 |
|
|
gracefully shut down the handle and report the results. This callback is |
944 |
|
|
only called when the read queue is empty - if the read queue expects some |
945 |
|
|
data and the handle gets an EOF from the other side this will be an error |
946 |
|
|
- after all, you did expect more to come. |
947 |
root |
1.11 |
|
948 |
root |
1.24 |
If you wanted to write a server using AnyEvent::Handle, you would use |
949 |
|
|
C<tcp_accept> and then create the AnyEvent::Handle with the C<fh> |
950 |
root |
1.11 |
argument. |
951 |
|
|
|
952 |
|
|
=head3 The write queue |
953 |
|
|
|
954 |
|
|
The next line sends the actual request: |
955 |
|
|
|
956 |
root |
1.24 |
$handle->push_write ("GET $uri HTTP/1.0\015\012\015\012"); |
957 |
root |
1.11 |
|
958 |
|
|
No headers will be sent (this is fine for simple requests), so the whole |
959 |
|
|
request is just a single line followed by an empty line to signal the end |
960 |
|
|
of the headers to the server. |
961 |
|
|
|
962 |
|
|
The more interesting question is why the method is called C<push_write> |
963 |
|
|
and not just write. The reason is that you can I<always> add some write |
964 |
|
|
data without blocking, and to do this, AnyEvent::Handle needs some write |
965 |
root |
1.22 |
queue internally - and C<push_write> simply pushes some data onto the end |
966 |
|
|
of that queue, just like Perl's C<push> pushes data onto the end of an |
967 |
|
|
array. |
968 |
root |
1.11 |
|
969 |
|
|
The deeper reason is that at some point in the future, there might |
970 |
|
|
be C<unshift_write> as well, and in any case, we will shortly meet |
971 |
root |
1.22 |
C<push_read> and C<unshift_read>, and it's usually easiest to remember if |
972 |
root |
1.24 |
all those functions have some symmetry in their name. So C<push> is used |
973 |
|
|
as the opposite of C<unshift> in AnyEvent::Handle, not as the opposite of |
974 |
|
|
C<pull> - just like in Perl. |
975 |
|
|
|
976 |
|
|
Note that we call C<push_write> right after creating the AnyEvent::Handle |
977 |
|
|
object, before it has had time to actually connect to the server. This is |
978 |
|
|
fine, pushing the read and write requests will simply queue them in the |
979 |
|
|
handle object until the connection has been established. Alternatively, we |
980 |
|
|
could do this "on demand" in the C<on_connect> callback. |
981 |
root |
1.11 |
|
982 |
|
|
If C<push_write> is called with more than one argument, then you can even |
983 |
|
|
do I<formatted> I/O, which simply means your data will be transformed in |
984 |
|
|
some ways. For example, this would JSON-encode your data before pushing it |
985 |
|
|
to the write queue: |
986 |
|
|
|
987 |
|
|
$handle->push_write (json => [1, 2, 3]); |
988 |
|
|
|
989 |
|
|
Apart from that, this pretty much summarises the write queue, there is |
990 |
|
|
little else to it. |
991 |
|
|
|
992 |
root |
1.22 |
Reading the response is far more interesting, because it involves the more |
993 |
|
|
powerful and complex I<read queue>: |
994 |
root |
1.11 |
|
995 |
|
|
=head3 The read queue |
996 |
|
|
|
997 |
root |
1.22 |
The response consists of three parts: a single line with the response |
998 |
|
|
status, a single paragraph of headers ended by an empty line, and the |
999 |
|
|
request body, which is simply the remaining data on that connection. |
1000 |
root |
1.11 |
|
1001 |
|
|
For the first two, we push two read requests onto the read queue: |
1002 |
|
|
|
1003 |
root |
1.24 |
# now fetch response status line |
1004 |
|
|
$handle->push_read (line => sub { |
1005 |
|
|
my ($handle, $line) = @_; |
1006 |
|
|
$response = $line; |
1007 |
|
|
}); |
1008 |
root |
1.11 |
|
1009 |
root |
1.24 |
# then the headers |
1010 |
|
|
$handle->push_read (line => "\015\012\015\012", sub { |
1011 |
|
|
my ($handle, $line) = @_; |
1012 |
|
|
$header = $line; |
1013 |
|
|
}); |
1014 |
root |
1.11 |
|
1015 |
root |
1.22 |
While one can simply push a single callback to parse the data the |
1016 |
|
|
queue, I<formatted> I/O really comes to our advantage here, as there |
1017 |
|
|
is a ready-made "read line" read type. The first read expects a single |
1018 |
|
|
line, ended by C<\015\012> (the standard end-of-line marker in internet |
1019 |
|
|
protocols). |
1020 |
root |
1.11 |
|
1021 |
|
|
The second "line" is actually a single paragraph - instead of reading it |
1022 |
|
|
line by line we tell C<push_read> that the end-of-line marker is really |
1023 |
|
|
C<\015\012\015\012>, which is an empty line. The result is that the whole |
1024 |
|
|
header paragraph will be treated as a single line and read. The word |
1025 |
|
|
"line" is interpreted very freely, much like Perl itself does it. |
1026 |
|
|
|
1027 |
|
|
Note that push read requests are pushed immediately after creating the |
1028 |
|
|
handle object - since AnyEvent::Handle provides a queue we can push as |
1029 |
|
|
many requests as we want, and AnyEvent::Handle will handle them in order. |
1030 |
|
|
|
1031 |
|
|
There is, however, no read type for "the remaining data". For that, we |
1032 |
|
|
install our own C<on_read> callback: |
1033 |
|
|
|
1034 |
root |
1.24 |
# and finally handle any remaining data as body |
1035 |
|
|
$handle->on_read (sub { |
1036 |
|
|
$body .= $_[0]->rbuf; |
1037 |
|
|
$_[0]->rbuf = ""; |
1038 |
|
|
}); |
1039 |
root |
1.11 |
|
1040 |
|
|
This callback is invoked every time data arrives and the read queue is |
1041 |
|
|
empty - which in this example will only be the case when both response and |
1042 |
root |
1.12 |
header have been read. The C<on_read> callback could actually have been |
1043 |
|
|
specified when constructing the object, but doing it this way preserves |
1044 |
|
|
logical ordering. |
1045 |
root |
1.1 |
|
1046 |
root |
1.12 |
The read callback simply adds the current read buffer to it's C<$body> |
1047 |
root |
1.22 |
variable and, most importantly, I<empties> the buffer by assigning the |
1048 |
|
|
empty string to it. |
1049 |
root |
1.1 |
|
1050 |
root |
1.22 |
After AnyEvent::Handle has been so instructed, it will handle incoming |
1051 |
root |
1.12 |
data according to these instructions - if all goes well, the callback will |
1052 |
|
|
be invoked with the response data, if not, it will get an error. |
1053 |
root |
1.1 |
|
1054 |
root |
1.22 |
In general, you can implement pipelining (a semi-advanced feature of many |
1055 |
|
|
protocols) very easy with AnyEvent::Handle: If you have a protocol with a |
1056 |
|
|
request/response structure, your request methods/functions will all look |
1057 |
|
|
like this (simplified): |
1058 |
root |
1.13 |
|
1059 |
|
|
sub request { |
1060 |
|
|
|
1061 |
|
|
# send the request to the server |
1062 |
|
|
$handle->push_write (...); |
1063 |
|
|
|
1064 |
|
|
# push some response handlers |
1065 |
|
|
$handle->push_read (...); |
1066 |
|
|
} |
1067 |
|
|
|
1068 |
root |
1.22 |
This means you can queue as many requests as you want, and while |
1069 |
|
|
AnyEvent::Handle goes through its read queue to handle the response data, |
1070 |
|
|
the other side can work on the next request - queueing the request just |
1071 |
|
|
appends some data to the write queue and installs a handler to be called |
1072 |
|
|
later. |
1073 |
|
|
|
1074 |
|
|
You might ask yourself how to handle decisions you can only make I<after> |
1075 |
|
|
you have received some data (such as handling a short error response or a |
1076 |
|
|
long and differently-formatted response). The answer to this problem is |
1077 |
|
|
C<unshift_read>, which we will introduce together with an example in the |
1078 |
|
|
coming sections. |
1079 |
root |
1.1 |
|
1080 |
root |
1.22 |
=head3 Using C<http_get> |
1081 |
|
|
|
1082 |
|
|
Finally, here is how you would use C<http_get>: |
1083 |
root |
1.1 |
|
1084 |
root |
1.12 |
http_get "www.google.com", "/", sub { |
1085 |
|
|
my ($response, $header, $body) = @_; |
1086 |
root |
1.1 |
|
1087 |
root |
1.12 |
print |
1088 |
|
|
$response, "\n", |
1089 |
|
|
$body; |
1090 |
|
|
}; |
1091 |
root |
1.1 |
|
1092 |
root |
1.12 |
And of course, you can run as many of these requests in parallel as you |
1093 |
|
|
want (and your memory supports). |
1094 |
root |
1.1 |
|
1095 |
root |
1.13 |
=head3 HTTPS |
1096 |
|
|
|
1097 |
|
|
Now, as promised, let's implement the same thing for HTTPS, or more |
1098 |
|
|
correctly, let's change our C<http_get> function into a function that |
1099 |
|
|
speaks HTTPS instead. |
1100 |
|
|
|
1101 |
|
|
HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer |
1102 |
|
|
B<S>ecurity is the official name for what most people refer to as C<SSL>) |
1103 |
root |
1.22 |
that contains standard HTTP protocol exchanges. The only other difference |
1104 |
|
|
to HTTP is that by default it uses port C<443> instead of port C<80>. |
1105 |
root |
1.13 |
|
1106 |
root |
1.22 |
To implement these two differences we need two tiny changes, first, in the |
1107 |
root |
1.24 |
C<connect> parameter, we replace C<http> by C<https> to connect to the |
1108 |
|
|
https port: |
1109 |
root |
1.13 |
|
1110 |
root |
1.24 |
connect => [$host => 'https'], |
1111 |
root |
1.13 |
|
1112 |
|
|
The other change deals with TLS, which is something L<AnyEvent::Handle> |
1113 |
root |
1.22 |
does for us, as long as I<you> made sure that the L<Net::SSLeay> module |
1114 |
|
|
is around. To enable TLS with L<AnyEvent::Handle>, we simply pass an |
1115 |
|
|
additional C<tls> parameter to the call to C<AnyEvent::Handle::new>: |
1116 |
root |
1.13 |
|
1117 |
|
|
tls => "connect", |
1118 |
|
|
|
1119 |
|
|
Specifying C<tls> enables TLS, and the argument specifies whether |
1120 |
|
|
AnyEvent::Handle is the server side ("accept") or the client side |
1121 |
|
|
("connect") for the TLS connection, as unlike TCP, there is a clear |
1122 |
|
|
server/client relationship in TLS. |
1123 |
|
|
|
1124 |
root |
1.14 |
That's all. |
1125 |
|
|
|
1126 |
root |
1.22 |
Of course, all this should be handled transparently by C<http_get> |
1127 |
|
|
after parsing the URL. If you need this, see the part about exercising |
1128 |
|
|
your inspiration earlier in this document. You could also use the |
1129 |
|
|
L<AnyEvent::HTTP> module from CPAN, which implements all this and works |
1130 |
|
|
around a lot of quirks for you, too. |
1131 |
root |
1.13 |
|
1132 |
root |
1.12 |
=head3 The read queue - revisited |
1133 |
root |
1.1 |
|
1134 |
root |
1.13 |
HTTP always uses the same structure in its responses, but many protocols |
1135 |
root |
1.22 |
require parsing responses differently depending on the response itself. |
1136 |
root |
1.13 |
|
1137 |
|
|
For example, in SMTP, you normally get a single response line: |
1138 |
|
|
|
1139 |
|
|
220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
1140 |
|
|
|
1141 |
|
|
But SMTP also supports multi-line responses: |
1142 |
|
|
|
1143 |
|
|
220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
1144 |
|
|
220-hey guys |
1145 |
|
|
220 my response is longer than yours |
1146 |
|
|
|
1147 |
|
|
To handle this, we need C<unshift_read>. As the name (hopefully) implies, |
1148 |
root |
1.22 |
C<unshift_read> will not append your read request to the end of the read |
1149 |
root |
1.13 |
queue, but instead it will prepend it to the queue. |
1150 |
|
|
|
1151 |
root |
1.22 |
This is useful in the situation above: Just push your response-line read |
1152 |
root |
1.13 |
request when sending the SMTP command, and when handling it, you look at |
1153 |
root |
1.22 |
the line to see if more is to come, and C<unshift_read> another reader |
1154 |
|
|
callback if required, like this: |
1155 |
root |
1.13 |
|
1156 |
|
|
my $response; # response lines end up in here |
1157 |
|
|
|
1158 |
|
|
my $read_response; $read_response = sub { |
1159 |
|
|
my ($handle, $line) = @_; |
1160 |
|
|
|
1161 |
|
|
$response .= "$line\n"; |
1162 |
|
|
|
1163 |
|
|
# check for continuation lines ("-" as 4th character") |
1164 |
|
|
if ($line =~ /^...-/) { |
1165 |
|
|
# if yes, then unshift another line read |
1166 |
|
|
$handle->unshift_read (line => $read_response); |
1167 |
|
|
|
1168 |
|
|
} else { |
1169 |
|
|
# otherwise we are done |
1170 |
|
|
|
1171 |
|
|
# free callback |
1172 |
|
|
undef $read_response; |
1173 |
|
|
|
1174 |
|
|
print "we are don reading: $response\n"; |
1175 |
|
|
} |
1176 |
|
|
}; |
1177 |
|
|
|
1178 |
|
|
$handle->push_read (line => $read_response); |
1179 |
root |
1.1 |
|
1180 |
root |
1.13 |
This recipe can be used for all similar parsing problems, for example in |
1181 |
|
|
NNTP, the response code to some commands indicates that more data will be |
1182 |
root |
1.14 |
sent: |
1183 |
|
|
|
1184 |
|
|
$handle->push_write ("article 42"); |
1185 |
|
|
|
1186 |
|
|
# read response line |
1187 |
|
|
$handle->push_read (line => sub { |
1188 |
|
|
my ($handle, $status) = @_; |
1189 |
|
|
|
1190 |
|
|
# article data following? |
1191 |
|
|
if ($status =~ /^2/) { |
1192 |
|
|
# yes, read article body |
1193 |
|
|
|
1194 |
|
|
$handle->unshift_read (line => "\012.\015\012", sub { |
1195 |
|
|
my ($handle, $body) = @_; |
1196 |
|
|
|
1197 |
|
|
$finish->($status, $body); |
1198 |
|
|
}); |
1199 |
|
|
|
1200 |
|
|
} else { |
1201 |
|
|
# some error occured, no article data |
1202 |
|
|
|
1203 |
|
|
$finish->($status); |
1204 |
|
|
} |
1205 |
|
|
} |
1206 |
|
|
|
1207 |
|
|
=head3 Your own read queue handler |
1208 |
|
|
|
1209 |
|
|
Sometimes, your protocol doesn't play nice and uses lines or chunks of |
1210 |
root |
1.22 |
data not formatted in a way handled by AnyEvent::Handle out of the box. In |
1211 |
|
|
this case you have to implement your own read parser. |
1212 |
root |
1.14 |
|
1213 |
|
|
To make up a contorted example, imagine you are looking for an even |
1214 |
|
|
number of characters followed by a colon (":"). Also imagine that |
1215 |
|
|
AnyEvent::Handle had no C<regex> read type which could be used, so you'd |
1216 |
|
|
had to do it manually. |
1217 |
|
|
|
1218 |
root |
1.22 |
To implement a read handler for this, you would C<push_read> (or |
1219 |
|
|
C<unshift_read>) just a single code reference. |
1220 |
root |
1.14 |
|
1221 |
|
|
This code reference will then be called each time there is (new) data |
1222 |
root |
1.22 |
available in the read buffer, and is expected to either successfully |
1223 |
|
|
eat/consume some of that data (and return true) or to return false to |
1224 |
|
|
indicate that it wants to be called again. |
1225 |
|
|
|
1226 |
|
|
If the code reference returns true, then it will be removed from the |
1227 |
|
|
read queue (because it has parsed/consumed whatever it was supposed to |
1228 |
|
|
consume), otherwise it stays in the front of it. |
1229 |
root |
1.14 |
|
1230 |
|
|
The example above could be coded like this: |
1231 |
|
|
|
1232 |
|
|
$handle->push_read (sub { |
1233 |
|
|
my ($handle) = @_; |
1234 |
|
|
|
1235 |
|
|
# check for even number of characters + ":" |
1236 |
|
|
# and remove the data if a match is found. |
1237 |
|
|
# if not, return false (actually nothing) |
1238 |
|
|
|
1239 |
|
|
$handle->{rbuf} =~ s/^( (?:..)* ) ://x |
1240 |
|
|
or return; |
1241 |
|
|
|
1242 |
|
|
# we got some data in $1, pass it to whoever wants it |
1243 |
|
|
$finish->($1); |
1244 |
|
|
|
1245 |
|
|
# and return true to indicate we are done |
1246 |
|
|
1 |
1247 |
|
|
}); |
1248 |
|
|
|
1249 |
root |
1.22 |
This concludes our little tutorial. |
1250 |
|
|
|
1251 |
|
|
=head1 Where to go from here? |
1252 |
|
|
|
1253 |
root |
1.23 |
This introduction should have explained the key concepts of L<AnyEvent> |
1254 |
|
|
- event watchers and condition variables, L<AnyEvent::Socket> - basic |
1255 |
|
|
networking utilities, and L<AnyEvent::Handle> - a nice wrapper around |
1256 |
|
|
handles. |
1257 |
root |
1.22 |
|
1258 |
|
|
You could either start coding stuff right away, look at those manual |
1259 |
|
|
pages for the gory details, or roam CPAN for other AnyEvent modules (such |
1260 |
|
|
as L<AnyEvent::IRC> or L<AnyEvent::HTTP>) to see more code examples (or |
1261 |
|
|
simply to use them). |
1262 |
|
|
|
1263 |
|
|
If you need a protocol that doesn't have an implementation using AnyEvent, |
1264 |
|
|
remember that you can mix AnyEvent with one other event framework, such as |
1265 |
|
|
L<POE>, so you can always use AnyEvent for your own tasks plus modules of |
1266 |
|
|
one other event framework to fill any gaps. |
1267 |
|
|
|
1268 |
|
|
And last not least, you could also look at L<Coro>, especially |
1269 |
|
|
L<Coro::AnyEvent>, to see how you can turn event-based programming from |
1270 |
|
|
callback style back to the usual imperative style (also called "inversion |
1271 |
|
|
of control" - AnyEvent calls I<you>, but Coro lets I<you> call AnyEvent). |
1272 |
root |
1.1 |
|
1273 |
root |
1.15 |
=head1 Authors |
1274 |
root |
1.6 |
|
1275 |
|
|
Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>. |
1276 |
|
|
|