| 1 |
=encoding utf-8 |
| 2 |
|
| 3 |
=head1 NAME |
| 4 |
|
| 5 |
AnyEvent::Intro - an introductory tutorial to AnyEvent |
| 6 |
|
| 7 |
=head1 Introduction to AnyEvent |
| 8 |
|
| 9 |
This is a tutorial that will introduce you to the features of AnyEvent. |
| 10 |
|
| 11 |
The first part introduces the core AnyEvent module (after swamping you a |
| 12 |
bit in evangelism), which might already provide all you ever need: If you |
| 13 |
are only interested in AnyEvent's event handling capabilities, read no |
| 14 |
further. |
| 15 |
|
| 16 |
The second part focuses on network programming using sockets, for which |
| 17 |
AnyEvent offers a lot of support you can use, and a lot of workarounds |
| 18 |
around portability quirks. |
| 19 |
|
| 20 |
|
| 21 |
=head1 What is AnyEvent? |
| 22 |
|
| 23 |
If you don't care for the whys and want to see code, skip this section! |
| 24 |
|
| 25 |
AnyEvent is first of all just a framework to do event-based |
| 26 |
programming. Typically such frameworks are an all-or-nothing thing: If you |
| 27 |
use one such framework, you can't (easily, or even at all) use another in |
| 28 |
the same program. |
| 29 |
|
| 30 |
AnyEvent is different - it is a thin abstraction layer on top of other |
| 31 |
event loops, just like DBI is an abstraction of many different database |
| 32 |
APIs. Its main purpose is to move the choice of the underlying framework |
| 33 |
(the event loop) from the module author to the program author using the |
| 34 |
module. |
| 35 |
|
| 36 |
That means you can write code that uses events to control what it |
| 37 |
does, without forcing other code in the same program to use the same |
| 38 |
underlying framework as you do - i.e. you can create a Perl module |
| 39 |
that is event-based using AnyEvent, and users of that module can still |
| 40 |
choose between using L<Gtk2>, L<Tk>, L<Event> (or run inside Irssi or |
| 41 |
rxvt-unicode) or any other supported event loop. AnyEvent even comes with |
| 42 |
its own pure-perl event loop implementation, so your code works regardless |
| 43 |
of other modules that might or might not be installed. The latter is |
| 44 |
important, as AnyEvent does not have any hard dependencies to other |
| 45 |
modules, which makes it easy to install, for example, when you lack a C |
| 46 |
compiler. No matter what environment, AnyEvent will just cope with it. |
| 47 |
|
| 48 |
A typical limitation of existing Perl modules such as L<Net::IRC> is that |
| 49 |
they come with their own event loop: In L<Net::IRC>, a program which uses |
| 50 |
it needs to start the event loop of L<Net::IRC>. That means that one |
| 51 |
cannot integrate this module into a L<Gtk2> GUI for instance, as that |
| 52 |
module, too, enforces the use of its own event loop (namely L<Glib>). |
| 53 |
|
| 54 |
Another example is L<LWP>: it provides no event interface at all. It's |
| 55 |
a pure blocking HTTP (and FTP etc.) client library, which usually means |
| 56 |
that you either have to start another process or have to fork for a HTTP |
| 57 |
request, or use threads (e.g. L<Coro::LWP>), if you want to do something |
| 58 |
else while waiting for the request to finish. |
| 59 |
|
| 60 |
The motivation behind these designs is often that a module doesn't want |
| 61 |
to depend on some complicated XS-module (Net::IRC), or that it doesn't |
| 62 |
want to force the user to use some specific event loop at all (LWP), out |
| 63 |
of fear of severly limiting the usefulness of the module: If your module |
| 64 |
requires Glib, it will not run in a Tk program. |
| 65 |
|
| 66 |
L<AnyEvent> solves this dilemma, by B<not> forcing module authors to |
| 67 |
either: |
| 68 |
|
| 69 |
=over 4 |
| 70 |
|
| 71 |
=item - write their own event loop (because it guarantees the availability |
| 72 |
of an event loop everywhere - even on windows with no extra modules |
| 73 |
installed). |
| 74 |
|
| 75 |
=item - choose one specific event loop (because AnyEvent works with most |
| 76 |
event loops available for Perl). |
| 77 |
|
| 78 |
=back |
| 79 |
|
| 80 |
If the module author uses L<AnyEvent> for all his (or her) event needs |
| 81 |
(IO events, timers, signals, ...) then all other modules can just use |
| 82 |
his module and don't have to choose an event loop or adapt to his event |
| 83 |
loop. The choice of the event loop is ultimately made by the program |
| 84 |
author who uses all the modules and writes the main program. And even |
| 85 |
there he doesn't have to choose, he can just let L<AnyEvent> choose the |
| 86 |
most efficient event loop available on the system. |
| 87 |
|
| 88 |
Read more about this in the main documentation of the L<AnyEvent> module. |
| 89 |
|
| 90 |
|
| 91 |
=head1 Introduction to Event-Based Programming |
| 92 |
|
| 93 |
So what exactly is programming using events? It quite simply means that |
| 94 |
instead of your code actively waiting for something, such as the user |
| 95 |
entering something on STDIN: |
| 96 |
|
| 97 |
$| = 1; print "enter your name> "; |
| 98 |
|
| 99 |
my $name = <STDIN>; |
| 100 |
|
| 101 |
You instead tell your event framework to notify you in the event of some |
| 102 |
data being available on STDIN, by using a callback mechanism: |
| 103 |
|
| 104 |
use AnyEvent; |
| 105 |
|
| 106 |
$| = 1; print "enter your name> "; |
| 107 |
|
| 108 |
my $name; |
| 109 |
|
| 110 |
my $wait_for_input = AnyEvent->io ( |
| 111 |
fh => \*STDIN, # which file handle to check |
| 112 |
poll => "r", # which event to wait for ("r"ead data) |
| 113 |
cb => sub { # what callback to execute |
| 114 |
$name = <STDIN>; # read it |
| 115 |
} |
| 116 |
); |
| 117 |
|
| 118 |
# do something else here |
| 119 |
|
| 120 |
Looks more complicated, and surely is, but the advantage of using events |
| 121 |
is that your program can do something else instead of waiting for input |
| 122 |
(side note: combining AnyEvent with a thread package such as Coro can |
| 123 |
recoup much of the simplicity, effectively getting the best of two |
| 124 |
worlds). |
| 125 |
|
| 126 |
Waiting as done in the first example is also called "blocking" the process |
| 127 |
because you "block"/keep your process from executing anything else while |
| 128 |
you do so. |
| 129 |
|
| 130 |
The second example avoids blocking by only registering interest in a read |
| 131 |
event, which is fast and doesn't block your process. The callback will |
| 132 |
be called only when data is available and can be read without blocking. |
| 133 |
|
| 134 |
The "interest" is represented by an object returned by C<< AnyEvent->io |
| 135 |
>> called a "watcher" object - thus named because it "watches" your |
| 136 |
file handle (or other event sources) for the event you are interested in. |
| 137 |
|
| 138 |
In the example above, we create an I/O watcher by calling the C<< |
| 139 |
AnyEvent->io >> method. A lack of further interest in some event is |
| 140 |
expressed by simply forgetting about its watcher, for example by |
| 141 |
C<undef>-ing the only variable it is stored in. AnyEvent will |
| 142 |
automatically clean up the watcher if it is no longer used, much like |
| 143 |
Perl closes your file handles if you no longer use them anywhere. |
| 144 |
|
| 145 |
=head3 A short note on callbacks |
| 146 |
|
| 147 |
A common issue that hits people is the problem of passing parameters |
| 148 |
to callbacks. Programmers used to languages such as C or C++ are often |
| 149 |
used to a style where one passes the address of a function (a function |
| 150 |
reference) and some data value, e.g.: |
| 151 |
|
| 152 |
sub callback { |
| 153 |
my ($arg) = @_; |
| 154 |
|
| 155 |
$arg->method; |
| 156 |
} |
| 157 |
|
| 158 |
my $arg = ...; |
| 159 |
|
| 160 |
call_me_back_later \&callback, $arg; |
| 161 |
|
| 162 |
This is clumsy, as the place where behaviour is specified (when the |
| 163 |
callback is registered) is often far away from the place where behaviour |
| 164 |
is implemented. It also doesn't use Perl syntax to invoke the code. There |
| 165 |
is also an abstraction penalty to pay as one has to I<name> the callback, |
| 166 |
which often is unnecessary and leads to nonsensical or duplicated names. |
| 167 |
|
| 168 |
In Perl, one can specify behaviour much more directly by using |
| 169 |
I<closures>. Closures are code blocks that take a reference to the |
| 170 |
enclosing scope(s) when they are created. This means lexical variables |
| 171 |
in scope when a closure is created can be used inside the closure: |
| 172 |
|
| 173 |
my $arg = ...; |
| 174 |
|
| 175 |
call_me_back_later sub { $arg->method }; |
| 176 |
|
| 177 |
Under most circumstances, closures are faster, use fewer resources and |
| 178 |
result in much clearer code than the traditional approach. Faster, |
| 179 |
because parameter passing and storing them in local variables in Perl |
| 180 |
is relatively slow. Fewer resources, because closures take references |
| 181 |
to existing variables without having to create new ones, and clearer |
| 182 |
code because it is immediately obvious that the second example calls the |
| 183 |
C<method> method when the callback is invoked. |
| 184 |
|
| 185 |
Apart from these, the strongest argument for using closures with AnyEvent |
| 186 |
is that AnyEvent does not allow passing parameters to the callback, so |
| 187 |
closures are the only way to achieve that in most cases :-> |
| 188 |
|
| 189 |
|
| 190 |
=head3 A little hint to catch mistakes |
| 191 |
|
| 192 |
AnyEvent does not check the parameters you pass in, at least not by |
| 193 |
default. to enable checking, simply start your program with C<AE_STRICT=1> |
| 194 |
in the environment, or put C<use AnyEvent::Strict> near the top of your |
| 195 |
program: |
| 196 |
|
| 197 |
AE_STRICT=1 perl myprogram |
| 198 |
|
| 199 |
You can find more info on this and additional debugging aids later in this |
| 200 |
introduction. |
| 201 |
|
| 202 |
|
| 203 |
=head2 Condition Variables |
| 204 |
|
| 205 |
Back to the I/O watcher example: The code is not yet a fully working |
| 206 |
program, and will not work as-is. The reason is that your callback will |
| 207 |
not be invoked out of the blue; you have to run the event loop first. |
| 208 |
Also, event-based programs need to block sometimes too, such as when |
| 209 |
there is nothing to do, and everything is waiting for new events to |
| 210 |
arrive. |
| 211 |
|
| 212 |
In AnyEvent, this is done using condition variables. Condition variables |
| 213 |
are named "condition variables" because they represent a condition that is |
| 214 |
initially false and needs to be fulfilled. |
| 215 |
|
| 216 |
You can also call them "merge points", "sync points", "rendezvous ports" |
| 217 |
or even callbacks and many other things (and they are often called these |
| 218 |
names in other frameworks). The important point is that you can create them |
| 219 |
freely and later wait for them to become true. |
| 220 |
|
| 221 |
Condition variables have two sides - one side is the "producer" of the |
| 222 |
condition (whatever code detects and flags the condition), the other side |
| 223 |
is the "consumer" (the code that waits for that condition). |
| 224 |
|
| 225 |
In our example in the previous section, the producer is the event callback |
| 226 |
and there is no consumer yet - let's change that right now: |
| 227 |
|
| 228 |
use AnyEvent; |
| 229 |
|
| 230 |
$| = 1; print "enter your name> "; |
| 231 |
|
| 232 |
my $name; |
| 233 |
|
| 234 |
my $name_ready = AnyEvent->condvar; |
| 235 |
|
| 236 |
my $wait_for_input = AnyEvent->io ( |
| 237 |
fh => \*STDIN, |
| 238 |
poll => "r", |
| 239 |
cb => sub { |
| 240 |
$name = <STDIN>; |
| 241 |
$name_ready->send; |
| 242 |
} |
| 243 |
); |
| 244 |
|
| 245 |
# do something else here |
| 246 |
|
| 247 |
# now wait until the name is available: |
| 248 |
$name_ready->recv; |
| 249 |
|
| 250 |
undef $wait_for_input; # watcher no longer needed |
| 251 |
|
| 252 |
print "your name is $name\n"; |
| 253 |
|
| 254 |
This program creates an AnyEvent condvar by calling the C<< |
| 255 |
AnyEvent->condvar >> method. It then creates a watcher as usual, but |
| 256 |
inside the callback it C<send>s the C<$name_ready> condition variable, |
| 257 |
which causes whoever is waiting on it to continue. |
| 258 |
|
| 259 |
The "whoever" in this case is the code that follows, which calls C<< |
| 260 |
$name_ready->recv >>: The producer calls C<send>, the consumer calls |
| 261 |
C<recv>. |
| 262 |
|
| 263 |
If there is no C<$name> available yet, then the call to C<< |
| 264 |
$name_ready->recv >> will halt your program until the condition becomes |
| 265 |
true. |
| 266 |
|
| 267 |
As the names C<send> and C<recv> imply, you can actually send and receive |
| 268 |
data using this, for example, the above code could also be written like |
| 269 |
this, without an extra variable to store the name in: |
| 270 |
|
| 271 |
use AnyEvent; |
| 272 |
|
| 273 |
$| = 1; print "enter your name> "; |
| 274 |
|
| 275 |
my $name_ready = AnyEvent->condvar; |
| 276 |
|
| 277 |
my $wait_for_input = AnyEvent->io ( |
| 278 |
fh => \*STDIN, poll => "r", |
| 279 |
cb => sub { $name_ready->send (scalar <STDIN>) } |
| 280 |
); |
| 281 |
|
| 282 |
# do something else here |
| 283 |
|
| 284 |
# now wait and fetch the name |
| 285 |
my $name = $name_ready->recv; |
| 286 |
|
| 287 |
undef $wait_for_input; # watcher no longer needed |
| 288 |
|
| 289 |
print "your name is $name\n"; |
| 290 |
|
| 291 |
You can pass any number of arguments to C<send>, and every subsequent |
| 292 |
call to C<recv> will return them. |
| 293 |
|
| 294 |
=head2 The "main loop" |
| 295 |
|
| 296 |
Most event-based frameworks have something called a "main loop" or "event |
| 297 |
loop run function" or something similar. |
| 298 |
|
| 299 |
Just like in C<recv> AnyEvent, these functions need to be called |
| 300 |
eventually so that your event loop has a chance of actually looking for |
| 301 |
the events you are interested in. |
| 302 |
|
| 303 |
For example, in a L<Gtk2> program, the above example could also be written |
| 304 |
like this: |
| 305 |
|
| 306 |
use Gtk2 -init; |
| 307 |
use AnyEvent; |
| 308 |
|
| 309 |
############################################ |
| 310 |
# create a window and some label |
| 311 |
|
| 312 |
my $window = new Gtk2::Window "toplevel"; |
| 313 |
$window->add (my $label = new Gtk2::Label "soon replaced by name"); |
| 314 |
|
| 315 |
$window->show_all; |
| 316 |
|
| 317 |
############################################ |
| 318 |
# do our AnyEvent stuff |
| 319 |
|
| 320 |
$| = 1; print "enter your name> "; |
| 321 |
|
| 322 |
my $name_ready = AnyEvent->condvar; |
| 323 |
|
| 324 |
my $wait_for_input = AnyEvent->io ( |
| 325 |
fh => \*STDIN, poll => "r", |
| 326 |
cb => sub { |
| 327 |
# set the label |
| 328 |
$label->set_text (scalar <STDIN>); |
| 329 |
print "enter another name> "; |
| 330 |
} |
| 331 |
); |
| 332 |
|
| 333 |
############################################ |
| 334 |
# Now enter Gtk2's event loop |
| 335 |
|
| 336 |
main Gtk2; |
| 337 |
|
| 338 |
No condition variable anywhere in sight - instead, we just read a line |
| 339 |
from STDIN and replace the text in the label. In fact, since nobody |
| 340 |
C<undef>s C<$wait_for_input> you can enter multiple lines. |
| 341 |
|
| 342 |
Instead of waiting for a condition variable, the program enters the Gtk2 |
| 343 |
main loop by calling C<< Gtk2->main >>, which will block the program and |
| 344 |
wait for events to arrive. |
| 345 |
|
| 346 |
This also shows that AnyEvent is quite flexible - you didn't have to do |
| 347 |
anything to make the AnyEvent watcher use Gtk2 (actually Glib) - it just |
| 348 |
worked. |
| 349 |
|
| 350 |
Admittedly, the example is a bit silly - who would want to read names |
| 351 |
from standard input in a Gtk+ application? But imagine that instead of |
| 352 |
doing that, you make an HTTP request in the background and display its |
| 353 |
results. In fact, with event-based programming you can make many |
| 354 |
HTTP requests in parallel in your program and still provide feedback to |
| 355 |
the user and stay interactive. |
| 356 |
|
| 357 |
And in the next part you will see how to do just that - by implementing an |
| 358 |
HTTP request, on our own, with the utility modules AnyEvent comes with. |
| 359 |
|
| 360 |
Before that, however, let's briefly look at how you would write your |
| 361 |
program using only AnyEvent, without ever calling some other event |
| 362 |
loop's run function. |
| 363 |
|
| 364 |
In the example using condition variables, we used those to start waiting |
| 365 |
for events, and in fact, condition variables are the solution: |
| 366 |
|
| 367 |
my $quit_program = AnyEvent->condvar; |
| 368 |
|
| 369 |
# create AnyEvent watchers (or not) here |
| 370 |
|
| 371 |
$quit_program->recv; |
| 372 |
|
| 373 |
If any of your watcher callbacks decide to quit (this is often |
| 374 |
called an "unloop" in other frameworks), they can just call C<< |
| 375 |
$quit_program->send >>. Of course, they could also decide not to and |
| 376 |
call C<exit> instead, or they could decide never to quit (e.g. in a |
| 377 |
long-running daemon program). |
| 378 |
|
| 379 |
If you don't need some clean quit functionality and just want to run the |
| 380 |
event loop, you can do this: |
| 381 |
|
| 382 |
AnyEvent->condvar->recv; |
| 383 |
|
| 384 |
And this is, in fact, the closest to the idea of a main loop run |
| 385 |
function that AnyEvent offers. |
| 386 |
|
| 387 |
=head2 Timers and other event sources |
| 388 |
|
| 389 |
So far, we have used only I/O watchers. These are useful mainly to find |
| 390 |
out whether a socket has data to read, or space to write more data. On sane |
| 391 |
operating systems this also works for console windows/terminals (typically |
| 392 |
on standard input), serial lines, all sorts of other devices, basically |
| 393 |
almost everything that has a file descriptor but isn't a file itself. (As |
| 394 |
usual, "sane" excludes windows - on that platform you would need different |
| 395 |
functions for all of these, complicating code immensely - think "socket |
| 396 |
only" on windows). |
| 397 |
|
| 398 |
However, I/O is not everything - the second most important event source is |
| 399 |
the clock. For example when doing an HTTP request you might want to time |
| 400 |
out when the server doesn't answer within some predefined amount of time. |
| 401 |
|
| 402 |
In AnyEvent, timer event watchers are created by calling the C<< |
| 403 |
AnyEvent->timer >> method: |
| 404 |
|
| 405 |
use AnyEvent; |
| 406 |
|
| 407 |
my $cv = AnyEvent->condvar; |
| 408 |
|
| 409 |
my $wait_one_and_a_half_seconds = AnyEvent->timer ( |
| 410 |
after => 1.5, # after how many seconds to invoke the cb? |
| 411 |
cb => sub { # the callback to invoke |
| 412 |
$cv->send; |
| 413 |
}, |
| 414 |
); |
| 415 |
|
| 416 |
# can do something else here |
| 417 |
|
| 418 |
# now wait till our time has come |
| 419 |
$cv->recv; |
| 420 |
|
| 421 |
Unlike I/O watchers, timers are only interested in the amount of seconds |
| 422 |
they have to wait. When (at least) that amount of time has passed, |
| 423 |
AnyEvent will invoke your callback. |
| 424 |
|
| 425 |
Unlike I/O watchers, which will call your callback as many times as there |
| 426 |
is data available, timers are normally one-shot: after they have "fired" |
| 427 |
once and invoked your callback, they are dead and no longer do anything. |
| 428 |
|
| 429 |
To get a repeating timer, such as a timer firing roughly once per second, |
| 430 |
you can specify an C<interval> parameter: |
| 431 |
|
| 432 |
my $once_per_second = AnyEvent->timer ( |
| 433 |
after => 0, # first invoke ASAP |
| 434 |
interval => 1, # then invoke every second |
| 435 |
cb => sub { # the callback to invoke |
| 436 |
$cv->send; |
| 437 |
}, |
| 438 |
); |
| 439 |
|
| 440 |
=head3 More esoteric sources |
| 441 |
|
| 442 |
AnyEvent also has some other, more esoteric event sources you can tap |
| 443 |
into: signal, child and idle watchers. |
| 444 |
|
| 445 |
Signal watchers can be used to wait for "signal events", which means |
| 446 |
your process was sent a signal (such as C<SIGTERM> or C<SIGUSR1>). |
| 447 |
|
| 448 |
Child-process watchers wait for a child process to exit. They are useful |
| 449 |
when you fork a separate process and need to know when it exits, but you |
| 450 |
do not want to wait for that by blocking. |
| 451 |
|
| 452 |
Idle watchers invoke their callback when the event loop has handled all |
| 453 |
outstanding events, polled for new events and didn't find any, i.e., when |
| 454 |
your process is otherwise idle. They are useful if you want to do some |
| 455 |
non-trivial data processing that can be done when your program doesn't |
| 456 |
have anything better to do. |
| 457 |
|
| 458 |
All these watcher types are described in detail in the main L<AnyEvent> |
| 459 |
manual page. |
| 460 |
|
| 461 |
Sometimes you also need to know what the current time is: C<< |
| 462 |
AnyEvent->now >> returns the time the event toolkit uses to schedule |
| 463 |
relative timers, and is usually what you want. It is often cached (which |
| 464 |
means it can be a bit outdated). In that case, you can use the more costly |
| 465 |
C<< AnyEvent->time >> method which will ask your operating system for the |
| 466 |
current time, which is slower, but also more up to date. |
| 467 |
|
| 468 |
|
| 469 |
=head1 Network programming and AnyEvent |
| 470 |
|
| 471 |
So far you have seen how to register event watchers and handle events. |
| 472 |
|
| 473 |
This is a great foundation to write network clients and servers, and might |
| 474 |
be all that your module (or program) ever requires, but writing your own |
| 475 |
I/O buffering again and again becomes tedious, not to mention that it |
| 476 |
attracts errors. |
| 477 |
|
| 478 |
While the core L<AnyEvent> module is still small and self-contained, |
| 479 |
the distribution comes with some very useful utility modules such as |
| 480 |
L<AnyEvent::Handle>, L<AnyEvent::DNS> and L<AnyEvent::Socket>. These can |
| 481 |
make your life as a non-blocking network programmer a lot easier. |
| 482 |
|
| 483 |
Here is a quick overview of these three modules: |
| 484 |
|
| 485 |
=head2 L<AnyEvent::DNS> |
| 486 |
|
| 487 |
This module allows fully asynchronous DNS resolution. It is used mainly by |
| 488 |
L<AnyEvent::Socket> to resolve hostnames and service ports for you, but is |
| 489 |
a great way to do other DNS resolution tasks, such as reverse lookups of |
| 490 |
IP addresses for log files. |
| 491 |
|
| 492 |
=head2 L<AnyEvent::Handle> |
| 493 |
|
| 494 |
This module handles non-blocking IO on (socket-, pipe- etc.) file handles |
| 495 |
in an event based manner. It provides a wrapper object around your file |
| 496 |
handle that provides queueing and buffering of incoming and outgoing data |
| 497 |
for you. |
| 498 |
|
| 499 |
It also implements the most common data formats, such as text lines, or |
| 500 |
fixed and variable-width data blocks. |
| 501 |
|
| 502 |
=head2 L<AnyEvent::Socket> |
| 503 |
|
| 504 |
This module provides you with functions that handle socket creation |
| 505 |
and IP address magic. The two main functions are C<tcp_connect> and |
| 506 |
C<tcp_server>. The former will connect a (streaming) socket to an internet |
| 507 |
host for you and the later will make a server socket for you, to accept |
| 508 |
connections. |
| 509 |
|
| 510 |
This module also comes with transparent IPv6 support, this means: If you |
| 511 |
write your programs with this module, you will be IPv6 ready without doing |
| 512 |
anything special. |
| 513 |
|
| 514 |
It also works around a lot of portability quirks (especially on the |
| 515 |
windows platform), which makes it even easier to write your programs in a |
| 516 |
portable way (did you know that windows uses different error codes for all |
| 517 |
socket functions and that Perl does not know about these? That "Unknown |
| 518 |
error 10022" (which is C<WSAEINVAL>) can mean that our C<connect> call was |
| 519 |
successful? That unsuccessful TCP connects might never be reported back |
| 520 |
to your program? That C<WSAEINPROGRESS> means your C<connect> call was |
| 521 |
ignored instead of being in progress? AnyEvent::Socket works around all of |
| 522 |
these Windows/Perl bugs for you). |
| 523 |
|
| 524 |
=head2 Implementing a parallel finger client with non-blocking connects |
| 525 |
and AnyEvent::Socket |
| 526 |
|
| 527 |
The finger protocol is one of the simplest protocols in use on the |
| 528 |
internet. Or in use in the past, as almost nobody uses it anymore. |
| 529 |
|
| 530 |
It works by connecting to the finger port on another host, writing a |
| 531 |
single line with a user name and then reading the finger response, as |
| 532 |
specified by that user. OK, RFC 1288 specifies a vastly more complex |
| 533 |
protocol, but it basically boils down to this: |
| 534 |
|
| 535 |
# telnet freebsd.org finger |
| 536 |
Trying 8.8.178.135... |
| 537 |
Connected to freebsd.org (8.8.178.135). |
| 538 |
Escape character is '^]'. |
| 539 |
larry |
| 540 |
Login: lile Name: Larry Lile |
| 541 |
Directory: /home/lile Shell: /usr/local/bin/bash |
| 542 |
No Mail. |
| 543 |
Mail forwarded to: lile@stdio.com |
| 544 |
No Plan. |
| 545 |
|
| 546 |
So let's write a little AnyEvent function that makes a finger request: |
| 547 |
|
| 548 |
use AnyEvent; |
| 549 |
use AnyEvent::Socket; |
| 550 |
|
| 551 |
sub finger($$) { |
| 552 |
my ($user, $host) = @_; |
| 553 |
|
| 554 |
# use a condvar to return results |
| 555 |
my $cv = AnyEvent->condvar; |
| 556 |
|
| 557 |
# first, connect to the host |
| 558 |
tcp_connect $host, "finger", sub { |
| 559 |
# the callback receives the socket handle - or nothing |
| 560 |
my ($fh) = @_ |
| 561 |
or return $cv->send; |
| 562 |
|
| 563 |
# now write the username |
| 564 |
syswrite $fh, "$user\015\012"; |
| 565 |
|
| 566 |
my $response; |
| 567 |
|
| 568 |
# register a read watcher |
| 569 |
my $read_watcher; $read_watcher = AnyEvent->io ( |
| 570 |
fh => $fh, |
| 571 |
poll => "r", |
| 572 |
cb => sub { |
| 573 |
my $len = sysread $fh, $response, 1024, length $response; |
| 574 |
|
| 575 |
if ($len <= 0) { |
| 576 |
# we are done, or an error occured, lets ignore the latter |
| 577 |
undef $read_watcher; # no longer interested |
| 578 |
$cv->send ($response); # send results |
| 579 |
} |
| 580 |
}, |
| 581 |
); |
| 582 |
}; |
| 583 |
|
| 584 |
# pass $cv to the caller |
| 585 |
$cv |
| 586 |
} |
| 587 |
|
| 588 |
That's a mouthful! Let's dissect this function a bit, first the overall |
| 589 |
function and execution flow: |
| 590 |
|
| 591 |
sub finger($$) { |
| 592 |
my ($user, $host) = @_; |
| 593 |
|
| 594 |
# use a condvar to return results |
| 595 |
my $cv = AnyEvent->condvar; |
| 596 |
|
| 597 |
# first, connect to the host |
| 598 |
tcp_connect $host, "finger", sub { |
| 599 |
... |
| 600 |
}; |
| 601 |
|
| 602 |
$cv |
| 603 |
} |
| 604 |
|
| 605 |
This isn't too complicated, just a function with two parameters that |
| 606 |
creates a condition variable C<$cv>, initiates a TCP connect to |
| 607 |
C<$host>, and returns C<$cv>. The caller can use the returned C<$cv> to |
| 608 |
receive the finger response, but one could equally well pass a third |
| 609 |
argument, a callback, to the function. |
| 610 |
|
| 611 |
Since we are programming event'ish, we do not wait for the connect to |
| 612 |
finish - it could block the program for a minute or longer! |
| 613 |
|
| 614 |
Instead, we pass C<tcp_connect> a callback to invoke when the connect is |
| 615 |
done. The callback is called with the socket handle as its first |
| 616 |
argument if the connect succeeds, and no arguments otherwise. The |
| 617 |
important point is that it will always be called as soon as the outcome |
| 618 |
of the TCP connect is known. |
| 619 |
|
| 620 |
This style of programming is also called "continuation style": the |
| 621 |
"continuation" is simply the way the program continues - normally at the |
| 622 |
next line after some statement (the exception is loops or things like |
| 623 |
C<return>). When we are interested in events, however, we instead specify |
| 624 |
the "continuation" of our program by passing a closure, which makes that |
| 625 |
closure the "continuation" of the program. |
| 626 |
|
| 627 |
The C<tcp_connect> call is like saying "return now, and when the |
| 628 |
connection is established or the attempt failed, continue there". |
| 629 |
|
| 630 |
Now let's look at the callback/closure in more detail: |
| 631 |
|
| 632 |
# the callback receives the socket handle - or nothing |
| 633 |
my ($fh) = @_ |
| 634 |
or return $cv->send; |
| 635 |
|
| 636 |
The first thing the callback does is to save the socket handle in |
| 637 |
C<$fh>. When there was an error (no arguments), then our instinct as |
| 638 |
expert Perl programmers would tell us to C<die>: |
| 639 |
|
| 640 |
my ($fh) = @_ |
| 641 |
or die "$host: $!"; |
| 642 |
|
| 643 |
While this would give good feedback to the user (if he happens to watch |
| 644 |
standard error), our program would probably stop working here, as we never |
| 645 |
report the results to anybody, certainly not the caller of our C<finger> |
| 646 |
function, and most event loops continue even after a C<die>! |
| 647 |
|
| 648 |
This is why we instead C<return>, but also call C<< $cv->send >> without |
| 649 |
any arguments to signal to the condvar consumer that something bad has |
| 650 |
happened. The return value of C<< $cv->send >> is irrelevant, as is |
| 651 |
the return value of our callback. The C<return> statement is used for |
| 652 |
the side effect of, well, returning immediately from the callback. |
| 653 |
Checking for errors and handling them this way is very common, which is |
| 654 |
why this compact idiom is so handy. |
| 655 |
|
| 656 |
As the next step in the finger protocol, we send the username to the |
| 657 |
finger daemon on the other side of our connection (the kernel.org finger |
| 658 |
service doesn't actually wait for a username, but the net is running out |
| 659 |
of finger servers fast): |
| 660 |
|
| 661 |
syswrite $fh, "$user\015\012"; |
| 662 |
|
| 663 |
Note that this isn't 100% clean socket programming - the socket could, |
| 664 |
for whatever reasons, not accept our data. When writing a small amount |
| 665 |
of data like in this example it doesn't matter, as a socket buffer is |
| 666 |
almost always big enough for a mere "username", but for real-world |
| 667 |
cases you might need to implement some kind of write buffering - or use |
| 668 |
L<AnyEvent::Handle>, which handles these matters for you, as shown in the |
| 669 |
next section. |
| 670 |
|
| 671 |
What we I<do> have to do is implement our own read buffer - the response |
| 672 |
data could arrive late or in multiple chunks, and we cannot just wait for |
| 673 |
it (event-based programming, you know?). |
| 674 |
|
| 675 |
To do that, we register a read watcher on the socket which waits for data: |
| 676 |
|
| 677 |
my $read_watcher; $read_watcher = AnyEvent->io ( |
| 678 |
fh => $fh, |
| 679 |
poll => "r", |
| 680 |
|
| 681 |
There is a trick here, however: the read watcher isn't stored in a global |
| 682 |
variable, but in a local one - if the callback returns, it would normally |
| 683 |
destroy the variable and its contents, which would in turn unregister our |
| 684 |
watcher. |
| 685 |
|
| 686 |
To avoid that, we refer to the watcher variable in the watcher callback. |
| 687 |
This means that, when the C<tcp_connect> callback returns, perl thinks |
| 688 |
(quite correctly) that the read watcher is still in use - namely inside |
| 689 |
the inner callback - and thus keeps it alive even if nothing else in the |
| 690 |
program refers to it anymore (it is much like Baron Münchhausen keeping |
| 691 |
himself from dying by pulling himself out of a swamp). |
| 692 |
|
| 693 |
The trick, however, is that instead of: |
| 694 |
|
| 695 |
my $read_watcher = AnyEvent->io (... |
| 696 |
|
| 697 |
The program does: |
| 698 |
|
| 699 |
my $read_watcher; $read_watcher = AnyEvent->io (... |
| 700 |
|
| 701 |
The reason for this is a quirk in the way Perl works: variable names |
| 702 |
declared with C<my> are only visible in the I<next> statement. If the |
| 703 |
whole C<< AnyEvent->io >> call, including the callback, would be done in |
| 704 |
a single statement, the callback could not refer to the C<$read_watcher> |
| 705 |
variable to C<undef>ine it, so it is done in two statements. |
| 706 |
|
| 707 |
Whether you'd want to format it like this is of course a matter of style. |
| 708 |
This way emphasizes that the declaration and assignment really are one |
| 709 |
logical statement. |
| 710 |
|
| 711 |
The callback itself calls C<sysread> for as many times as necessary, until |
| 712 |
C<sysread> returns either an error or end-of-file: |
| 713 |
|
| 714 |
cb => sub { |
| 715 |
my $len = sysread $fh, $response, 1024, length $response; |
| 716 |
|
| 717 |
if ($len <= 0) { |
| 718 |
|
| 719 |
Note that C<sysread> has the ability to append data it reads to a scalar |
| 720 |
if we specify an offset, a feature which we make use of in this example. |
| 721 |
|
| 722 |
When C<sysread> indicates we are done, the callback C<undef>ines |
| 723 |
the watcher and then C<send>s the response data to the condition |
| 724 |
variable. All this has the following effects: |
| 725 |
|
| 726 |
Undefining the watcher destroys it, as our callback was the only one still |
| 727 |
having a reference to it. When the watcher gets destroyed, it destroys the |
| 728 |
callback, which in turn means the C<$fh> handle is no longer used, so that |
| 729 |
gets destroyed as well. The result is that all resources will be nicely |
| 730 |
cleaned up by perl for us. |
| 731 |
|
| 732 |
=head3 Using the finger client |
| 733 |
|
| 734 |
Now, we could probably write the same finger client in a simpler way if |
| 735 |
we used C<IO::Socket::INET>, ignored the problem of multiple hosts and |
| 736 |
ignored IPv6 and a few other things that C<tcp_connect> handles for us. |
| 737 |
|
| 738 |
But the main advantage is that we can not only run this finger function in |
| 739 |
the background, we even can run multiple sessions in parallel, like this: |
| 740 |
|
| 741 |
my $f1 = finger "kuriyama", "freebsd.org"; |
| 742 |
my $f2 = finger "icculus?listarchives=1", "icculus.org"; |
| 743 |
my $f3 = finger "mikachu", "icculus.org"; |
| 744 |
|
| 745 |
print "kuriyama's gpg key\n" , $f1->recv, "\n"; |
| 746 |
print "icculus' plan archive\n" , $f2->recv, "\n"; |
| 747 |
print "mikachu's plan zomgn\n" , $f3->recv, "\n"; |
| 748 |
|
| 749 |
It doesn't look like it, but in fact all three requests run in |
| 750 |
parallel. The code waits for the first finger request to finish first, but |
| 751 |
that doesn't keep it from executing them parallel: when the first C<recv> |
| 752 |
call sees that the data isn't ready yet, it serves events for all three |
| 753 |
requests automatically, until the first request has finished. |
| 754 |
|
| 755 |
The second C<recv> call might either find the data is already there, or it |
| 756 |
will continue handling events until that is the case, and so on. |
| 757 |
|
| 758 |
By taking advantage of network latencies, which allows us to serve other |
| 759 |
requests and events while we wait for an event on one socket, the overall |
| 760 |
time to do these three requests will be greatly reduced, typically all |
| 761 |
three are done in the same time as the slowest of the three requests. |
| 762 |
|
| 763 |
By the way, you do not actually have to wait in the C<recv> method on an |
| 764 |
AnyEvent condition variable - after all, waiting is evil - you can also |
| 765 |
register a callback: |
| 766 |
|
| 767 |
$f1->cb (sub { |
| 768 |
my $response = shift->recv; |
| 769 |
# ... |
| 770 |
}); |
| 771 |
|
| 772 |
The callback will be invoked only when C<send> is called. In fact, |
| 773 |
instead of returning a condition variable you could also pass a third |
| 774 |
parameter to your finger function, the callback to invoke with the |
| 775 |
response: |
| 776 |
|
| 777 |
sub finger($$$) { |
| 778 |
my ($user, $host, $cb) = @_; |
| 779 |
|
| 780 |
How you implement it is a matter of taste - if you expect your function to |
| 781 |
be used mainly in an event-based program you would normally prefer to pass |
| 782 |
a callback directly. If you write a module and expect your users to use |
| 783 |
it "synchronously" often (for example, a simple http-get script would not |
| 784 |
really care much for events), then you would use a condition variable and |
| 785 |
tell them "simply C<< ->recv >> the data". |
| 786 |
|
| 787 |
=head3 Problems with the implementation and how to fix them |
| 788 |
|
| 789 |
To make this example more real-world-ready, we would not only implement |
| 790 |
some write buffering (for the paranoid, or maybe denial-of-service aware |
| 791 |
security expert), but we would also have to handle timeouts and maybe |
| 792 |
protocol errors. |
| 793 |
|
| 794 |
Doing this quickly gets unwieldy, which is why we introduce |
| 795 |
L<AnyEvent::Handle> in the next section, which takes care of all these |
| 796 |
details for you and lets you concentrate on the actual protocol. |
| 797 |
|
| 798 |
|
| 799 |
=head2 Implementing simple HTTP and HTTPS GET requests with AnyEvent::Handle |
| 800 |
|
| 801 |
The L<AnyEvent::Handle> module has been hyped quite a bit in this document |
| 802 |
so far, so let's see what it really offers. |
| 803 |
|
| 804 |
As finger is such a simple protocol, let's try something slightly more |
| 805 |
complicated: HTTP/1.0. |
| 806 |
|
| 807 |
An HTTP GET request works by sending a single request line that indicates |
| 808 |
what you want the server to do and the URI you want to act it on, followed |
| 809 |
by as many "header" lines (C<Header: data>, same as e-mail headers) as |
| 810 |
required for the request, followed by an empty line. |
| 811 |
|
| 812 |
The response is formatted very similarly, first a line with the response |
| 813 |
status, then again as many header lines as required, then an empty line, |
| 814 |
followed by any data that the server might send. |
| 815 |
|
| 816 |
Again, let's try it out with C<telnet> (I condensed the output a bit - if |
| 817 |
you want to see the full response, do it yourself). |
| 818 |
|
| 819 |
# telnet www.google.com 80 |
| 820 |
Trying 209.85.135.99... |
| 821 |
Connected to www.google.com (209.85.135.99). |
| 822 |
Escape character is '^]'. |
| 823 |
GET /test HTTP/1.0 |
| 824 |
|
| 825 |
HTTP/1.0 404 Not Found |
| 826 |
Date: Mon, 02 Jun 2008 07:05:54 GMT |
| 827 |
Content-Type: text/html; charset=UTF-8 |
| 828 |
|
| 829 |
<html><head> |
| 830 |
[...] |
| 831 |
Connection closed by foreign host. |
| 832 |
|
| 833 |
The C<GET ...> and the empty line were entered manually, the rest of the |
| 834 |
telnet output is google's response, in this case a C<404 not found> one. |
| 835 |
|
| 836 |
So, here is how you would do it with C<AnyEvent::Handle>: |
| 837 |
|
| 838 |
sub http_get { |
| 839 |
my ($host, $uri, $cb) = @_; |
| 840 |
|
| 841 |
# store results here |
| 842 |
my ($response, $header, $body); |
| 843 |
|
| 844 |
my $handle; $handle = new AnyEvent::Handle |
| 845 |
connect => [$host => 'http'], |
| 846 |
on_error => sub { |
| 847 |
$cb->("HTTP/1.0 500 $!"); |
| 848 |
$handle->destroy; # explicitly destroy handle |
| 849 |
}, |
| 850 |
on_eof => sub { |
| 851 |
$cb->($response, $header, $body); |
| 852 |
$handle->destroy; # explicitly destroy handle |
| 853 |
}; |
| 854 |
|
| 855 |
$handle->push_write ("GET $uri HTTP/1.0\015\012\015\012"); |
| 856 |
|
| 857 |
# now fetch response status line |
| 858 |
$handle->push_read (line => sub { |
| 859 |
my ($handle, $line) = @_; |
| 860 |
$response = $line; |
| 861 |
}); |
| 862 |
|
| 863 |
# then the headers |
| 864 |
$handle->push_read (line => "\015\012\015\012", sub { |
| 865 |
my ($handle, $line) = @_; |
| 866 |
$header = $line; |
| 867 |
}); |
| 868 |
|
| 869 |
# and finally handle any remaining data as body |
| 870 |
$handle->on_read (sub { |
| 871 |
$body .= $_[0]->rbuf; |
| 872 |
$_[0]->rbuf = ""; |
| 873 |
}); |
| 874 |
} |
| 875 |
|
| 876 |
And now let's go through it step by step. First, as usual, the overall |
| 877 |
C<http_get> function structure: |
| 878 |
|
| 879 |
sub http_get { |
| 880 |
my ($host, $uri, $cb) = @_; |
| 881 |
|
| 882 |
# store results here |
| 883 |
my ($response, $header, $body); |
| 884 |
|
| 885 |
my $handle; $handle = new AnyEvent::Handle |
| 886 |
... create handle object |
| 887 |
|
| 888 |
... push data to write |
| 889 |
|
| 890 |
... push what to expect to read queue |
| 891 |
} |
| 892 |
|
| 893 |
Unlike in the finger example, this time the caller has to pass a callback |
| 894 |
to C<http_get>. Also, instead of passing a URL as one would expect, the |
| 895 |
caller has to provide the hostname and URI - normally you would use the |
| 896 |
C<URI> module to parse a URL and separate it into those parts, but that is |
| 897 |
left to the inspired reader :) |
| 898 |
|
| 899 |
Since everything else is left to the caller, all C<http_get> does is |
| 900 |
initiate the connection by creating the AnyEvent::Handle object (which |
| 901 |
calls C<tcp_connect> for us) and leave everything else to its callback. |
| 902 |
|
| 903 |
The handle object is created, unsurprisingly, by calling the C<new> |
| 904 |
method of L<AnyEvent::Handle>: |
| 905 |
|
| 906 |
my $handle; $handle = new AnyEvent::Handle |
| 907 |
connect => [$host => 'http'], |
| 908 |
on_error => sub { |
| 909 |
$cb->("HTTP/1.0 500 $!"); |
| 910 |
$handle->destroy; # explicitly destroy handle |
| 911 |
}, |
| 912 |
on_eof => sub { |
| 913 |
$cb->($response, $header, $body); |
| 914 |
$handle->destroy; # explicitly destroy handle |
| 915 |
}; |
| 916 |
|
| 917 |
The C<connect> argument tells AnyEvent::Handle to call C<tcp_connect> for |
| 918 |
the specified host and service/port. |
| 919 |
|
| 920 |
The C<on_error> callback will be called on any unexpected error, such as a |
| 921 |
refused connection, or unexpected end-of-file while reading headers. |
| 922 |
|
| 923 |
Instead of having an extra mechanism to signal errors, connection errors |
| 924 |
are signalled by crafting a special "response status line", like this: |
| 925 |
|
| 926 |
HTTP/1.0 500 Connection refused |
| 927 |
|
| 928 |
This means the caller cannot distinguish (easily) between |
| 929 |
locally-generated errors and server errors, but it simplifies error |
| 930 |
handling for the caller a lot. |
| 931 |
|
| 932 |
The error callback also destroys the handle explicitly, because we are not |
| 933 |
interested in continuing after any errors. In AnyEvent::Handle callbacks |
| 934 |
you have to call C<destroy> explicitly to destroy a handle. Outside of |
| 935 |
those callbacks you can just forget the object reference and it will be |
| 936 |
automatically cleaned up. |
| 937 |
|
| 938 |
Last but not least, we set an C<on_eof> callback that is called when the |
| 939 |
other side indicates it has stopped writing data, which we will use to |
| 940 |
gracefully shut down the handle and report the results. This callback is |
| 941 |
only called when the read queue is empty - if the read queue expects |
| 942 |
some data and the handle gets an EOF from the other side this will be an |
| 943 |
error - after all, you did expect more to come. |
| 944 |
|
| 945 |
If you wanted to write a server using AnyEvent::Handle, you would use |
| 946 |
C<tcp_accept> and then create the AnyEvent::Handle with the C<fh> |
| 947 |
argument. |
| 948 |
|
| 949 |
=head3 The write queue |
| 950 |
|
| 951 |
The next line sends the actual request: |
| 952 |
|
| 953 |
$handle->push_write ("GET $uri HTTP/1.0\015\012\015\012"); |
| 954 |
|
| 955 |
No headers will be sent (this is fine for simple requests), so the whole |
| 956 |
request is just a single line followed by an empty line to signal the end |
| 957 |
of the headers to the server. |
| 958 |
|
| 959 |
The more interesting question is why the method is called C<push_write> |
| 960 |
and not just write. The reason is that you can I<always> add some write |
| 961 |
data without blocking, and to do this, AnyEvent::Handle needs some write |
| 962 |
queue internally - and C<push_write> pushes some data onto the end of |
| 963 |
that queue, just like Perl's C<push> pushes data onto the end of an |
| 964 |
array. |
| 965 |
|
| 966 |
The deeper reason is that at some point in the future, there might |
| 967 |
be C<unshift_write> as well, and in any case, we will shortly meet |
| 968 |
C<push_read> and C<unshift_read>, and it's usually easiest to remember if |
| 969 |
all those functions have some symmetry in their name. So C<push> is used |
| 970 |
as the opposite of C<unshift> in AnyEvent::Handle, not as the opposite of |
| 971 |
C<pull> - just like in Perl. |
| 972 |
|
| 973 |
Note that we call C<push_write> right after creating the AnyEvent::Handle |
| 974 |
object, before it has had time to actually connect to the server. This is |
| 975 |
fine, pushing the read and write requests will queue them in the handle |
| 976 |
object until the connection has been established. Alternatively, we |
| 977 |
could do this "on demand" in the C<on_connect> callback. |
| 978 |
|
| 979 |
If C<push_write> is called with more than one argument, then you can do |
| 980 |
I<formatted> I/O. For example, this would JSON-encode your data before |
| 981 |
pushing it to the write queue: |
| 982 |
|
| 983 |
$handle->push_write (json => [1, 2, 3]); |
| 984 |
|
| 985 |
This pretty much summarises the write queue, there is little else to it. |
| 986 |
|
| 987 |
Reading the response is far more interesting, because it involves the more |
| 988 |
powerful and complex I<read queue>: |
| 989 |
|
| 990 |
=head3 The read queue |
| 991 |
|
| 992 |
The response consists of three parts: a single line with the response |
| 993 |
status, a single paragraph of headers ended by an empty line, and the |
| 994 |
request body, which is the remaining data on the connection. |
| 995 |
|
| 996 |
For the first two, we push two read requests onto the read queue: |
| 997 |
|
| 998 |
# now fetch response status line |
| 999 |
$handle->push_read (line => sub { |
| 1000 |
my ($handle, $line) = @_; |
| 1001 |
$response = $line; |
| 1002 |
}); |
| 1003 |
|
| 1004 |
# then the headers |
| 1005 |
$handle->push_read (line => "\015\012\015\012", sub { |
| 1006 |
my ($handle, $line) = @_; |
| 1007 |
$header = $line; |
| 1008 |
}); |
| 1009 |
|
| 1010 |
While one can just push a single callback to parse all the data on the |
| 1011 |
queue, formatted I/O really comes to our aid here, since there is a |
| 1012 |
ready-made "read line" read type. The first read expects a single line, |
| 1013 |
ended by C<\015\012> (the standard end-of-line marker in internet |
| 1014 |
protocols). |
| 1015 |
|
| 1016 |
The second "line" is actually a single paragraph - instead of reading it |
| 1017 |
line by line we tell C<push_read> that the end-of-line marker is really |
| 1018 |
C<\015\012\015\012>, which is an empty line. The result is that the whole |
| 1019 |
header paragraph will be treated as a single line and read. The word |
| 1020 |
"line" is interpreted very freely, much like Perl itself does it. |
| 1021 |
|
| 1022 |
Note that push read requests are pushed immediately after creating the |
| 1023 |
handle object - since AnyEvent::Handle provides a queue we can push as |
| 1024 |
many requests as we want, and AnyEvent::Handle will handle them in order. |
| 1025 |
|
| 1026 |
There is, however, no read type for "the remaining data". For that, we |
| 1027 |
install our own C<on_read> callback: |
| 1028 |
|
| 1029 |
# and finally handle any remaining data as body |
| 1030 |
$handle->on_read (sub { |
| 1031 |
$body .= $_[0]->rbuf; |
| 1032 |
$_[0]->rbuf = ""; |
| 1033 |
}); |
| 1034 |
|
| 1035 |
This callback is invoked every time data arrives and the read queue is |
| 1036 |
empty - which in this example will only be the case when both response and |
| 1037 |
header have been read. The C<on_read> callback could actually have been |
| 1038 |
specified when constructing the object, but doing it this way preserves |
| 1039 |
logical ordering. |
| 1040 |
|
| 1041 |
The read callback adds the current read buffer to its C<$body> |
| 1042 |
variable and, most importantly, I<empties> the buffer by assigning the |
| 1043 |
empty string to it. |
| 1044 |
|
| 1045 |
Given these instructions, AnyEvent::Handle will handle incoming data - |
| 1046 |
if all goes well, the callback will be invoked with the response data; |
| 1047 |
if not, it will get an error. |
| 1048 |
|
| 1049 |
In general, you can implement pipelining (a semi-advanced feature of many |
| 1050 |
protocols) very easily with AnyEvent::Handle: If you have a protocol |
| 1051 |
with a request/response structure, your request methods/functions will |
| 1052 |
all look like this (simplified): |
| 1053 |
|
| 1054 |
sub request { |
| 1055 |
|
| 1056 |
# send the request to the server |
| 1057 |
$handle->push_write (...); |
| 1058 |
|
| 1059 |
# push some response handlers |
| 1060 |
$handle->push_read (...); |
| 1061 |
} |
| 1062 |
|
| 1063 |
This means you can queue as many requests as you want, and while |
| 1064 |
AnyEvent::Handle goes through its read queue to handle the response data, |
| 1065 |
the other side can work on the next request - queueing the request just |
| 1066 |
appends some data to the write queue and installs a handler to be called |
| 1067 |
later. |
| 1068 |
|
| 1069 |
You might ask yourself how to handle decisions you can only make I<after> |
| 1070 |
you have received some data (such as handling a short error response or a |
| 1071 |
long and differently-formatted response). The answer to this problem is |
| 1072 |
C<unshift_read>, which we will introduce together with an example in the |
| 1073 |
coming sections. |
| 1074 |
|
| 1075 |
=head3 Using C<http_get> |
| 1076 |
|
| 1077 |
Finally, here is how you would use C<http_get>: |
| 1078 |
|
| 1079 |
http_get "www.google.com", "/", sub { |
| 1080 |
my ($response, $header, $body) = @_; |
| 1081 |
|
| 1082 |
print |
| 1083 |
$response, "\n", |
| 1084 |
$body; |
| 1085 |
}; |
| 1086 |
|
| 1087 |
And of course, you can run as many of these requests in parallel as you |
| 1088 |
want (and your memory supports). |
| 1089 |
|
| 1090 |
=head3 HTTPS |
| 1091 |
|
| 1092 |
Now, as promised, let's implement the same thing for HTTPS, or more |
| 1093 |
correctly, let's change our C<http_get> function into a function that |
| 1094 |
speaks HTTPS instead. |
| 1095 |
|
| 1096 |
HTTPS is a standard TLS connection (B<T>ransport B<L>ayer |
| 1097 |
B<S>ecurity is the official name for what most people refer to as C<SSL>) |
| 1098 |
that contains standard HTTP protocol exchanges. The only other difference |
| 1099 |
to HTTP is that by default it uses port C<443> instead of port C<80>. |
| 1100 |
|
| 1101 |
To implement these two differences we need two tiny changes, first, in the |
| 1102 |
C<connect> parameter, we replace C<http> by C<https> to connect to the |
| 1103 |
https port: |
| 1104 |
|
| 1105 |
connect => [$host => 'https'], |
| 1106 |
|
| 1107 |
The other change deals with TLS, which is something L<AnyEvent::Handle> |
| 1108 |
does for us if the L<Net::SSLeay> module is available. To enable TLS |
| 1109 |
with L<AnyEvent::Handle>, we pass an additional C<tls> parameter |
| 1110 |
to the call to C<AnyEvent::Handle::new>: |
| 1111 |
|
| 1112 |
tls => "connect", |
| 1113 |
|
| 1114 |
Specifying C<tls> enables TLS, and the argument specifies whether |
| 1115 |
AnyEvent::Handle is the server side ("accept") or the client side |
| 1116 |
("connect") for the TLS connection, as unlike TCP, there is a clear |
| 1117 |
server/client relationship in TLS. |
| 1118 |
|
| 1119 |
That's all. |
| 1120 |
|
| 1121 |
Of course, all this should be handled transparently by C<http_get> |
| 1122 |
after parsing the URL. If you need this, see the part about exercising |
| 1123 |
your inspiration earlier in this document. You could also use the |
| 1124 |
L<AnyEvent::HTTP> module from CPAN, which implements all this and works |
| 1125 |
around a lot of quirks for you too. |
| 1126 |
|
| 1127 |
=head3 The read queue - revisited |
| 1128 |
|
| 1129 |
HTTP always uses the same structure in its responses, but many protocols |
| 1130 |
require parsing responses differently depending on the response itself. |
| 1131 |
|
| 1132 |
For example, in SMTP, you normally get a single response line: |
| 1133 |
|
| 1134 |
220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
| 1135 |
|
| 1136 |
But SMTP also supports multi-line responses: |
| 1137 |
|
| 1138 |
220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
| 1139 |
220-hey guys |
| 1140 |
220 my response is longer than yours |
| 1141 |
|
| 1142 |
To handle this, we need C<unshift_read>. As the name (we hope) implies, |
| 1143 |
C<unshift_read> will not append your read request to the end of the read |
| 1144 |
queue, but will prepend it to the queue instead. |
| 1145 |
|
| 1146 |
This is useful in the situation above: Just push your response-line read |
| 1147 |
request when sending the SMTP command, and when handling it, you look at |
| 1148 |
the line to see if more is to come, and C<unshift_read> another reader |
| 1149 |
callback if required, like this: |
| 1150 |
|
| 1151 |
my $response; # response lines end up in here |
| 1152 |
|
| 1153 |
my $read_response; $read_response = sub { |
| 1154 |
my ($handle, $line) = @_; |
| 1155 |
|
| 1156 |
$response .= "$line\n"; |
| 1157 |
|
| 1158 |
# check for continuation lines ("-" as 4th character") |
| 1159 |
if ($line =~ /^...-/) { |
| 1160 |
# if yes, then unshift another line read |
| 1161 |
$handle->unshift_read (line => $read_response); |
| 1162 |
|
| 1163 |
} else { |
| 1164 |
# otherwise we are done |
| 1165 |
|
| 1166 |
# free callback |
| 1167 |
undef $read_response; |
| 1168 |
|
| 1169 |
print "we are don reading: $response\n"; |
| 1170 |
} |
| 1171 |
}; |
| 1172 |
|
| 1173 |
$handle->push_read (line => $read_response); |
| 1174 |
|
| 1175 |
This recipe can be used for all similar parsing problems, for example in |
| 1176 |
NNTP, the response code to some commands indicates that more data will be |
| 1177 |
sent: |
| 1178 |
|
| 1179 |
$handle->push_write ("article 42"); |
| 1180 |
|
| 1181 |
# read response line |
| 1182 |
$handle->push_read (line => sub { |
| 1183 |
my ($handle, $status) = @_; |
| 1184 |
|
| 1185 |
# article data following? |
| 1186 |
if ($status =~ /^2/) { |
| 1187 |
# yes, read article body |
| 1188 |
|
| 1189 |
$handle->unshift_read (line => "\012.\015\012", sub { |
| 1190 |
my ($handle, $body) = @_; |
| 1191 |
|
| 1192 |
$finish->($status, $body); |
| 1193 |
}); |
| 1194 |
|
| 1195 |
} else { |
| 1196 |
# some error occured, no article data |
| 1197 |
|
| 1198 |
$finish->($status); |
| 1199 |
} |
| 1200 |
} |
| 1201 |
|
| 1202 |
=head3 Your own read queue handler |
| 1203 |
|
| 1204 |
Sometimes your protocol doesn't play nice, and uses lines or chunks of |
| 1205 |
data not formatted in a way handled out of the box by AnyEvent::Handle. |
| 1206 |
In this case you have to implement your own read parser. |
| 1207 |
|
| 1208 |
To make up a contorted example, imagine you are looking for an even |
| 1209 |
number of characters followed by a colon (":"). Also imagine that |
| 1210 |
AnyEvent::Handle has no C<regex> read type which could be used, so you'd |
| 1211 |
have to do it manually. |
| 1212 |
|
| 1213 |
To implement a read handler for this, you would C<push_read> (or |
| 1214 |
C<unshift_read>) a single code reference. |
| 1215 |
|
| 1216 |
This code reference will then be called each time there is (new) data |
| 1217 |
available in the read buffer, and is expected to either successfully |
| 1218 |
eat/consume some of that data (and return true) or to return false to |
| 1219 |
indicate that it wants to be called again. |
| 1220 |
|
| 1221 |
If the code reference returns true, then it will be removed from the |
| 1222 |
read queue (because it has parsed/consumed whatever it was supposed to |
| 1223 |
consume), otherwise it stays in the front of it. |
| 1224 |
|
| 1225 |
The example above could be coded like this: |
| 1226 |
|
| 1227 |
$handle->push_read (sub { |
| 1228 |
my ($handle) = @_; |
| 1229 |
|
| 1230 |
# check for even number of characters + ":" |
| 1231 |
# and remove the data if a match is found. |
| 1232 |
# if not, return false (actually nothing) |
| 1233 |
|
| 1234 |
$handle->{rbuf} =~ s/^( (?:..)* ) ://x |
| 1235 |
or return; |
| 1236 |
|
| 1237 |
# we got some data in $1, pass it to whoever wants it |
| 1238 |
$finish->($1); |
| 1239 |
|
| 1240 |
# and return true to indicate we are done |
| 1241 |
1 |
| 1242 |
}); |
| 1243 |
|
| 1244 |
|
| 1245 |
=head1 Debugging aids |
| 1246 |
|
| 1247 |
Now that you have seen how to use AnyEvent, here's what to use when you |
| 1248 |
don't use it correctly, or simply hit a bug somewhere and want to debug |
| 1249 |
it: |
| 1250 |
|
| 1251 |
=over 4 |
| 1252 |
|
| 1253 |
=item Enable strict argument checking during development |
| 1254 |
|
| 1255 |
AnyEvent does not, by default, do any argument checking. This can lead to |
| 1256 |
strange and unexpected results especially if you are just trying to find |
| 1257 |
your way with AnyEvent. |
| 1258 |
|
| 1259 |
AnyEvent supports a special "strict" mode - off by default - which does |
| 1260 |
very strict argument checking, at the expense of slowing down your |
| 1261 |
program. During development, however, this mode is very useful because it |
| 1262 |
quickly catches the msot common errors. |
| 1263 |
|
| 1264 |
You can enable this strict mode either by having an environment variable |
| 1265 |
C<AE_STRICT> with a true value in your environment: |
| 1266 |
|
| 1267 |
AE_STRICT=1 perl myprog |
| 1268 |
|
| 1269 |
Or you can write C<use AnyEvent::Strict> in your program, which has the |
| 1270 |
same effect (do not do this in production, however). |
| 1271 |
|
| 1272 |
=item Increase verbosity, configure logging |
| 1273 |
|
| 1274 |
AnyEvent, by default, only logs critical messages. If something doesn't |
| 1275 |
work, maybe there was a warning about it that you didn't see because it |
| 1276 |
was suppressed. |
| 1277 |
|
| 1278 |
So during development it is recommended to push up the logging level to at |
| 1279 |
least warn level (C<5>): |
| 1280 |
|
| 1281 |
AE_VERBOSE=5 perl myprog |
| 1282 |
|
| 1283 |
Other levels that might be helpful are debug (C<8>) or even trace (C<9>). |
| 1284 |
|
| 1285 |
AnyEvent's logging is quite versatile - the L<AnyEvent::Log> manpage has |
| 1286 |
all the details. |
| 1287 |
|
| 1288 |
=item Watcher wrapping, tracing, the shell |
| 1289 |
|
| 1290 |
For even more debugging, you can enable watcher wrapping: |
| 1291 |
|
| 1292 |
AE_DEBUG_WRAP=2 perl myprog |
| 1293 |
|
| 1294 |
This will have the effect of wrapping every watcher into a special object |
| 1295 |
that stores a backtrace of when it was created, stores a backtrace |
| 1296 |
when an exception occurs during watcher execution, and stores a lot |
| 1297 |
of other information. If that slows down your program too much, then |
| 1298 |
C<AE_DEBUG_WRAP=1> avoids the costly backtraces. |
| 1299 |
|
| 1300 |
Here is an example of what of information is stored: |
| 1301 |
|
| 1302 |
59148536 DC::DB:472(Server::run)>io>DC::DB::Server::fh_read |
| 1303 |
type: io watcher |
| 1304 |
args: poll r fh GLOB(0x35283f0) |
| 1305 |
created: 2011-09-01 23:13:46.597336 +0200 (1314911626.59734) |
| 1306 |
file: ./blib/lib/Deliantra/Client/private/DC/DB.pm |
| 1307 |
line: 472 |
| 1308 |
subname: DC::DB::Server::run |
| 1309 |
context: |
| 1310 |
tracing: enabled |
| 1311 |
cb: CODE(0x2d1fb98) (DC::DB::Server::fh_read) |
| 1312 |
invoked: 0 times |
| 1313 |
created |
| 1314 |
(eval 25) line 6 AnyEvent::Debug::Wrap::__ANON__('AnyEvent','fh',GLOB(0x35283f0),'poll','r','cb',CODE(0x2d1fb98)=DC::DB::Server::fh_read) |
| 1315 |
DC::DB line 472 AE::io(GLOB(0x35283f0),'0',CODE(0x2d1fb98)=DC::DB::Server::fh_read) |
| 1316 |
bin/deliantra line 2776 DC::DB::Server::run() |
| 1317 |
bin/deliantra line 2941 main::main() |
| 1318 |
|
| 1319 |
There are many ways to get at this data - see the L<AnyEvent::Debug> and |
| 1320 |
L<AnyEvent::Log> manpages for more details. |
| 1321 |
|
| 1322 |
The most interesting and interactive way is to create a debug shell, for |
| 1323 |
example by setting C<AE_DEBUG_SHELL>: |
| 1324 |
|
| 1325 |
AE_DEBUG_WRAP=2 AE_DEBUG_SHELL=$HOME/myshell ./myprog |
| 1326 |
|
| 1327 |
# while myprog is running: |
| 1328 |
socat readline $HOME/myshell |
| 1329 |
|
| 1330 |
Note that anybody who can access F<$HOME/myshell> can make your program |
| 1331 |
do anything he or she wants, so if you are not the only user on your |
| 1332 |
machine, better put it into a secure location (F<$HOME> might not be |
| 1333 |
secure enough). |
| 1334 |
|
| 1335 |
If you don't have C<socat> (a shame!) and care even less about security, |
| 1336 |
you can also use TCP and C<telnet>: |
| 1337 |
|
| 1338 |
AE_DEBUG_WRAP=2 AE_DEBUG_SHELL=127.0.0.1:1234 ./myprog |
| 1339 |
|
| 1340 |
telnet 127.0.0.1 1234 |
| 1341 |
|
| 1342 |
The debug shell can enable and disable tracing of watcher invocations, |
| 1343 |
can display the trace output, give you a list of watchers and lets you |
| 1344 |
investigate watchers in detail. |
| 1345 |
|
| 1346 |
=back |
| 1347 |
|
| 1348 |
This concludes our little tutorial. |
| 1349 |
|
| 1350 |
|
| 1351 |
=head1 Where to go from here? |
| 1352 |
|
| 1353 |
This introduction should have explained the key concepts of L<AnyEvent> |
| 1354 |
- event watchers and condition variables, L<AnyEvent::Socket> - basic |
| 1355 |
networking utilities, and L<AnyEvent::Handle> - a nice wrapper around |
| 1356 |
sockets. |
| 1357 |
|
| 1358 |
You could either start coding stuff right away, look at those manual |
| 1359 |
pages for the gory details, or roam CPAN for other AnyEvent modules (such |
| 1360 |
as L<AnyEvent::IRC> or L<AnyEvent::HTTP>) to see more code examples (or |
| 1361 |
simply to use them). |
| 1362 |
|
| 1363 |
If you need a protocol that doesn't have an implementation using AnyEvent, |
| 1364 |
remember that you can mix AnyEvent with one other event framework, such as |
| 1365 |
L<POE>, so you can always use AnyEvent for your own tasks plus modules of |
| 1366 |
one other event framework to fill any gaps. |
| 1367 |
|
| 1368 |
And last not least, you could also look at L<Coro>, especially |
| 1369 |
L<Coro::AnyEvent>, to see how you can turn event-based programming from |
| 1370 |
callback style back to the usual imperative style (also called "inversion |
| 1371 |
of control" - AnyEvent calls I<you>, but Coro lets I<you> call AnyEvent). |
| 1372 |
|
| 1373 |
=head1 Authors |
| 1374 |
|
| 1375 |
Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>. |
| 1376 |
|