--- AnyEvent/lib/AnyEvent.pm	2008/04/25 06:54:08	1.64
+++ AnyEvent/lib/AnyEvent.pm	2008/04/25 13:05:17	1.81
@@ -82,9 +82,9 @@
 During the first call of any watcher-creation method, the module tries
 to detect the currently loaded event loop by probing whether one of the
 following modules is already loaded: L<Coro::EV>, L<Coro::Event>, L<EV>,
-L<Event>, L<Glib>, L<Tk>, L<AnyEvent::Impl::Perl>, L<Event::Lib>, L<Qt>,
+L<Event>, L<Glib>, L<AnyEvent::Impl::Perl>, L<Tk>, L<Event::Lib>, L<Qt>,
 L<POE>. The first one found is used. If none are found, the module tries
-to load these modules (excluding Event::Lib, Qt and POE as the pure perl
+to load these modules (excluding Tk, Event::Lib, Qt and POE as the pure perl
 adaptor should always succeed) in the order given. The first one that can
 be successfully loaded will be used. If, after this, still none could be
 found, AnyEvent will fall back to a pure-perl event loop, which is not
@@ -138,7 +138,7 @@
 my variables are only visible after the statement in which they are
 declared.
 
-=head2 IO WATCHERS
+=head2 I/O WATCHERS
 
 You can create an I/O watcher by calling the C<< AnyEvent->io >> method
 with the following mandatory key-value pairs as arguments:
@@ -361,8 +361,8 @@
    AnyEvent::Impl::EV        based on EV (an interface to libev, best choice).
    AnyEvent::Impl::Event     based on Event, second best choice.
    AnyEvent::Impl::Glib      based on Glib, third-best choice.
-   AnyEvent::Impl::Tk        based on Tk, very bad choice.
    AnyEvent::Impl::Perl      pure-perl implementation, inefficient but portable.
+   AnyEvent::Impl::Tk        based on Tk, very bad choice.
    AnyEvent::Impl::Qt        based on Qt, cannot be autoprobed (see its docs).
    AnyEvent::Impl::EventLib  based on Event::Lib, leaks memory and worse.
    AnyEvent::Impl::POE       based on POE, not generic enough for full support.
@@ -708,7 +708,7 @@
 
 =head1 EXAMPLE PROGRAM
 
-The following program uses an IO watcher to read data from STDIN, a timer
+The following program uses an I/O watcher to read data from STDIN, a timer
 to display a message once per second, and a condition variable to quit the
 program when the user enters quit:
 
@@ -865,74 +865,122 @@
 
 =head1 BENCHMARK
 
-To give you an idea of the performance an doverheads that AnyEvent adds
-over the backends, here is a benchmark of various supported backends. The
-benchmark creates a lot of timers (with zero timeout) and io events
-(watching STDOUT, a pty, to become writable).
-
-Explanation of the fields:
-
-I<watcher> is the number of event watchers created/destroyed. Sicne
-different event models have vastly different performance each backend was
-handed a number of watchers so that overall runtime is acceptable and
-similar to all backends (and keep them from crashing).
-
-I<bytes> is the number of bytes (as measured by resident set size) used by
-each watcher.
-
-I<create> is the time, in microseconds, to create a single watcher.
-
-I<invoke> is the time, in microseconds, used to invoke a simple callback
-that simply counts down.
-
-I<destroy> is the time, in microseconds, to destroy a single watcher.
-
-          name watcher bytes create invoke destroy comment
-         EV/EV  400000   244   0.56   0.46    0.31 EV native interface
-        EV/Any  100000   610   3.52   0.91    0.75 
-    CoroEV/Any  100000   610   3.49   0.92    0.75 coroutines + Coro::Signal
-      Perl/Any   10000   654   4.64   1.22    0.77 pure perl implementation
-   Event/Event   10000   523  28.05  21.38    5.22 Event native interface
-     Event/Any   10000   943  34.43  20.48    1.39
-      Glib/Any   16000  1357  96.99  12.55   55.51 quadratic behaviour
-        Tk/Any    2000  1855  27.01  66.61   14.03 SEGV with >> 2000 watchers
-    POE/Select    2000  6343  94.69 807.65  562.69 POE::Loop::Select
-     POE/Event    2000  6644 108.15 768.19   14.33 POE::Loop::Event
-
-Discussion: The benchmark does I<not> bench scalability of the
-backend. For example a select-based backend (such as the pureperl one) can
-never compete with a backend using epoll. In this benchmark, only a single
-filehandle is used.
-
-EV is the sole leader regarding speed and memory use, which are both
-maximal/minimal. Even when going through AnyEvent, there is only one event
-loop that uses less memory (the Event module natively), and no faster
-event model.
+To give you an idea of the performance and overheads that AnyEvent adds
+over the event loops themselves (and to give you an impression of the
+speed of various event loops), here is a benchmark of various supported
+event models natively and with anyevent. The benchmark creates a lot of
+timers (with a zero timeout) and I/O watchers (watching STDOUT, a pty, to
+become writable, which it is), lets them fire exactly once and destroys
+them again.
+
+Rewriting the benchmark to use many different sockets instead of using
+the same filehandle for all I/O watchers results in a much longer runtime
+(socket creation is expensive), but qualitatively the same figures, so it
+was not used.
+
+=head2 Explanation of the columns
+
+I<watcher> is the number of event watchers created/destroyed. Since
+different event models feature vastly different performances, each event
+loop was given a number of watchers so that overall runtime is acceptable
+and similar between tested event loop (and keep them from crashing): Glib
+would probably take thousands of years if asked to process the same number
+of watchers as EV in this benchmark.
+
+I<bytes> is the number of bytes (as measured by the resident set size,
+RSS) consumed by each watcher. This method of measuring captures both C
+and Perl-based overheads.
+
+I<create> is the time, in microseconds (millionths of seconds), that it
+takes to create a single watcher. The callback is a closure shared between
+all watchers, to avoid adding memory overhead. That means closure creation
+and memory usage is not included in the figures.
+
+I<invoke> is the time, in microseconds, used to invoke a simple
+callback. The callback simply counts down a Perl variable and after it was
+invoked "watcher" times, it would C<< ->broadcast >> a condvar once to
+signal the end of this phase.
+
+I<destroy> is the time, in microseconds, that it takes to destroy a single
+watcher.
+
+=head2 Results
+
+          name watchers bytes create invoke destroy comment
+         EV/EV   400000   244   0.56   0.46    0.31 EV native interface
+        EV/Any   100000   610   3.52   0.91    0.75 EV + AnyEvent watchers
+    CoroEV/Any   100000   610   3.49   0.92    0.75 coroutines + Coro::Signal
+      Perl/Any   100000   513   4.91   0.92    1.15 pure perl implementation
+   Event/Event    16000   523  28.05  21.38    0.86 Event native interface
+     Event/Any    16000   943  34.43  20.48    1.39 Event + AnyEvent watchers
+      Glib/Any    16000  1357  96.99  12.55   55.51 quadratic behaviour
+        Tk/Any     2000  1855  27.01  66.61   14.03 SEGV with >> 2000 watchers
+     POE/Event     2000  6644 108.15 768.19   14.33 via POE::Loop::Event
+    POE/Select     2000  6343  94.69 807.65  562.69 via POE::Loop::Select
+
+=head2 Discussion
+
+The benchmark does I<not> measure scalability of the event loop very
+well. For example, a select-based event loop (such as the pure perl one)
+can never compete with an event loop that uses epoll when the number of
+file descriptors grows high. In this benchmark, all events become ready at
+the same time, so select/poll-based implementations get an unnatural speed
+boost.
+
+C<EV> is the sole leader regarding speed and memory use, which are both
+maximal/minimal, respectively. Even when going through AnyEvent, there are
+only two event loops that use slightly less memory (the C<Event> module
+natively and the pure perl backend), and no faster event models, not even
+C<Event> natively.
 
 The pure perl implementation is hit in a few sweet spots (both the
 zero timeout and the use of a single fd hit optimisations in the perl
-interpreter and the backend itself), but it shows that it adds very little
-overhead in itself. Like any select-based backend it's performance becomes
-really bad with lots of file descriptors.
+interpreter and the backend itself, and all watchers become ready at the
+same time). Nevertheless this shows that it adds very little overhead in
+itself. Like any select-based backend its performance becomes really bad
+with lots of file descriptors (and few of them active), of course, but
+this was not subject of this benchmark.
 
-The Event module has a relatively high setup and callback invocation cost,
+The C<Event> module has a relatively high setup and callback invocation cost,
 but overall scores on the third place.
 
-Glib has a little higher memory cost, a bit fster callback invocation and
-has a similar speed as Event.
+C<Glib>'s memory usage is quite a bit bit higher, but it features a
+faster callback invocation and overall ends up in the same class as
+C<Event>. However, Glib scales extremely badly, doubling the number of
+watchers increases the processing time by more than a factor of four,
+making it completely unusable when using larger numbers of watchers
+(note that only a single file descriptor was used in the benchmark, so
+inefficiencies of C<poll> do not account for this).
 
-The Tk backend works relatively well, the fact that it crashes with
+The C<Tk> adaptor works relatively well. The fact that it crashes with
 more than 2000 watchers is a big setback, however, as correctness takes
-precedence over speed.
+precedence over speed. Nevertheless, its performance is surprising, as the
+file descriptor is dup()ed for each watcher. This shows that the dup()
+employed by some adaptors is not a big performance issue (it does incur a
+hidden memory cost inside the kernel, though, that is not reflected in the
+figures above).
+
+C<POE>, regardless of underlying event loop (wether using its pure perl
+select-based backend or the Event module) shows abysmal performance and
+memory usage: Watchers use almost 30 times as much memory as EV watchers,
+and 10 times as much memory as both Event or EV via AnyEvent. Watcher
+invocation is almost 900 times slower than with AnyEvent's pure perl
+implementation. The design of the POE adaptor class in AnyEvent can not
+really account for this, as session creation overhead is small compared
+to execution of the state machine, which is coded pretty optimally within
+L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.
+
+=head2 Summary
+
+Using EV through AnyEvent is faster than any other event loop, but most
+event loops have acceptable performance with or without AnyEvent.
+
+The overhead AnyEvent adds is usually much smaller than the overhead of
+the actual event loop, only with extremely fast event loops such as the EV
+adds AnyEvent significant overhead.
 
-POE, regardless of backend (wether it's pure perl select backend or the
-Event backend) shows abysmal performance and memory usage: Watchers use
-almost 30 times as much memory as EV watchers, and 10 times as much memory
-as both Event or EV via AnyEvent.
-
-Summary: using EV through AnyEvent is faster than any other event
-loop. The overhead AnyEvent adds can be very small, and you should avoid
-POE like the plague if you want performance or reasonable memory usage.
+And you should simply avoid POE like the plague if you want performance or
+reasonable memory usage.
 
 
 =head1 FORK