--- AnyEvent/lib/AnyEvent.pm	2008/04/25 06:54:08	1.64
+++ AnyEvent/lib/AnyEvent.pm	2008/04/25 09:08:16	1.79
@@ -138,7 +138,7 @@
 my variables are only visible after the statement in which they are
 declared.
 
-=head2 IO WATCHERS
+=head2 I/O WATCHERS
 
 You can create an I/O watcher by calling the C<< AnyEvent->io >> method
 with the following mandatory key-value pairs as arguments:
@@ -708,7 +708,7 @@
 
 =head1 EXAMPLE PROGRAM
 
-The following program uses an IO watcher to read data from STDIN, a timer
+The following program uses an I/O watcher to read data from STDIN, a timer
 to display a message once per second, and a condition variable to quit the
 program when the user enters quit:
 
@@ -865,74 +865,122 @@
 
 =head1 BENCHMARK
 
-To give you an idea of the performance an doverheads that AnyEvent adds
-over the backends, here is a benchmark of various supported backends. The
-benchmark creates a lot of timers (with zero timeout) and io events
-(watching STDOUT, a pty, to become writable).
-
-Explanation of the fields:
-
-I<watcher> is the number of event watchers created/destroyed. Sicne
-different event models have vastly different performance each backend was
-handed a number of watchers so that overall runtime is acceptable and
-similar to all backends (and keep them from crashing).
-
-I<bytes> is the number of bytes (as measured by resident set size) used by
-each watcher.
-
-I<create> is the time, in microseconds, to create a single watcher.
-
-I<invoke> is the time, in microseconds, used to invoke a simple callback
-that simply counts down.
-
-I<destroy> is the time, in microseconds, to destroy a single watcher.
-
-          name watcher bytes create invoke destroy comment
-         EV/EV  400000   244   0.56   0.46    0.31 EV native interface
-        EV/Any  100000   610   3.52   0.91    0.75 
-    CoroEV/Any  100000   610   3.49   0.92    0.75 coroutines + Coro::Signal
-      Perl/Any   10000   654   4.64   1.22    0.77 pure perl implementation
-   Event/Event   10000   523  28.05  21.38    5.22 Event native interface
-     Event/Any   10000   943  34.43  20.48    1.39
-      Glib/Any   16000  1357  96.99  12.55   55.51 quadratic behaviour
-        Tk/Any    2000  1855  27.01  66.61   14.03 SEGV with >> 2000 watchers
-    POE/Select    2000  6343  94.69 807.65  562.69 POE::Loop::Select
-     POE/Event    2000  6644 108.15 768.19   14.33 POE::Loop::Event
-
-Discussion: The benchmark does I<not> bench scalability of the
-backend. For example a select-based backend (such as the pureperl one) can
-never compete with a backend using epoll. In this benchmark, only a single
-filehandle is used.
-
-EV is the sole leader regarding speed and memory use, which are both
-maximal/minimal. Even when going through AnyEvent, there is only one event
-loop that uses less memory (the Event module natively), and no faster
-event model.
+To give you an idea of the performance and overheads that AnyEvent adds
+over the event loops themselves (and to give you an impression of the
+speed of various event loops), here is a benchmark of various supported
+event models natively and with anyevent. The benchmark creates a lot of
+timers (with a zero timeout) and I/O watchers (watching STDOUT, a pty, to
+become writable, which it is), lets them fire exactly once and destroys
+them again.
+
+Rewriting the benchmark to use many different sockets instead of using
+the same filehandle for all I/O watchers results in a much longer runtime
+(socket creation is expensive), but qualitatively the same figures, so it
+was not used.
+
+=head2 Explanation of the columns
+
+I<watcher> is the number of event watchers created/destroyed. Since
+different event models feature vastly different performances, each event
+loop was given a number of watchers so that overall runtime is acceptable
+and similar between tested event loop (and keep them from crashing): Glib
+would probably take thousands of years if asked to process the same number
+of watchers as EV in this benchmark.
+
+I<bytes> is the number of bytes (as measured by the resident set size,
+RSS) consumed by each watcher. This method of measuring captures both C
+and Perl-based overheads.
+
+I<create> is the time, in microseconds (millionths of seconds), that it
+takes to create a single watcher. The callback is a closure shared between
+all watchers, to avoid adding memory overhead. That means closure creation
+and memory usage is not included in the figures.
+
+I<invoke> is the time, in microseconds, used to invoke a simple
+callback. The callback simply counts down a Perl variable and after it was
+invoked "watcher" times, it would C<< ->broadcast >> a condvar once to
+signal the end of this phase.
+
+I<destroy> is the time, in microseconds, that it takes to destroy a single
+watcher.
+
+=head2 Results
+
+          name watchers bytes create invoke destroy comment
+         EV/EV   400000   244   0.56   0.46    0.31 EV native interface
+        EV/Any   100000   610   3.52   0.91    0.75 EV + AnyEvent watchers
+    CoroEV/Any   100000   610   3.49   0.92    0.75 coroutines + Coro::Signal
+      Perl/Any   100000   513   4.91   0.92    1.15 pure perl implementation
+   Event/Event    16000   523  28.05  21.38    0.86 Event native interface
+     Event/Any    16000   943  34.43  20.48    1.39 Event + AnyEvent watchers
+      Glib/Any    16000  1357  96.99  12.55   55.51 quadratic behaviour
+        Tk/Any     2000  1855  27.01  66.61   14.03 SEGV with >> 2000 watchers
+     POE/Event     2000  6644 108.15 768.19   14.33 via POE::Loop::Event
+    POE/Select     2000  6343  94.69 807.65  562.69 via POE::Loop::Select
+
+=head2 Discussion
+
+The benchmark does I<not> measure scalability of the event loop very
+well. For example, a select-based event loop (such as the pure perl one)
+can never compete with an event loop that uses epoll when the number of
+file descriptors grows high. In this benchmark, only a single filehandle
+is used (although some of the AnyEvent adaptors dup() its file descriptor
+to worka round bugs).
+
+C<EV> is the sole leader regarding speed and memory use, which are both
+maximal/minimal, respectively. Even when going through AnyEvent, there are
+only two event loops that use slightly less memory (the C<Event> module
+natively and the pure perl backend), and no faster event models, not even
+C<Event> natively.
 
 The pure perl implementation is hit in a few sweet spots (both the
 zero timeout and the use of a single fd hit optimisations in the perl
-interpreter and the backend itself), but it shows that it adds very little
-overhead in itself. Like any select-based backend it's performance becomes
-really bad with lots of file descriptors.
+interpreter and the backend itself, and all watchers become ready at the
+same time). Nevertheless this shows that it adds very little overhead in
+itself. Like any select-based backend its performance becomes really bad
+with lots of file descriptors (and few of them active), of course, but
+this was not subject of this benchmark.
 
-The Event module has a relatively high setup and callback invocation cost,
+The C<Event> module has a relatively high setup and callback invocation cost,
 but overall scores on the third place.
 
-Glib has a little higher memory cost, a bit fster callback invocation and
-has a similar speed as Event.
+C<Glib>'s memory usage is quite a bit bit higher, but it features a
+faster callback invocation and overall ends up in the same class as
+C<Event>. However, Glib scales extremely badly, doubling the number of
+watchers increases the processing time by more than a factor of four,
+making it completely unusable when using larger numbers of watchers
+(note that only a single file descriptor was used in the benchmark, so
+inefficiencies of C<poll> do not account for this).
 
-The Tk backend works relatively well, the fact that it crashes with
+The C<Tk> adaptor works relatively well. The fact that it crashes with
 more than 2000 watchers is a big setback, however, as correctness takes
-precedence over speed.
+precedence over speed. Nevertheless, its performance is surprising, as the
+file descriptor is dup()ed for each watcher. This shows that the dup()
+employed by some adaptors is not a big performance issue (it does incur a
+hidden memory cost inside the kernel, though, that is not reflected in the
+figures above).
+
+C<POE>, regardless of underlying event loop (wether using its pure perl
+select-based backend or the Event module) shows abysmal performance and
+memory usage: Watchers use almost 30 times as much memory as EV watchers,
+and 10 times as much memory as both Event or EV via AnyEvent. Watcher
+invocation is almost 900 times slower than with AnyEvent's pure perl
+implementation. The design of the POE adaptor class in AnyEvent can not
+really account for this, as session creation overhead is small compared
+to execution of the state machine, which is coded pretty optimally within
+L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.
+
+=head2 Summary
+
+Using EV through AnyEvent is faster than any other event loop, but most
+event loops have acceptable performance with or without AnyEvent.
+
+The overhead AnyEvent adds is usually much smaller than the overhead of
+the actual event loop, only with extremely fast event loops such as the EV
+adds AnyEvent significant overhead.
 
-POE, regardless of backend (wether it's pure perl select backend or the
-Event backend) shows abysmal performance and memory usage: Watchers use
-almost 30 times as much memory as EV watchers, and 10 times as much memory
-as both Event or EV via AnyEvent.
-
-Summary: using EV through AnyEvent is faster than any other event
-loop. The overhead AnyEvent adds can be very small, and you should avoid
-POE like the plague if you want performance or reasonable memory usage.
+And you should simply avoid POE like the plague if you want performance or
+reasonable memory usage.
 
 
 =head1 FORK