--- AnyEvent/lib/AnyEvent.pm	2008/04/25 02:03:18	1.62
+++ AnyEvent/lib/AnyEvent.pm	2008/04/25 07:43:25	1.72
@@ -436,7 +436,7 @@
 
 use Carp;
 
-our $VERSION = '3.2';
+our $VERSION = '3.3';
 our $MODEL;
 
 our $AUTOLOAD;
@@ -862,6 +862,114 @@
 
    $quit->wait;
 
+
+=head1 BENCHMARK
+
+To give you an idea of the performance and overheads that AnyEvent adds
+over the event loops themselves (and to give you an impression of the
+speed of various event loops), here is a benchmark of various supported
+event models natively and with anyevent. The benchmark creates a lot of
+timers (with a zero timeout) and io watchers (watching STDOUT, a pty, to
+become writable, which it is), lets them fire exactly once and destroys
+them again.
+
+=head2 Explanation of the columns
+
+I<watcher> is the number of event watchers created/destroyed. Since
+different event models feature vastly different performances, each event
+loop was given a number of watchers so that overall runtime is acceptable
+and similar between tested event loop (and keep them from crashing): Glib
+would probably take thousands of years if asked to process the same number
+of watchers as EV in this benchmark.
+
+I<bytes> is the number of bytes (as measured by the resident set size,
+RSS) consumed by each watcher. This method of measuring captures both C
+and Perl-based overheads.
+
+I<create> is the time, in microseconds (millionths of seconds), that it
+takes to create a single watcher. The callback is a closure shared between
+all watchers, to avoid adding memory overhead. That means closure creation
+and memory usage is not included in the figures.
+
+I<invoke> is the time, in microseconds, used to invoke a simple
+callback. The callback simply counts down a Perl variable and after it was
+invoked "watcher" times, it would C<< ->broadcast >> a condvar once to
+signal the end of this phase.
+
+I<destroy> is the time, in microseconds, that it takes to destroy a single
+watcher.
+
+=head2 Results
+
+          name watcher bytes create invoke destroy comment
+         EV/EV  400000   244   0.56   0.46    0.31 EV native interface
+        EV/Any  100000   610   3.52   0.91    0.75 EV + AnyEvent watchers
+    CoroEV/Any  100000   610   3.49   0.92    0.75 coroutines + Coro::Signal
+      Perl/Any   16000   654   4.64   1.22    0.77 pure perl implementation
+   Event/Event   16000   523  28.05  21.38    0.86 Event native interface
+     Event/Any   16000   943  34.43  20.48    1.39 Event + AnyEvent watchers
+      Glib/Any   16000  1357  96.99  12.55   55.51 quadratic behaviour
+        Tk/Any    2000  1855  27.01  66.61   14.03 SEGV with >> 2000 watchers
+     POE/Event    2000  6644 108.15 768.19   14.33 via POE::Loop::Event
+    POE/Select    2000  6343  94.69 807.65  562.69 via POE::Loop::Select
+
+=head2 Discussion
+
+The benchmark does I<not> measure scalability of the event loop very
+well. For example, a select-based event loop (such as the pure perl one)
+can never compete with an event loop that uses epoll when the number of
+file descriptors grows high. In this benchmark, only a single filehandle
+is used (although some of the AnyEvent adaptors dup() its file descriptor
+to worka round bugs).
+
+C<EV> is the sole leader regarding speed and memory use, which are both
+maximal/minimal, respectively. Even when going through AnyEvent, there is
+only one event loop that uses less memory (the C<Event> module natively), and
+no faster event model, not event C<Event> natively.
+
+The pure perl implementation is hit in a few sweet spots (both the
+zero timeout and the use of a single fd hit optimisations in the perl
+interpreter and the backend itself). Nevertheless tis shows that it
+adds very little overhead in itself. Like any select-based backend its
+performance becomes really bad with lots of file descriptors, of course,
+but this was not subjetc of this benchmark.
+
+The C<Event> module has a relatively high setup and callback invocation cost,
+but overall scores on the third place.
+
+C<Glib>'s memory usage is quite a bit bit higher, features a faster
+callback invocation and overall lands in the same class as C<Event>.
+
+The C<Tk> adaptor works relatively well, the fact that it crashes with
+more than 2000 watchers is a big setback, however, as correctness takes
+precedence over speed. Nevertheless, its performance is surprising, as the
+file descriptor is dup()ed for each watcher. This shows that the dup()
+employed by some adaptors is not a big performance issue (it does incur a
+hidden memory cost inside the kernel, though).
+
+C<POE>, regardless of underlying event loop (wether using its pure perl
+select-based backend or the Event module) shows abysmal performance and
+memory usage: Watchers use almost 30 times as much memory as EV watchers,
+and 10 times as much memory as both Event or EV via AnyEvent. Watcher
+invocation is almost 700 times slower than with AnyEvent's pure perl
+implementation. The design of the POE adaptor class in AnyEvent can not
+really account for this, as session creation overhead is small compared
+to execution of the state machine, which is coded pretty optimally within
+L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.
+
+=head2 Summary
+
+Using EV through AnyEvent is faster than any other event loop, but most
+event loops have acceptable performance with or without AnyEvent.
+
+The overhead AnyEvent adds is usually much smaller than the overhead of
+the actual event loop, only with extremely fast event loops such as the EV
+adds Anyevent significant overhead.
+
+And you should simply avoid POE like the plague if you want performance or
+reasonable memory usage.
+
+
 =head1 FORK
 
 Most event libraries are not fork-safe. The ones who are usually are
@@ -870,6 +978,7 @@
 If you have to fork, you must either do so I<before> creating your first
 watcher OR you must not use AnyEvent at all in the child.
 
+
 =head1 SECURITY CONSIDERATIONS
 
 AnyEvent can be forced to load any event model via
@@ -886,6 +995,7 @@
 
   use AnyEvent;
 
+
 =head1 SEE ALSO
 
 Event modules: L<Coro::EV>, L<EV>, L<EV::Glib>, L<Glib::EV>,
@@ -899,6 +1009,7 @@
 
 Nontrivial usage examples: L<Net::FCP>, L<Net::XMPP2>.
 
+
 =head1 AUTHOR
 
  Marc Lehmann <schmorp@schmorp.de>