… | |
… | |
864 | |
864 | |
865 | |
865 | |
866 | =head1 BENCHMARK |
866 | =head1 BENCHMARK |
867 | |
867 | |
868 | To give you an idea of the performance and overheads that AnyEvent adds |
868 | To give you an idea of the performance and overheads that AnyEvent adds |
869 | over the backends directly, here is a benchmark of various supported event |
869 | over the event loops directly, here is a benchmark of various supported |
870 | models natively and with anyevent. The benchmark creates a lot of timers |
870 | event models natively and with anyevent. The benchmark creates a lot of |
871 | (with a zero timeout) and io watchers (watching STDOUT, a pty, to become |
871 | timers (with a zero timeout) and io watchers (watching STDOUT, a pty, to |
872 | writable, which it is), lets them fire exactly once and destroys them |
872 | become writable, which it is), lets them fire exactly once and destroys |
873 | again. |
873 | them again. |
874 | |
874 | |
875 | Explanation of the fields: |
875 | =head2 Explanation of the columns |
876 | |
876 | |
877 | I<watcher> is the number of event watchers created/destroyed. Sicne |
877 | I<watcher> is the number of event watchers created/destroyed. Since |
878 | different event models have vastly different performance each backend was |
878 | different event models feature vastly different performances, each event |
879 | handed a number of watchers so that overall runtime is acceptable and |
879 | loop was given a number of watchers so that overall runtime is acceptable |
880 | similar to all backends (and keep them from crashing). |
880 | and similar between tested event loop (and keep them from crashing): Glib |
|
|
881 | would probably take thousands of years if asked to process the same number |
|
|
882 | of watchers as EV in this benchmark. |
881 | |
883 | |
882 | I<bytes> is the number of bytes (as measured by resident set size) used by |
884 | I<bytes> is the number of bytes (as measured by the resident set size, |
883 | each watcher. |
885 | RSS) consumed by each watcher. This method of measuring captures both C |
|
|
886 | and Perl-based overheads. |
884 | |
887 | |
885 | I<create> is the time, in microseconds, to create a single watcher. |
888 | I<create> is the time, in microseconds (millionths of seconds), that it |
|
|
889 | takes to create a single watcher. The callback is a closure shared between |
|
|
890 | all watchers, to avoid adding memory overhead. That means closure creation |
|
|
891 | and memory usage is not included in the figures. |
886 | |
892 | |
887 | I<invoke> is the time, in microseconds, used to invoke a simple callback |
893 | I<invoke> is the time, in microseconds, used to invoke a simple |
888 | that simply counts down. |
894 | callback. The callback simply counts down a Perl variable and after it was |
|
|
895 | invoked "watcher" times, it would C<< ->broadcast >> a condvar once to |
|
|
896 | signal the end of this phase. |
889 | |
897 | |
890 | I<destroy> is the time, in microseconds, to destroy a single watcher. |
898 | I<destroy> is the time, in microseconds, that it takes destroy a single |
|
|
899 | watcher. |
|
|
900 | |
|
|
901 | =head2 Results |
891 | |
902 | |
892 | name watcher bytes create invoke destroy comment |
903 | name watcher bytes create invoke destroy comment |
893 | EV/EV 400000 244 0.56 0.46 0.31 EV native interface |
904 | EV/EV 400000 244 0.56 0.46 0.31 EV native interface |
894 | EV/Any 100000 610 3.52 0.91 0.75 |
905 | EV/Any 100000 610 3.52 0.91 0.75 |
895 | CoroEV/Any 100000 610 3.49 0.92 0.75 coroutines + Coro::Signal |
906 | CoroEV/Any 100000 610 3.49 0.92 0.75 coroutines + Coro::Signal |
… | |
… | |
899 | Glib/Any 16000 1357 96.99 12.55 55.51 quadratic behaviour |
910 | Glib/Any 16000 1357 96.99 12.55 55.51 quadratic behaviour |
900 | Tk/Any 2000 1855 27.01 66.61 14.03 SEGV with >> 2000 watchers |
911 | Tk/Any 2000 1855 27.01 66.61 14.03 SEGV with >> 2000 watchers |
901 | POE/Select 2000 6343 94.69 807.65 562.69 POE::Loop::Select |
912 | POE/Select 2000 6343 94.69 807.65 562.69 POE::Loop::Select |
902 | POE/Event 2000 6644 108.15 768.19 14.33 POE::Loop::Event |
913 | POE/Event 2000 6644 108.15 768.19 14.33 POE::Loop::Event |
903 | |
914 | |
904 | Discussion: The benchmark does I<not> bench scalability of the |
915 | =head2 Discussion |
|
|
916 | |
|
|
917 | The benchmark does I<not> measure scalability of the event loop very |
905 | backend. For example a select-based backend (such as the pureperl one) can |
918 | well. For example, a select-based event loop (such as the pure perl one) |
906 | never compete with a backend using epoll. In this benchmark, only a single |
919 | can never compete with an event loop that uses epoll when the number of |
907 | filehandle is used. |
920 | file descriptors grows high. In this benchmark, only a single filehandle |
|
|
921 | is used (although some of the AnyEvent adaptors dup() its file descriptor |
|
|
922 | to worka round bugs). |
908 | |
923 | |
909 | EV is the sole leader regarding speed and memory use, which are both |
924 | C<EV> is the sole leader regarding speed and memory use, which are both |
910 | maximal/minimal. Even when going through AnyEvent, there is only one event |
925 | maximal/minimal, respectively. Even when going through AnyEvent, there is |
911 | loop that uses less memory (the Event module natively), and no faster |
926 | only one event loop that uses less memory (the C<Event> module natively), and |
912 | event model. |
927 | no faster event model, not event C<Event> natively. |
913 | |
928 | |
914 | The pure perl implementation is hit in a few sweet spots (both the |
929 | The pure perl implementation is hit in a few sweet spots (both the |
915 | zero timeout and the use of a single fd hit optimisations in the perl |
930 | zero timeout and the use of a single fd hit optimisations in the perl |
916 | interpreter and the backend itself), but it shows that it adds very little |
931 | interpreter and the backend itself). Nevertheless tis shows that it |
917 | overhead in itself. Like any select-based backend it's performance becomes |
932 | adds very little overhead in itself. Like any select-based backend its |
918 | really bad with lots of file descriptors. |
933 | performance becomes really bad with lots of file descriptors, of course, |
|
|
934 | but this was not subjetc of this benchmark. |
919 | |
935 | |
920 | The Event module has a relatively high setup and callback invocation cost, |
936 | The C<Event> module has a relatively high setup and callback invocation cost, |
921 | but overall scores on the third place. |
937 | but overall scores on the third place. |
922 | |
938 | |
923 | Glib has a little higher memory cost, a bit fster callback invocation and |
939 | C<Glib>'s memory usage is quite a bit bit higher, features a faster |
924 | has a similar speed as Event. |
940 | callback invocation and overall lands in the same class as C<Event>. |
925 | |
941 | |
926 | The Tk backend works relatively well, the fact that it crashes with |
942 | The C<Tk> adaptor works relatively well, the fact that it crashes with |
927 | more than 2000 watchers is a big setback, however, as correctness takes |
943 | more than 2000 watchers is a big setback, however, as correctness takes |
928 | precedence over speed. |
944 | precedence over speed. Nevertheless, its performance is surprising, as the |
|
|
945 | file descriptor is dup()ed for each watcher. This shows that the dup() |
|
|
946 | employed by some adaptors is not a big performance issue (it does incur a |
|
|
947 | hidden memory cost inside the kernel, though). |
929 | |
948 | |
930 | POE, regardless of backend (wether it's pure perl select backend or the |
949 | C<POE>, regardless of backend (wether using its pure perl select-based |
931 | Event backend) shows abysmal performance and memory usage: Watchers use |
950 | backend or the Event backend) shows abysmal performance and memory |
932 | almost 30 times as much memory as EV watchers, and 10 times as much memory |
951 | usage: Watchers use almost 30 times as much memory as EV watchers, and 10 |
933 | as both Event or EV via AnyEvent. |
952 | times as much memory as both Event or EV via AnyEvent. Watcher invocation |
|
|
953 | is almost 700 times slower as with AnyEvent's pure perl implementation. |
934 | |
954 | |
935 | Summary: using EV through AnyEvent is faster than any other event |
955 | Summary: using EV through AnyEvent is faster than any other event |
936 | loop. The overhead AnyEvent adds can be very small, and you should avoid |
956 | loop. The overhead AnyEvent adds can be very small, and you should avoid |
937 | POE like the plague if you want performance or reasonable memory usage. |
957 | POE like the plague if you want performance or reasonable memory usage. |
938 | |
958 | |