ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent/lib/AnyEvent.pm
(Generate patch)

Comparing AnyEvent/lib/AnyEvent.pm (file contents):
Revision 1.90 by root, Fri Apr 25 14:24:29 2008 UTC vs.
Revision 1.91 by root, Sat Apr 26 02:27:30 2008 UTC

894 }); 894 });
895 895
896 $quit->wait; 896 $quit->wait;
897 897
898 898
899=head1 BENCHMARK 899=head1 BENCHMARKS
900 900
901To give you an idea of the performance and overheads that AnyEvent adds 901To give you an idea of the performance and overheads that AnyEvent adds
902over the event loops themselves (and to give you an impression of the 902over the event loops themselves and to give you an impression of the speed
903speed of various event loops), here is a benchmark of various supported 903of various event loops I prepared some benchmarks.
904event models natively and with anyevent. The benchmark creates a lot of 904
905timers (with a zero timeout) and I/O watchers (watching STDOUT, a pty, to 905=head2 BENCHMARKING ANYEVENT OVERHEAD
906
907Here is a benchmark of various supported event models used natively and
908through anyevent. The benchmark creates a lot of timers (with a zero
909timeout) and I/O watchers (watching STDOUT, a pty, to become writable,
906become writable, which it is), lets them fire exactly once and destroys 910which it is), lets them fire exactly once and destroys them again.
907them again.
908 911
909Rewriting the benchmark to use many different sockets instead of using 912Source code for this benchmark is found as F<eg/bench> in the AnyEvent
910the same filehandle for all I/O watchers results in a much longer runtime 913distribution.
911(socket creation is expensive), but qualitatively the same figures, so it
912was not used.
913 914
914=head2 Explanation of the columns 915=head3 Explanation of the columns
915 916
916I<watcher> is the number of event watchers created/destroyed. Since 917I<watcher> is the number of event watchers created/destroyed. Since
917different event models feature vastly different performances, each event 918different event models feature vastly different performances, each event
918loop was given a number of watchers so that overall runtime is acceptable 919loop was given a number of watchers so that overall runtime is acceptable
919and similar between tested event loop (and keep them from crashing): Glib 920and similar between tested event loop (and keep them from crashing): Glib
935signal the end of this phase. 936signal the end of this phase.
936 937
937I<destroy> is the time, in microseconds, that it takes to destroy a single 938I<destroy> is the time, in microseconds, that it takes to destroy a single
938watcher. 939watcher.
939 940
940=head2 Results 941=head3 Results
941 942
942 name watchers bytes create invoke destroy comment 943 name watchers bytes create invoke destroy comment
943 EV/EV 400000 244 0.56 0.46 0.31 EV native interface 944 EV/EV 400000 244 0.56 0.46 0.31 EV native interface
944 EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers 945 EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers
945 CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal 946 CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal
949 Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour 950 Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour
950 Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers 951 Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers
951 POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event 952 POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event
952 POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select 953 POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select
953 954
954=head2 Discussion 955=head3 Discussion
955 956
956The benchmark does I<not> measure scalability of the event loop very 957The benchmark does I<not> measure scalability of the event loop very
957well. For example, a select-based event loop (such as the pure perl one) 958well. For example, a select-based event loop (such as the pure perl one)
958can never compete with an event loop that uses epoll when the number of 959can never compete with an event loop that uses epoll when the number of
959file descriptors grows high. In this benchmark, all events become ready at 960file descriptors grows high. In this benchmark, all events become ready at
1001implementation. The design of the POE adaptor class in AnyEvent can not 1002implementation. The design of the POE adaptor class in AnyEvent can not
1002really account for this, as session creation overhead is small compared 1003really account for this, as session creation overhead is small compared
1003to execution of the state machine, which is coded pretty optimally within 1004to execution of the state machine, which is coded pretty optimally within
1004L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow. 1005L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.
1005 1006
1006=head2 Summary 1007=head3 Summary
1007 1008
1008=over 4 1009=over 4
1009 1010
1010=item * Using EV through AnyEvent is faster than any other event loop 1011=item * Using EV through AnyEvent is faster than any other event loop
1011(even when used without AnyEvent), but most event loops have acceptable 1012(even when used without AnyEvent), but most event loops have acceptable
1015the actual event loop, only with extremely fast event loops such as EV 1016the actual event loop, only with extremely fast event loops such as EV
1016adds AnyEvent significant overhead. 1017adds AnyEvent significant overhead.
1017 1018
1018=item * You should avoid POE like the plague if you want performance or 1019=item * You should avoid POE like the plague if you want performance or
1019reasonable memory usage. 1020reasonable memory usage.
1021
1022=back
1023
1024=head2 BENCHMARKING THE LARGE SERVER CASE
1025
1026This benchmark atcually benchmarks the event loop itself. It works by
1027creating a number of "servers": each server consists of a socketpair, a
1028timeout watcher that gets reset on activity (but never fires), and an I/O
1029watcher waiting for input on one side of the socket. Each time the socket
1030watcher reads a byte it will write that byte to a random other "server".
1031
1032The effect is that there will be a lot of I/O watchers, only part of which
1033are active at any one point (so there is a constant number of active
1034fds for each loop iterstaion, but which fds these are is random). The
1035timeout is reset each time something is read because that reflects how
1036most timeouts work (and puts extra pressure on the event loops).
1037
1038In this benchmark, we use 10000 socketpairs (20000 sockets), of which 100
1039(1%) are active. This mirrors the activity of large servers with many
1040connections, most of which are idle during at any one point in time.
1041
1042Source code for this benchmark is found as F<eg/bench2> in the AnyEvent
1043distribution.
1044
1045=head3 Explanation of the columns
1046
1047I<sockets> is the number of sockets, and twice the number of "servers" (as
1048eahc server has a read and write socket end).
1049
1050I<create> is the time it takes to create a socketpair (which is
1051nontrivial) and two watchers: an I/O watcher and a timeout watcher.
1052
1053I<request>, the most important value, is the time it takes to handle a
1054single "request", that is, reading the token from the pipe and forwarding
1055it to another server. This includes deleteing the old timeout and creating
1056a new one with a later timeout.
1057
1058=head3 Results
1059
1060 name sockets create request
1061 EV 20000 69.01 11.16
1062 Perl 20000 75.28 112.76
1063 Event 20000 212.62 257.32
1064 Glib 20000 651.16 1896.30
1065 POE 20000 349.67 12317.24 uses POE::Loop::Event
1066
1067=head3 Discussion
1068
1069This benchmark I<does> measure scalability and overall performance of the
1070particular event loop.
1071
1072EV is again fastest. Since it is using epoll on my system, the setup time
1073is relatively high, though.
1074
1075Perl surprisingly comes second. It is much faster than the C-based event
1076loops Event and Glib.
1077
1078Event suffers from high setup time as well (look at its code and you will
1079understand why). Callback invocation also has a high overhead compared to
1080the C<< $_->() for .. >>-style loop that the Perl event loop uses. Event
1081uses select or poll in basically all documented configurations.
1082
1083Glib is hit hard by its quadratic behaviour w.r.t. many watchers. It
1084clearly fails to perform with many filehandles or in busy servers.
1085
1086POE is still completely out of the picture, taking over 1000 times as long
1087as EV, and over 100 times as long as the Perl implementation, even though
1088it uses a C-based event loop in this case.
1089
1090=head3 Summary
1091
1092=over 4
1093
1094=item * The pure perl implementation performs extremely well, considering
1095that it uses select.
1096
1097=item * Avoid Glib or POE in large projects where performance matters.
1098
1099=back
1100
1101=head2 BENCHMARKING SMALL SERVERS
1102
1103While event loops should scale (and select-based ones do not...) even to
1104large servers, most programs we (or I :) actually write have only a few
1105I/O watchers.
1106
1107In this benchmark, I use the same benchmark program as in the large server
1108case, but it uses only eight "servers", of which three are active at any
1109one time. This should reflect performance for a small server relatively
1110well.
1111
1112The columns are identical to the previous table.
1113
1114=head3 Results
1115
1116 name sockets create request
1117 EV 16 20.00 6.54
1118 Event 16 81.27 35.86
1119 Glib 16 32.63 15.48
1120 Perl 16 24.62 162.37
1121 POE 16 261.87 276.28 uses POE::Loop::Event
1122
1123=head3 Discussion
1124
1125The benchmark tries to test the performance of a typical small
1126server. While knowing how various event loops perform is interesting, keep
1127in mind that their overhead in this case is usually not as important, due
1128to the small absolute number of watchers.
1129
1130EV is again fastest.
1131
1132The C-based event loops Event and Glib come in second this time, as the
1133overhead of running an iteration is much smaller in C than in Perl (little
1134code to execute in the inner loop, and perl's function calling overhead is
1135high, and updating all the data structures is costly).
1136
1137The pure perl event loop is much slower, but still competitive.
1138
1139POE also performs much better in this case, but is is stillf ar behind the
1140others.
1141
1142=head3 Summary
1143
1144=over 4
1145
1146=item * C-based event loops perform very well with small number of
1147watchers, as the management overhead dominates.
1020 1148
1021=back 1149=back
1022 1150
1023 1151
1024=head1 FORK 1152=head1 FORK

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines