… | |
… | |
894 | }); |
894 | }); |
895 | |
895 | |
896 | $quit->wait; |
896 | $quit->wait; |
897 | |
897 | |
898 | |
898 | |
899 | =head1 BENCHMARK |
899 | =head1 BENCHMARKS |
900 | |
900 | |
901 | To give you an idea of the performance and overheads that AnyEvent adds |
901 | To give you an idea of the performance and overheads that AnyEvent adds |
902 | over the event loops themselves (and to give you an impression of the |
902 | over the event loops themselves and to give you an impression of the speed |
903 | speed of various event loops), here is a benchmark of various supported |
903 | of various event loops I prepared some benchmarks. |
904 | event models natively and with anyevent. The benchmark creates a lot of |
904 | |
905 | timers (with a zero timeout) and I/O watchers (watching STDOUT, a pty, to |
905 | =head2 BENCHMARKING ANYEVENT OVERHEAD |
|
|
906 | |
|
|
907 | Here is a benchmark of various supported event models used natively and |
|
|
908 | through anyevent. The benchmark creates a lot of timers (with a zero |
|
|
909 | timeout) and I/O watchers (watching STDOUT, a pty, to become writable, |
906 | become writable, which it is), lets them fire exactly once and destroys |
910 | which it is), lets them fire exactly once and destroys them again. |
907 | them again. |
|
|
908 | |
911 | |
909 | Rewriting the benchmark to use many different sockets instead of using |
912 | Source code for this benchmark is found as F<eg/bench> in the AnyEvent |
910 | the same filehandle for all I/O watchers results in a much longer runtime |
913 | distribution. |
911 | (socket creation is expensive), but qualitatively the same figures, so it |
|
|
912 | was not used. |
|
|
913 | |
914 | |
914 | =head2 Explanation of the columns |
915 | =head3 Explanation of the columns |
915 | |
916 | |
916 | I<watcher> is the number of event watchers created/destroyed. Since |
917 | I<watcher> is the number of event watchers created/destroyed. Since |
917 | different event models feature vastly different performances, each event |
918 | different event models feature vastly different performances, each event |
918 | loop was given a number of watchers so that overall runtime is acceptable |
919 | loop was given a number of watchers so that overall runtime is acceptable |
919 | and similar between tested event loop (and keep them from crashing): Glib |
920 | and similar between tested event loop (and keep them from crashing): Glib |
… | |
… | |
935 | signal the end of this phase. |
936 | signal the end of this phase. |
936 | |
937 | |
937 | I<destroy> is the time, in microseconds, that it takes to destroy a single |
938 | I<destroy> is the time, in microseconds, that it takes to destroy a single |
938 | watcher. |
939 | watcher. |
939 | |
940 | |
940 | =head2 Results |
941 | =head3 Results |
941 | |
942 | |
942 | name watchers bytes create invoke destroy comment |
943 | name watchers bytes create invoke destroy comment |
943 | EV/EV 400000 244 0.56 0.46 0.31 EV native interface |
944 | EV/EV 400000 244 0.56 0.46 0.31 EV native interface |
944 | EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers |
945 | EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers |
945 | CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal |
946 | CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal |
… | |
… | |
949 | Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour |
950 | Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour |
950 | Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers |
951 | Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers |
951 | POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event |
952 | POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event |
952 | POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select |
953 | POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select |
953 | |
954 | |
954 | =head2 Discussion |
955 | =head3 Discussion |
955 | |
956 | |
956 | The benchmark does I<not> measure scalability of the event loop very |
957 | The benchmark does I<not> measure scalability of the event loop very |
957 | well. For example, a select-based event loop (such as the pure perl one) |
958 | well. For example, a select-based event loop (such as the pure perl one) |
958 | can never compete with an event loop that uses epoll when the number of |
959 | can never compete with an event loop that uses epoll when the number of |
959 | file descriptors grows high. In this benchmark, all events become ready at |
960 | file descriptors grows high. In this benchmark, all events become ready at |
960 | the same time, so select/poll-based implementations get an unnatural speed |
961 | the same time, so select/poll-based implementations get an unnatural speed |
961 | boost. |
962 | boost. |
962 | |
963 | |
|
|
964 | Also, note that the number of watchers usually has a nonlinear effect on |
|
|
965 | overall speed, that is, creating twice as many watchers doesn't take twice |
|
|
966 | the time - usually it takes longer. This puts event loops tested with a |
|
|
967 | higher number of watchers at a disadvantage. |
|
|
968 | |
963 | C<EV> is the sole leader regarding speed and memory use, which are both |
969 | C<EV> is the sole leader regarding speed and memory use, which are both |
964 | maximal/minimal, respectively. Even when going through AnyEvent, it uses |
970 | maximal/minimal, respectively. Even when going through AnyEvent, it uses |
965 | far less memory than any other event loop and is still faster than Event |
971 | far less memory than any other event loop and is still faster than Event |
966 | natively. |
972 | natively. |
967 | |
973 | |
… | |
… | |
970 | interpreter and the backend itself). Nevertheless this shows that it |
976 | interpreter and the backend itself). Nevertheless this shows that it |
971 | adds very little overhead in itself. Like any select-based backend its |
977 | adds very little overhead in itself. Like any select-based backend its |
972 | performance becomes really bad with lots of file descriptors (and few of |
978 | performance becomes really bad with lots of file descriptors (and few of |
973 | them active), of course, but this was not subject of this benchmark. |
979 | them active), of course, but this was not subject of this benchmark. |
974 | |
980 | |
975 | The C<Event> module has a relatively high setup and callback invocation cost, |
981 | The C<Event> module has a relatively high setup and callback invocation |
976 | but overall scores on the third place. |
982 | cost, but overall scores in on the third place. |
977 | |
983 | |
978 | C<Glib>'s memory usage is quite a bit bit higher, but it features a |
984 | C<Glib>'s memory usage is quite a bit higher, but it features a |
979 | faster callback invocation and overall ends up in the same class as |
985 | faster callback invocation and overall ends up in the same class as |
980 | C<Event>. However, Glib scales extremely badly, doubling the number of |
986 | C<Event>. However, Glib scales extremely badly, doubling the number of |
981 | watchers increases the processing time by more than a factor of four, |
987 | watchers increases the processing time by more than a factor of four, |
982 | making it completely unusable when using larger numbers of watchers |
988 | making it completely unusable when using larger numbers of watchers |
983 | (note that only a single file descriptor was used in the benchmark, so |
989 | (note that only a single file descriptor was used in the benchmark, so |
… | |
… | |
1001 | implementation. The design of the POE adaptor class in AnyEvent can not |
1007 | implementation. The design of the POE adaptor class in AnyEvent can not |
1002 | really account for this, as session creation overhead is small compared |
1008 | really account for this, as session creation overhead is small compared |
1003 | to execution of the state machine, which is coded pretty optimally within |
1009 | to execution of the state machine, which is coded pretty optimally within |
1004 | L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow. |
1010 | L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow. |
1005 | |
1011 | |
1006 | =head2 Summary |
1012 | =head3 Summary |
1007 | |
1013 | |
1008 | =over 4 |
1014 | =over 4 |
1009 | |
1015 | |
1010 | =item * Using EV through AnyEvent is faster than any other event loop |
1016 | =item * Using EV through AnyEvent is faster than any other event loop |
1011 | (even when used without AnyEvent), but most event loops have acceptable |
1017 | (even when used without AnyEvent), but most event loops have acceptable |
… | |
… | |
1013 | |
1019 | |
1014 | =item * The overhead AnyEvent adds is usually much smaller than the overhead of |
1020 | =item * The overhead AnyEvent adds is usually much smaller than the overhead of |
1015 | the actual event loop, only with extremely fast event loops such as EV |
1021 | the actual event loop, only with extremely fast event loops such as EV |
1016 | adds AnyEvent significant overhead. |
1022 | adds AnyEvent significant overhead. |
1017 | |
1023 | |
1018 | =item * You should simply avoid POE like the plague if you want performance or |
1024 | =item * You should avoid POE like the plague if you want performance or |
1019 | reasonable memory usage. |
1025 | reasonable memory usage. |
|
|
1026 | |
|
|
1027 | =back |
|
|
1028 | |
|
|
1029 | =head2 BENCHMARKING THE LARGE SERVER CASE |
|
|
1030 | |
|
|
1031 | This benchmark atcually benchmarks the event loop itself. It works by |
|
|
1032 | creating a number of "servers": each server consists of a socketpair, a |
|
|
1033 | timeout watcher that gets reset on activity (but never fires), and an I/O |
|
|
1034 | watcher waiting for input on one side of the socket. Each time the socket |
|
|
1035 | watcher reads a byte it will write that byte to a random other "server". |
|
|
1036 | |
|
|
1037 | The effect is that there will be a lot of I/O watchers, only part of which |
|
|
1038 | are active at any one point (so there is a constant number of active |
|
|
1039 | fds for each loop iterstaion, but which fds these are is random). The |
|
|
1040 | timeout is reset each time something is read because that reflects how |
|
|
1041 | most timeouts work (and puts extra pressure on the event loops). |
|
|
1042 | |
|
|
1043 | In this benchmark, we use 10000 socketpairs (20000 sockets), of which 100 |
|
|
1044 | (1%) are active. This mirrors the activity of large servers with many |
|
|
1045 | connections, most of which are idle at any one point in time. |
|
|
1046 | |
|
|
1047 | Source code for this benchmark is found as F<eg/bench2> in the AnyEvent |
|
|
1048 | distribution. |
|
|
1049 | |
|
|
1050 | =head3 Explanation of the columns |
|
|
1051 | |
|
|
1052 | I<sockets> is the number of sockets, and twice the number of "servers" (as |
|
|
1053 | each server has a read and write socket end). |
|
|
1054 | |
|
|
1055 | I<create> is the time it takes to create a socketpair (which is |
|
|
1056 | nontrivial) and two watchers: an I/O watcher and a timeout watcher. |
|
|
1057 | |
|
|
1058 | I<request>, the most important value, is the time it takes to handle a |
|
|
1059 | single "request", that is, reading the token from the pipe and forwarding |
|
|
1060 | it to another server. This includes deleting the old timeout and creating |
|
|
1061 | a new one that moves the timeout into the future. |
|
|
1062 | |
|
|
1063 | =head3 Results |
|
|
1064 | |
|
|
1065 | name sockets create request |
|
|
1066 | EV 20000 69.01 11.16 |
|
|
1067 | Perl 20000 75.28 112.76 |
|
|
1068 | Event 20000 212.62 257.32 |
|
|
1069 | Glib 20000 651.16 1896.30 |
|
|
1070 | POE 20000 349.67 12317.24 uses POE::Loop::Event |
|
|
1071 | |
|
|
1072 | =head3 Discussion |
|
|
1073 | |
|
|
1074 | This benchmark I<does> measure scalability and overall performance of the |
|
|
1075 | particular event loop. |
|
|
1076 | |
|
|
1077 | EV is again fastest. Since it is using epoll on my system, the setup time |
|
|
1078 | is relatively high, though. |
|
|
1079 | |
|
|
1080 | Perl surprisingly comes second. It is much faster than the C-based event |
|
|
1081 | loops Event and Glib. |
|
|
1082 | |
|
|
1083 | Event suffers from high setup time as well (look at its code and you will |
|
|
1084 | understand why). Callback invocation also has a high overhead compared to |
|
|
1085 | the C<< $_->() for .. >>-style loop that the Perl event loop uses. Event |
|
|
1086 | uses select or poll in basically all documented configurations. |
|
|
1087 | |
|
|
1088 | Glib is hit hard by its quadratic behaviour w.r.t. many watchers. It |
|
|
1089 | clearly fails to perform with many filehandles or in busy servers. |
|
|
1090 | |
|
|
1091 | POE is still completely out of the picture, taking over 1000 times as long |
|
|
1092 | as EV, and over 100 times as long as the Perl implementation, even though |
|
|
1093 | it uses a C-based event loop in this case. |
|
|
1094 | |
|
|
1095 | =head3 Summary |
|
|
1096 | |
|
|
1097 | =over 4 |
|
|
1098 | |
|
|
1099 | =item * The pure perl implementation performs extremely well, considering |
|
|
1100 | that it uses select. |
|
|
1101 | |
|
|
1102 | =item * Avoid Glib or POE in large projects where performance matters. |
|
|
1103 | |
|
|
1104 | =back |
|
|
1105 | |
|
|
1106 | =head2 BENCHMARKING SMALL SERVERS |
|
|
1107 | |
|
|
1108 | While event loops should scale (and select-based ones do not...) even to |
|
|
1109 | large servers, most programs we (or I :) actually write have only a few |
|
|
1110 | I/O watchers. |
|
|
1111 | |
|
|
1112 | In this benchmark, I use the same benchmark program as in the large server |
|
|
1113 | case, but it uses only eight "servers", of which three are active at any |
|
|
1114 | one time. This should reflect performance for a small server relatively |
|
|
1115 | well. |
|
|
1116 | |
|
|
1117 | The columns are identical to the previous table. |
|
|
1118 | |
|
|
1119 | =head3 Results |
|
|
1120 | |
|
|
1121 | name sockets create request |
|
|
1122 | EV 16 20.00 6.54 |
|
|
1123 | Event 16 81.27 35.86 |
|
|
1124 | Glib 16 32.63 15.48 |
|
|
1125 | Perl 16 24.62 162.37 |
|
|
1126 | POE 16 261.87 276.28 uses POE::Loop::Event |
|
|
1127 | |
|
|
1128 | =head3 Discussion |
|
|
1129 | |
|
|
1130 | The benchmark tries to test the performance of a typical small |
|
|
1131 | server. While knowing how various event loops perform is interesting, keep |
|
|
1132 | in mind that their overhead in this case is usually not as important, due |
|
|
1133 | to the small absolute number of watchers. |
|
|
1134 | |
|
|
1135 | EV is again fastest. |
|
|
1136 | |
|
|
1137 | The C-based event loops Event and Glib come in second this time, as the |
|
|
1138 | overhead of running an iteration is much smaller in C than in Perl (little |
|
|
1139 | code to execute in the inner loop, and perl's function calling overhead is |
|
|
1140 | high, and updating all the data structures is costly). |
|
|
1141 | |
|
|
1142 | The pure perl event loop is much slower, but still competitive. |
|
|
1143 | |
|
|
1144 | POE also performs much better in this case, but is is stillf ar behind the |
|
|
1145 | others. |
|
|
1146 | |
|
|
1147 | =head3 Summary |
|
|
1148 | |
|
|
1149 | =over 4 |
|
|
1150 | |
|
|
1151 | =item * C-based event loops perform very well with small number of |
|
|
1152 | watchers, as the management overhead dominates. |
1020 | |
1153 | |
1021 | =back |
1154 | =back |
1022 | |
1155 | |
1023 | |
1156 | |
1024 | =head1 FORK |
1157 | =head1 FORK |