… | |
… | |
141 | =head2 I/O WATCHERS |
141 | =head2 I/O WATCHERS |
142 | |
142 | |
143 | You can create an I/O watcher by calling the C<< AnyEvent->io >> method |
143 | You can create an I/O watcher by calling the C<< AnyEvent->io >> method |
144 | with the following mandatory key-value pairs as arguments: |
144 | with the following mandatory key-value pairs as arguments: |
145 | |
145 | |
146 | C<fh> the Perl I<file handle> (I<not> file descriptor) to watch for |
146 | C<fh> the Perl I<file handle> (I<not> file descriptor) to watch |
147 | events. C<poll> must be a string that is either C<r> or C<w>, which |
147 | for events. C<poll> must be a string that is either C<r> or C<w>, |
148 | creates a watcher waiting for "r"eadable or "w"ritable events, |
148 | which creates a watcher waiting for "r"eadable or "w"ritable events, |
149 | respectively. C<cb> is the callback to invoke each time the file handle |
149 | respectively. C<cb> is the callback to invoke each time the file handle |
150 | becomes ready. |
150 | becomes ready. |
151 | |
151 | |
|
|
152 | Although the callback might get passed parameters, their value and |
|
|
153 | presence is undefined and you cannot rely on them. Portable AnyEvent |
|
|
154 | callbacks cannot use arguments passed to I/O watcher callbacks. |
|
|
155 | |
152 | The I/O watcher might use the underlying file descriptor or a copy of it. |
156 | The I/O watcher might use the underlying file descriptor or a copy of it. |
153 | It is not allowed to close a file handle as long as any watcher is active |
157 | You must not close a file handle as long as any watcher is active on the |
154 | on the underlying file descriptor. |
158 | underlying file descriptor. |
155 | |
159 | |
156 | Some event loops issue spurious readyness notifications, so you should |
160 | Some event loops issue spurious readyness notifications, so you should |
157 | always use non-blocking calls when reading/writing from/to your file |
161 | always use non-blocking calls when reading/writing from/to your file |
158 | handles. |
162 | handles. |
159 | |
163 | |
… | |
… | |
170 | |
174 | |
171 | You can create a time watcher by calling the C<< AnyEvent->timer >> |
175 | You can create a time watcher by calling the C<< AnyEvent->timer >> |
172 | method with the following mandatory arguments: |
176 | method with the following mandatory arguments: |
173 | |
177 | |
174 | C<after> specifies after how many seconds (fractional values are |
178 | C<after> specifies after how many seconds (fractional values are |
175 | supported) should the timer activate. C<cb> the callback to invoke in that |
179 | supported) the callback should be invoked. C<cb> is the callback to invoke |
176 | case. |
180 | in that case. |
|
|
181 | |
|
|
182 | Although the callback might get passed parameters, their value and |
|
|
183 | presence is undefined and you cannot rely on them. Portable AnyEvent |
|
|
184 | callbacks cannot use arguments passed to time watcher callbacks. |
177 | |
185 | |
178 | The timer callback will be invoked at most once: if you want a repeating |
186 | The timer callback will be invoked at most once: if you want a repeating |
179 | timer you have to create a new watcher (this is a limitation by both Tk |
187 | timer you have to create a new watcher (this is a limitation by both Tk |
180 | and Glib). |
188 | and Glib). |
181 | |
189 | |
… | |
… | |
226 | |
234 | |
227 | You can watch for signals using a signal watcher, C<signal> is the signal |
235 | You can watch for signals using a signal watcher, C<signal> is the signal |
228 | I<name> without any C<SIG> prefix, C<cb> is the Perl callback to |
236 | I<name> without any C<SIG> prefix, C<cb> is the Perl callback to |
229 | be invoked whenever a signal occurs. |
237 | be invoked whenever a signal occurs. |
230 | |
238 | |
|
|
239 | Although the callback might get passed parameters, their value and |
|
|
240 | presence is undefined and you cannot rely on them. Portable AnyEvent |
|
|
241 | callbacks cannot use arguments passed to signal watcher callbacks. |
|
|
242 | |
231 | Multiple signal occurances can be clumped together into one callback |
243 | Multiple signal occurances can be clumped together into one callback |
232 | invocation, and callback invocation will be synchronous. synchronous means |
244 | invocation, and callback invocation will be synchronous. synchronous means |
233 | that it might take a while until the signal gets handled by the process, |
245 | that it might take a while until the signal gets handled by the process, |
234 | but it is guarenteed not to interrupt any other callbacks. |
246 | but it is guarenteed not to interrupt any other callbacks. |
235 | |
247 | |
… | |
… | |
249 | |
261 | |
250 | The child process is specified by the C<pid> argument (if set to C<0>, it |
262 | The child process is specified by the C<pid> argument (if set to C<0>, it |
251 | watches for any child process exit). The watcher will trigger as often |
263 | watches for any child process exit). The watcher will trigger as often |
252 | as status change for the child are received. This works by installing a |
264 | as status change for the child are received. This works by installing a |
253 | signal handler for C<SIGCHLD>. The callback will be called with the pid |
265 | signal handler for C<SIGCHLD>. The callback will be called with the pid |
254 | and exit status (as returned by waitpid). |
266 | and exit status (as returned by waitpid), so unlike other watcher types, |
|
|
267 | you I<can> rely on child watcher callback arguments. |
255 | |
268 | |
256 | There is a slight catch to child watchers, however: you usually start them |
269 | There is a slight catch to child watchers, however: you usually start them |
257 | I<after> the child process was created, and this means the process could |
270 | I<after> the child process was created, and this means the process could |
258 | have exited already (and no SIGCHLD will be sent anymore). |
271 | have exited already (and no SIGCHLD will be sent anymore). |
259 | |
272 | |
… | |
… | |
881 | }); |
894 | }); |
882 | |
895 | |
883 | $quit->wait; |
896 | $quit->wait; |
884 | |
897 | |
885 | |
898 | |
886 | =head1 BENCHMARK |
899 | =head1 BENCHMARKS |
887 | |
900 | |
888 | To give you an idea of the performance and overheads that AnyEvent adds |
901 | To give you an idea of the performance and overheads that AnyEvent adds |
889 | over the event loops themselves (and to give you an impression of the |
902 | over the event loops themselves and to give you an impression of the speed |
890 | speed of various event loops), here is a benchmark of various supported |
903 | of various event loops I prepared some benchmarks. |
891 | event models natively and with anyevent. The benchmark creates a lot of |
904 | |
892 | timers (with a zero timeout) and I/O watchers (watching STDOUT, a pty, to |
905 | =head2 BENCHMARKING ANYEVENT OVERHEAD |
|
|
906 | |
|
|
907 | Here is a benchmark of various supported event models used natively and |
|
|
908 | through anyevent. The benchmark creates a lot of timers (with a zero |
|
|
909 | timeout) and I/O watchers (watching STDOUT, a pty, to become writable, |
893 | become writable, which it is), lets them fire exactly once and destroys |
910 | which it is), lets them fire exactly once and destroys them again. |
894 | them again. |
|
|
895 | |
911 | |
896 | Rewriting the benchmark to use many different sockets instead of using |
912 | Source code for this benchmark is found as F<eg/bench> in the AnyEvent |
897 | the same filehandle for all I/O watchers results in a much longer runtime |
913 | distribution. |
898 | (socket creation is expensive), but qualitatively the same figures, so it |
|
|
899 | was not used. |
|
|
900 | |
914 | |
901 | =head2 Explanation of the columns |
915 | =head3 Explanation of the columns |
902 | |
916 | |
903 | I<watcher> is the number of event watchers created/destroyed. Since |
917 | I<watcher> is the number of event watchers created/destroyed. Since |
904 | different event models feature vastly different performances, each event |
918 | different event models feature vastly different performances, each event |
905 | loop was given a number of watchers so that overall runtime is acceptable |
919 | loop was given a number of watchers so that overall runtime is acceptable |
906 | and similar between tested event loop (and keep them from crashing): Glib |
920 | and similar between tested event loop (and keep them from crashing): Glib |
… | |
… | |
922 | signal the end of this phase. |
936 | signal the end of this phase. |
923 | |
937 | |
924 | I<destroy> is the time, in microseconds, that it takes to destroy a single |
938 | I<destroy> is the time, in microseconds, that it takes to destroy a single |
925 | watcher. |
939 | watcher. |
926 | |
940 | |
927 | =head2 Results |
941 | =head3 Results |
928 | |
942 | |
929 | name watchers bytes create invoke destroy comment |
943 | name watchers bytes create invoke destroy comment |
930 | EV/EV 400000 244 0.56 0.46 0.31 EV native interface |
944 | EV/EV 400000 244 0.56 0.46 0.31 EV native interface |
931 | EV/Any 100000 610 3.52 0.91 0.75 EV + AnyEvent watchers |
945 | EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers |
932 | CoroEV/Any 100000 610 3.49 0.92 0.75 coroutines + Coro::Signal |
946 | CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal |
933 | Perl/Any 100000 513 4.91 0.92 1.15 pure perl implementation |
947 | Perl/Any 100000 513 4.92 0.87 1.12 pure perl implementation |
934 | Event/Event 16000 523 28.05 21.38 0.86 Event native interface |
948 | Event/Event 16000 516 31.88 31.30 0.85 Event native interface |
935 | Event/Any 16000 943 34.43 20.48 1.39 Event + AnyEvent watchers |
949 | Event/Any 16000 936 39.17 33.63 1.43 Event + AnyEvent watchers |
936 | Glib/Any 16000 1357 96.99 12.55 55.51 quadratic behaviour |
950 | Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour |
937 | Tk/Any 2000 1855 27.01 66.61 14.03 SEGV with >> 2000 watchers |
951 | Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers |
938 | POE/Event 2000 6644 108.15 768.19 14.33 via POE::Loop::Event |
952 | POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event |
939 | POE/Select 2000 6343 94.69 807.65 562.69 via POE::Loop::Select |
953 | POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select |
940 | |
954 | |
941 | =head2 Discussion |
955 | =head3 Discussion |
942 | |
956 | |
943 | The benchmark does I<not> measure scalability of the event loop very |
957 | The benchmark does I<not> measure scalability of the event loop very |
944 | well. For example, a select-based event loop (such as the pure perl one) |
958 | well. For example, a select-based event loop (such as the pure perl one) |
945 | can never compete with an event loop that uses epoll when the number of |
959 | can never compete with an event loop that uses epoll when the number of |
946 | file descriptors grows high. In this benchmark, all events become ready at |
960 | file descriptors grows high. In this benchmark, all events become ready at |
947 | the same time, so select/poll-based implementations get an unnatural speed |
961 | the same time, so select/poll-based implementations get an unnatural speed |
948 | boost. |
962 | boost. |
949 | |
963 | |
950 | C<EV> is the sole leader regarding speed and memory use, which are both |
964 | C<EV> is the sole leader regarding speed and memory use, which are both |
951 | maximal/minimal, respectively. Even when going through AnyEvent, there are |
965 | maximal/minimal, respectively. Even when going through AnyEvent, it uses |
952 | only two event loops that use slightly less memory (the C<Event> module |
966 | far less memory than any other event loop and is still faster than Event |
953 | natively and the pure perl backend), and no faster event models, not even |
967 | natively. |
954 | C<Event> natively. |
|
|
955 | |
968 | |
956 | The pure perl implementation is hit in a few sweet spots (both the |
969 | The pure perl implementation is hit in a few sweet spots (both the |
957 | zero timeout and the use of a single fd hit optimisations in the perl |
970 | constant timeout and the use of a single fd hit optimisations in the perl |
958 | interpreter and the backend itself, and all watchers become ready at the |
971 | interpreter and the backend itself). Nevertheless this shows that it |
959 | same time). Nevertheless this shows that it adds very little overhead in |
972 | adds very little overhead in itself. Like any select-based backend its |
960 | itself. Like any select-based backend its performance becomes really bad |
973 | performance becomes really bad with lots of file descriptors (and few of |
961 | with lots of file descriptors (and few of them active), of course, but |
974 | them active), of course, but this was not subject of this benchmark. |
962 | this was not subject of this benchmark. |
|
|
963 | |
975 | |
964 | The C<Event> module has a relatively high setup and callback invocation cost, |
976 | The C<Event> module has a relatively high setup and callback invocation |
965 | but overall scores on the third place. |
977 | cost, but overall scores in on the third place. |
966 | |
978 | |
967 | C<Glib>'s memory usage is quite a bit bit higher, but it features a |
979 | C<Glib>'s memory usage is quite a bit higher, but it features a |
968 | faster callback invocation and overall ends up in the same class as |
980 | faster callback invocation and overall ends up in the same class as |
969 | C<Event>. However, Glib scales extremely badly, doubling the number of |
981 | C<Event>. However, Glib scales extremely badly, doubling the number of |
970 | watchers increases the processing time by more than a factor of four, |
982 | watchers increases the processing time by more than a factor of four, |
971 | making it completely unusable when using larger numbers of watchers |
983 | making it completely unusable when using larger numbers of watchers |
972 | (note that only a single file descriptor was used in the benchmark, so |
984 | (note that only a single file descriptor was used in the benchmark, so |
… | |
… | |
975 | The C<Tk> adaptor works relatively well. The fact that it crashes with |
987 | The C<Tk> adaptor works relatively well. The fact that it crashes with |
976 | more than 2000 watchers is a big setback, however, as correctness takes |
988 | more than 2000 watchers is a big setback, however, as correctness takes |
977 | precedence over speed. Nevertheless, its performance is surprising, as the |
989 | precedence over speed. Nevertheless, its performance is surprising, as the |
978 | file descriptor is dup()ed for each watcher. This shows that the dup() |
990 | file descriptor is dup()ed for each watcher. This shows that the dup() |
979 | employed by some adaptors is not a big performance issue (it does incur a |
991 | employed by some adaptors is not a big performance issue (it does incur a |
980 | hidden memory cost inside the kernel, though, that is not reflected in the |
992 | hidden memory cost inside the kernel which is not reflected in the figures |
981 | figures above). |
993 | above). |
982 | |
994 | |
983 | C<POE>, regardless of underlying event loop (wether using its pure perl |
995 | C<POE>, regardless of underlying event loop (whether using its pure |
984 | select-based backend or the Event module) shows abysmal performance and |
996 | perl select-based backend or the Event module, the POE-EV backend |
|
|
997 | couldn't be tested because it wasn't working) shows abysmal performance |
985 | memory usage: Watchers use almost 30 times as much memory as EV watchers, |
998 | and memory usage: Watchers use almost 30 times as much memory as |
986 | and 10 times as much memory as both Event or EV via AnyEvent. Watcher |
999 | EV watchers, and 10 times as much memory as Event (the high memory |
|
|
1000 | requirements are caused by requiring a session for each watcher). Watcher |
987 | invocation is almost 900 times slower than with AnyEvent's pure perl |
1001 | invocation speed is almost 900 times slower than with AnyEvent's pure perl |
988 | implementation. The design of the POE adaptor class in AnyEvent can not |
1002 | implementation. The design of the POE adaptor class in AnyEvent can not |
989 | really account for this, as session creation overhead is small compared |
1003 | really account for this, as session creation overhead is small compared |
990 | to execution of the state machine, which is coded pretty optimally within |
1004 | to execution of the state machine, which is coded pretty optimally within |
991 | L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow. |
1005 | L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow. |
992 | |
1006 | |
993 | =head2 Summary |
1007 | =head3 Summary |
994 | |
1008 | |
|
|
1009 | =over 4 |
|
|
1010 | |
995 | Using EV through AnyEvent is faster than any other event loop, but most |
1011 | =item * Using EV through AnyEvent is faster than any other event loop |
996 | event loops have acceptable performance with or without AnyEvent. |
1012 | (even when used without AnyEvent), but most event loops have acceptable |
|
|
1013 | performance with or without AnyEvent. |
997 | |
1014 | |
998 | The overhead AnyEvent adds is usually much smaller than the overhead of |
1015 | =item * The overhead AnyEvent adds is usually much smaller than the overhead of |
999 | the actual event loop, only with extremely fast event loops such as the EV |
1016 | the actual event loop, only with extremely fast event loops such as EV |
1000 | adds AnyEvent significant overhead. |
1017 | adds AnyEvent significant overhead. |
1001 | |
1018 | |
1002 | And you should simply avoid POE like the plague if you want performance or |
1019 | =item * You should avoid POE like the plague if you want performance or |
1003 | reasonable memory usage. |
1020 | reasonable memory usage. |
|
|
1021 | |
|
|
1022 | =back |
|
|
1023 | |
|
|
1024 | =head2 BENCHMARKING THE LARGE SERVER CASE |
|
|
1025 | |
|
|
1026 | This benchmark atcually benchmarks the event loop itself. It works by |
|
|
1027 | creating a number of "servers": each server consists of a socketpair, a |
|
|
1028 | timeout watcher that gets reset on activity (but never fires), and an I/O |
|
|
1029 | watcher waiting for input on one side of the socket. Each time the socket |
|
|
1030 | watcher reads a byte it will write that byte to a random other "server". |
|
|
1031 | |
|
|
1032 | The effect is that there will be a lot of I/O watchers, only part of which |
|
|
1033 | are active at any one point (so there is a constant number of active |
|
|
1034 | fds for each loop iterstaion, but which fds these are is random). The |
|
|
1035 | timeout is reset each time something is read because that reflects how |
|
|
1036 | most timeouts work (and puts extra pressure on the event loops). |
|
|
1037 | |
|
|
1038 | In this benchmark, we use 10000 socketpairs (20000 sockets), of which 100 |
|
|
1039 | (1%) are active. This mirrors the activity of large servers with many |
|
|
1040 | connections, most of which are idle at any one point in time. |
|
|
1041 | |
|
|
1042 | Source code for this benchmark is found as F<eg/bench2> in the AnyEvent |
|
|
1043 | distribution. |
|
|
1044 | |
|
|
1045 | =head3 Explanation of the columns |
|
|
1046 | |
|
|
1047 | I<sockets> is the number of sockets, and twice the number of "servers" (as |
|
|
1048 | eahc server has a read and write socket end). |
|
|
1049 | |
|
|
1050 | I<create> is the time it takes to create a socketpair (which is |
|
|
1051 | nontrivial) and two watchers: an I/O watcher and a timeout watcher. |
|
|
1052 | |
|
|
1053 | I<request>, the most important value, is the time it takes to handle a |
|
|
1054 | single "request", that is, reading the token from the pipe and forwarding |
|
|
1055 | it to another server. This includes deleting the old timeout and creating |
|
|
1056 | a new one that moves the timeout into the future. |
|
|
1057 | |
|
|
1058 | =head3 Results |
|
|
1059 | |
|
|
1060 | name sockets create request |
|
|
1061 | EV 20000 69.01 11.16 |
|
|
1062 | Perl 20000 75.28 112.76 |
|
|
1063 | Event 20000 212.62 257.32 |
|
|
1064 | Glib 20000 651.16 1896.30 |
|
|
1065 | POE 20000 349.67 12317.24 uses POE::Loop::Event |
|
|
1066 | |
|
|
1067 | =head3 Discussion |
|
|
1068 | |
|
|
1069 | This benchmark I<does> measure scalability and overall performance of the |
|
|
1070 | particular event loop. |
|
|
1071 | |
|
|
1072 | EV is again fastest. Since it is using epoll on my system, the setup time |
|
|
1073 | is relatively high, though. |
|
|
1074 | |
|
|
1075 | Perl surprisingly comes second. It is much faster than the C-based event |
|
|
1076 | loops Event and Glib. |
|
|
1077 | |
|
|
1078 | Event suffers from high setup time as well (look at its code and you will |
|
|
1079 | understand why). Callback invocation also has a high overhead compared to |
|
|
1080 | the C<< $_->() for .. >>-style loop that the Perl event loop uses. Event |
|
|
1081 | uses select or poll in basically all documented configurations. |
|
|
1082 | |
|
|
1083 | Glib is hit hard by its quadratic behaviour w.r.t. many watchers. It |
|
|
1084 | clearly fails to perform with many filehandles or in busy servers. |
|
|
1085 | |
|
|
1086 | POE is still completely out of the picture, taking over 1000 times as long |
|
|
1087 | as EV, and over 100 times as long as the Perl implementation, even though |
|
|
1088 | it uses a C-based event loop in this case. |
|
|
1089 | |
|
|
1090 | =head3 Summary |
|
|
1091 | |
|
|
1092 | =over 4 |
|
|
1093 | |
|
|
1094 | =item * The pure perl implementation performs extremely well, considering |
|
|
1095 | that it uses select. |
|
|
1096 | |
|
|
1097 | =item * Avoid Glib or POE in large projects where performance matters. |
|
|
1098 | |
|
|
1099 | =back |
|
|
1100 | |
|
|
1101 | =head2 BENCHMARKING SMALL SERVERS |
|
|
1102 | |
|
|
1103 | While event loops should scale (and select-based ones do not...) even to |
|
|
1104 | large servers, most programs we (or I :) actually write have only a few |
|
|
1105 | I/O watchers. |
|
|
1106 | |
|
|
1107 | In this benchmark, I use the same benchmark program as in the large server |
|
|
1108 | case, but it uses only eight "servers", of which three are active at any |
|
|
1109 | one time. This should reflect performance for a small server relatively |
|
|
1110 | well. |
|
|
1111 | |
|
|
1112 | The columns are identical to the previous table. |
|
|
1113 | |
|
|
1114 | =head3 Results |
|
|
1115 | |
|
|
1116 | name sockets create request |
|
|
1117 | EV 16 20.00 6.54 |
|
|
1118 | Event 16 81.27 35.86 |
|
|
1119 | Glib 16 32.63 15.48 |
|
|
1120 | Perl 16 24.62 162.37 |
|
|
1121 | POE 16 261.87 276.28 uses POE::Loop::Event |
|
|
1122 | |
|
|
1123 | =head3 Discussion |
|
|
1124 | |
|
|
1125 | The benchmark tries to test the performance of a typical small |
|
|
1126 | server. While knowing how various event loops perform is interesting, keep |
|
|
1127 | in mind that their overhead in this case is usually not as important, due |
|
|
1128 | to the small absolute number of watchers. |
|
|
1129 | |
|
|
1130 | EV is again fastest. |
|
|
1131 | |
|
|
1132 | The C-based event loops Event and Glib come in second this time, as the |
|
|
1133 | overhead of running an iteration is much smaller in C than in Perl (little |
|
|
1134 | code to execute in the inner loop, and perl's function calling overhead is |
|
|
1135 | high, and updating all the data structures is costly). |
|
|
1136 | |
|
|
1137 | The pure perl event loop is much slower, but still competitive. |
|
|
1138 | |
|
|
1139 | POE also performs much better in this case, but is is stillf ar behind the |
|
|
1140 | others. |
|
|
1141 | |
|
|
1142 | =head3 Summary |
|
|
1143 | |
|
|
1144 | =over 4 |
|
|
1145 | |
|
|
1146 | =item * C-based event loops perform very well with small number of |
|
|
1147 | watchers, as the management overhead dominates. |
|
|
1148 | |
|
|
1149 | =back |
1004 | |
1150 | |
1005 | |
1151 | |
1006 | =head1 FORK |
1152 | =head1 FORK |
1007 | |
1153 | |
1008 | Most event libraries are not fork-safe. The ones who are usually are |
1154 | Most event libraries are not fork-safe. The ones who are usually are |