[ViewVC] Diff of: cvs/AnyEvent/lib/AnyEvent.pm

Comparing AnyEvent/lib/AnyEvent.pm (file contents):
Revision 1.90 by root, Fri Apr 25 14:24:29 2008 UTC vs.
Revision 1.96 by root, Sat Apr 26 11:16:16 2008 UTC

…		…
894	});	894	});
895		895
896	$quit->wait;	896	$quit->wait;
897		897
898		898
899	=head1 BENCHMARK	899	=head1 BENCHMARKS
900		900
901	To give you an idea of the performance and overheads that AnyEvent adds	901	To give you an idea of the performance and overheads that AnyEvent adds
902	over the event loops themselves (and to give you an impression of the	902	over the event loops themselves and to give you an impression of the speed
903	speed of various event loops), here is a benchmark of various supported	903	of various event loops I prepared some benchmarks.
904	event models natively and with anyevent. The benchmark creates a lot of	904
905	timers (with a zero timeout) and I/O watchers (watching STDOUT, a pty, to	905	=head2 BENCHMARKING ANYEVENT OVERHEAD
		906
		907	Here is a benchmark of various supported event models used natively and
		908	through anyevent. The benchmark creates a lot of timers (with a zero
		909	timeout) and I/O watchers (watching STDOUT, a pty, to become writable,
906	become writable, which it is), lets them fire exactly once and destroys	910	which it is), lets them fire exactly once and destroys them again.
907	them again.
908		911
909	Rewriting the benchmark to use many different sockets instead of using	912	Source code for this benchmark is found as F<eg/bench> in the AnyEvent
910	the same filehandle for all I/O watchers results in a much longer runtime	913	distribution.
911	(socket creation is expensive), but qualitatively the same figures, so it
912	was not used.
913		914
914	=head2 Explanation of the columns	915	=head3 Explanation of the columns
915		916
916	I<watcher> is the number of event watchers created/destroyed. Since	917	I<watcher> is the number of event watchers created/destroyed. Since
917	different event models feature vastly different performances, each event	918	different event models feature vastly different performances, each event
918	loop was given a number of watchers so that overall runtime is acceptable	919	loop was given a number of watchers so that overall runtime is acceptable
919	and similar between tested event loop (and keep them from crashing): Glib	920	and similar between tested event loop (and keep them from crashing): Glib
…		…
935	signal the end of this phase.	936	signal the end of this phase.
936		937
937	I<destroy> is the time, in microseconds, that it takes to destroy a single	938	I<destroy> is the time, in microseconds, that it takes to destroy a single
938	watcher.	939	watcher.
939		940
940	=head2 Results	941	=head3 Results
941		942
942	name watchers bytes create invoke destroy comment	943	name watchers bytes create invoke destroy comment
943	EV/EV 400000 244 0.56 0.46 0.31 EV native interface	944	EV/EV 400000 244 0.56 0.46 0.31 EV native interface
944	EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers	945	EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers
945	CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal	946	CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal
…		…
949	Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour	950	Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour
950	Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers	951	Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers
951	POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event	952	POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event
952	POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select	953	POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select
953		954
954	=head2 Discussion	955	=head3 Discussion
955		956
956	The benchmark does I<not> measure scalability of the event loop very	957	The benchmark does I<not> measure scalability of the event loop very
957	well. For example, a select-based event loop (such as the pure perl one)	958	well. For example, a select-based event loop (such as the pure perl one)
958	can never compete with an event loop that uses epoll when the number of	959	can never compete with an event loop that uses epoll when the number of
959	file descriptors grows high. In this benchmark, all events become ready at	960	file descriptors grows high. In this benchmark, all events become ready at
960	the same time, so select/poll-based implementations get an unnatural speed	961	the same time, so select/poll-based implementations get an unnatural speed
961	boost.	962	boost.
		963
		964	Also, note that the number of watchers usually has a nonlinear effect on
		965	overall speed, that is, creating twice as many watchers doesn't take twice
		966	the time - usually it takes longer. This puts event loops tested with a
		967	higher number of watchers at a disadvantage.
		968
		969	To put the range of results into perspective, consider that on the
		970	benchmark machine, handling an event takes roughly 1600 CPU cycles with
		971	EV, 3100 CPU cycles with AnyEvent's pure perl loop and almost 3000000 CPU
		972	cycles with POE.
962		973
963	C<EV> is the sole leader regarding speed and memory use, which are both	974	C<EV> is the sole leader regarding speed and memory use, which are both
964	maximal/minimal, respectively. Even when going through AnyEvent, it uses	975	maximal/minimal, respectively. Even when going through AnyEvent, it uses
965	far less memory than any other event loop and is still faster than Event	976	far less memory than any other event loop and is still faster than Event
966	natively.	977	natively.
…		…
1001	implementation. The design of the POE adaptor class in AnyEvent can not	1012	implementation. The design of the POE adaptor class in AnyEvent can not
1002	really account for this, as session creation overhead is small compared	1013	really account for this, as session creation overhead is small compared
1003	to execution of the state machine, which is coded pretty optimally within	1014	to execution of the state machine, which is coded pretty optimally within
1004	L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.	1015	L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.
1005		1016
1006	=head2 Summary	1017	=head3 Summary
1007		1018
1008	=over 4	1019	=over 4
1009		1020
1010	=item * Using EV through AnyEvent is faster than any other event loop	1021	=item * Using EV through AnyEvent is faster than any other event loop
1011	(even when used without AnyEvent), but most event loops have acceptable	1022	(even when used without AnyEvent), but most event loops have acceptable
…		…
1015	the actual event loop, only with extremely fast event loops such as EV	1026	the actual event loop, only with extremely fast event loops such as EV
1016	adds AnyEvent significant overhead.	1027	adds AnyEvent significant overhead.
1017		1028
1018	=item * You should avoid POE like the plague if you want performance or	1029	=item * You should avoid POE like the plague if you want performance or
1019	reasonable memory usage.	1030	reasonable memory usage.
		1031
		1032	=back
		1033
		1034	=head2 BENCHMARKING THE LARGE SERVER CASE
		1035
		1036	This benchmark atcually benchmarks the event loop itself. It works by
		1037	creating a number of "servers": each server consists of a socketpair, a
		1038	timeout watcher that gets reset on activity (but never fires), and an I/O
		1039	watcher waiting for input on one side of the socket. Each time the socket
		1040	watcher reads a byte it will write that byte to a random other "server".
		1041
		1042	The effect is that there will be a lot of I/O watchers, only part of which
		1043	are active at any one point (so there is a constant number of active
		1044	fds for each loop iterstaion, but which fds these are is random). The
		1045	timeout is reset each time something is read because that reflects how
		1046	most timeouts work (and puts extra pressure on the event loops).
		1047
		1048	In this benchmark, we use 10000 socketpairs (20000 sockets), of which 100
		1049	(1%) are active. This mirrors the activity of large servers with many
		1050	connections, most of which are idle at any one point in time.
		1051
		1052	Source code for this benchmark is found as F<eg/bench2> in the AnyEvent
		1053	distribution.
		1054
		1055	=head3 Explanation of the columns
		1056
		1057	I<sockets> is the number of sockets, and twice the number of "servers" (as
		1058	each server has a read and write socket end).
		1059
		1060	I<create> is the time it takes to create a socketpair (which is
		1061	nontrivial) and two watchers: an I/O watcher and a timeout watcher.
		1062
		1063	I<request>, the most important value, is the time it takes to handle a
		1064	single "request", that is, reading the token from the pipe and forwarding
		1065	it to another server. This includes deleting the old timeout and creating
		1066	a new one that moves the timeout into the future.
		1067
		1068	=head3 Results
		1069
		1070	name sockets create request
		1071	EV 20000 69.01 11.16
		1072	Perl 20000 75.28 112.76
		1073	Event 20000 212.62 257.32
		1074	Glib 20000 651.16 1896.30
		1075	POE 20000 349.67 12317.24 uses POE::Loop::Event
		1076
		1077	=head3 Discussion
		1078
		1079	This benchmark I<does> measure scalability and overall performance of the
		1080	particular event loop.
		1081
		1082	EV is again fastest. Since it is using epoll on my system, the setup time
		1083	is relatively high, though.
		1084
		1085	Perl surprisingly comes second. It is much faster than the C-based event
		1086	loops Event and Glib.
		1087
		1088	Event suffers from high setup time as well (look at its code and you will
		1089	understand why). Callback invocation also has a high overhead compared to
		1090	the C<< $_->() for .. >>-style loop that the Perl event loop uses. Event
		1091	uses select or poll in basically all documented configurations.
		1092
		1093	Glib is hit hard by its quadratic behaviour w.r.t. many watchers. It
		1094	clearly fails to perform with many filehandles or in busy servers.
		1095
		1096	POE is still completely out of the picture, taking over 1000 times as long
		1097	as EV, and over 100 times as long as the Perl implementation, even though
		1098	it uses a C-based event loop in this case.
		1099
		1100	=head3 Summary
		1101
		1102	=over 4
		1103
		1104	=item * The pure perl implementation performs extremely well, considering
		1105	that it uses select.
		1106
		1107	=item * Avoid Glib or POE in large projects where performance matters.
		1108
		1109	=back
		1110
		1111	=head2 BENCHMARKING SMALL SERVERS
		1112
		1113	While event loops should scale (and select-based ones do not...) even to
		1114	large servers, most programs we (or I :) actually write have only a few
		1115	I/O watchers.
		1116
		1117	In this benchmark, I use the same benchmark program as in the large server
		1118	case, but it uses only eight "servers", of which three are active at any
		1119	one time. This should reflect performance for a small server relatively
		1120	well.
		1121
		1122	The columns are identical to the previous table.
		1123
		1124	=head3 Results
		1125
		1126	name sockets create request
		1127	EV 16 20.00 6.54
		1128	Event 16 81.27 35.86
		1129	Glib 16 32.63 15.48
		1130	Perl 16 24.62 162.37
		1131	POE 16 261.87 276.28 uses POE::Loop::Event
		1132
		1133	=head3 Discussion
		1134
		1135	The benchmark tries to test the performance of a typical small
		1136	server. While knowing how various event loops perform is interesting, keep
		1137	in mind that their overhead in this case is usually not as important, due
		1138	to the small absolute number of watchers.
		1139
		1140	EV is again fastest.
		1141
		1142	The C-based event loops Event and Glib come in second this time, as the
		1143	overhead of running an iteration is much smaller in C than in Perl (little
		1144	code to execute in the inner loop, and perl's function calling overhead is
		1145	high, and updating all the data structures is costly).
		1146
		1147	The pure perl event loop is much slower, but still competitive.
		1148
		1149	POE also performs much better in this case, but is is stillf ar behind the
		1150	others.
		1151
		1152	=head3 Summary
		1153
		1154	=over 4
		1155
		1156	=item * C-based event loops perform very well with small number of
		1157	watchers, as the management overhead dominates.
1020		1158
1021	=back	1159	=back
1022		1160
1023		1161
1024	=head1 FORK	1162	=head1 FORK

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing AnyEvent/lib/AnyEvent.pm (file contents): Revision 1.90 by root, Fri Apr 25 14:24:29 2008 UTC vs. Revision 1.96 by root, Sat Apr 26 11:16:16 2008 UTC

Diff Legend

Comparing AnyEvent/lib/AnyEvent.pm (file contents):
Revision 1.90 by root, Fri Apr 25 14:24:29 2008 UTC vs.
Revision 1.96 by root, Sat Apr 26 11:16:16 2008 UTC