[ViewVC] Diff of: cvs/AnyEvent/lib/AnyEvent.pm

Comparing AnyEvent/lib/AnyEvent.pm (file contents):
Revision 1.85 by root, Fri Apr 25 13:51:32 2008 UTC vs.
Revision 1.94 by root, Sat Apr 26 04:33:51 2008 UTC

…		…
894	});	894	});
895		895
896	$quit->wait;	896	$quit->wait;
897		897
898		898
899	=head1 BENCHMARK	899	=head1 BENCHMARKS
900		900
901	To give you an idea of the performance and overheads that AnyEvent adds	901	To give you an idea of the performance and overheads that AnyEvent adds
902	over the event loops themselves (and to give you an impression of the	902	over the event loops themselves and to give you an impression of the speed
903	speed of various event loops), here is a benchmark of various supported	903	of various event loops I prepared some benchmarks.
904	event models natively and with anyevent. The benchmark creates a lot of	904
905	timers (with a zero timeout) and I/O watchers (watching STDOUT, a pty, to	905	=head2 BENCHMARKING ANYEVENT OVERHEAD
		906
		907	Here is a benchmark of various supported event models used natively and
		908	through anyevent. The benchmark creates a lot of timers (with a zero
		909	timeout) and I/O watchers (watching STDOUT, a pty, to become writable,
906	become writable, which it is), lets them fire exactly once and destroys	910	which it is), lets them fire exactly once and destroys them again.
907	them again.
908		911
909	Rewriting the benchmark to use many different sockets instead of using	912	Source code for this benchmark is found as F<eg/bench> in the AnyEvent
910	the same filehandle for all I/O watchers results in a much longer runtime	913	distribution.
911	(socket creation is expensive), but qualitatively the same figures, so it
912	was not used.
913		914
914	=head2 Explanation of the columns	915	=head3 Explanation of the columns
915		916
916	I<watcher> is the number of event watchers created/destroyed. Since	917	I<watcher> is the number of event watchers created/destroyed. Since
917	different event models feature vastly different performances, each event	918	different event models feature vastly different performances, each event
918	loop was given a number of watchers so that overall runtime is acceptable	919	loop was given a number of watchers so that overall runtime is acceptable
919	and similar between tested event loop (and keep them from crashing): Glib	920	and similar between tested event loop (and keep them from crashing): Glib
…		…
935	signal the end of this phase.	936	signal the end of this phase.
936		937
937	I<destroy> is the time, in microseconds, that it takes to destroy a single	938	I<destroy> is the time, in microseconds, that it takes to destroy a single
938	watcher.	939	watcher.
939		940
940	=head2 Results	941	=head3 Results
941		942
942	name watchers bytes create invoke destroy comment	943	name watchers bytes create invoke destroy comment
943	EV/EV 400000 244 0.56 0.46 0.31 EV native interface	944	EV/EV 400000 244 0.56 0.46 0.31 EV native interface
944	EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers	945	EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers
945	CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal	946	CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal
…		…
949	Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour	950	Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour
950	Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers	951	Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers
951	POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event	952	POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event
952	POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select	953	POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select
953		954
954	=head2 Discussion	955	=head3 Discussion
955		956
956	The benchmark does I<not> measure scalability of the event loop very	957	The benchmark does I<not> measure scalability of the event loop very
957	well. For example, a select-based event loop (such as the pure perl one)	958	well. For example, a select-based event loop (such as the pure perl one)
958	can never compete with an event loop that uses epoll when the number of	959	can never compete with an event loop that uses epoll when the number of
959	file descriptors grows high. In this benchmark, all events become ready at	960	file descriptors grows high. In this benchmark, all events become ready at
…		…
964	maximal/minimal, respectively. Even when going through AnyEvent, it uses	965	maximal/minimal, respectively. Even when going through AnyEvent, it uses
965	far less memory than any other event loop and is still faster than Event	966	far less memory than any other event loop and is still faster than Event
966	natively.	967	natively.
967		968
968	The pure perl implementation is hit in a few sweet spots (both the	969	The pure perl implementation is hit in a few sweet spots (both the
969	zero timeout and the use of a single fd hit optimisations in the perl	970	constant timeout and the use of a single fd hit optimisations in the perl
970	interpreter and the backend itself, and all watchers become ready at the	971	interpreter and the backend itself). Nevertheless this shows that it
971	same time). Nevertheless this shows that it adds very little overhead in	972	adds very little overhead in itself. Like any select-based backend its
972	itself. Like any select-based backend its performance becomes really bad	973	performance becomes really bad with lots of file descriptors (and few of
973	with lots of file descriptors (and few of them active), of course, but	974	them active), of course, but this was not subject of this benchmark.
974	this was not subject of this benchmark.
975		975
976	The C<Event> module has a relatively high setup and callback invocation cost,	976	The C<Event> module has a relatively high setup and callback invocation
977	but overall scores on the third place.	977	cost, but overall scores in on the third place.
978		978
979	C<Glib>'s memory usage is quite a bit bit higher, but it features a	979	C<Glib>'s memory usage is quite a bit higher, but it features a
980	faster callback invocation and overall ends up in the same class as	980	faster callback invocation and overall ends up in the same class as
981	C<Event>. However, Glib scales extremely badly, doubling the number of	981	C<Event>. However, Glib scales extremely badly, doubling the number of
982	watchers increases the processing time by more than a factor of four,	982	watchers increases the processing time by more than a factor of four,
983	making it completely unusable when using larger numbers of watchers	983	making it completely unusable when using larger numbers of watchers
984	(note that only a single file descriptor was used in the benchmark, so	984	(note that only a single file descriptor was used in the benchmark, so
…		…
987	The C<Tk> adaptor works relatively well. The fact that it crashes with	987	The C<Tk> adaptor works relatively well. The fact that it crashes with
988	more than 2000 watchers is a big setback, however, as correctness takes	988	more than 2000 watchers is a big setback, however, as correctness takes
989	precedence over speed. Nevertheless, its performance is surprising, as the	989	precedence over speed. Nevertheless, its performance is surprising, as the
990	file descriptor is dup()ed for each watcher. This shows that the dup()	990	file descriptor is dup()ed for each watcher. This shows that the dup()
991	employed by some adaptors is not a big performance issue (it does incur a	991	employed by some adaptors is not a big performance issue (it does incur a
992	hidden memory cost inside the kernel, though, that is not reflected in the	992	hidden memory cost inside the kernel which is not reflected in the figures
993	figures above).	993	above).
994		994
995	C<POE>, regardless of underlying event loop (wether using its pure perl	995	C<POE>, regardless of underlying event loop (whether using its pure
996	select-based backend or the Event module) shows abysmal performance and	996	perl select-based backend or the Event module, the POE-EV backend
		997	couldn't be tested because it wasn't working) shows abysmal performance
997	memory usage: Watchers use almost 30 times as much memory as EV watchers,	998	and memory usage: Watchers use almost 30 times as much memory as
998	and 10 times as much memory as both Event or EV via AnyEvent. Watcher	999	EV watchers, and 10 times as much memory as Event (the high memory
		1000	requirements are caused by requiring a session for each watcher). Watcher
999	invocation is almost 900 times slower than with AnyEvent's pure perl	1001	invocation speed is almost 900 times slower than with AnyEvent's pure perl
1000	implementation. The design of the POE adaptor class in AnyEvent can not	1002	implementation. The design of the POE adaptor class in AnyEvent can not
1001	really account for this, as session creation overhead is small compared	1003	really account for this, as session creation overhead is small compared
1002	to execution of the state machine, which is coded pretty optimally within	1004	to execution of the state machine, which is coded pretty optimally within
1003	L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.	1005	L<AnyEvent::Impl::POE>. POE simply seems to be abysmally slow.
1004		1006
1005	=head2 Summary	1007	=head3 Summary
1006		1008
		1009	=over 4
		1010
1007	Using EV through AnyEvent is faster than any other event loop, but most	1011	=item * Using EV through AnyEvent is faster than any other event loop
1008	event loops have acceptable performance with or without AnyEvent.	1012	(even when used without AnyEvent), but most event loops have acceptable
		1013	performance with or without AnyEvent.
1009		1014
1010	The overhead AnyEvent adds is usually much smaller than the overhead of	1015	=item * The overhead AnyEvent adds is usually much smaller than the overhead of
1011	the actual event loop, only with extremely fast event loops such as the EV	1016	the actual event loop, only with extremely fast event loops such as EV
1012	adds AnyEvent significant overhead.	1017	adds AnyEvent significant overhead.
1013		1018
1014	And you should simply avoid POE like the plague if you want performance or	1019	=item * You should avoid POE like the plague if you want performance or
1015	reasonable memory usage.	1020	reasonable memory usage.
		1021
		1022	=back
		1023
		1024	=head2 BENCHMARKING THE LARGE SERVER CASE
		1025
		1026	This benchmark atcually benchmarks the event loop itself. It works by
		1027	creating a number of "servers": each server consists of a socketpair, a
		1028	timeout watcher that gets reset on activity (but never fires), and an I/O
		1029	watcher waiting for input on one side of the socket. Each time the socket
		1030	watcher reads a byte it will write that byte to a random other "server".
		1031
		1032	The effect is that there will be a lot of I/O watchers, only part of which
		1033	are active at any one point (so there is a constant number of active
		1034	fds for each loop iterstaion, but which fds these are is random). The
		1035	timeout is reset each time something is read because that reflects how
		1036	most timeouts work (and puts extra pressure on the event loops).
		1037
		1038	In this benchmark, we use 10000 socketpairs (20000 sockets), of which 100
		1039	(1%) are active. This mirrors the activity of large servers with many
		1040	connections, most of which are idle at any one point in time.
		1041
		1042	Source code for this benchmark is found as F<eg/bench2> in the AnyEvent
		1043	distribution.
		1044
		1045	=head3 Explanation of the columns
		1046
		1047	I<sockets> is the number of sockets, and twice the number of "servers" (as
		1048	each server has a read and write socket end).
		1049
		1050	I<create> is the time it takes to create a socketpair (which is
		1051	nontrivial) and two watchers: an I/O watcher and a timeout watcher.
		1052
		1053	I<request>, the most important value, is the time it takes to handle a
		1054	single "request", that is, reading the token from the pipe and forwarding
		1055	it to another server. This includes deleting the old timeout and creating
		1056	a new one that moves the timeout into the future.
		1057
		1058	=head3 Results
		1059
		1060	name sockets create request
		1061	EV 20000 69.01 11.16
		1062	Perl 20000 75.28 112.76
		1063	Event 20000 212.62 257.32
		1064	Glib 20000 651.16 1896.30
		1065	POE 20000 349.67 12317.24 uses POE::Loop::Event
		1066
		1067	=head3 Discussion
		1068
		1069	This benchmark I<does> measure scalability and overall performance of the
		1070	particular event loop.
		1071
		1072	EV is again fastest. Since it is using epoll on my system, the setup time
		1073	is relatively high, though.
		1074
		1075	Perl surprisingly comes second. It is much faster than the C-based event
		1076	loops Event and Glib.
		1077
		1078	Event suffers from high setup time as well (look at its code and you will
		1079	understand why). Callback invocation also has a high overhead compared to
		1080	the C<< $_->() for .. >>-style loop that the Perl event loop uses. Event
		1081	uses select or poll in basically all documented configurations.
		1082
		1083	Glib is hit hard by its quadratic behaviour w.r.t. many watchers. It
		1084	clearly fails to perform with many filehandles or in busy servers.
		1085
		1086	POE is still completely out of the picture, taking over 1000 times as long
		1087	as EV, and over 100 times as long as the Perl implementation, even though
		1088	it uses a C-based event loop in this case.
		1089
		1090	=head3 Summary
		1091
		1092	=over 4
		1093
		1094	=item * The pure perl implementation performs extremely well, considering
		1095	that it uses select.
		1096
		1097	=item * Avoid Glib or POE in large projects where performance matters.
		1098
		1099	=back
		1100
		1101	=head2 BENCHMARKING SMALL SERVERS
		1102
		1103	While event loops should scale (and select-based ones do not...) even to
		1104	large servers, most programs we (or I :) actually write have only a few
		1105	I/O watchers.
		1106
		1107	In this benchmark, I use the same benchmark program as in the large server
		1108	case, but it uses only eight "servers", of which three are active at any
		1109	one time. This should reflect performance for a small server relatively
		1110	well.
		1111
		1112	The columns are identical to the previous table.
		1113
		1114	=head3 Results
		1115
		1116	name sockets create request
		1117	EV 16 20.00 6.54
		1118	Event 16 81.27 35.86
		1119	Glib 16 32.63 15.48
		1120	Perl 16 24.62 162.37
		1121	POE 16 261.87 276.28 uses POE::Loop::Event
		1122
		1123	=head3 Discussion
		1124
		1125	The benchmark tries to test the performance of a typical small
		1126	server. While knowing how various event loops perform is interesting, keep
		1127	in mind that their overhead in this case is usually not as important, due
		1128	to the small absolute number of watchers.
		1129
		1130	EV is again fastest.
		1131
		1132	The C-based event loops Event and Glib come in second this time, as the
		1133	overhead of running an iteration is much smaller in C than in Perl (little
		1134	code to execute in the inner loop, and perl's function calling overhead is
		1135	high, and updating all the data structures is costly).
		1136
		1137	The pure perl event loop is much slower, but still competitive.
		1138
		1139	POE also performs much better in this case, but is is stillf ar behind the
		1140	others.
		1141
		1142	=head3 Summary
		1143
		1144	=over 4
		1145
		1146	=item * C-based event loops perform very well with small number of
		1147	watchers, as the management overhead dominates.
		1148
		1149	=back
1016		1150
1017		1151
1018	=head1 FORK	1152	=head1 FORK
1019		1153
1020	Most event libraries are not fork-safe. The ones who are usually are	1154	Most event libraries are not fork-safe. The ones who are usually are

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing AnyEvent/lib/AnyEvent.pm (file contents): Revision 1.85 by root, Fri Apr 25 13:51:32 2008 UTC vs. Revision 1.94 by root, Sat Apr 26 04:33:51 2008 UTC

Diff Legend

Comparing AnyEvent/lib/AnyEvent.pm (file contents):
Revision 1.85 by root, Fri Apr 25 13:51:32 2008 UTC vs.
Revision 1.94 by root, Sat Apr 26 04:33:51 2008 UTC