[ViewVC] Diff of: cvs/AnyEvent-Fork-Pool/Pool.pm

Comparing AnyEvent-Fork-Pool/Pool.pm (file contents):
Revision 1.5 by root, Sat Apr 20 19:33:23 2013 UTC vs.
Revision 1.6 by root, Sun Apr 21 11:17:02 2013 UTC

…		…
6		6
7	use AnyEvent;	7	use AnyEvent;
8	use AnyEvent::Fork::Pool;	8	use AnyEvent::Fork::Pool;
9	# use AnyEvent::Fork is not needed	9	# use AnyEvent::Fork is not needed
10		10
11	# all parameters with default values	11	# all possible parameters shown, with default values
12	my $pool = AnyEvent::Fork	12	my $pool = AnyEvent::Fork
13	->new	13	->new
14	->require ("MyWorker")	14	->require ("MyWorker")
15	->AnyEvent::Fork::Pool::run (	15	->AnyEvent::Fork::Pool::run (
16	"MyWorker::run", # the worker function	16	"MyWorker::run", # the worker function
17		17
18	# pool management	18	# pool management
19	max => 4, # absolute maximum # of processes	19	max => 4, # absolute maximum # of processes
20	idle => 2, # minimum # of idle processes	20	idle => 0, # minimum # of idle processes
21	load => 2, # queue at most this number of jobs per process	21	load => 2, # queue at most this number of jobs per process
22	start => 0.1, # wait this many seconds before starting a new process	22	start => 0.1, # wait this many seconds before starting a new process
23	stop => 1, # wait this many seconds before stopping an idle process	23	stop => 10, # wait this many seconds before stopping an idle process
24	on_destroy => (my $finish = AE::cv), # called when object is destroyed	24	on_destroy => (my $finish = AE::cv), # called when object is destroyed
25		25
26	# parameters passed to AnyEvent::Fork::RPC	26	# parameters passed to AnyEvent::Fork::RPC
27	async => 0,	27	async => 0,
28	on_error => sub { die "FATAL: $_[0]\n" },	28	on_error => sub { die "FATAL: $_[0]\n" },
…		…
48	pool of processes that handles jobs.	48	pool of processes that handles jobs.
49		49
50	Understanding of L<AnyEvent::Fork> is helpful but not critical to be able	50	Understanding of L<AnyEvent::Fork> is helpful but not critical to be able
51	to use this module, but a thorough understanding of L<AnyEvent::Fork::RPC>	51	to use this module, but a thorough understanding of L<AnyEvent::Fork::RPC>
52	is, as it defines the actual API that needs to be implemented in the	52	is, as it defines the actual API that needs to be implemented in the
53	children.	53	worker processes.
54		54
55	=head1 EXAMPLES	55	=head1 EXAMPLES
56		56
57	=head1 PARENT USAGE	57	=head1 PARENT USAGE
		58
		59	To create a pool, you first have to create a L<AnyEvent::Fork> object -
		60	this object becomes your template process. Whenever a new worker process
		61	is needed, it is forked from this template process. Then you need to
		62	"hand off" this template process to the C<AnyEvent::Fork::Pool> module by
		63	calling its run method on it:
		64
		65	my $template = AnyEvent::Fork
		66	->new
		67	->require ("SomeModule", "MyWorkerModule");
		68
		69	my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
		70
		71	The pool "object" is not a regular Perl object, but a code reference that
		72	you can call and that works roughly like calling the worker function
		73	directly, except that it returns nothing but instead you need to specify a
		74	callback to be invoked once results are in:
		75
		76	$pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
58		77
59	=over 4	78	=over 4
60		79
61	=cut	80	=cut
62		81
…		…
81		100
82	our $VERSION = 0.1;	101	our $VERSION = 0.1;
83		102
84	=item my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]	103	=item my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
85		104
86	The traditional way to call it. But it is way cooler to call it in the	105	The traditional way to call the pool creation function. But it is way
87	following way:	106	cooler to call it in the following way:
88		107
89	=item my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => value...])	108	=item my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => value...])
90		109
91	Creates a new pool object with the specified C<$function> as function	110	Creates a new pool object with the specified C<$function> as function
92	(name) to call for each request. The pool uses the C<$fork> object as the	111	(name) to call for each request. The pool uses the C<$fork> object as the
…		…
105		124
106	The pool consists of a certain number of worker processes. These options	125	The pool consists of a certain number of worker processes. These options
107	decide how many of these processes exist and when they are started and	126	decide how many of these processes exist and when they are started and
108	stopped.	127	stopped.
109		128
		129	The worker pool is dynamically resized, according to (perceived :)
		130	load. The minimum size is given by the C<idle> parameter and the maximum
		131	size is given by the C<max> parameter. A new worker is started every
		132	C<start> seconds at most, and an idle worker is stopped at most every
		133	C<stop> second.
		134
		135	You can specify the amount of jobs sent to a worker concurrently using the
		136	C<load> parameter.
		137
110	=over 4	138	=over 4
111		139
112	=item idle => $count (default: 0)	140	=item idle => $count (default: 0)
113		141
114	The minimum amount of idle processes in the pool - when there are fewer	142	The minimum amount of idle processes in the pool - when there are fewer
115	than this many idle workers, C<AnyEvent::Fork::Pool> will try to start new	143	than this many idle workers, C<AnyEvent::Fork::Pool> will try to start new
116	ones, subject to C<max> and C<start>.	144	ones, subject to the limits set by C<max> and C<start>.
117		145
118	This is also the initial/minimum amount of workers in the pool. The	146	This is also the initial amount of workers in the pool. The default of
119	default of zero means that the pool starts empty and can shrink back to	147	zero means that the pool starts empty and can shrink back to zero workers
120	zero workers over time.	148	over time.
121		149
122	=item max => $count (default: 4)	150	=item max => $count (default: 4)
123		151
124	The maximum number of processes in the pool, in addition to the template	152	The maximum number of processes in the pool, in addition to the template
125	process. C<AnyEvent::Fork::Pool> will never create more than this number	153	process. C<AnyEvent::Fork::Pool> will never have more than this number of
126	of worker processes, although there can be more temporarily when a worker	154	worker processes, although there can be more temporarily when a worker is
127	is shut down and hasn't exited yet.	155	shut down and hasn't exited yet.
128		156
129	=item load => $count (default: 2)	157	=item load => $count (default: 2)
130		158
131	The maximum number of concurrent jobs sent to a single worker	159	The maximum number of concurrent jobs sent to a single worker process.
132	process. Worker processes that handle this number of jobs already are
133	called "busy".
134		160
135	Jobs that cannot be sent to a worker immediately (because all workers are	161	Jobs that cannot be sent to a worker immediately (because all workers are
136	busy) will be queued until a worker is available.	162	busy) will be queued until a worker is available.
137		163
		164	Setting this low improves latency. For example, at C<1>, every job that
		165	is sent to a worker is sent to a completely idle worker that doesn't run
		166	any other jobs. The downside is that throughput is reduced - a worker that
		167	finishes a job needs to wait for a new job from the parent.
		168
		169	The default of C<2> is usually a good compromise.
		170
138	=item start => $seconds (default: 0.1)	171	=item start => $seconds (default: 0.1)
139		172
140	When a job is queued and all workers are busy, a timer is started. If the	173	When there are fewer than C<idle> workers (or all workers are completely
141	timer elapses and there are still jobs that cannot be queued to a worker,	174	busy), then a timer is started. If the timer elapses and there are still
142	a new worker is started.	175	jobs that cannot be queued to a worker, a new worker is started.
143		176
144	This configurs the time that all workers must be busy before a new worker	177	This sets the minimum time that all workers must be busy before a new
145	is started. Or, put differently, the minimum delay betwene starting new	178	worker is started. Or, put differently, the minimum delay between starting
146	workers.	179	new workers.
147		180
148	The delay is zero by default, which means new workers will be started	181	The delay is small by default, which means new workers will be started
149	without delay.	182	relatively quickly. A delay of C<0> is possible, and ensures that the pool
		183	will grow as quickly as possible under load.
150		184
		185	Non-zero values are useful to avoid "exploding" a pool because a lot of
		186	jobs are queued in an instant.
		187
		188	Higher values are often useful to improve efficiency at the cost of
		189	latency - when fewer processes can do the job over time, starting more and
		190	more is not necessarily going to help.
		191
151	=item stop => $seconds (default: 1)	192	=item stop => $seconds (default: 10)
152		193
153	When a worker has no jobs to execute it becomes idle. An idle worker that	194	When a worker has no jobs to execute it becomes idle. An idle worker that
154	hasn't executed a job within this amount of time will be stopped, unless	195	hasn't executed a job within this amount of time will be stopped, unless
155	the other parameters say otherwise.	196	the other parameters say otherwise.
156		197
		198	Setting this to a very high value means that workers stay around longer,
		199	even when they have nothing to do, which can be good as they don't have to
		200	be started on the netx load spike again.
		201
		202	Setting this to a lower value can be useful to avoid memory or simply
		203	process table wastage.
		204
		205	Usually, setting this to a time longer than the time between load spikes
		206	is best - if you expect a lot of requests every minute and little work
		207	in between, setting this to longer than a minute avoids having to stop
		208	and start workers. On the other hand, you have to ask yourself if letting
		209	workers run idle is a good use of your resources. Try to find a good
		210	balance between resource usage of your workers and the time to start new
		211	workers - the processes created by L<AnyEvent::Fork> itself is fats at
		212	creating workers while not using much memory for them, so most of the
		213	overhead is likely from your own code.
		214
157	=item on_destroy => $callback->() (default: none)	215	=item on_destroy => $callback->() (default: none)
158		216
159	When a pool object goes out of scope, it will still handle all outstanding	217	When a pool object goes out of scope, the outstanding requests are still
160	jobs. After that, it will destroy all workers (and also the template	218	handled till completion. Only after handling all jobs will the workers
161	process if it isn't referenced otherwise).	219	be destroyed (and also the template process if it isn't referenced
		220	otherwise).
		221
		222	To find out when a pool I<really> has finished its work, you can set this
		223	callback, which will be called when the pool has been destroyed.
162		224
163	=back	225	=back
164		226
165	=item Template Process	227	=item AnyEvent::Fork::RPC Parameters
166		228
167	The worker processes are all forked from a single template	229	These parameters are all passed more or less directly to
168	process. Ideally, all modules and all cdoe used by the worker, as well as	230	L<AnyEvent::Fork::RPC>. They are only briefly mentioned here, for
169	any shared data structures should be loaded into the template process, to	231	their full documentation please refer to the L<AnyEvent::Fork::RPC>
170	take advantage of data sharing via fork.	232	documentation. Also, the default values mentioned here are only documented
171		233	as a best effort - the L<AnyEvent::Fork::RPC> documentation is binding.
172	You can create your own template process by creating a L<AnyEvent::Fork>
173	object yourself and passing it as the C<template> parameter, but
174	C<AnyEvent::Fork::Pool> can create one for you, including some standard
175	options.
176		234
177	=over 4	235	=over 4
178		236
179	=item template => $fork (default: C<< AnyEvent::Fork->new >>)
180
181	The template process to use, if you want to create your own.
182
183	=item require => \@modules (default: C<[]>)
184
185	The modules in this list will be laoded into the template process.
186
187	=item eval => "# perl code to execute in template" (default: none)
188
189	This is a perl string that is evaluated after creating the template
190	process and after requiring the modules. It can do whatever it wants to
191	configure the process, but it must not do anything that would keep a later
192	fork from working (so must not create event handlers or (real) threads for
193	example).
194
195	=back
196
197	=item AnyEvent::Fork::RPC Parameters
198
199	These parameters are all passed directly to L<AnyEvent::Fork::RPC>. They
200	are only briefly mentioned here, for their full documentation
201	please refer to the L<AnyEvent::Fork::RPC> documentation. Also, the
202	default values mentioned here are only documented as a best effort -
203	L<AnyEvent::Fork::RPC> documentation is binding.
204
205	=over 4
206
207	=item async => $boolean (default: 0)	237	=item async => $boolean (default: 0)
208		238
209	Whether to sue the synchronous or asynchronous RPC backend.	239	Whether to use the synchronous or asynchronous RPC backend.
210		240
211	=item on_error => $callback->($message) (default: die with message)	241	=item on_error => $callback->($message) (default: die with message)
212		242
213	The callback to call on any (fatal) errors.	243	The callback to call on any (fatal) errors.
214		244
…		…
235		265
236	my $max = $arg{max} \|\| 4;	266	my $max = $arg{max} \|\| 4;
237	my $idle = $arg{idle} \|\| 0,	267	my $idle = $arg{idle} \|\| 0,
238	my $load = $arg{load} \|\| 2,	268	my $load = $arg{load} \|\| 2,
239	my $start = $arg{start} \|\| 0.1,	269	my $start = $arg{start} \|\| 0.1,
240	my $stop = $arg{stop} \|\| 1,	270	my $stop = $arg{stop} \|\| 10,
241	my $on_event = $arg{on_event} \|\| sub { },	271	my $on_event = $arg{on_event} \|\| sub { },
242	my $on_destroy = $arg{on_destroy};	272	my $on_destroy = $arg{on_destroy};
243		273
244	my @rpc = (	274	my @rpc = (
245	async => $arg{async},	275	async => $arg{async},
…		…
258		288
259	$template	289	$template
260	->require ("AnyEvent::Fork::RPC::" . ($arg{async} ? "Async" : "Sync"))	290	->require ("AnyEvent::Fork::RPC::" . ($arg{async} ? "Async" : "Sync"))
261	->eval ('	291	->eval ('
262	my ($magic0, $magic1) = @_;	292	my ($magic0, $magic1) = @_;
263	sub AnyEvent::Fork::Pool::quit() {	293	sub AnyEvent::Fork::Pool::retire() {
264	AnyEvent::Fork::RPC::on_event $magic0, "quit", $magic1;	294	AnyEvent::Fork::RPC::event $magic0, "quit", $magic1;
265	}	295	}
266	', $magic0, $magic1)	296	', $magic0, $magic1)
267	->eval ($arg{eval});	297	;
268		298
269	$start_worker = sub {	299	$start_worker = sub {
270	my $proc = [0, 0, undef]; # load, index, rpc	300	my $proc = [0, 0, undef]; # load, index, rpc
271		301
272	$proc->[2] = $template	302	$proc->[2] = $template
…		…
397		427
398	=item $pool->(..., $cb->(...))	428	=item $pool->(..., $cb->(...))
399		429
400	Call the RPC function of a worker with the given arguments, and when the	430	Call the RPC function of a worker with the given arguments, and when the
401	worker is done, call the C<$cb> with the results, just like calling the	431	worker is done, call the C<$cb> with the results, just like calling the
402	L<AnyEvent::Fork::RPC> object directly.	432	RPC object durectly - see the L<AnyEvent::Fork::RPC> documentation for
		433	details on the RPC API.
403		434
404	If there is no free worker, the call will be queued.	435	If there is no free worker, the call will be queued until a worker becomes
		436	available.
405		437
406	Note that there can be considerable time between calling this method and	438	Note that there can be considerable time between calling this method and
407	the call actually being executed. During this time, the parameters passed	439	the call actually being executed. During this time, the parameters passed
408	to this function are effectively read-only - modifying them after the call	440	to this function are effectively read-only - modifying them after the call
409	and before the callback is invoked causes undefined behaviour.	441	and before the callback is invoked causes undefined behaviour.
410		442
411	=cut	443	=cut
412		444
413	=back	445	=back
414		446
		447	=head1 CHILD USAGE
		448
		449	In addition to the L<AnyEvent::Fork::RPC> API, this module implements one
		450	more child-side function:
		451
		452	=over 4
		453
		454	=item AnyEvent::Fork::Pool::retire ()
		455
		456	This function sends an event to the parent process to request retirement:
		457	the worker is removed from the pool and no new jobs will be sent to it,
		458	but it has to handle the jobs that are already queued.
		459
		460	The parentheses are part of the syntax: the function usually isn't defined
		461	when you compile your code (because that happens I<before> handing the
		462	template process over to C<AnyEvent::Fork::Pool::run>, so you need the
		463	empty parentheses to tell Perl that the function is indeed a function.
		464
		465	Retiring a worker can be useful to gracefully shut it down when the worker
		466	deems this useful. For example, after executing a job, one could check
		467	the process size or the number of jobs handled so far, and if either is
		468	too high, the worker could ask to get retired, to avoid memory leaks to
		469	accumulate.
		470
		471	=back
		472
		473	=head1 POOL PARAMETERS RECIPES
		474
		475	This section describes some recipes for pool paramaters. These are mostly
		476	meant for the synchronous RPC backend, as the asynchronous RPC backend
		477	changes the rules considerably, making workers themselves responsible for
		478	their scheduling.
		479
		480	=over 4
		481
		482	=item low latency - set load = 1
		483
		484	If you need a deterministic low latency, you should set the C<load>
		485	parameter to C<1>. This ensures that never more than one job is sent to
		486	each worker. This avoids having to wait for a previous job to finish.
		487
		488	This makes most sense with the synchronous (default) backend, as the
		489	asynchronous backend can handle multiple requests concurrently.
		490
		491	=item lowest latency - set load = 1 and idle = max
		492
		493	To achieve the lowest latency, you additionally should disable any dynamic
		494	resizing of the pool by setting C<idle> to the same value as C<max>.
		495
		496	=item high throughput, cpu bound jobs - set load >= 2, max = #cpus
		497
		498	To get high throughput with cpu-bound jobs, you should set the maximum
		499	pool size to the number of cpus in your system, and C<load> to at least
		500	C<2>, to make sure there can be another job waiting for the worker when it
		501	has finished one.
		502
		503	The value of C<2> for C<load> is the minimum value that I<can> achieve
		504	100% throughput, but if your parent process itself is sometimes busy, you
		505	might need higher values. Also there is a limit on the amount of data that
		506	can be "in flight" to the worker, so if you send big blobs of data to your
		507	worker, C<load> might have much less of an effect.
		508
		509	=item high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
		510
		511	When your jobs are I/O bound, using more workers usually boils down to
		512	higher throughput, depending very much on your actual workload - sometimes
		513	having only one worker is best, for example, when you read or write big
		514	files at maixmum speed, as a second worker will increase seek times.
		515
		516	=back
		517
415	=head1 SEE ALSO	518	=head1 SEE ALSO
416		519
417	L<AnyEvent::Fork>, to create the processes in the first place.	520	L<AnyEvent::Fork>, to create the processes in the first place.
418		521
419	L<AnyEvent::Fork::RPC>, which implements the RPC protocol and API.	522	L<AnyEvent::Fork::RPC>, which implements the RPC protocol and API.

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing AnyEvent-Fork-Pool/Pool.pm (file contents): Revision 1.5 by root, Sat Apr 20 19:33:23 2013 UTC vs. Revision 1.6 by root, Sun Apr 21 11:17:02 2013 UTC

Diff Legend

Comparing AnyEvent-Fork-Pool/Pool.pm (file contents):
Revision 1.5 by root, Sat Apr 20 19:33:23 2013 UTC vs.
Revision 1.6 by root, Sun Apr 21 11:17:02 2013 UTC