ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-Pool/README
Revision: 1.3
Committed: Sun Apr 28 14:19:22 2013 UTC (11 years ago) by root
Branch: MAIN
CVS Tags: rel-1_1, rel-1_2
Changes since 1.2: +36 -0 lines
Log Message:
1.1

File Contents

# Content
1 NAME
2 AnyEvent::Fork::Pool - simple process pool manager on top of
3 AnyEvent::Fork
4
5 THE API IS NOT FINISHED, CONSIDER THIS AN ALPHA RELEASE
6
7 SYNOPSIS
8 use AnyEvent;
9 use AnyEvent::Fork::Pool;
10 # use AnyEvent::Fork is not needed
11
12 # all possible parameters shown, with default values
13 my $pool = AnyEvent::Fork
14 ->new
15 ->require ("MyWorker")
16 ->AnyEvent::Fork::Pool::run (
17 "MyWorker::run", # the worker function
18
19 # pool management
20 max => 4, # absolute maximum # of processes
21 idle => 0, # minimum # of idle processes
22 load => 2, # queue at most this number of jobs per process
23 start => 0.1, # wait this many seconds before starting a new process
24 stop => 10, # wait this many seconds before stopping an idle process
25 on_destroy => (my $finish = AE::cv), # called when object is destroyed
26
27 # parameters passed to AnyEvent::Fork::RPC
28 async => 0,
29 on_error => sub { die "FATAL: $_[0]\n" },
30 on_event => sub { my @ev = @_ },
31 init => "MyWorker::init",
32 serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER,
33 );
34
35 for (1..10) {
36 $pool->(doit => $_, sub {
37 print "MyWorker::run returned @_\n";
38 });
39 }
40
41 undef $pool;
42
43 $finish->recv;
44
45 DESCRIPTION
46 This module uses processes created via AnyEvent::Fork and the RPC
47 protocol implement in AnyEvent::Fork::RPC to create a load-balanced pool
48 of processes that handles jobs.
49
50 Understanding of AnyEvent::Fork is helpful but not critical to be able
51 to use this module, but a thorough understanding of AnyEvent::Fork::RPC
52 is, as it defines the actual API that needs to be implemented in the
53 worker processes.
54
55 EXAMPLES
56 PARENT USAGE
57 To create a pool, you first have to create a AnyEvent::Fork object -
58 this object becomes your template process. Whenever a new worker process
59 is needed, it is forked from this template process. Then you need to
60 "hand off" this template process to the "AnyEvent::Fork::Pool" module by
61 calling its run method on it:
62
63 my $template = AnyEvent::Fork
64 ->new
65 ->require ("SomeModule", "MyWorkerModule");
66
67 my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
68
69 The pool "object" is not a regular Perl object, but a code reference
70 that you can call and that works roughly like calling the worker
71 function directly, except that it returns nothing but instead you need
72 to specify a callback to be invoked once results are in:
73
74 $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
75
76 my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
77 The traditional way to call the pool creation function. But it is
78 way cooler to call it in the following way:
79
80 my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key =>
81 value...])
82 Creates a new pool object with the specified $function as function
83 (name) to call for each request. The pool uses the $fork object as
84 the template when creating worker processes.
85
86 You can supply your own template process, or tell
87 "AnyEvent::Fork::Pool" to create one.
88
89 A relatively large number of key/value pairs can be specified to
90 influence the behaviour. They are grouped into the categories "pool
91 management", "template process" and "rpc parameters".
92
93 Pool Management
94 The pool consists of a certain number of worker processes. These
95 options decide how many of these processes exist and when they
96 are started and stopped.
97
98 The worker pool is dynamically resized, according to (perceived
99 :) load. The minimum size is given by the "idle" parameter and
100 the maximum size is given by the "max" parameter. A new worker
101 is started every "start" seconds at most, and an idle worker is
102 stopped at most every "stop" second.
103
104 You can specify the amount of jobs sent to a worker concurrently
105 using the "load" parameter.
106
107 idle => $count (default: 0)
108 The minimum amount of idle processes in the pool - when
109 there are fewer than this many idle workers,
110 "AnyEvent::Fork::Pool" will try to start new ones, subject
111 to the limits set by "max" and "start".
112
113 This is also the initial amount of workers in the pool. The
114 default of zero means that the pool starts empty and can
115 shrink back to zero workers over time.
116
117 max => $count (default: 4)
118 The maximum number of processes in the pool, in addition to
119 the template process. "AnyEvent::Fork::Pool" will never have
120 more than this number of worker processes, although there
121 can be more temporarily when a worker is shut down and
122 hasn't exited yet.
123
124 load => $count (default: 2)
125 The maximum number of concurrent jobs sent to a single
126 worker process.
127
128 Jobs that cannot be sent to a worker immediately (because
129 all workers are busy) will be queued until a worker is
130 available.
131
132 Setting this low improves latency. For example, at 1, every
133 job that is sent to a worker is sent to a completely idle
134 worker that doesn't run any other jobs. The downside is that
135 throughput is reduced - a worker that finishes a job needs
136 to wait for a new job from the parent.
137
138 The default of 2 is usually a good compromise.
139
140 start => $seconds (default: 0.1)
141 When there are fewer than "idle" workers (or all workers are
142 completely busy), then a timer is started. If the timer
143 elapses and there are still jobs that cannot be queued to a
144 worker, a new worker is started.
145
146 This sets the minimum time that all workers must be busy
147 before a new worker is started. Or, put differently, the
148 minimum delay between starting new workers.
149
150 The delay is small by default, which means new workers will
151 be started relatively quickly. A delay of 0 is possible, and
152 ensures that the pool will grow as quickly as possible under
153 load.
154
155 Non-zero values are useful to avoid "exploding" a pool
156 because a lot of jobs are queued in an instant.
157
158 Higher values are often useful to improve efficiency at the
159 cost of latency - when fewer processes can do the job over
160 time, starting more and more is not necessarily going to
161 help.
162
163 stop => $seconds (default: 10)
164 When a worker has no jobs to execute it becomes idle. An
165 idle worker that hasn't executed a job within this amount of
166 time will be stopped, unless the other parameters say
167 otherwise.
168
169 Setting this to a very high value means that workers stay
170 around longer, even when they have nothing to do, which can
171 be good as they don't have to be started on the netx load
172 spike again.
173
174 Setting this to a lower value can be useful to avoid memory
175 or simply process table wastage.
176
177 Usually, setting this to a time longer than the time between
178 load spikes is best - if you expect a lot of requests every
179 minute and little work in between, setting this to longer
180 than a minute avoids having to stop and start workers. On
181 the other hand, you have to ask yourself if letting workers
182 run idle is a good use of your resources. Try to find a good
183 balance between resource usage of your workers and the time
184 to start new workers - the processes created by
185 AnyEvent::Fork itself is fats at creating workers while not
186 using much memory for them, so most of the overhead is
187 likely from your own code.
188
189 on_destroy => $callback->() (default: none)
190 When a pool object goes out of scope, the outstanding
191 requests are still handled till completion. Only after
192 handling all jobs will the workers be destroyed (and also
193 the template process if it isn't referenced otherwise).
194
195 To find out when a pool *really* has finished its work, you
196 can set this callback, which will be called when the pool
197 has been destroyed.
198
199 AnyEvent::Fork::RPC Parameters
200 These parameters are all passed more or less directly to
201 AnyEvent::Fork::RPC. They are only briefly mentioned here, for
202 their full documentation please refer to the AnyEvent::Fork::RPC
203 documentation. Also, the default values mentioned here are only
204 documented as a best effort - the AnyEvent::Fork::RPC
205 documentation is binding.
206
207 async => $boolean (default: 0)
208 Whether to use the synchronous or asynchronous RPC backend.
209
210 on_error => $callback->($message) (default: die with message)
211 The callback to call on any (fatal) errors.
212
213 on_event => $callback->(...) (default: "sub { }", unlike
214 AnyEvent::Fork::RPC)
215 The callback to invoke on events.
216
217 init => $initfunction (default: none)
218 The function to call in the child, once before handling
219 requests.
220
221 serialiser => $serialiser (defailt:
222 $AnyEvent::Fork::RPC::STRING_SERIALISER)
223 The serialiser to use.
224
225 $pool->(..., $cb->(...))
226 Call the RPC function of a worker with the given arguments, and when
227 the worker is done, call the $cb with the results, just like calling
228 the RPC object durectly - see the AnyEvent::Fork::RPC documentation
229 for details on the RPC API.
230
231 If there is no free worker, the call will be queued until a worker
232 becomes available.
233
234 Note that there can be considerable time between calling this method
235 and the call actually being executed. During this time, the
236 parameters passed to this function are effectively read-only -
237 modifying them after the call and before the callback is invoked
238 causes undefined behaviour.
239
240 $cpus = AnyEvent::Fork::Pool::ncpu [$default_cpus]
241 ($cpus, $eus) = AnyEvent::Fork::Pool::ncpu [$default_cpus]
242 Tries to detect the number of CPUs ($cpus often called cpu cores
243 nowadays) and execution units ($eus) which include e.g. extra
244 hyperthreaded units). When $cpus cannot be determined reliably,
245 $default_cpus is returned for both values, or 1 if it is missing.
246
247 For normal CPU bound uses, it is wise to have as many worker
248 processes as CPUs in the system ($cpus), if nothing else uses the
249 CPU. Using hyperthreading is usually detrimental to performance, but
250 in those rare cases where that really helps it might be beneficial
251 to use more workers ($eus).
252
253 Currently, /proc/cpuinfo is parsed on GNU/Linux systems for both
254 $cpus and $eu, and on {Free,Net,Open}BSD, sysctl -n hw.ncpu is used
255 for $cpus.
256
257 Example: create a worker pool with as many workers as cpu cores, or
258 2, if the actual number could not be determined.
259
260 $fork->AnyEvent::Fork::Pool::run ("myworker::function",
261 max => (scalar AnyEvent::Fork::Pool::ncpu 2),
262 );
263
264 CHILD USAGE
265 In addition to the AnyEvent::Fork::RPC API, this module implements one
266 more child-side function:
267
268 AnyEvent::Fork::Pool::retire ()
269 This function sends an event to the parent process to request
270 retirement: the worker is removed from the pool and no new jobs will
271 be sent to it, but it has to handle the jobs that are already
272 queued.
273
274 The parentheses are part of the syntax: the function usually isn't
275 defined when you compile your code (because that happens *before*
276 handing the template process over to "AnyEvent::Fork::Pool::run", so
277 you need the empty parentheses to tell Perl that the function is
278 indeed a function.
279
280 Retiring a worker can be useful to gracefully shut it down when the
281 worker deems this useful. For example, after executing a job, one
282 could check the process size or the number of jobs handled so far,
283 and if either is too high, the worker could ask to get retired, to
284 avoid memory leaks to accumulate.
285
286 Example: retire a worker after it has handled roughly 100 requests.
287
288 my $count = 0;
289
290 sub my::worker {
291
292 ++$count == 100
293 and AnyEvent::Fork::Pool::retire ();
294
295 ... normal code goes here
296 }
297
298 POOL PARAMETERS RECIPES
299 This section describes some recipes for pool paramaters. These are
300 mostly meant for the synchronous RPC backend, as the asynchronous RPC
301 backend changes the rules considerably, making workers themselves
302 responsible for their scheduling.
303
304 low latency - set load = 1
305 If you need a deterministic low latency, you should set the "load"
306 parameter to 1. This ensures that never more than one job is sent to
307 each worker. This avoids having to wait for a previous job to
308 finish.
309
310 This makes most sense with the synchronous (default) backend, as the
311 asynchronous backend can handle multiple requests concurrently.
312
313 lowest latency - set load = 1 and idle = max
314 To achieve the lowest latency, you additionally should disable any
315 dynamic resizing of the pool by setting "idle" to the same value as
316 "max".
317
318 high throughput, cpu bound jobs - set load >= 2, max = #cpus
319 To get high throughput with cpu-bound jobs, you should set the
320 maximum pool size to the number of cpus in your system, and "load"
321 to at least 2, to make sure there can be another job waiting for the
322 worker when it has finished one.
323
324 The value of 2 for "load" is the minimum value that *can* achieve
325 100% throughput, but if your parent process itself is sometimes
326 busy, you might need higher values. Also there is a limit on the
327 amount of data that can be "in flight" to the worker, so if you send
328 big blobs of data to your worker, "load" might have much less of an
329 effect.
330
331 high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
332 When your jobs are I/O bound, using more workers usually boils down
333 to higher throughput, depending very much on your actual workload -
334 sometimes having only one worker is best, for example, when you read
335 or write big files at maixmum speed, as a second worker will
336 increase seek times.
337
338 EXCEPTIONS
339 The same "policy" as with AnyEvent::Fork::RPC applies - exceptins will
340 not be caught, and exceptions in both worker and in callbacks causes
341 undesirable or undefined behaviour.
342
343 SEE ALSO
344 AnyEvent::Fork, to create the processes in the first place.
345
346 AnyEvent::Fork::RPC, which implements the RPC protocol and API.
347
348 AUTHOR AND CONTACT INFORMATION
349 Marc Lehmann <schmorp@schmorp.de>
350 http://software.schmorp.de/pkg/AnyEvent-Fork-Pool
351