ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-Pool/README
Revision: 1.5
Committed: Thu Oct 27 07:27:57 2022 UTC (18 months, 1 week ago) by root
Branch: MAIN
CVS Tags: rel-1_3, HEAD
Changes since 1.4: +4 -4 lines
Log Message:
1.3

File Contents

# Content
1 NAME
2 AnyEvent::Fork::Pool - simple process pool manager on top of
3 AnyEvent::Fork
4
5 SYNOPSIS
6 use AnyEvent;
7 use AnyEvent::Fork;
8 use AnyEvent::Fork::Pool;
9
10 # all possible parameters shown, with default values
11 my $pool = AnyEvent::Fork
12 ->new
13 ->require ("MyWorker")
14 ->AnyEvent::Fork::Pool::run (
15 "MyWorker::run", # the worker function
16
17 # pool management
18 max => 4, # absolute maximum # of processes
19 idle => 0, # minimum # of idle processes
20 load => 2, # queue at most this number of jobs per process
21 start => 0.1, # wait this many seconds before starting a new process
22 stop => 10, # wait this many seconds before stopping an idle process
23 on_destroy => (my $finish = AE::cv), # called when object is destroyed
24
25 # parameters passed to AnyEvent::Fork::RPC
26 async => 0,
27 on_error => sub { die "FATAL: $_[0]\n" },
28 on_event => sub { my @ev = @_ },
29 init => "MyWorker::init",
30 serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER,
31 );
32
33 for (1..10) {
34 $pool->(doit => $_, sub {
35 print "MyWorker::run returned @_\n";
36 });
37 }
38
39 undef $pool;
40
41 $finish->recv;
42
43 DESCRIPTION
44 This module uses processes created via AnyEvent::Fork (or
45 AnyEvent::Fork::Remote) and the RPC protocol implement in
46 AnyEvent::Fork::RPC to create a load-balanced pool of processes that
47 handles jobs.
48
49 Understanding AnyEvent::Fork is helpful but not required to use this
50 module, but a thorough understanding of AnyEvent::Fork::RPC is, as it
51 defines the actual API that needs to be implemented in the worker
52 processes.
53
54 PARENT USAGE
55 To create a pool, you first have to create a AnyEvent::Fork object -
56 this object becomes your template process. Whenever a new worker process
57 is needed, it is forked from this template process. Then you need to
58 "hand off" this template process to the "AnyEvent::Fork::Pool" module by
59 calling its run method on it:
60
61 my $template = AnyEvent::Fork
62 ->new
63 ->require ("SomeModule", "MyWorkerModule");
64
65 my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
66
67 The pool "object" is not a regular Perl object, but a code reference
68 that you can call and that works roughly like calling the worker
69 function directly, except that it returns nothing but instead you need
70 to specify a callback to be invoked once results are in:
71
72 $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
73
74 my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
75 The traditional way to call the pool creation function. But it is
76 way cooler to call it in the following way:
77
78 my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key =>
79 value...])
80 Creates a new pool object with the specified $function as function
81 (name) to call for each request. The pool uses the $fork object as
82 the template when creating worker processes.
83
84 You can supply your own template process, or tell
85 "AnyEvent::Fork::Pool" to create one.
86
87 A relatively large number of key/value pairs can be specified to
88 influence the behaviour. They are grouped into the categories "pool
89 management", "template process" and "rpc parameters".
90
91 Pool Management
92 The pool consists of a certain number of worker processes. These
93 options decide how many of these processes exist and when they
94 are started and stopped.
95
96 The worker pool is dynamically resized, according to (perceived
97 :) load. The minimum size is given by the "idle" parameter and
98 the maximum size is given by the "max" parameter. A new worker
99 is started every "start" seconds at most, and an idle worker is
100 stopped at most every "stop" second.
101
102 You can specify the amount of jobs sent to a worker concurrently
103 using the "load" parameter.
104
105 idle => $count (default: 0)
106 The minimum amount of idle processes in the pool - when
107 there are fewer than this many idle workers,
108 "AnyEvent::Fork::Pool" will try to start new ones, subject
109 to the limits set by "max" and "start".
110
111 This is also the initial amount of workers in the pool. The
112 default of zero means that the pool starts empty and can
113 shrink back to zero workers over time.
114
115 max => $count (default: 4)
116 The maximum number of processes in the pool, in addition to
117 the template process. "AnyEvent::Fork::Pool" will never have
118 more than this number of worker processes, although there
119 can be more temporarily when a worker is shut down and
120 hasn't exited yet.
121
122 load => $count (default: 2)
123 The maximum number of concurrent jobs sent to a single
124 worker process.
125
126 Jobs that cannot be sent to a worker immediately (because
127 all workers are busy) will be queued until a worker is
128 available.
129
130 Setting this low improves latency. For example, at 1, every
131 job that is sent to a worker is sent to a completely idle
132 worker that doesn't run any other jobs. The downside is that
133 throughput is reduced - a worker that finishes a job needs
134 to wait for a new job from the parent.
135
136 The default of 2 is usually a good compromise.
137
138 start => $seconds (default: 0.1)
139 When there are fewer than "idle" workers (or all workers are
140 completely busy), then a timer is started. If the timer
141 elapses and there are still jobs that cannot be queued to a
142 worker, a new worker is started.
143
144 This sets the minimum time that all workers must be busy
145 before a new worker is started. Or, put differently, the
146 minimum delay between starting new workers.
147
148 The delay is small by default, which means new workers will
149 be started relatively quickly. A delay of 0 is possible, and
150 ensures that the pool will grow as quickly as possible under
151 load.
152
153 Non-zero values are useful to avoid "exploding" a pool
154 because a lot of jobs are queued in an instant.
155
156 Higher values are often useful to improve efficiency at the
157 cost of latency - when fewer processes can do the job over
158 time, starting more and more is not necessarily going to
159 help.
160
161 stop => $seconds (default: 10)
162 When a worker has no jobs to execute it becomes idle. An
163 idle worker that hasn't executed a job within this amount of
164 time will be stopped, unless the other parameters say
165 otherwise.
166
167 Setting this to a very high value means that workers stay
168 around longer, even when they have nothing to do, which can
169 be good as they don't have to be started on the netx load
170 spike again.
171
172 Setting this to a lower value can be useful to avoid memory
173 or simply process table wastage.
174
175 Usually, setting this to a time longer than the time between
176 load spikes is best - if you expect a lot of requests every
177 minute and little work in between, setting this to longer
178 than a minute avoids having to stop and start workers. On
179 the other hand, you have to ask yourself if letting workers
180 run idle is a good use of your resources. Try to find a good
181 balance between resource usage of your workers and the time
182 to start new workers - the processes created by
183 AnyEvent::Fork itself is fats at creating workers while not
184 using much memory for them, so most of the overhead is
185 likely from your own code.
186
187 on_destroy => $callback->() (default: none)
188 When a pool object goes out of scope, the outstanding
189 requests are still handled till completion. Only after
190 handling all jobs will the workers be destroyed (and also
191 the template process if it isn't referenced otherwise).
192
193 To find out when a pool *really* has finished its work, you
194 can set this callback, which will be called when the pool
195 has been destroyed.
196
197 AnyEvent::Fork::RPC Parameters
198 These parameters are all passed more or less directly to
199 AnyEvent::Fork::RPC. They are only briefly mentioned here, for
200 their full documentation please refer to the AnyEvent::Fork::RPC
201 documentation. Also, the default values mentioned here are only
202 documented as a best effort - the AnyEvent::Fork::RPC
203 documentation is binding.
204
205 async => $boolean (default: 0)
206 Whether to use the synchronous or asynchronous RPC backend.
207
208 on_error => $callback->($message) (default: die with message)
209 The callback to call on any (fatal) errors.
210
211 on_event => $callback->(...) (default: "sub { }", unlike
212 AnyEvent::Fork::RPC)
213 The callback to invoke on events.
214
215 init => $initfunction (default: none)
216 The function to call in the child, once before handling
217 requests.
218
219 serialiser => $serialiser (defailt:
220 $AnyEvent::Fork::RPC::STRING_SERIALISER)
221 The serialiser to use.
222
223 $pool->(..., $cb->(...))
224 Call the RPC function of a worker with the given arguments, and when
225 the worker is done, call the $cb with the results, just like calling
226 the RPC object durectly - see the AnyEvent::Fork::RPC documentation
227 for details on the RPC API.
228
229 If there is no free worker, the call will be queued until a worker
230 becomes available.
231
232 Note that there can be considerable time between calling this method
233 and the call actually being executed. During this time, the
234 parameters passed to this function are effectively read-only -
235 modifying them after the call and before the callback is invoked
236 causes undefined behaviour.
237
238 $cpus = AnyEvent::Fork::Pool::ncpu [$default_cpus]
239 ($cpus, $eus) = AnyEvent::Fork::Pool::ncpu [$default_cpus]
240 Tries to detect the number of CPUs ($cpus often called CPU cores
241 nowadays) and execution units ($eus) which include e.g. extra
242 hyperthreaded units). When $cpus cannot be determined reliably,
243 $default_cpus is returned for both values, or 1 if it is missing.
244
245 For normal CPU bound uses, it is wise to have as many worker
246 processes as CPUs in the system ($cpus), if nothing else uses the
247 CPU. Using hyperthreading is usually detrimental to performance, but
248 in those rare cases where that really helps it might be beneficial
249 to use more workers ($eus).
250
251 Currently, /proc/cpuinfo is parsed on GNU/Linux systems for both
252 $cpus and $eus, and on {Free,Net,Open}BSD, sysctl -n hw.ncpu is used
253 for $cpus.
254
255 Example: create a worker pool with as many workers as CPU cores, or
256 2, if the actual number could not be determined.
257
258 $fork->AnyEvent::Fork::Pool::run ("myworker::function",
259 max => (scalar AnyEvent::Fork::Pool::ncpu 2),
260 );
261
262 CHILD USAGE
263 In addition to the AnyEvent::Fork::RPC API, this module implements one
264 more child-side function:
265
266 AnyEvent::Fork::Pool::retire ()
267 This function sends an event to the parent process to request
268 retirement: the worker is removed from the pool and no new jobs will
269 be sent to it, but it still has to handle the jobs that are already
270 queued.
271
272 The parentheses are part of the syntax: the function usually isn't
273 defined when you compile your code (because that happens *before*
274 handing the template process over to "AnyEvent::Fork::Pool::run", so
275 you need the empty parentheses to tell Perl that the function is
276 indeed a function.
277
278 Retiring a worker can be useful to gracefully shut it down when the
279 worker deems this useful. For example, after executing a job, it
280 could check the process size or the number of jobs handled so far,
281 and if either is too high, the worker could request to be retired,
282 to avoid memory leaks to accumulate.
283
284 Example: retire a worker after it has handled roughly 100 requests.
285 It doesn't matter whether you retire at the beginning or end of your
286 request, as the worker will continue to handle some outstanding
287 requests. Likewise, it's ok to call retire multiple times.
288
289 my $count = 0;
290
291 sub my::worker {
292
293 ++$count == 100
294 and AnyEvent::Fork::Pool::retire ();
295
296 ... normal code goes here
297 }
298
299 POOL PARAMETERS RECIPES
300 This section describes some recipes for pool parameters. These are
301 mostly meant for the synchronous RPC backend, as the asynchronous RPC
302 backend changes the rules considerably, making workers themselves
303 responsible for their scheduling.
304
305 low latency - set load = 1
306 If you need a deterministic low latency, you should set the "load"
307 parameter to 1. This ensures that never more than one job is sent to
308 each worker. This avoids having to wait for a previous job to
309 finish.
310
311 This makes most sense with the synchronous (default) backend, as the
312 asynchronous backend can handle multiple requests concurrently.
313
314 lowest latency - set load = 1 and idle = max
315 To achieve the lowest latency, you additionally should disable any
316 dynamic resizing of the pool by setting "idle" to the same value as
317 "max".
318
319 high throughput, cpu bound jobs - set load >= 2, max = #cpus
320 To get high throughput with cpu-bound jobs, you should set the
321 maximum pool size to the number of cpus in your system, and "load"
322 to at least 2, to make sure there can be another job waiting for the
323 worker when it has finished one.
324
325 The value of 2 for "load" is the minimum value that *can* achieve
326 100% throughput, but if your parent process itself is sometimes
327 busy, you might need higher values. Also there is a limit on the
328 amount of data that can be "in flight" to the worker, so if you send
329 big blobs of data to your worker, "load" might have much less of an
330 effect.
331
332 high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
333 When your jobs are I/O bound, using more workers usually boils down
334 to higher throughput, depending very much on your actual workload -
335 sometimes having only one worker is best, for example, when you read
336 or write big files at maximum speed, as a second worker will
337 increase seek times.
338
339 EXCEPTIONS
340 The same "policy" as with AnyEvent::Fork::RPC applies - exceptions will
341 not be caught, and exceptions in both worker and in callbacks causes
342 undesirable or undefined behaviour.
343
344 SEE ALSO
345 AnyEvent::Fork, to create the processes in the first place.
346
347 AnyEvent::Fork::Remote, likewise, but helpful for remote processes.
348
349 AnyEvent::Fork::RPC, which implements the RPC protocol and API.
350
351 AUTHOR AND CONTACT INFORMATION
352 Marc Lehmann <schmorp@schmorp.de>
353 http://software.schmorp.de/pkg/AnyEvent-Fork-Pool
354