1 |
NAME |
2 |
AnyEvent::Fork::Pool - simple process pool manager on top of |
3 |
AnyEvent::Fork |
4 |
|
5 |
THE API IS NOT FINISHED, CONSIDER THIS AN ALPHA RELEASE |
6 |
|
7 |
SYNOPSIS |
8 |
use AnyEvent; |
9 |
use AnyEvent::Fork::Pool; |
10 |
# use AnyEvent::Fork is not needed |
11 |
|
12 |
# all possible parameters shown, with default values |
13 |
my $pool = AnyEvent::Fork |
14 |
->new |
15 |
->require ("MyWorker") |
16 |
->AnyEvent::Fork::Pool::run ( |
17 |
"MyWorker::run", # the worker function |
18 |
|
19 |
# pool management |
20 |
max => 4, # absolute maximum # of processes |
21 |
idle => 0, # minimum # of idle processes |
22 |
load => 2, # queue at most this number of jobs per process |
23 |
start => 0.1, # wait this many seconds before starting a new process |
24 |
stop => 10, # wait this many seconds before stopping an idle process |
25 |
on_destroy => (my $finish = AE::cv), # called when object is destroyed |
26 |
|
27 |
# parameters passed to AnyEvent::Fork::RPC |
28 |
async => 0, |
29 |
on_error => sub { die "FATAL: $_[0]\n" }, |
30 |
on_event => sub { my @ev = @_ }, |
31 |
init => "MyWorker::init", |
32 |
serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER, |
33 |
); |
34 |
|
35 |
for (1..10) { |
36 |
$pool->(doit => $_, sub { |
37 |
print "MyWorker::run returned @_\n"; |
38 |
}); |
39 |
} |
40 |
|
41 |
undef $pool; |
42 |
|
43 |
$finish->recv; |
44 |
|
45 |
DESCRIPTION |
46 |
This module uses processes created via AnyEvent::Fork and the RPC |
47 |
protocol implement in AnyEvent::Fork::RPC to create a load-balanced pool |
48 |
of processes that handles jobs. |
49 |
|
50 |
Understanding of AnyEvent::Fork is helpful but not critical to be able |
51 |
to use this module, but a thorough understanding of AnyEvent::Fork::RPC |
52 |
is, as it defines the actual API that needs to be implemented in the |
53 |
worker processes. |
54 |
|
55 |
EXAMPLES |
56 |
PARENT USAGE |
57 |
To create a pool, you first have to create a AnyEvent::Fork object - |
58 |
this object becomes your template process. Whenever a new worker process |
59 |
is needed, it is forked from this template process. Then you need to |
60 |
"hand off" this template process to the "AnyEvent::Fork::Pool" module by |
61 |
calling its run method on it: |
62 |
|
63 |
my $template = AnyEvent::Fork |
64 |
->new |
65 |
->require ("SomeModule", "MyWorkerModule"); |
66 |
|
67 |
my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction"); |
68 |
|
69 |
The pool "object" is not a regular Perl object, but a code reference |
70 |
that you can call and that works roughly like calling the worker |
71 |
function directly, except that it returns nothing but instead you need |
72 |
to specify a callback to be invoked once results are in: |
73 |
|
74 |
$pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" }); |
75 |
|
76 |
my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...] |
77 |
The traditional way to call the pool creation function. But it is |
78 |
way cooler to call it in the following way: |
79 |
|
80 |
my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => |
81 |
value...]) |
82 |
Creates a new pool object with the specified $function as function |
83 |
(name) to call for each request. The pool uses the $fork object as |
84 |
the template when creating worker processes. |
85 |
|
86 |
You can supply your own template process, or tell |
87 |
"AnyEvent::Fork::Pool" to create one. |
88 |
|
89 |
A relatively large number of key/value pairs can be specified to |
90 |
influence the behaviour. They are grouped into the categories "pool |
91 |
management", "template process" and "rpc parameters". |
92 |
|
93 |
Pool Management |
94 |
The pool consists of a certain number of worker processes. These |
95 |
options decide how many of these processes exist and when they |
96 |
are started and stopped. |
97 |
|
98 |
The worker pool is dynamically resized, according to (perceived |
99 |
:) load. The minimum size is given by the "idle" parameter and |
100 |
the maximum size is given by the "max" parameter. A new worker |
101 |
is started every "start" seconds at most, and an idle worker is |
102 |
stopped at most every "stop" second. |
103 |
|
104 |
You can specify the amount of jobs sent to a worker concurrently |
105 |
using the "load" parameter. |
106 |
|
107 |
idle => $count (default: 0) |
108 |
The minimum amount of idle processes in the pool - when |
109 |
there are fewer than this many idle workers, |
110 |
"AnyEvent::Fork::Pool" will try to start new ones, subject |
111 |
to the limits set by "max" and "start". |
112 |
|
113 |
This is also the initial amount of workers in the pool. The |
114 |
default of zero means that the pool starts empty and can |
115 |
shrink back to zero workers over time. |
116 |
|
117 |
max => $count (default: 4) |
118 |
The maximum number of processes in the pool, in addition to |
119 |
the template process. "AnyEvent::Fork::Pool" will never have |
120 |
more than this number of worker processes, although there |
121 |
can be more temporarily when a worker is shut down and |
122 |
hasn't exited yet. |
123 |
|
124 |
load => $count (default: 2) |
125 |
The maximum number of concurrent jobs sent to a single |
126 |
worker process. |
127 |
|
128 |
Jobs that cannot be sent to a worker immediately (because |
129 |
all workers are busy) will be queued until a worker is |
130 |
available. |
131 |
|
132 |
Setting this low improves latency. For example, at 1, every |
133 |
job that is sent to a worker is sent to a completely idle |
134 |
worker that doesn't run any other jobs. The downside is that |
135 |
throughput is reduced - a worker that finishes a job needs |
136 |
to wait for a new job from the parent. |
137 |
|
138 |
The default of 2 is usually a good compromise. |
139 |
|
140 |
start => $seconds (default: 0.1) |
141 |
When there are fewer than "idle" workers (or all workers are |
142 |
completely busy), then a timer is started. If the timer |
143 |
elapses and there are still jobs that cannot be queued to a |
144 |
worker, a new worker is started. |
145 |
|
146 |
This sets the minimum time that all workers must be busy |
147 |
before a new worker is started. Or, put differently, the |
148 |
minimum delay between starting new workers. |
149 |
|
150 |
The delay is small by default, which means new workers will |
151 |
be started relatively quickly. A delay of 0 is possible, and |
152 |
ensures that the pool will grow as quickly as possible under |
153 |
load. |
154 |
|
155 |
Non-zero values are useful to avoid "exploding" a pool |
156 |
because a lot of jobs are queued in an instant. |
157 |
|
158 |
Higher values are often useful to improve efficiency at the |
159 |
cost of latency - when fewer processes can do the job over |
160 |
time, starting more and more is not necessarily going to |
161 |
help. |
162 |
|
163 |
stop => $seconds (default: 10) |
164 |
When a worker has no jobs to execute it becomes idle. An |
165 |
idle worker that hasn't executed a job within this amount of |
166 |
time will be stopped, unless the other parameters say |
167 |
otherwise. |
168 |
|
169 |
Setting this to a very high value means that workers stay |
170 |
around longer, even when they have nothing to do, which can |
171 |
be good as they don't have to be started on the netx load |
172 |
spike again. |
173 |
|
174 |
Setting this to a lower value can be useful to avoid memory |
175 |
or simply process table wastage. |
176 |
|
177 |
Usually, setting this to a time longer than the time between |
178 |
load spikes is best - if you expect a lot of requests every |
179 |
minute and little work in between, setting this to longer |
180 |
than a minute avoids having to stop and start workers. On |
181 |
the other hand, you have to ask yourself if letting workers |
182 |
run idle is a good use of your resources. Try to find a good |
183 |
balance between resource usage of your workers and the time |
184 |
to start new workers - the processes created by |
185 |
AnyEvent::Fork itself is fats at creating workers while not |
186 |
using much memory for them, so most of the overhead is |
187 |
likely from your own code. |
188 |
|
189 |
on_destroy => $callback->() (default: none) |
190 |
When a pool object goes out of scope, the outstanding |
191 |
requests are still handled till completion. Only after |
192 |
handling all jobs will the workers be destroyed (and also |
193 |
the template process if it isn't referenced otherwise). |
194 |
|
195 |
To find out when a pool *really* has finished its work, you |
196 |
can set this callback, which will be called when the pool |
197 |
has been destroyed. |
198 |
|
199 |
AnyEvent::Fork::RPC Parameters |
200 |
These parameters are all passed more or less directly to |
201 |
AnyEvent::Fork::RPC. They are only briefly mentioned here, for |
202 |
their full documentation please refer to the AnyEvent::Fork::RPC |
203 |
documentation. Also, the default values mentioned here are only |
204 |
documented as a best effort - the AnyEvent::Fork::RPC |
205 |
documentation is binding. |
206 |
|
207 |
async => $boolean (default: 0) |
208 |
Whether to use the synchronous or asynchronous RPC backend. |
209 |
|
210 |
on_error => $callback->($message) (default: die with message) |
211 |
The callback to call on any (fatal) errors. |
212 |
|
213 |
on_event => $callback->(...) (default: "sub { }", unlike |
214 |
AnyEvent::Fork::RPC) |
215 |
The callback to invoke on events. |
216 |
|
217 |
init => $initfunction (default: none) |
218 |
The function to call in the child, once before handling |
219 |
requests. |
220 |
|
221 |
serialiser => $serialiser (defailt: |
222 |
$AnyEvent::Fork::RPC::STRING_SERIALISER) |
223 |
The serialiser to use. |
224 |
|
225 |
$pool->(..., $cb->(...)) |
226 |
Call the RPC function of a worker with the given arguments, and when |
227 |
the worker is done, call the $cb with the results, just like calling |
228 |
the RPC object durectly - see the AnyEvent::Fork::RPC documentation |
229 |
for details on the RPC API. |
230 |
|
231 |
If there is no free worker, the call will be queued until a worker |
232 |
becomes available. |
233 |
|
234 |
Note that there can be considerable time between calling this method |
235 |
and the call actually being executed. During this time, the |
236 |
parameters passed to this function are effectively read-only - |
237 |
modifying them after the call and before the callback is invoked |
238 |
causes undefined behaviour. |
239 |
|
240 |
$cpus = AnyEvent::Fork::Pool::ncpu [$default_cpus] |
241 |
($cpus, $eus) = AnyEvent::Fork::Pool::ncpu [$default_cpus] |
242 |
Tries to detect the number of CPUs ($cpus often called cpu cores |
243 |
nowadays) and execution units ($eus) which include e.g. extra |
244 |
hyperthreaded units). When $cpus cannot be determined reliably, |
245 |
$default_cpus is returned for both values, or 1 if it is missing. |
246 |
|
247 |
For normal CPU bound uses, it is wise to have as many worker |
248 |
processes as CPUs in the system ($cpus), if nothing else uses the |
249 |
CPU. Using hyperthreading is usually detrimental to performance, but |
250 |
in those rare cases where that really helps it might be beneficial |
251 |
to use more workers ($eus). |
252 |
|
253 |
Currently, /proc/cpuinfo is parsed on GNU/Linux systems for both |
254 |
$cpus and $eu, and on {Free,Net,Open}BSD, sysctl -n hw.ncpu is used |
255 |
for $cpus. |
256 |
|
257 |
Example: create a worker pool with as many workers as cpu cores, or |
258 |
2, if the actual number could not be determined. |
259 |
|
260 |
$fork->AnyEvent::Fork::Pool::run ("myworker::function", |
261 |
max => (scalar AnyEvent::Fork::Pool::ncpu 2), |
262 |
); |
263 |
|
264 |
CHILD USAGE |
265 |
In addition to the AnyEvent::Fork::RPC API, this module implements one |
266 |
more child-side function: |
267 |
|
268 |
AnyEvent::Fork::Pool::retire () |
269 |
This function sends an event to the parent process to request |
270 |
retirement: the worker is removed from the pool and no new jobs will |
271 |
be sent to it, but it has to handle the jobs that are already |
272 |
queued. |
273 |
|
274 |
The parentheses are part of the syntax: the function usually isn't |
275 |
defined when you compile your code (because that happens *before* |
276 |
handing the template process over to "AnyEvent::Fork::Pool::run", so |
277 |
you need the empty parentheses to tell Perl that the function is |
278 |
indeed a function. |
279 |
|
280 |
Retiring a worker can be useful to gracefully shut it down when the |
281 |
worker deems this useful. For example, after executing a job, one |
282 |
could check the process size or the number of jobs handled so far, |
283 |
and if either is too high, the worker could ask to get retired, to |
284 |
avoid memory leaks to accumulate. |
285 |
|
286 |
Example: retire a worker after it has handled roughly 100 requests. |
287 |
|
288 |
my $count = 0; |
289 |
|
290 |
sub my::worker { |
291 |
|
292 |
++$count == 100 |
293 |
and AnyEvent::Fork::Pool::retire (); |
294 |
|
295 |
... normal code goes here |
296 |
} |
297 |
|
298 |
POOL PARAMETERS RECIPES |
299 |
This section describes some recipes for pool paramaters. These are |
300 |
mostly meant for the synchronous RPC backend, as the asynchronous RPC |
301 |
backend changes the rules considerably, making workers themselves |
302 |
responsible for their scheduling. |
303 |
|
304 |
low latency - set load = 1 |
305 |
If you need a deterministic low latency, you should set the "load" |
306 |
parameter to 1. This ensures that never more than one job is sent to |
307 |
each worker. This avoids having to wait for a previous job to |
308 |
finish. |
309 |
|
310 |
This makes most sense with the synchronous (default) backend, as the |
311 |
asynchronous backend can handle multiple requests concurrently. |
312 |
|
313 |
lowest latency - set load = 1 and idle = max |
314 |
To achieve the lowest latency, you additionally should disable any |
315 |
dynamic resizing of the pool by setting "idle" to the same value as |
316 |
"max". |
317 |
|
318 |
high throughput, cpu bound jobs - set load >= 2, max = #cpus |
319 |
To get high throughput with cpu-bound jobs, you should set the |
320 |
maximum pool size to the number of cpus in your system, and "load" |
321 |
to at least 2, to make sure there can be another job waiting for the |
322 |
worker when it has finished one. |
323 |
|
324 |
The value of 2 for "load" is the minimum value that *can* achieve |
325 |
100% throughput, but if your parent process itself is sometimes |
326 |
busy, you might need higher values. Also there is a limit on the |
327 |
amount of data that can be "in flight" to the worker, so if you send |
328 |
big blobs of data to your worker, "load" might have much less of an |
329 |
effect. |
330 |
|
331 |
high throughput, I/O bound jobs - set load >= 2, max = 1, or very high |
332 |
When your jobs are I/O bound, using more workers usually boils down |
333 |
to higher throughput, depending very much on your actual workload - |
334 |
sometimes having only one worker is best, for example, when you read |
335 |
or write big files at maixmum speed, as a second worker will |
336 |
increase seek times. |
337 |
|
338 |
EXCEPTIONS |
339 |
The same "policy" as with AnyEvent::Fork::RPC applies - exceptins will |
340 |
not be caught, and exceptions in both worker and in callbacks causes |
341 |
undesirable or undefined behaviour. |
342 |
|
343 |
SEE ALSO |
344 |
AnyEvent::Fork, to create the processes in the first place. |
345 |
|
346 |
AnyEvent::Fork::RPC, which implements the RPC protocol and API. |
347 |
|
348 |
AUTHOR AND CONTACT INFORMATION |
349 |
Marc Lehmann <schmorp@schmorp.de> |
350 |
http://software.schmorp.de/pkg/AnyEvent-Fork-Pool |
351 |
|