|
|
1 | NAME |
|
|
2 | AnyEvent::Fork::Pool - simple process pool manager on top of |
|
|
3 | AnyEvent::Fork |
|
|
4 | |
|
|
5 | THE API IS NOT FINISHED, CONSIDER THIS AN ALPHA RELEASE |
|
|
6 | |
|
|
7 | SYNOPSIS |
|
|
8 | use AnyEvent; |
|
|
9 | use AnyEvent::Fork::Pool; |
|
|
10 | # use AnyEvent::Fork is not needed |
|
|
11 | |
|
|
12 | # all possible parameters shown, with default values |
|
|
13 | my $pool = AnyEvent::Fork |
|
|
14 | ->new |
|
|
15 | ->require ("MyWorker") |
|
|
16 | ->AnyEvent::Fork::Pool::run ( |
|
|
17 | "MyWorker::run", # the worker function |
|
|
18 | |
|
|
19 | # pool management |
|
|
20 | max => 4, # absolute maximum # of processes |
|
|
21 | idle => 0, # minimum # of idle processes |
|
|
22 | load => 2, # queue at most this number of jobs per process |
|
|
23 | start => 0.1, # wait this many seconds before starting a new process |
|
|
24 | stop => 10, # wait this many seconds before stopping an idle process |
|
|
25 | on_destroy => (my $finish = AE::cv), # called when object is destroyed |
|
|
26 | |
|
|
27 | # parameters passed to AnyEvent::Fork::RPC |
|
|
28 | async => 0, |
|
|
29 | on_error => sub { die "FATAL: $_[0]\n" }, |
|
|
30 | on_event => sub { my @ev = @_ }, |
|
|
31 | init => "MyWorker::init", |
|
|
32 | serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER, |
|
|
33 | ); |
|
|
34 | |
|
|
35 | for (1..10) { |
|
|
36 | $pool->(doit => $_, sub { |
|
|
37 | print "MyWorker::run returned @_\n"; |
|
|
38 | }); |
|
|
39 | } |
|
|
40 | |
|
|
41 | undef $pool; |
|
|
42 | |
|
|
43 | $finish->recv; |
|
|
44 | |
|
|
45 | DESCRIPTION |
|
|
46 | This module uses processes created via AnyEvent::Fork and the RPC |
|
|
47 | protocol implement in AnyEvent::Fork::RPC to create a load-balanced pool |
|
|
48 | of processes that handles jobs. |
|
|
49 | |
|
|
50 | Understanding of AnyEvent::Fork is helpful but not critical to be able |
|
|
51 | to use this module, but a thorough understanding of AnyEvent::Fork::RPC |
|
|
52 | is, as it defines the actual API that needs to be implemented in the |
|
|
53 | worker processes. |
|
|
54 | |
|
|
55 | EXAMPLES |
|
|
56 | PARENT USAGE |
|
|
57 | To create a pool, you first have to create a AnyEvent::Fork object - |
|
|
58 | this object becomes your template process. Whenever a new worker process |
|
|
59 | is needed, it is forked from this template process. Then you need to |
|
|
60 | "hand off" this template process to the "AnyEvent::Fork::Pool" module by |
|
|
61 | calling its run method on it: |
|
|
62 | |
|
|
63 | my $template = AnyEvent::Fork |
|
|
64 | ->new |
|
|
65 | ->require ("SomeModule", "MyWorkerModule"); |
|
|
66 | |
|
|
67 | my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction"); |
|
|
68 | |
|
|
69 | The pool "object" is not a regular Perl object, but a code reference |
|
|
70 | that you can call and that works roughly like calling the worker |
|
|
71 | function directly, except that it returns nothing but instead you need |
|
|
72 | to specify a callback to be invoked once results are in: |
|
|
73 | |
|
|
74 | $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" }); |
|
|
75 | |
|
|
76 | my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...] |
|
|
77 | The traditional way to call the pool creation function. But it is |
|
|
78 | way cooler to call it in the following way: |
|
|
79 | |
|
|
80 | my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => |
|
|
81 | value...]) |
|
|
82 | Creates a new pool object with the specified $function as function |
|
|
83 | (name) to call for each request. The pool uses the $fork object as |
|
|
84 | the template when creating worker processes. |
|
|
85 | |
|
|
86 | You can supply your own template process, or tell |
|
|
87 | "AnyEvent::Fork::Pool" to create one. |
|
|
88 | |
|
|
89 | A relatively large number of key/value pairs can be specified to |
|
|
90 | influence the behaviour. They are grouped into the categories "pool |
|
|
91 | management", "template process" and "rpc parameters". |
|
|
92 | |
|
|
93 | Pool Management |
|
|
94 | The pool consists of a certain number of worker processes. These |
|
|
95 | options decide how many of these processes exist and when they |
|
|
96 | are started and stopped. |
|
|
97 | |
|
|
98 | The worker pool is dynamically resized, according to (perceived |
|
|
99 | :) load. The minimum size is given by the "idle" parameter and |
|
|
100 | the maximum size is given by the "max" parameter. A new worker |
|
|
101 | is started every "start" seconds at most, and an idle worker is |
|
|
102 | stopped at most every "stop" second. |
|
|
103 | |
|
|
104 | You can specify the amount of jobs sent to a worker concurrently |
|
|
105 | using the "load" parameter. |
|
|
106 | |
|
|
107 | idle => $count (default: 0) |
|
|
108 | The minimum amount of idle processes in the pool - when |
|
|
109 | there are fewer than this many idle workers, |
|
|
110 | "AnyEvent::Fork::Pool" will try to start new ones, subject |
|
|
111 | to the limits set by "max" and "start". |
|
|
112 | |
|
|
113 | This is also the initial amount of workers in the pool. The |
|
|
114 | default of zero means that the pool starts empty and can |
|
|
115 | shrink back to zero workers over time. |
|
|
116 | |
|
|
117 | max => $count (default: 4) |
|
|
118 | The maximum number of processes in the pool, in addition to |
|
|
119 | the template process. "AnyEvent::Fork::Pool" will never have |
|
|
120 | more than this number of worker processes, although there |
|
|
121 | can be more temporarily when a worker is shut down and |
|
|
122 | hasn't exited yet. |
|
|
123 | |
|
|
124 | load => $count (default: 2) |
|
|
125 | The maximum number of concurrent jobs sent to a single |
|
|
126 | worker process. |
|
|
127 | |
|
|
128 | Jobs that cannot be sent to a worker immediately (because |
|
|
129 | all workers are busy) will be queued until a worker is |
|
|
130 | available. |
|
|
131 | |
|
|
132 | Setting this low improves latency. For example, at 1, every |
|
|
133 | job that is sent to a worker is sent to a completely idle |
|
|
134 | worker that doesn't run any other jobs. The downside is that |
|
|
135 | throughput is reduced - a worker that finishes a job needs |
|
|
136 | to wait for a new job from the parent. |
|
|
137 | |
|
|
138 | The default of 2 is usually a good compromise. |
|
|
139 | |
|
|
140 | start => $seconds (default: 0.1) |
|
|
141 | When there are fewer than "idle" workers (or all workers are |
|
|
142 | completely busy), then a timer is started. If the timer |
|
|
143 | elapses and there are still jobs that cannot be queued to a |
|
|
144 | worker, a new worker is started. |
|
|
145 | |
|
|
146 | This sets the minimum time that all workers must be busy |
|
|
147 | before a new worker is started. Or, put differently, the |
|
|
148 | minimum delay between starting new workers. |
|
|
149 | |
|
|
150 | The delay is small by default, which means new workers will |
|
|
151 | be started relatively quickly. A delay of 0 is possible, and |
|
|
152 | ensures that the pool will grow as quickly as possible under |
|
|
153 | load. |
|
|
154 | |
|
|
155 | Non-zero values are useful to avoid "exploding" a pool |
|
|
156 | because a lot of jobs are queued in an instant. |
|
|
157 | |
|
|
158 | Higher values are often useful to improve efficiency at the |
|
|
159 | cost of latency - when fewer processes can do the job over |
|
|
160 | time, starting more and more is not necessarily going to |
|
|
161 | help. |
|
|
162 | |
|
|
163 | stop => $seconds (default: 10) |
|
|
164 | When a worker has no jobs to execute it becomes idle. An |
|
|
165 | idle worker that hasn't executed a job within this amount of |
|
|
166 | time will be stopped, unless the other parameters say |
|
|
167 | otherwise. |
|
|
168 | |
|
|
169 | Setting this to a very high value means that workers stay |
|
|
170 | around longer, even when they have nothing to do, which can |
|
|
171 | be good as they don't have to be started on the netx load |
|
|
172 | spike again. |
|
|
173 | |
|
|
174 | Setting this to a lower value can be useful to avoid memory |
|
|
175 | or simply process table wastage. |
|
|
176 | |
|
|
177 | Usually, setting this to a time longer than the time between |
|
|
178 | load spikes is best - if you expect a lot of requests every |
|
|
179 | minute and little work in between, setting this to longer |
|
|
180 | than a minute avoids having to stop and start workers. On |
|
|
181 | the other hand, you have to ask yourself if letting workers |
|
|
182 | run idle is a good use of your resources. Try to find a good |
|
|
183 | balance between resource usage of your workers and the time |
|
|
184 | to start new workers - the processes created by |
|
|
185 | AnyEvent::Fork itself is fats at creating workers while not |
|
|
186 | using much memory for them, so most of the overhead is |
|
|
187 | likely from your own code. |
|
|
188 | |
|
|
189 | on_destroy => $callback->() (default: none) |
|
|
190 | When a pool object goes out of scope, the outstanding |
|
|
191 | requests are still handled till completion. Only after |
|
|
192 | handling all jobs will the workers be destroyed (and also |
|
|
193 | the template process if it isn't referenced otherwise). |
|
|
194 | |
|
|
195 | To find out when a pool *really* has finished its work, you |
|
|
196 | can set this callback, which will be called when the pool |
|
|
197 | has been destroyed. |
|
|
198 | |
|
|
199 | AnyEvent::Fork::RPC Parameters |
|
|
200 | These parameters are all passed more or less directly to |
|
|
201 | AnyEvent::Fork::RPC. They are only briefly mentioned here, for |
|
|
202 | their full documentation please refer to the AnyEvent::Fork::RPC |
|
|
203 | documentation. Also, the default values mentioned here are only |
|
|
204 | documented as a best effort - the AnyEvent::Fork::RPC |
|
|
205 | documentation is binding. |
|
|
206 | |
|
|
207 | async => $boolean (default: 0) |
|
|
208 | Whether to use the synchronous or asynchronous RPC backend. |
|
|
209 | |
|
|
210 | on_error => $callback->($message) (default: die with message) |
|
|
211 | The callback to call on any (fatal) errors. |
|
|
212 | |
|
|
213 | on_event => $callback->(...) (default: "sub { }", unlike |
|
|
214 | AnyEvent::Fork::RPC) |
|
|
215 | The callback to invoke on events. |
|
|
216 | |
|
|
217 | init => $initfunction (default: none) |
|
|
218 | The function to call in the child, once before handling |
|
|
219 | requests. |
|
|
220 | |
|
|
221 | serialiser => $serialiser (defailt: |
|
|
222 | $AnyEvent::Fork::RPC::STRING_SERIALISER) |
|
|
223 | The serialiser to use. |
|
|
224 | |
|
|
225 | $pool->(..., $cb->(...)) |
|
|
226 | Call the RPC function of a worker with the given arguments, and when |
|
|
227 | the worker is done, call the $cb with the results, just like calling |
|
|
228 | the RPC object durectly - see the AnyEvent::Fork::RPC documentation |
|
|
229 | for details on the RPC API. |
|
|
230 | |
|
|
231 | If there is no free worker, the call will be queued until a worker |
|
|
232 | becomes available. |
|
|
233 | |
|
|
234 | Note that there can be considerable time between calling this method |
|
|
235 | and the call actually being executed. During this time, the |
|
|
236 | parameters passed to this function are effectively read-only - |
|
|
237 | modifying them after the call and before the callback is invoked |
|
|
238 | causes undefined behaviour. |
|
|
239 | |
|
|
240 | CHILD USAGE |
|
|
241 | In addition to the AnyEvent::Fork::RPC API, this module implements one |
|
|
242 | more child-side function: |
|
|
243 | |
|
|
244 | AnyEvent::Fork::Pool::retire () |
|
|
245 | This function sends an event to the parent process to request |
|
|
246 | retirement: the worker is removed from the pool and no new jobs will |
|
|
247 | be sent to it, but it has to handle the jobs that are already |
|
|
248 | queued. |
|
|
249 | |
|
|
250 | The parentheses are part of the syntax: the function usually isn't |
|
|
251 | defined when you compile your code (because that happens *before* |
|
|
252 | handing the template process over to "AnyEvent::Fork::Pool::run", so |
|
|
253 | you need the empty parentheses to tell Perl that the function is |
|
|
254 | indeed a function. |
|
|
255 | |
|
|
256 | Retiring a worker can be useful to gracefully shut it down when the |
|
|
257 | worker deems this useful. For example, after executing a job, one |
|
|
258 | could check the process size or the number of jobs handled so far, |
|
|
259 | and if either is too high, the worker could ask to get retired, to |
|
|
260 | avoid memory leaks to accumulate. |
|
|
261 | |
|
|
262 | POOL PARAMETERS RECIPES |
|
|
263 | This section describes some recipes for pool paramaters. These are |
|
|
264 | mostly meant for the synchronous RPC backend, as the asynchronous RPC |
|
|
265 | backend changes the rules considerably, making workers themselves |
|
|
266 | responsible for their scheduling. |
|
|
267 | |
|
|
268 | low latency - set load = 1 |
|
|
269 | If you need a deterministic low latency, you should set the "load" |
|
|
270 | parameter to 1. This ensures that never more than one job is sent to |
|
|
271 | each worker. This avoids having to wait for a previous job to |
|
|
272 | finish. |
|
|
273 | |
|
|
274 | This makes most sense with the synchronous (default) backend, as the |
|
|
275 | asynchronous backend can handle multiple requests concurrently. |
|
|
276 | |
|
|
277 | lowest latency - set load = 1 and idle = max |
|
|
278 | To achieve the lowest latency, you additionally should disable any |
|
|
279 | dynamic resizing of the pool by setting "idle" to the same value as |
|
|
280 | "max". |
|
|
281 | |
|
|
282 | high throughput, cpu bound jobs - set load >= 2, max = #cpus |
|
|
283 | To get high throughput with cpu-bound jobs, you should set the |
|
|
284 | maximum pool size to the number of cpus in your system, and "load" |
|
|
285 | to at least 2, to make sure there can be another job waiting for the |
|
|
286 | worker when it has finished one. |
|
|
287 | |
|
|
288 | The value of 2 for "load" is the minimum value that *can* achieve |
|
|
289 | 100% throughput, but if your parent process itself is sometimes |
|
|
290 | busy, you might need higher values. Also there is a limit on the |
|
|
291 | amount of data that can be "in flight" to the worker, so if you send |
|
|
292 | big blobs of data to your worker, "load" might have much less of an |
|
|
293 | effect. |
|
|
294 | |
|
|
295 | high throughput, I/O bound jobs - set load >= 2, max = 1, or very high |
|
|
296 | When your jobs are I/O bound, using more workers usually boils down |
|
|
297 | to higher throughput, depending very much on your actual workload - |
|
|
298 | sometimes having only one worker is best, for example, when you read |
|
|
299 | or write big files at maixmum speed, as a second worker will |
|
|
300 | increase seek times. |
|
|
301 | |
|
|
302 | EXCEPTIONS |
|
|
303 | The same "policy" as with AnyEvent::Fork::RPC applies - exceptins will |
|
|
304 | not be caught, and exceptions in both worker and in callbacks causes |
|
|
305 | undesirable or undefined behaviour. |
|
|
306 | |
|
|
307 | SEE ALSO |
|
|
308 | AnyEvent::Fork, to create the processes in the first place. |
|
|
309 | |
|
|
310 | AnyEvent::Fork::RPC, which implements the RPC protocol and API. |
|
|
311 | |
|
|
312 | AUTHOR AND CONTACT INFORMATION |
|
|
313 | Marc Lehmann <schmorp@schmorp.de> |
|
|
314 | http://software.schmorp.de/pkg/AnyEvent-Fork-Pool |
|
|
315 | |