ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-Pool/README
(Generate patch)

Comparing AnyEvent-Fork-Pool/README (file contents):
Revision 1.1 by root, Thu Apr 18 14:24:01 2013 UTC vs.
Revision 1.2 by root, Sun Apr 21 12:28:34 2013 UTC

1NAME
2 AnyEvent::Fork::Pool - simple process pool manager on top of
3 AnyEvent::Fork
4
5 THE API IS NOT FINISHED, CONSIDER THIS AN ALPHA RELEASE
6
7SYNOPSIS
8 use AnyEvent;
9 use AnyEvent::Fork::Pool;
10 # use AnyEvent::Fork is not needed
11
12 # all possible parameters shown, with default values
13 my $pool = AnyEvent::Fork
14 ->new
15 ->require ("MyWorker")
16 ->AnyEvent::Fork::Pool::run (
17 "MyWorker::run", # the worker function
18
19 # pool management
20 max => 4, # absolute maximum # of processes
21 idle => 0, # minimum # of idle processes
22 load => 2, # queue at most this number of jobs per process
23 start => 0.1, # wait this many seconds before starting a new process
24 stop => 10, # wait this many seconds before stopping an idle process
25 on_destroy => (my $finish = AE::cv), # called when object is destroyed
26
27 # parameters passed to AnyEvent::Fork::RPC
28 async => 0,
29 on_error => sub { die "FATAL: $_[0]\n" },
30 on_event => sub { my @ev = @_ },
31 init => "MyWorker::init",
32 serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER,
33 );
34
35 for (1..10) {
36 $pool->(doit => $_, sub {
37 print "MyWorker::run returned @_\n";
38 });
39 }
40
41 undef $pool;
42
43 $finish->recv;
44
45DESCRIPTION
46 This module uses processes created via AnyEvent::Fork and the RPC
47 protocol implement in AnyEvent::Fork::RPC to create a load-balanced pool
48 of processes that handles jobs.
49
50 Understanding of AnyEvent::Fork is helpful but not critical to be able
51 to use this module, but a thorough understanding of AnyEvent::Fork::RPC
52 is, as it defines the actual API that needs to be implemented in the
53 worker processes.
54
55EXAMPLES
56PARENT USAGE
57 To create a pool, you first have to create a AnyEvent::Fork object -
58 this object becomes your template process. Whenever a new worker process
59 is needed, it is forked from this template process. Then you need to
60 "hand off" this template process to the "AnyEvent::Fork::Pool" module by
61 calling its run method on it:
62
63 my $template = AnyEvent::Fork
64 ->new
65 ->require ("SomeModule", "MyWorkerModule");
66
67 my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
68
69 The pool "object" is not a regular Perl object, but a code reference
70 that you can call and that works roughly like calling the worker
71 function directly, except that it returns nothing but instead you need
72 to specify a callback to be invoked once results are in:
73
74 $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
75
76 my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
77 The traditional way to call the pool creation function. But it is
78 way cooler to call it in the following way:
79
80 my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key =>
81 value...])
82 Creates a new pool object with the specified $function as function
83 (name) to call for each request. The pool uses the $fork object as
84 the template when creating worker processes.
85
86 You can supply your own template process, or tell
87 "AnyEvent::Fork::Pool" to create one.
88
89 A relatively large number of key/value pairs can be specified to
90 influence the behaviour. They are grouped into the categories "pool
91 management", "template process" and "rpc parameters".
92
93 Pool Management
94 The pool consists of a certain number of worker processes. These
95 options decide how many of these processes exist and when they
96 are started and stopped.
97
98 The worker pool is dynamically resized, according to (perceived
99 :) load. The minimum size is given by the "idle" parameter and
100 the maximum size is given by the "max" parameter. A new worker
101 is started every "start" seconds at most, and an idle worker is
102 stopped at most every "stop" second.
103
104 You can specify the amount of jobs sent to a worker concurrently
105 using the "load" parameter.
106
107 idle => $count (default: 0)
108 The minimum amount of idle processes in the pool - when
109 there are fewer than this many idle workers,
110 "AnyEvent::Fork::Pool" will try to start new ones, subject
111 to the limits set by "max" and "start".
112
113 This is also the initial amount of workers in the pool. The
114 default of zero means that the pool starts empty and can
115 shrink back to zero workers over time.
116
117 max => $count (default: 4)
118 The maximum number of processes in the pool, in addition to
119 the template process. "AnyEvent::Fork::Pool" will never have
120 more than this number of worker processes, although there
121 can be more temporarily when a worker is shut down and
122 hasn't exited yet.
123
124 load => $count (default: 2)
125 The maximum number of concurrent jobs sent to a single
126 worker process.
127
128 Jobs that cannot be sent to a worker immediately (because
129 all workers are busy) will be queued until a worker is
130 available.
131
132 Setting this low improves latency. For example, at 1, every
133 job that is sent to a worker is sent to a completely idle
134 worker that doesn't run any other jobs. The downside is that
135 throughput is reduced - a worker that finishes a job needs
136 to wait for a new job from the parent.
137
138 The default of 2 is usually a good compromise.
139
140 start => $seconds (default: 0.1)
141 When there are fewer than "idle" workers (or all workers are
142 completely busy), then a timer is started. If the timer
143 elapses and there are still jobs that cannot be queued to a
144 worker, a new worker is started.
145
146 This sets the minimum time that all workers must be busy
147 before a new worker is started. Or, put differently, the
148 minimum delay between starting new workers.
149
150 The delay is small by default, which means new workers will
151 be started relatively quickly. A delay of 0 is possible, and
152 ensures that the pool will grow as quickly as possible under
153 load.
154
155 Non-zero values are useful to avoid "exploding" a pool
156 because a lot of jobs are queued in an instant.
157
158 Higher values are often useful to improve efficiency at the
159 cost of latency - when fewer processes can do the job over
160 time, starting more and more is not necessarily going to
161 help.
162
163 stop => $seconds (default: 10)
164 When a worker has no jobs to execute it becomes idle. An
165 idle worker that hasn't executed a job within this amount of
166 time will be stopped, unless the other parameters say
167 otherwise.
168
169 Setting this to a very high value means that workers stay
170 around longer, even when they have nothing to do, which can
171 be good as they don't have to be started on the netx load
172 spike again.
173
174 Setting this to a lower value can be useful to avoid memory
175 or simply process table wastage.
176
177 Usually, setting this to a time longer than the time between
178 load spikes is best - if you expect a lot of requests every
179 minute and little work in between, setting this to longer
180 than a minute avoids having to stop and start workers. On
181 the other hand, you have to ask yourself if letting workers
182 run idle is a good use of your resources. Try to find a good
183 balance between resource usage of your workers and the time
184 to start new workers - the processes created by
185 AnyEvent::Fork itself is fats at creating workers while not
186 using much memory for them, so most of the overhead is
187 likely from your own code.
188
189 on_destroy => $callback->() (default: none)
190 When a pool object goes out of scope, the outstanding
191 requests are still handled till completion. Only after
192 handling all jobs will the workers be destroyed (and also
193 the template process if it isn't referenced otherwise).
194
195 To find out when a pool *really* has finished its work, you
196 can set this callback, which will be called when the pool
197 has been destroyed.
198
199 AnyEvent::Fork::RPC Parameters
200 These parameters are all passed more or less directly to
201 AnyEvent::Fork::RPC. They are only briefly mentioned here, for
202 their full documentation please refer to the AnyEvent::Fork::RPC
203 documentation. Also, the default values mentioned here are only
204 documented as a best effort - the AnyEvent::Fork::RPC
205 documentation is binding.
206
207 async => $boolean (default: 0)
208 Whether to use the synchronous or asynchronous RPC backend.
209
210 on_error => $callback->($message) (default: die with message)
211 The callback to call on any (fatal) errors.
212
213 on_event => $callback->(...) (default: "sub { }", unlike
214 AnyEvent::Fork::RPC)
215 The callback to invoke on events.
216
217 init => $initfunction (default: none)
218 The function to call in the child, once before handling
219 requests.
220
221 serialiser => $serialiser (defailt:
222 $AnyEvent::Fork::RPC::STRING_SERIALISER)
223 The serialiser to use.
224
225 $pool->(..., $cb->(...))
226 Call the RPC function of a worker with the given arguments, and when
227 the worker is done, call the $cb with the results, just like calling
228 the RPC object durectly - see the AnyEvent::Fork::RPC documentation
229 for details on the RPC API.
230
231 If there is no free worker, the call will be queued until a worker
232 becomes available.
233
234 Note that there can be considerable time between calling this method
235 and the call actually being executed. During this time, the
236 parameters passed to this function are effectively read-only -
237 modifying them after the call and before the callback is invoked
238 causes undefined behaviour.
239
240CHILD USAGE
241 In addition to the AnyEvent::Fork::RPC API, this module implements one
242 more child-side function:
243
244 AnyEvent::Fork::Pool::retire ()
245 This function sends an event to the parent process to request
246 retirement: the worker is removed from the pool and no new jobs will
247 be sent to it, but it has to handle the jobs that are already
248 queued.
249
250 The parentheses are part of the syntax: the function usually isn't
251 defined when you compile your code (because that happens *before*
252 handing the template process over to "AnyEvent::Fork::Pool::run", so
253 you need the empty parentheses to tell Perl that the function is
254 indeed a function.
255
256 Retiring a worker can be useful to gracefully shut it down when the
257 worker deems this useful. For example, after executing a job, one
258 could check the process size or the number of jobs handled so far,
259 and if either is too high, the worker could ask to get retired, to
260 avoid memory leaks to accumulate.
261
262POOL PARAMETERS RECIPES
263 This section describes some recipes for pool paramaters. These are
264 mostly meant for the synchronous RPC backend, as the asynchronous RPC
265 backend changes the rules considerably, making workers themselves
266 responsible for their scheduling.
267
268 low latency - set load = 1
269 If you need a deterministic low latency, you should set the "load"
270 parameter to 1. This ensures that never more than one job is sent to
271 each worker. This avoids having to wait for a previous job to
272 finish.
273
274 This makes most sense with the synchronous (default) backend, as the
275 asynchronous backend can handle multiple requests concurrently.
276
277 lowest latency - set load = 1 and idle = max
278 To achieve the lowest latency, you additionally should disable any
279 dynamic resizing of the pool by setting "idle" to the same value as
280 "max".
281
282 high throughput, cpu bound jobs - set load >= 2, max = #cpus
283 To get high throughput with cpu-bound jobs, you should set the
284 maximum pool size to the number of cpus in your system, and "load"
285 to at least 2, to make sure there can be another job waiting for the
286 worker when it has finished one.
287
288 The value of 2 for "load" is the minimum value that *can* achieve
289 100% throughput, but if your parent process itself is sometimes
290 busy, you might need higher values. Also there is a limit on the
291 amount of data that can be "in flight" to the worker, so if you send
292 big blobs of data to your worker, "load" might have much less of an
293 effect.
294
295 high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
296 When your jobs are I/O bound, using more workers usually boils down
297 to higher throughput, depending very much on your actual workload -
298 sometimes having only one worker is best, for example, when you read
299 or write big files at maixmum speed, as a second worker will
300 increase seek times.
301
302EXCEPTIONS
303 The same "policy" as with AnyEvent::Fork::RPC applies - exceptins will
304 not be caught, and exceptions in both worker and in callbacks causes
305 undesirable or undefined behaviour.
306
307SEE ALSO
308 AnyEvent::Fork, to create the processes in the first place.
309
310 AnyEvent::Fork::RPC, which implements the RPC protocol and API.
311
312AUTHOR AND CONTACT INFORMATION
313 Marc Lehmann <schmorp@schmorp.de>
314 http://software.schmorp.de/pkg/AnyEvent-Fork-Pool
315

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines