ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-Pool/README
Revision: 1.3
Committed: Sun Apr 28 14:19:22 2013 UTC (11 years ago) by root
Branch: MAIN
CVS Tags: rel-1_1, rel-1_2
Changes since 1.2: +36 -0 lines
Log Message:
1.1

File Contents

# User Rev Content
1 root 1.2 NAME
2     AnyEvent::Fork::Pool - simple process pool manager on top of
3     AnyEvent::Fork
4    
5     THE API IS NOT FINISHED, CONSIDER THIS AN ALPHA RELEASE
6    
7     SYNOPSIS
8     use AnyEvent;
9     use AnyEvent::Fork::Pool;
10     # use AnyEvent::Fork is not needed
11    
12     # all possible parameters shown, with default values
13     my $pool = AnyEvent::Fork
14     ->new
15     ->require ("MyWorker")
16     ->AnyEvent::Fork::Pool::run (
17     "MyWorker::run", # the worker function
18    
19     # pool management
20     max => 4, # absolute maximum # of processes
21     idle => 0, # minimum # of idle processes
22     load => 2, # queue at most this number of jobs per process
23     start => 0.1, # wait this many seconds before starting a new process
24     stop => 10, # wait this many seconds before stopping an idle process
25     on_destroy => (my $finish = AE::cv), # called when object is destroyed
26    
27     # parameters passed to AnyEvent::Fork::RPC
28     async => 0,
29     on_error => sub { die "FATAL: $_[0]\n" },
30     on_event => sub { my @ev = @_ },
31     init => "MyWorker::init",
32     serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER,
33     );
34    
35     for (1..10) {
36     $pool->(doit => $_, sub {
37     print "MyWorker::run returned @_\n";
38     });
39     }
40    
41     undef $pool;
42    
43     $finish->recv;
44    
45     DESCRIPTION
46     This module uses processes created via AnyEvent::Fork and the RPC
47     protocol implement in AnyEvent::Fork::RPC to create a load-balanced pool
48     of processes that handles jobs.
49    
50     Understanding of AnyEvent::Fork is helpful but not critical to be able
51     to use this module, but a thorough understanding of AnyEvent::Fork::RPC
52     is, as it defines the actual API that needs to be implemented in the
53     worker processes.
54    
55     EXAMPLES
56     PARENT USAGE
57     To create a pool, you first have to create a AnyEvent::Fork object -
58     this object becomes your template process. Whenever a new worker process
59     is needed, it is forked from this template process. Then you need to
60     "hand off" this template process to the "AnyEvent::Fork::Pool" module by
61     calling its run method on it:
62    
63     my $template = AnyEvent::Fork
64     ->new
65     ->require ("SomeModule", "MyWorkerModule");
66    
67     my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
68    
69     The pool "object" is not a regular Perl object, but a code reference
70     that you can call and that works roughly like calling the worker
71     function directly, except that it returns nothing but instead you need
72     to specify a callback to be invoked once results are in:
73    
74     $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
75    
76     my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
77     The traditional way to call the pool creation function. But it is
78     way cooler to call it in the following way:
79    
80     my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key =>
81     value...])
82     Creates a new pool object with the specified $function as function
83     (name) to call for each request. The pool uses the $fork object as
84     the template when creating worker processes.
85    
86     You can supply your own template process, or tell
87     "AnyEvent::Fork::Pool" to create one.
88    
89     A relatively large number of key/value pairs can be specified to
90     influence the behaviour. They are grouped into the categories "pool
91     management", "template process" and "rpc parameters".
92    
93     Pool Management
94     The pool consists of a certain number of worker processes. These
95     options decide how many of these processes exist and when they
96     are started and stopped.
97    
98     The worker pool is dynamically resized, according to (perceived
99     :) load. The minimum size is given by the "idle" parameter and
100     the maximum size is given by the "max" parameter. A new worker
101     is started every "start" seconds at most, and an idle worker is
102     stopped at most every "stop" second.
103    
104     You can specify the amount of jobs sent to a worker concurrently
105     using the "load" parameter.
106    
107     idle => $count (default: 0)
108     The minimum amount of idle processes in the pool - when
109     there are fewer than this many idle workers,
110     "AnyEvent::Fork::Pool" will try to start new ones, subject
111     to the limits set by "max" and "start".
112    
113     This is also the initial amount of workers in the pool. The
114     default of zero means that the pool starts empty and can
115     shrink back to zero workers over time.
116    
117     max => $count (default: 4)
118     The maximum number of processes in the pool, in addition to
119     the template process. "AnyEvent::Fork::Pool" will never have
120     more than this number of worker processes, although there
121     can be more temporarily when a worker is shut down and
122     hasn't exited yet.
123    
124     load => $count (default: 2)
125     The maximum number of concurrent jobs sent to a single
126     worker process.
127    
128     Jobs that cannot be sent to a worker immediately (because
129     all workers are busy) will be queued until a worker is
130     available.
131    
132     Setting this low improves latency. For example, at 1, every
133     job that is sent to a worker is sent to a completely idle
134     worker that doesn't run any other jobs. The downside is that
135     throughput is reduced - a worker that finishes a job needs
136     to wait for a new job from the parent.
137    
138     The default of 2 is usually a good compromise.
139    
140     start => $seconds (default: 0.1)
141     When there are fewer than "idle" workers (or all workers are
142     completely busy), then a timer is started. If the timer
143     elapses and there are still jobs that cannot be queued to a
144     worker, a new worker is started.
145    
146     This sets the minimum time that all workers must be busy
147     before a new worker is started. Or, put differently, the
148     minimum delay between starting new workers.
149    
150     The delay is small by default, which means new workers will
151     be started relatively quickly. A delay of 0 is possible, and
152     ensures that the pool will grow as quickly as possible under
153     load.
154    
155     Non-zero values are useful to avoid "exploding" a pool
156     because a lot of jobs are queued in an instant.
157    
158     Higher values are often useful to improve efficiency at the
159     cost of latency - when fewer processes can do the job over
160     time, starting more and more is not necessarily going to
161     help.
162    
163     stop => $seconds (default: 10)
164     When a worker has no jobs to execute it becomes idle. An
165     idle worker that hasn't executed a job within this amount of
166     time will be stopped, unless the other parameters say
167     otherwise.
168    
169     Setting this to a very high value means that workers stay
170     around longer, even when they have nothing to do, which can
171     be good as they don't have to be started on the netx load
172     spike again.
173    
174     Setting this to a lower value can be useful to avoid memory
175     or simply process table wastage.
176    
177     Usually, setting this to a time longer than the time between
178     load spikes is best - if you expect a lot of requests every
179     minute and little work in between, setting this to longer
180     than a minute avoids having to stop and start workers. On
181     the other hand, you have to ask yourself if letting workers
182     run idle is a good use of your resources. Try to find a good
183     balance between resource usage of your workers and the time
184     to start new workers - the processes created by
185     AnyEvent::Fork itself is fats at creating workers while not
186     using much memory for them, so most of the overhead is
187     likely from your own code.
188    
189     on_destroy => $callback->() (default: none)
190     When a pool object goes out of scope, the outstanding
191     requests are still handled till completion. Only after
192     handling all jobs will the workers be destroyed (and also
193     the template process if it isn't referenced otherwise).
194    
195     To find out when a pool *really* has finished its work, you
196     can set this callback, which will be called when the pool
197     has been destroyed.
198    
199     AnyEvent::Fork::RPC Parameters
200     These parameters are all passed more or less directly to
201     AnyEvent::Fork::RPC. They are only briefly mentioned here, for
202     their full documentation please refer to the AnyEvent::Fork::RPC
203     documentation. Also, the default values mentioned here are only
204     documented as a best effort - the AnyEvent::Fork::RPC
205     documentation is binding.
206    
207     async => $boolean (default: 0)
208     Whether to use the synchronous or asynchronous RPC backend.
209    
210     on_error => $callback->($message) (default: die with message)
211     The callback to call on any (fatal) errors.
212    
213     on_event => $callback->(...) (default: "sub { }", unlike
214     AnyEvent::Fork::RPC)
215     The callback to invoke on events.
216    
217     init => $initfunction (default: none)
218     The function to call in the child, once before handling
219     requests.
220    
221     serialiser => $serialiser (defailt:
222     $AnyEvent::Fork::RPC::STRING_SERIALISER)
223     The serialiser to use.
224    
225     $pool->(..., $cb->(...))
226     Call the RPC function of a worker with the given arguments, and when
227     the worker is done, call the $cb with the results, just like calling
228     the RPC object durectly - see the AnyEvent::Fork::RPC documentation
229     for details on the RPC API.
230    
231     If there is no free worker, the call will be queued until a worker
232     becomes available.
233    
234     Note that there can be considerable time between calling this method
235     and the call actually being executed. During this time, the
236     parameters passed to this function are effectively read-only -
237     modifying them after the call and before the callback is invoked
238     causes undefined behaviour.
239    
240 root 1.3 $cpus = AnyEvent::Fork::Pool::ncpu [$default_cpus]
241     ($cpus, $eus) = AnyEvent::Fork::Pool::ncpu [$default_cpus]
242     Tries to detect the number of CPUs ($cpus often called cpu cores
243     nowadays) and execution units ($eus) which include e.g. extra
244     hyperthreaded units). When $cpus cannot be determined reliably,
245     $default_cpus is returned for both values, or 1 if it is missing.
246    
247     For normal CPU bound uses, it is wise to have as many worker
248     processes as CPUs in the system ($cpus), if nothing else uses the
249     CPU. Using hyperthreading is usually detrimental to performance, but
250     in those rare cases where that really helps it might be beneficial
251     to use more workers ($eus).
252    
253     Currently, /proc/cpuinfo is parsed on GNU/Linux systems for both
254     $cpus and $eu, and on {Free,Net,Open}BSD, sysctl -n hw.ncpu is used
255     for $cpus.
256    
257     Example: create a worker pool with as many workers as cpu cores, or
258     2, if the actual number could not be determined.
259    
260     $fork->AnyEvent::Fork::Pool::run ("myworker::function",
261     max => (scalar AnyEvent::Fork::Pool::ncpu 2),
262     );
263    
264 root 1.2 CHILD USAGE
265     In addition to the AnyEvent::Fork::RPC API, this module implements one
266     more child-side function:
267    
268     AnyEvent::Fork::Pool::retire ()
269     This function sends an event to the parent process to request
270     retirement: the worker is removed from the pool and no new jobs will
271     be sent to it, but it has to handle the jobs that are already
272     queued.
273    
274     The parentheses are part of the syntax: the function usually isn't
275     defined when you compile your code (because that happens *before*
276     handing the template process over to "AnyEvent::Fork::Pool::run", so
277     you need the empty parentheses to tell Perl that the function is
278     indeed a function.
279    
280     Retiring a worker can be useful to gracefully shut it down when the
281     worker deems this useful. For example, after executing a job, one
282     could check the process size or the number of jobs handled so far,
283     and if either is too high, the worker could ask to get retired, to
284     avoid memory leaks to accumulate.
285    
286 root 1.3 Example: retire a worker after it has handled roughly 100 requests.
287    
288     my $count = 0;
289    
290     sub my::worker {
291    
292     ++$count == 100
293     and AnyEvent::Fork::Pool::retire ();
294    
295     ... normal code goes here
296     }
297    
298 root 1.2 POOL PARAMETERS RECIPES
299     This section describes some recipes for pool paramaters. These are
300     mostly meant for the synchronous RPC backend, as the asynchronous RPC
301     backend changes the rules considerably, making workers themselves
302     responsible for their scheduling.
303    
304     low latency - set load = 1
305     If you need a deterministic low latency, you should set the "load"
306     parameter to 1. This ensures that never more than one job is sent to
307     each worker. This avoids having to wait for a previous job to
308     finish.
309    
310     This makes most sense with the synchronous (default) backend, as the
311     asynchronous backend can handle multiple requests concurrently.
312    
313     lowest latency - set load = 1 and idle = max
314     To achieve the lowest latency, you additionally should disable any
315     dynamic resizing of the pool by setting "idle" to the same value as
316     "max".
317    
318     high throughput, cpu bound jobs - set load >= 2, max = #cpus
319     To get high throughput with cpu-bound jobs, you should set the
320     maximum pool size to the number of cpus in your system, and "load"
321     to at least 2, to make sure there can be another job waiting for the
322     worker when it has finished one.
323    
324     The value of 2 for "load" is the minimum value that *can* achieve
325     100% throughput, but if your parent process itself is sometimes
326     busy, you might need higher values. Also there is a limit on the
327     amount of data that can be "in flight" to the worker, so if you send
328     big blobs of data to your worker, "load" might have much less of an
329     effect.
330    
331     high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
332     When your jobs are I/O bound, using more workers usually boils down
333     to higher throughput, depending very much on your actual workload -
334     sometimes having only one worker is best, for example, when you read
335     or write big files at maixmum speed, as a second worker will
336     increase seek times.
337    
338     EXCEPTIONS
339     The same "policy" as with AnyEvent::Fork::RPC applies - exceptins will
340     not be caught, and exceptions in both worker and in callbacks causes
341     undesirable or undefined behaviour.
342    
343     SEE ALSO
344     AnyEvent::Fork, to create the processes in the first place.
345    
346     AnyEvent::Fork::RPC, which implements the RPC protocol and API.
347    
348     AUTHOR AND CONTACT INFORMATION
349     Marc Lehmann <schmorp@schmorp.de>
350     http://software.schmorp.de/pkg/AnyEvent-Fork-Pool
351