ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-Pool/README
Revision: 1.5
Committed: Thu Oct 27 07:27:57 2022 UTC (18 months, 3 weeks ago) by root
Branch: MAIN
CVS Tags: rel-1_3, HEAD
Changes since 1.4: +4 -4 lines
Log Message:
1.3

File Contents

# User Rev Content
1 root 1.2 NAME
2     AnyEvent::Fork::Pool - simple process pool manager on top of
3     AnyEvent::Fork
4    
5     SYNOPSIS
6     use AnyEvent;
7 root 1.4 use AnyEvent::Fork;
8 root 1.2 use AnyEvent::Fork::Pool;
9    
10     # all possible parameters shown, with default values
11     my $pool = AnyEvent::Fork
12     ->new
13     ->require ("MyWorker")
14     ->AnyEvent::Fork::Pool::run (
15     "MyWorker::run", # the worker function
16    
17     # pool management
18     max => 4, # absolute maximum # of processes
19     idle => 0, # minimum # of idle processes
20     load => 2, # queue at most this number of jobs per process
21     start => 0.1, # wait this many seconds before starting a new process
22     stop => 10, # wait this many seconds before stopping an idle process
23     on_destroy => (my $finish = AE::cv), # called when object is destroyed
24    
25     # parameters passed to AnyEvent::Fork::RPC
26     async => 0,
27     on_error => sub { die "FATAL: $_[0]\n" },
28     on_event => sub { my @ev = @_ },
29     init => "MyWorker::init",
30     serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER,
31     );
32    
33     for (1..10) {
34     $pool->(doit => $_, sub {
35     print "MyWorker::run returned @_\n";
36     });
37     }
38    
39     undef $pool;
40    
41     $finish->recv;
42    
43     DESCRIPTION
44 root 1.4 This module uses processes created via AnyEvent::Fork (or
45     AnyEvent::Fork::Remote) and the RPC protocol implement in
46     AnyEvent::Fork::RPC to create a load-balanced pool of processes that
47     handles jobs.
48 root 1.2
49 root 1.5 Understanding AnyEvent::Fork is helpful but not required to use this
50     module, but a thorough understanding of AnyEvent::Fork::RPC is, as it
51     defines the actual API that needs to be implemented in the worker
52     processes.
53 root 1.2
54     PARENT USAGE
55     To create a pool, you first have to create a AnyEvent::Fork object -
56     this object becomes your template process. Whenever a new worker process
57     is needed, it is forked from this template process. Then you need to
58     "hand off" this template process to the "AnyEvent::Fork::Pool" module by
59     calling its run method on it:
60    
61     my $template = AnyEvent::Fork
62     ->new
63     ->require ("SomeModule", "MyWorkerModule");
64    
65     my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
66    
67     The pool "object" is not a regular Perl object, but a code reference
68     that you can call and that works roughly like calling the worker
69     function directly, except that it returns nothing but instead you need
70     to specify a callback to be invoked once results are in:
71    
72     $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
73    
74     my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
75     The traditional way to call the pool creation function. But it is
76     way cooler to call it in the following way:
77    
78     my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key =>
79     value...])
80     Creates a new pool object with the specified $function as function
81     (name) to call for each request. The pool uses the $fork object as
82     the template when creating worker processes.
83    
84     You can supply your own template process, or tell
85     "AnyEvent::Fork::Pool" to create one.
86    
87     A relatively large number of key/value pairs can be specified to
88     influence the behaviour. They are grouped into the categories "pool
89     management", "template process" and "rpc parameters".
90    
91     Pool Management
92     The pool consists of a certain number of worker processes. These
93     options decide how many of these processes exist and when they
94     are started and stopped.
95    
96     The worker pool is dynamically resized, according to (perceived
97     :) load. The minimum size is given by the "idle" parameter and
98     the maximum size is given by the "max" parameter. A new worker
99     is started every "start" seconds at most, and an idle worker is
100     stopped at most every "stop" second.
101    
102     You can specify the amount of jobs sent to a worker concurrently
103     using the "load" parameter.
104    
105     idle => $count (default: 0)
106     The minimum amount of idle processes in the pool - when
107     there are fewer than this many idle workers,
108     "AnyEvent::Fork::Pool" will try to start new ones, subject
109     to the limits set by "max" and "start".
110    
111     This is also the initial amount of workers in the pool. The
112     default of zero means that the pool starts empty and can
113     shrink back to zero workers over time.
114    
115     max => $count (default: 4)
116     The maximum number of processes in the pool, in addition to
117     the template process. "AnyEvent::Fork::Pool" will never have
118     more than this number of worker processes, although there
119     can be more temporarily when a worker is shut down and
120     hasn't exited yet.
121    
122     load => $count (default: 2)
123     The maximum number of concurrent jobs sent to a single
124     worker process.
125    
126     Jobs that cannot be sent to a worker immediately (because
127     all workers are busy) will be queued until a worker is
128     available.
129    
130     Setting this low improves latency. For example, at 1, every
131     job that is sent to a worker is sent to a completely idle
132     worker that doesn't run any other jobs. The downside is that
133     throughput is reduced - a worker that finishes a job needs
134     to wait for a new job from the parent.
135    
136     The default of 2 is usually a good compromise.
137    
138     start => $seconds (default: 0.1)
139     When there are fewer than "idle" workers (or all workers are
140     completely busy), then a timer is started. If the timer
141     elapses and there are still jobs that cannot be queued to a
142     worker, a new worker is started.
143    
144     This sets the minimum time that all workers must be busy
145     before a new worker is started. Or, put differently, the
146     minimum delay between starting new workers.
147    
148     The delay is small by default, which means new workers will
149     be started relatively quickly. A delay of 0 is possible, and
150     ensures that the pool will grow as quickly as possible under
151     load.
152    
153     Non-zero values are useful to avoid "exploding" a pool
154     because a lot of jobs are queued in an instant.
155    
156     Higher values are often useful to improve efficiency at the
157     cost of latency - when fewer processes can do the job over
158     time, starting more and more is not necessarily going to
159     help.
160    
161     stop => $seconds (default: 10)
162     When a worker has no jobs to execute it becomes idle. An
163     idle worker that hasn't executed a job within this amount of
164     time will be stopped, unless the other parameters say
165     otherwise.
166    
167     Setting this to a very high value means that workers stay
168     around longer, even when they have nothing to do, which can
169     be good as they don't have to be started on the netx load
170     spike again.
171    
172     Setting this to a lower value can be useful to avoid memory
173     or simply process table wastage.
174    
175     Usually, setting this to a time longer than the time between
176     load spikes is best - if you expect a lot of requests every
177     minute and little work in between, setting this to longer
178     than a minute avoids having to stop and start workers. On
179     the other hand, you have to ask yourself if letting workers
180     run idle is a good use of your resources. Try to find a good
181     balance between resource usage of your workers and the time
182     to start new workers - the processes created by
183     AnyEvent::Fork itself is fats at creating workers while not
184     using much memory for them, so most of the overhead is
185     likely from your own code.
186    
187     on_destroy => $callback->() (default: none)
188     When a pool object goes out of scope, the outstanding
189     requests are still handled till completion. Only after
190     handling all jobs will the workers be destroyed (and also
191     the template process if it isn't referenced otherwise).
192    
193     To find out when a pool *really* has finished its work, you
194     can set this callback, which will be called when the pool
195     has been destroyed.
196    
197     AnyEvent::Fork::RPC Parameters
198     These parameters are all passed more or less directly to
199     AnyEvent::Fork::RPC. They are only briefly mentioned here, for
200     their full documentation please refer to the AnyEvent::Fork::RPC
201     documentation. Also, the default values mentioned here are only
202     documented as a best effort - the AnyEvent::Fork::RPC
203     documentation is binding.
204    
205     async => $boolean (default: 0)
206     Whether to use the synchronous or asynchronous RPC backend.
207    
208     on_error => $callback->($message) (default: die with message)
209     The callback to call on any (fatal) errors.
210    
211     on_event => $callback->(...) (default: "sub { }", unlike
212     AnyEvent::Fork::RPC)
213     The callback to invoke on events.
214    
215     init => $initfunction (default: none)
216     The function to call in the child, once before handling
217     requests.
218    
219     serialiser => $serialiser (defailt:
220     $AnyEvent::Fork::RPC::STRING_SERIALISER)
221     The serialiser to use.
222    
223     $pool->(..., $cb->(...))
224     Call the RPC function of a worker with the given arguments, and when
225     the worker is done, call the $cb with the results, just like calling
226     the RPC object durectly - see the AnyEvent::Fork::RPC documentation
227     for details on the RPC API.
228    
229     If there is no free worker, the call will be queued until a worker
230     becomes available.
231    
232     Note that there can be considerable time between calling this method
233     and the call actually being executed. During this time, the
234     parameters passed to this function are effectively read-only -
235     modifying them after the call and before the callback is invoked
236     causes undefined behaviour.
237    
238 root 1.3 $cpus = AnyEvent::Fork::Pool::ncpu [$default_cpus]
239     ($cpus, $eus) = AnyEvent::Fork::Pool::ncpu [$default_cpus]
240 root 1.4 Tries to detect the number of CPUs ($cpus often called CPU cores
241 root 1.3 nowadays) and execution units ($eus) which include e.g. extra
242     hyperthreaded units). When $cpus cannot be determined reliably,
243     $default_cpus is returned for both values, or 1 if it is missing.
244    
245     For normal CPU bound uses, it is wise to have as many worker
246     processes as CPUs in the system ($cpus), if nothing else uses the
247     CPU. Using hyperthreading is usually detrimental to performance, but
248     in those rare cases where that really helps it might be beneficial
249     to use more workers ($eus).
250    
251     Currently, /proc/cpuinfo is parsed on GNU/Linux systems for both
252 root 1.4 $cpus and $eus, and on {Free,Net,Open}BSD, sysctl -n hw.ncpu is used
253 root 1.3 for $cpus.
254    
255 root 1.4 Example: create a worker pool with as many workers as CPU cores, or
256 root 1.3 2, if the actual number could not be determined.
257    
258     $fork->AnyEvent::Fork::Pool::run ("myworker::function",
259     max => (scalar AnyEvent::Fork::Pool::ncpu 2),
260     );
261    
262 root 1.2 CHILD USAGE
263     In addition to the AnyEvent::Fork::RPC API, this module implements one
264     more child-side function:
265    
266     AnyEvent::Fork::Pool::retire ()
267     This function sends an event to the parent process to request
268     retirement: the worker is removed from the pool and no new jobs will
269 root 1.4 be sent to it, but it still has to handle the jobs that are already
270 root 1.2 queued.
271    
272     The parentheses are part of the syntax: the function usually isn't
273     defined when you compile your code (because that happens *before*
274     handing the template process over to "AnyEvent::Fork::Pool::run", so
275     you need the empty parentheses to tell Perl that the function is
276     indeed a function.
277    
278     Retiring a worker can be useful to gracefully shut it down when the
279 root 1.4 worker deems this useful. For example, after executing a job, it
280 root 1.2 could check the process size or the number of jobs handled so far,
281 root 1.4 and if either is too high, the worker could request to be retired,
282     to avoid memory leaks to accumulate.
283 root 1.2
284 root 1.3 Example: retire a worker after it has handled roughly 100 requests.
285 root 1.4 It doesn't matter whether you retire at the beginning or end of your
286     request, as the worker will continue to handle some outstanding
287     requests. Likewise, it's ok to call retire multiple times.
288 root 1.3
289     my $count = 0;
290    
291     sub my::worker {
292    
293     ++$count == 100
294     and AnyEvent::Fork::Pool::retire ();
295    
296     ... normal code goes here
297     }
298    
299 root 1.2 POOL PARAMETERS RECIPES
300 root 1.4 This section describes some recipes for pool parameters. These are
301 root 1.2 mostly meant for the synchronous RPC backend, as the asynchronous RPC
302     backend changes the rules considerably, making workers themselves
303     responsible for their scheduling.
304    
305     low latency - set load = 1
306     If you need a deterministic low latency, you should set the "load"
307     parameter to 1. This ensures that never more than one job is sent to
308     each worker. This avoids having to wait for a previous job to
309     finish.
310    
311     This makes most sense with the synchronous (default) backend, as the
312     asynchronous backend can handle multiple requests concurrently.
313    
314     lowest latency - set load = 1 and idle = max
315     To achieve the lowest latency, you additionally should disable any
316     dynamic resizing of the pool by setting "idle" to the same value as
317     "max".
318    
319     high throughput, cpu bound jobs - set load >= 2, max = #cpus
320     To get high throughput with cpu-bound jobs, you should set the
321     maximum pool size to the number of cpus in your system, and "load"
322     to at least 2, to make sure there can be another job waiting for the
323     worker when it has finished one.
324    
325     The value of 2 for "load" is the minimum value that *can* achieve
326     100% throughput, but if your parent process itself is sometimes
327     busy, you might need higher values. Also there is a limit on the
328     amount of data that can be "in flight" to the worker, so if you send
329     big blobs of data to your worker, "load" might have much less of an
330     effect.
331    
332     high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
333     When your jobs are I/O bound, using more workers usually boils down
334     to higher throughput, depending very much on your actual workload -
335     sometimes having only one worker is best, for example, when you read
336 root 1.4 or write big files at maximum speed, as a second worker will
337 root 1.2 increase seek times.
338    
339     EXCEPTIONS
340 root 1.4 The same "policy" as with AnyEvent::Fork::RPC applies - exceptions will
341 root 1.2 not be caught, and exceptions in both worker and in callbacks causes
342     undesirable or undefined behaviour.
343    
344     SEE ALSO
345     AnyEvent::Fork, to create the processes in the first place.
346    
347 root 1.4 AnyEvent::Fork::Remote, likewise, but helpful for remote processes.
348    
349 root 1.2 AnyEvent::Fork::RPC, which implements the RPC protocol and API.
350    
351     AUTHOR AND CONTACT INFORMATION
352     Marc Lehmann <schmorp@schmorp.de>
353     http://software.schmorp.de/pkg/AnyEvent-Fork-Pool
354