--- AnyEvent-Fork-Pool/README 2013/04/18 14:24:01 1.1 +++ AnyEvent-Fork-Pool/README 2013/04/21 12:28:34 1.2 @@ -0,0 +1,315 @@ +NAME + AnyEvent::Fork::Pool - simple process pool manager on top of + AnyEvent::Fork + + THE API IS NOT FINISHED, CONSIDER THIS AN ALPHA RELEASE + +SYNOPSIS + use AnyEvent; + use AnyEvent::Fork::Pool; + # use AnyEvent::Fork is not needed + + # all possible parameters shown, with default values + my $pool = AnyEvent::Fork + ->new + ->require ("MyWorker") + ->AnyEvent::Fork::Pool::run ( + "MyWorker::run", # the worker function + + # pool management + max => 4, # absolute maximum # of processes + idle => 0, # minimum # of idle processes + load => 2, # queue at most this number of jobs per process + start => 0.1, # wait this many seconds before starting a new process + stop => 10, # wait this many seconds before stopping an idle process + on_destroy => (my $finish = AE::cv), # called when object is destroyed + + # parameters passed to AnyEvent::Fork::RPC + async => 0, + on_error => sub { die "FATAL: $_[0]\n" }, + on_event => sub { my @ev = @_ }, + init => "MyWorker::init", + serialiser => $AnyEvent::Fork::RPC::STRING_SERIALISER, + ); + + for (1..10) { + $pool->(doit => $_, sub { + print "MyWorker::run returned @_\n"; + }); + } + + undef $pool; + + $finish->recv; + +DESCRIPTION + This module uses processes created via AnyEvent::Fork and the RPC + protocol implement in AnyEvent::Fork::RPC to create a load-balanced pool + of processes that handles jobs. + + Understanding of AnyEvent::Fork is helpful but not critical to be able + to use this module, but a thorough understanding of AnyEvent::Fork::RPC + is, as it defines the actual API that needs to be implemented in the + worker processes. + +EXAMPLES +PARENT USAGE + To create a pool, you first have to create a AnyEvent::Fork object - + this object becomes your template process. Whenever a new worker process + is needed, it is forked from this template process. Then you need to + "hand off" this template process to the "AnyEvent::Fork::Pool" module by + calling its run method on it: + + my $template = AnyEvent::Fork + ->new + ->require ("SomeModule", "MyWorkerModule"); + + my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction"); + + The pool "object" is not a regular Perl object, but a code reference + that you can call and that works roughly like calling the worker + function directly, except that it returns nothing but instead you need + to specify a callback to be invoked once results are in: + + $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" }); + + my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...] + The traditional way to call the pool creation function. But it is + way cooler to call it in the following way: + + my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => + value...]) + Creates a new pool object with the specified $function as function + (name) to call for each request. The pool uses the $fork object as + the template when creating worker processes. + + You can supply your own template process, or tell + "AnyEvent::Fork::Pool" to create one. + + A relatively large number of key/value pairs can be specified to + influence the behaviour. They are grouped into the categories "pool + management", "template process" and "rpc parameters". + + Pool Management + The pool consists of a certain number of worker processes. These + options decide how many of these processes exist and when they + are started and stopped. + + The worker pool is dynamically resized, according to (perceived + :) load. The minimum size is given by the "idle" parameter and + the maximum size is given by the "max" parameter. A new worker + is started every "start" seconds at most, and an idle worker is + stopped at most every "stop" second. + + You can specify the amount of jobs sent to a worker concurrently + using the "load" parameter. + + idle => $count (default: 0) + The minimum amount of idle processes in the pool - when + there are fewer than this many idle workers, + "AnyEvent::Fork::Pool" will try to start new ones, subject + to the limits set by "max" and "start". + + This is also the initial amount of workers in the pool. The + default of zero means that the pool starts empty and can + shrink back to zero workers over time. + + max => $count (default: 4) + The maximum number of processes in the pool, in addition to + the template process. "AnyEvent::Fork::Pool" will never have + more than this number of worker processes, although there + can be more temporarily when a worker is shut down and + hasn't exited yet. + + load => $count (default: 2) + The maximum number of concurrent jobs sent to a single + worker process. + + Jobs that cannot be sent to a worker immediately (because + all workers are busy) will be queued until a worker is + available. + + Setting this low improves latency. For example, at 1, every + job that is sent to a worker is sent to a completely idle + worker that doesn't run any other jobs. The downside is that + throughput is reduced - a worker that finishes a job needs + to wait for a new job from the parent. + + The default of 2 is usually a good compromise. + + start => $seconds (default: 0.1) + When there are fewer than "idle" workers (or all workers are + completely busy), then a timer is started. If the timer + elapses and there are still jobs that cannot be queued to a + worker, a new worker is started. + + This sets the minimum time that all workers must be busy + before a new worker is started. Or, put differently, the + minimum delay between starting new workers. + + The delay is small by default, which means new workers will + be started relatively quickly. A delay of 0 is possible, and + ensures that the pool will grow as quickly as possible under + load. + + Non-zero values are useful to avoid "exploding" a pool + because a lot of jobs are queued in an instant. + + Higher values are often useful to improve efficiency at the + cost of latency - when fewer processes can do the job over + time, starting more and more is not necessarily going to + help. + + stop => $seconds (default: 10) + When a worker has no jobs to execute it becomes idle. An + idle worker that hasn't executed a job within this amount of + time will be stopped, unless the other parameters say + otherwise. + + Setting this to a very high value means that workers stay + around longer, even when they have nothing to do, which can + be good as they don't have to be started on the netx load + spike again. + + Setting this to a lower value can be useful to avoid memory + or simply process table wastage. + + Usually, setting this to a time longer than the time between + load spikes is best - if you expect a lot of requests every + minute and little work in between, setting this to longer + than a minute avoids having to stop and start workers. On + the other hand, you have to ask yourself if letting workers + run idle is a good use of your resources. Try to find a good + balance between resource usage of your workers and the time + to start new workers - the processes created by + AnyEvent::Fork itself is fats at creating workers while not + using much memory for them, so most of the overhead is + likely from your own code. + + on_destroy => $callback->() (default: none) + When a pool object goes out of scope, the outstanding + requests are still handled till completion. Only after + handling all jobs will the workers be destroyed (and also + the template process if it isn't referenced otherwise). + + To find out when a pool *really* has finished its work, you + can set this callback, which will be called when the pool + has been destroyed. + + AnyEvent::Fork::RPC Parameters + These parameters are all passed more or less directly to + AnyEvent::Fork::RPC. They are only briefly mentioned here, for + their full documentation please refer to the AnyEvent::Fork::RPC + documentation. Also, the default values mentioned here are only + documented as a best effort - the AnyEvent::Fork::RPC + documentation is binding. + + async => $boolean (default: 0) + Whether to use the synchronous or asynchronous RPC backend. + + on_error => $callback->($message) (default: die with message) + The callback to call on any (fatal) errors. + + on_event => $callback->(...) (default: "sub { }", unlike + AnyEvent::Fork::RPC) + The callback to invoke on events. + + init => $initfunction (default: none) + The function to call in the child, once before handling + requests. + + serialiser => $serialiser (defailt: + $AnyEvent::Fork::RPC::STRING_SERIALISER) + The serialiser to use. + + $pool->(..., $cb->(...)) + Call the RPC function of a worker with the given arguments, and when + the worker is done, call the $cb with the results, just like calling + the RPC object durectly - see the AnyEvent::Fork::RPC documentation + for details on the RPC API. + + If there is no free worker, the call will be queued until a worker + becomes available. + + Note that there can be considerable time between calling this method + and the call actually being executed. During this time, the + parameters passed to this function are effectively read-only - + modifying them after the call and before the callback is invoked + causes undefined behaviour. + +CHILD USAGE + In addition to the AnyEvent::Fork::RPC API, this module implements one + more child-side function: + + AnyEvent::Fork::Pool::retire () + This function sends an event to the parent process to request + retirement: the worker is removed from the pool and no new jobs will + be sent to it, but it has to handle the jobs that are already + queued. + + The parentheses are part of the syntax: the function usually isn't + defined when you compile your code (because that happens *before* + handing the template process over to "AnyEvent::Fork::Pool::run", so + you need the empty parentheses to tell Perl that the function is + indeed a function. + + Retiring a worker can be useful to gracefully shut it down when the + worker deems this useful. For example, after executing a job, one + could check the process size or the number of jobs handled so far, + and if either is too high, the worker could ask to get retired, to + avoid memory leaks to accumulate. + +POOL PARAMETERS RECIPES + This section describes some recipes for pool paramaters. These are + mostly meant for the synchronous RPC backend, as the asynchronous RPC + backend changes the rules considerably, making workers themselves + responsible for their scheduling. + + low latency - set load = 1 + If you need a deterministic low latency, you should set the "load" + parameter to 1. This ensures that never more than one job is sent to + each worker. This avoids having to wait for a previous job to + finish. + + This makes most sense with the synchronous (default) backend, as the + asynchronous backend can handle multiple requests concurrently. + + lowest latency - set load = 1 and idle = max + To achieve the lowest latency, you additionally should disable any + dynamic resizing of the pool by setting "idle" to the same value as + "max". + + high throughput, cpu bound jobs - set load >= 2, max = #cpus + To get high throughput with cpu-bound jobs, you should set the + maximum pool size to the number of cpus in your system, and "load" + to at least 2, to make sure there can be another job waiting for the + worker when it has finished one. + + The value of 2 for "load" is the minimum value that *can* achieve + 100% throughput, but if your parent process itself is sometimes + busy, you might need higher values. Also there is a limit on the + amount of data that can be "in flight" to the worker, so if you send + big blobs of data to your worker, "load" might have much less of an + effect. + + high throughput, I/O bound jobs - set load >= 2, max = 1, or very high + When your jobs are I/O bound, using more workers usually boils down + to higher throughput, depending very much on your actual workload - + sometimes having only one worker is best, for example, when you read + or write big files at maixmum speed, as a second worker will + increase seek times. + +EXCEPTIONS + The same "policy" as with AnyEvent::Fork::RPC applies - exceptins will + not be caught, and exceptions in both worker and in callbacks causes + undesirable or undefined behaviour. + +SEE ALSO + AnyEvent::Fork, to create the processes in the first place. + + AnyEvent::Fork::RPC, which implements the RPC protocol and API. + +AUTHOR AND CONTACT INFORMATION + Marc Lehmann + http://software.schmorp.de/pkg/AnyEvent-Fork-Pool +