--- AnyEvent-Fork-Pool/Pool.pm 2013/04/20 19:33:23 1.5 +++ AnyEvent-Fork-Pool/Pool.pm 2013/04/21 11:17:02 1.6 @@ -8,7 +8,7 @@ use AnyEvent::Fork::Pool; # use AnyEvent::Fork is not needed - # all parameters with default values + # all possible parameters shown, with default values my $pool = AnyEvent::Fork ->new ->require ("MyWorker") @@ -17,10 +17,10 @@ # pool management max => 4, # absolute maximum # of processes - idle => 2, # minimum # of idle processes + idle => 0, # minimum # of idle processes load => 2, # queue at most this number of jobs per process start => 0.1, # wait this many seconds before starting a new process - stop => 1, # wait this many seconds before stopping an idle process + stop => 10, # wait this many seconds before stopping an idle process on_destroy => (my $finish = AE::cv), # called when object is destroyed # parameters passed to AnyEvent::Fork::RPC @@ -50,12 +50,31 @@ Understanding of L is helpful but not critical to be able to use this module, but a thorough understanding of L is, as it defines the actual API that needs to be implemented in the -children. +worker processes. =head1 EXAMPLES =head1 PARENT USAGE +To create a pool, you first have to create a L object - +this object becomes your template process. Whenever a new worker process +is needed, it is forked from this template process. Then you need to +"hand off" this template process to the C module by +calling its run method on it: + + my $template = AnyEvent::Fork + ->new + ->require ("SomeModule", "MyWorkerModule"); + + my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction"); + +The pool "object" is not a regular Perl object, but a code reference that +you can call and that works roughly like calling the worker function +directly, except that it returns nothing but instead you need to specify a +callback to be invoked once results are in: + + $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" }); + =over 4 =cut @@ -83,8 +102,8 @@ =item my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...] -The traditional way to call it. But it is way cooler to call it in the -following way: +The traditional way to call the pool creation function. But it is way +cooler to call it in the following way: =item my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => value...]) @@ -107,106 +126,117 @@ decide how many of these processes exist and when they are started and stopped. +The worker pool is dynamically resized, according to (perceived :) +load. The minimum size is given by the C parameter and the maximum +size is given by the C parameter. A new worker is started every +C seconds at most, and an idle worker is stopped at most every +C second. + +You can specify the amount of jobs sent to a worker concurrently using the +C parameter. + =over 4 =item idle => $count (default: 0) The minimum amount of idle processes in the pool - when there are fewer than this many idle workers, C will try to start new -ones, subject to C and C. +ones, subject to the limits set by C and C. -This is also the initial/minimum amount of workers in the pool. The -default of zero means that the pool starts empty and can shrink back to -zero workers over time. +This is also the initial amount of workers in the pool. The default of +zero means that the pool starts empty and can shrink back to zero workers +over time. =item max => $count (default: 4) The maximum number of processes in the pool, in addition to the template -process. C will never create more than this number -of worker processes, although there can be more temporarily when a worker -is shut down and hasn't exited yet. +process. C will never have more than this number of +worker processes, although there can be more temporarily when a worker is +shut down and hasn't exited yet. =item load => $count (default: 2) -The maximum number of concurrent jobs sent to a single worker -process. Worker processes that handle this number of jobs already are -called "busy". +The maximum number of concurrent jobs sent to a single worker process. Jobs that cannot be sent to a worker immediately (because all workers are busy) will be queued until a worker is available. -=item start => $seconds (default: 0.1) +Setting this low improves latency. For example, at C<1>, every job that +is sent to a worker is sent to a completely idle worker that doesn't run +any other jobs. The downside is that throughput is reduced - a worker that +finishes a job needs to wait for a new job from the parent. -When a job is queued and all workers are busy, a timer is started. If the -timer elapses and there are still jobs that cannot be queued to a worker, -a new worker is started. - -This configurs the time that all workers must be busy before a new worker -is started. Or, put differently, the minimum delay betwene starting new -workers. +The default of C<2> is usually a good compromise. -The delay is zero by default, which means new workers will be started -without delay. +=item start => $seconds (default: 0.1) -=item stop => $seconds (default: 1) +When there are fewer than C workers (or all workers are completely +busy), then a timer is started. If the timer elapses and there are still +jobs that cannot be queued to a worker, a new worker is started. + +This sets the minimum time that all workers must be busy before a new +worker is started. Or, put differently, the minimum delay between starting +new workers. + +The delay is small by default, which means new workers will be started +relatively quickly. A delay of C<0> is possible, and ensures that the pool +will grow as quickly as possible under load. + +Non-zero values are useful to avoid "exploding" a pool because a lot of +jobs are queued in an instant. + +Higher values are often useful to improve efficiency at the cost of +latency - when fewer processes can do the job over time, starting more and +more is not necessarily going to help. + +=item stop => $seconds (default: 10) When a worker has no jobs to execute it becomes idle. An idle worker that hasn't executed a job within this amount of time will be stopped, unless the other parameters say otherwise. -=item on_destroy => $callback->() (default: none) - -When a pool object goes out of scope, it will still handle all outstanding -jobs. After that, it will destroy all workers (and also the template -process if it isn't referenced otherwise). - -=back - -=item Template Process - -The worker processes are all forked from a single template -process. Ideally, all modules and all cdoe used by the worker, as well as -any shared data structures should be loaded into the template process, to -take advantage of data sharing via fork. - -You can create your own template process by creating a L -object yourself and passing it as the C