--- AnyEvent-Fork-Pool/Pool.pm	2013/04/20 19:33:23	1.5
+++ AnyEvent-Fork-Pool/Pool.pm	2013/04/21 11:17:02	1.6
@@ -8,7 +8,7 @@
    use AnyEvent::Fork::Pool;
    # use AnyEvent::Fork is not needed
 
-   # all parameters with default values
+   # all possible parameters shown, with default values
    my $pool = AnyEvent::Fork
       ->new
       ->require ("MyWorker")
@@ -17,10 +17,10 @@
 
            # pool management
            max        => 4,   # absolute maximum # of processes
-           idle       => 2,   # minimum # of idle processes
+           idle       => 0,   # minimum # of idle processes
            load       => 2,   # queue at most this number of jobs per process
            start      => 0.1, # wait this many seconds before starting a new process
-           stop       => 1,   # wait this many seconds before stopping an idle process
+           stop       => 10,  # wait this many seconds before stopping an idle process
            on_destroy => (my $finish = AE::cv), # called when object is destroyed
 
            # parameters passed to AnyEvent::Fork::RPC
@@ -50,12 +50,31 @@
 Understanding of L<AnyEvent::Fork> is helpful but not critical to be able
 to use this module, but a thorough understanding of L<AnyEvent::Fork::RPC>
 is, as it defines the actual API that needs to be implemented in the
-children.
+worker processes.
 
 =head1 EXAMPLES
 
 =head1 PARENT USAGE
 
+To create a pool, you first have to create a L<AnyEvent::Fork> object -
+this object becomes your template process. Whenever a new worker process
+is needed, it is forked from this template process. Then you need to
+"hand off" this template process to the C<AnyEvent::Fork::Pool> module by
+calling its run method on it:
+
+   my $template = AnyEvent::Fork
+                     ->new
+                     ->require ("SomeModule", "MyWorkerModule");
+
+   my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
+
+The pool "object" is not a regular Perl object, but a code reference that
+you can call and that works roughly like calling the worker function
+directly, except that it returns nothing but instead you need to specify a
+callback to be invoked once results are in:
+
+   $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
+
 =over 4
 
 =cut
@@ -83,8 +102,8 @@
 
 =item my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
 
-The traditional way to call it. But it is way cooler to call it in the
-following way:
+The traditional way to call the pool creation function. But it is way
+cooler to call it in the following way:
 
 =item my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => value...])
 
@@ -107,106 +126,117 @@
 decide how many of these processes exist and when they are started and
 stopped.
 
+The worker pool is dynamically resized, according to (perceived :)
+load. The minimum size is given by the C<idle> parameter and the maximum
+size is given by the C<max> parameter. A new worker is started every
+C<start> seconds at most, and an idle worker is stopped at most every
+C<stop> second.
+
+You can specify the amount of jobs sent to a worker concurrently using the
+C<load> parameter.
+
 =over 4
 
 =item idle => $count (default: 0)
 
 The minimum amount of idle processes in the pool - when there are fewer
 than this many idle workers, C<AnyEvent::Fork::Pool> will try to start new
-ones, subject to C<max> and C<start>.
+ones, subject to the limits set by C<max> and C<start>.
 
-This is also the initial/minimum amount of workers in the pool. The
-default of zero means that the pool starts empty and can shrink back to
-zero workers over time.
+This is also the initial amount of workers in the pool. The default of
+zero means that the pool starts empty and can shrink back to zero workers
+over time.
 
 =item max => $count (default: 4)
 
 The maximum number of processes in the pool, in addition to the template
-process. C<AnyEvent::Fork::Pool> will never create more than this number
-of worker processes, although there can be more temporarily when a worker
-is shut down and hasn't exited yet.
+process. C<AnyEvent::Fork::Pool> will never have more than this number of
+worker processes, although there can be more temporarily when a worker is
+shut down and hasn't exited yet.
 
 =item load => $count (default: 2)
 
-The maximum number of concurrent jobs sent to a single worker
-process. Worker processes that handle this number of jobs already are
-called "busy".
+The maximum number of concurrent jobs sent to a single worker process.
 
 Jobs that cannot be sent to a worker immediately (because all workers are
 busy) will be queued until a worker is available.
 
-=item start => $seconds (default: 0.1)
+Setting this low improves latency. For example, at C<1>, every job that
+is sent to a worker is sent to a completely idle worker that doesn't run
+any other jobs. The downside is that throughput is reduced - a worker that
+finishes a job needs to wait for a new job from the parent.
 
-When a job is queued and all workers are busy, a timer is started. If the
-timer elapses and there are still jobs that cannot be queued to a worker,
-a new worker is started.
-
-This configurs the time that all workers must be busy before a new worker
-is started. Or, put differently, the minimum delay betwene starting new
-workers.
+The default of C<2> is usually a good compromise.
 
-The delay is zero by default, which means new workers will be started
-without delay.
+=item start => $seconds (default: 0.1)
 
-=item stop => $seconds (default: 1)
+When there are fewer than C<idle> workers (or all workers are completely
+busy), then a timer is started. If the timer elapses and there are still
+jobs that cannot be queued to a worker, a new worker is started.
+
+This sets the minimum time that all workers must be busy before a new
+worker is started. Or, put differently, the minimum delay between starting
+new workers.
+
+The delay is small by default, which means new workers will be started
+relatively quickly. A delay of C<0> is possible, and ensures that the pool
+will grow as quickly as possible under load.
+
+Non-zero values are useful to avoid "exploding" a pool because a lot of
+jobs are queued in an instant.
+
+Higher values are often useful to improve efficiency at the cost of
+latency - when fewer processes can do the job over time, starting more and
+more is not necessarily going to help.
+
+=item stop => $seconds (default: 10)
 
 When a worker has no jobs to execute it becomes idle. An idle worker that
 hasn't executed a job within this amount of time will be stopped, unless
 the other parameters say otherwise.
 
-=item on_destroy => $callback->() (default: none)
-
-When a pool object goes out of scope, it will still handle all outstanding
-jobs. After that, it will destroy all workers (and also the template
-process if it isn't referenced otherwise).
-
-=back
-
-=item Template Process
-
-The worker processes are all forked from a single template
-process. Ideally, all modules and all cdoe used by the worker, as well as
-any shared data structures should be loaded into the template process, to
-take advantage of data sharing via fork.
-
-You can create your own template process by creating a L<AnyEvent::Fork>
-object yourself and passing it as the C<template> parameter, but
-C<AnyEvent::Fork::Pool> can create one for you, including some standard
-options.
-
-=over 4
-
-=item template => $fork (default: C<< AnyEvent::Fork->new >>)
-
-The template process to use, if you want to create your own.
+Setting this to a very high value means that workers stay around longer,
+even when they have nothing to do, which can be good as they don't have to
+be started on the netx load spike again.
+
+Setting this to a lower value can be useful to avoid memory or simply
+process table wastage.
+
+Usually, setting this to a time longer than the time between load spikes
+is best - if you expect a lot of requests every minute and little work
+in between, setting this to longer than a minute avoids having to stop
+and start workers. On the other hand, you have to ask yourself if letting
+workers run idle is a good use of your resources. Try to find a good
+balance between resource usage of your workers and the time to start new
+workers - the processes created by L<AnyEvent::Fork> itself is fats at
+creating workers while not using much memory for them, so most of the
+overhead is likely from your own code.
 
-=item require => \@modules (default: C<[]>)
-
-The modules in this list will be laoded into the template process.
+=item on_destroy => $callback->() (default: none)
 
-=item eval => "# perl code to execute in template" (default: none)
+When a pool object goes out of scope, the outstanding requests are still
+handled till completion. Only after handling all jobs will the workers
+be destroyed (and also the template process if it isn't referenced
+otherwise).
 
-This is a perl string that is evaluated after creating the template
-process and after requiring the modules. It can do whatever it wants to
-configure the process, but it must not do anything that would keep a later
-fork from working (so must not create event handlers or (real) threads for
-example).
+To find out when a pool I<really> has finished its work, you can set this
+callback, which will be called when the pool has been destroyed.
 
 =back
 
 =item AnyEvent::Fork::RPC Parameters
 
-These parameters are all passed directly to L<AnyEvent::Fork::RPC>. They
-are only briefly mentioned here, for their full documentation
-please refer to the L<AnyEvent::Fork::RPC> documentation. Also, the
-default values mentioned here are only documented as a best effort -
-L<AnyEvent::Fork::RPC> documentation is binding.
+These parameters are all passed more or less directly to
+L<AnyEvent::Fork::RPC>. They are only briefly mentioned here, for
+their full documentation please refer to the L<AnyEvent::Fork::RPC>
+documentation. Also, the default values mentioned here are only documented
+as a best effort - the L<AnyEvent::Fork::RPC> documentation is binding.
 
 =over 4
 
 =item async => $boolean (default: 0)
 
-Whether to sue the synchronous or asynchronous RPC backend.
+Whether to use the synchronous or asynchronous RPC backend.
 
 =item on_error => $callback->($message) (default: die with message)
 
@@ -237,7 +267,7 @@
    my $idle       = $arg{idle}       || 0,
    my $load       = $arg{load}       || 2,
    my $start      = $arg{start}      || 0.1,
-   my $stop       = $arg{stop}       || 1,
+   my $stop       = $arg{stop}       || 10,
    my $on_event   = $arg{on_event}   || sub { },
    my $on_destroy = $arg{on_destroy};
 
@@ -260,11 +290,11 @@
       ->require ("AnyEvent::Fork::RPC::" . ($arg{async} ? "Async" : "Sync"))
       ->eval ('
            my ($magic0, $magic1) = @_;
-           sub AnyEvent::Fork::Pool::quit() {
-              AnyEvent::Fork::RPC::on_event $magic0, "quit", $magic1;
+           sub AnyEvent::Fork::Pool::retire() {
+              AnyEvent::Fork::RPC::event $magic0, "quit", $magic1;
            }
         ', $magic0, $magic1)
-      ->eval ($arg{eval});
+   ;
 
    $start_worker = sub {
       my $proc = [0, 0, undef]; # load, index, rpc
@@ -399,9 +429,11 @@
 
 Call the RPC function of a worker with the given arguments, and when the
 worker is done, call the C<$cb> with the results, just like calling the
-L<AnyEvent::Fork::RPC> object directly.
+RPC object durectly - see the L<AnyEvent::Fork::RPC> documentation for
+details on the RPC API.
 
-If there is no free worker, the call will be queued.
+If there is no free worker, the call will be queued until a worker becomes
+available.
 
 Note that there can be considerable time between calling this method and
 the call actually being executed. During this time, the parameters passed
@@ -412,6 +444,77 @@
 
 =back
 
+=head1 CHILD USAGE
+
+In addition to the L<AnyEvent::Fork::RPC> API, this module implements one
+more child-side function:
+
+=over 4
+
+=item AnyEvent::Fork::Pool::retire ()
+
+This function sends an event to the parent process to request retirement:
+the worker is removed from the pool and no new jobs will be sent to it,
+but it has to handle the jobs that are already queued.
+
+The parentheses are part of the syntax: the function usually isn't defined
+when you compile your code (because that happens I<before> handing the
+template process over to C<AnyEvent::Fork::Pool::run>, so you need the
+empty parentheses to tell Perl that the function is indeed a function.
+
+Retiring a worker can be useful to gracefully shut it down when the worker
+deems this useful. For example, after executing a job, one could check
+the process size or the number of jobs handled so far, and if either is
+too high, the worker could ask to get retired, to avoid memory leaks to
+accumulate.
+
+=back
+
+=head1 POOL PARAMETERS RECIPES
+
+This section describes some recipes for pool paramaters. These are mostly
+meant for the synchronous RPC backend, as the asynchronous RPC backend
+changes the rules considerably, making workers themselves responsible for
+their scheduling.
+
+=over 4
+
+=item low latency - set load = 1
+
+If you need a deterministic low latency, you should set the C<load>
+parameter to C<1>. This ensures that never more than one job is sent to
+each worker. This avoids having to wait for a previous job to finish.
+
+This makes most sense with the synchronous (default) backend, as the
+asynchronous backend can handle multiple requests concurrently.
+
+=item lowest latency - set load = 1 and idle = max
+
+To achieve the lowest latency, you additionally should disable any dynamic
+resizing of the pool by setting C<idle> to the same value as C<max>.
+
+=item high throughput, cpu bound jobs - set load >= 2, max = #cpus
+
+To get high throughput with cpu-bound jobs, you should set the maximum
+pool size to the number of cpus in your system, and C<load> to at least
+C<2>, to make sure there can be another job waiting for the worker when it
+has finished one.
+
+The value of C<2> for C<load> is the minimum value that I<can> achieve
+100% throughput, but if your parent process itself is sometimes busy, you
+might need higher values. Also there is a limit on the amount of data that
+can be "in flight" to the worker, so if you send big blobs of data to your
+worker, C<load> might have much less of an effect.
+
+=item high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
+
+When your jobs are I/O bound, using more workers usually boils down to
+higher throughput, depending very much on your actual workload - sometimes
+having only one worker is best, for example, when you read or write big
+files at maixmum speed, as a second worker will increase seek times.
+
+=back
+
 =head1 SEE ALSO
 
 L<AnyEvent::Fork>, to create the processes in the first place.