ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-Pool/Pool.pm
(Generate patch)

Comparing AnyEvent-Fork-Pool/Pool.pm (file contents):
Revision 1.5 by root, Sat Apr 20 19:33:23 2013 UTC vs.
Revision 1.14 by root, Sun Oct 26 16:22:38 2014 UTC

3AnyEvent::Fork::Pool - simple process pool manager on top of AnyEvent::Fork 3AnyEvent::Fork::Pool - simple process pool manager on top of AnyEvent::Fork
4 4
5=head1 SYNOPSIS 5=head1 SYNOPSIS
6 6
7 use AnyEvent; 7 use AnyEvent;
8 use AnyEvent::Fork;
8 use AnyEvent::Fork::Pool; 9 use AnyEvent::Fork::Pool;
9 # use AnyEvent::Fork is not needed
10 10
11 # all parameters with default values 11 # all possible parameters shown, with default values
12 my $pool = AnyEvent::Fork 12 my $pool = AnyEvent::Fork
13 ->new 13 ->new
14 ->require ("MyWorker") 14 ->require ("MyWorker")
15 ->AnyEvent::Fork::Pool::run ( 15 ->AnyEvent::Fork::Pool::run (
16 "MyWorker::run", # the worker function 16 "MyWorker::run", # the worker function
17 17
18 # pool management 18 # pool management
19 max => 4, # absolute maximum # of processes 19 max => 4, # absolute maximum # of processes
20 idle => 2, # minimum # of idle processes 20 idle => 0, # minimum # of idle processes
21 load => 2, # queue at most this number of jobs per process 21 load => 2, # queue at most this number of jobs per process
22 start => 0.1, # wait this many seconds before starting a new process 22 start => 0.1, # wait this many seconds before starting a new process
23 stop => 1, # wait this many seconds before stopping an idle process 23 stop => 10, # wait this many seconds before stopping an idle process
24 on_destroy => (my $finish = AE::cv), # called when object is destroyed 24 on_destroy => (my $finish = AE::cv), # called when object is destroyed
25 25
26 # parameters passed to AnyEvent::Fork::RPC 26 # parameters passed to AnyEvent::Fork::RPC
27 async => 0, 27 async => 0,
28 on_error => sub { die "FATAL: $_[0]\n" }, 28 on_error => sub { die "FATAL: $_[0]\n" },
41 41
42 $finish->recv; 42 $finish->recv;
43 43
44=head1 DESCRIPTION 44=head1 DESCRIPTION
45 45
46This module uses processes created via L<AnyEvent::Fork> and the RPC 46This module uses processes created via L<AnyEvent::Fork> (or
47protocol implement in L<AnyEvent::Fork::RPC> to create a load-balanced 47L<AnyEvent::Fork::Remote>) and the RPC protocol implement in
48pool of processes that handles jobs. 48L<AnyEvent::Fork::RPC> to create a load-balanced pool of processes that
49handles jobs.
49 50
50Understanding of L<AnyEvent::Fork> is helpful but not critical to be able 51Understanding of L<AnyEvent::Fork> is helpful but not critical to be able
51to use this module, but a thorough understanding of L<AnyEvent::Fork::RPC> 52to use this module, but a thorough understanding of L<AnyEvent::Fork::RPC>
52is, as it defines the actual API that needs to be implemented in the 53is, as it defines the actual API that needs to be implemented in the
53children. 54worker processes.
54
55=head1 EXAMPLES
56 55
57=head1 PARENT USAGE 56=head1 PARENT USAGE
57
58To create a pool, you first have to create a L<AnyEvent::Fork> object -
59this object becomes your template process. Whenever a new worker process
60is needed, it is forked from this template process. Then you need to
61"hand off" this template process to the C<AnyEvent::Fork::Pool> module by
62calling its run method on it:
63
64 my $template = AnyEvent::Fork
65 ->new
66 ->require ("SomeModule", "MyWorkerModule");
67
68 my $pool = $template->AnyEvent::Fork::Pool::run ("MyWorkerModule::myfunction");
69
70The pool "object" is not a regular Perl object, but a code reference that
71you can call and that works roughly like calling the worker function
72directly, except that it returns nothing but instead you need to specify a
73callback to be invoked once results are in:
74
75 $pool->(1, 2, 3, sub { warn "myfunction(1,2,3) returned @_" });
58 76
59=over 4 77=over 4
60 78
61=cut 79=cut
62 80
68 86
69use Guard (); 87use Guard ();
70use Array::Heap (); 88use Array::Heap ();
71 89
72use AnyEvent; 90use AnyEvent;
73use AnyEvent::Fork; # we don't actually depend on it, this is for convenience
74use AnyEvent::Fork::RPC; 91use AnyEvent::Fork::RPC;
75 92
76# these are used for the first and last argument of events 93# these are used for the first and last argument of events
77# in the hope of not colliding. yes, I don't like it either, 94# in the hope of not colliding. yes, I don't like it either,
78# but didn't come up with an obviously better alternative. 95# but didn't come up with an obviously better alternative.
79my $magic0 = ':t6Z@HK1N%Dx@_7?=~-7NQgWDdAs6a,jFN=wLO0*jD*1%P'; 96my $magic0 = ':t6Z@HK1N%Dx@_7?=~-7NQgWDdAs6a,jFN=wLO0*jD*1%P';
80my $magic1 = '<~53rexz.U`!]X[A235^"fyEoiTF\T~oH1l/N6+Djep9b~bI9`\1x%B~vWO1q*'; 97my $magic1 = '<~53rexz.U`!]X[A235^"fyEoiTF\T~oH1l/N6+Djep9b~bI9`\1x%B~vWO1q*';
81 98
82our $VERSION = 0.1; 99our $VERSION = 1.1;
83 100
84=item my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...] 101=item my $pool = AnyEvent::Fork::Pool::run $fork, $function, [key => value...]
85 102
86The traditional way to call it. But it is way cooler to call it in the 103The traditional way to call the pool creation function. But it is way
87following way: 104cooler to call it in the following way:
88 105
89=item my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => value...]) 106=item my $pool = $fork->AnyEvent::Fork::Pool::run ($function, [key => value...])
90 107
91Creates a new pool object with the specified C<$function> as function 108Creates a new pool object with the specified C<$function> as function
92(name) to call for each request. The pool uses the C<$fork> object as the 109(name) to call for each request. The pool uses the C<$fork> object as the
105 122
106The pool consists of a certain number of worker processes. These options 123The pool consists of a certain number of worker processes. These options
107decide how many of these processes exist and when they are started and 124decide how many of these processes exist and when they are started and
108stopped. 125stopped.
109 126
127The worker pool is dynamically resized, according to (perceived :)
128load. The minimum size is given by the C<idle> parameter and the maximum
129size is given by the C<max> parameter. A new worker is started every
130C<start> seconds at most, and an idle worker is stopped at most every
131C<stop> second.
132
133You can specify the amount of jobs sent to a worker concurrently using the
134C<load> parameter.
135
110=over 4 136=over 4
111 137
112=item idle => $count (default: 0) 138=item idle => $count (default: 0)
113 139
114The minimum amount of idle processes in the pool - when there are fewer 140The minimum amount of idle processes in the pool - when there are fewer
115than this many idle workers, C<AnyEvent::Fork::Pool> will try to start new 141than this many idle workers, C<AnyEvent::Fork::Pool> will try to start new
116ones, subject to C<max> and C<start>. 142ones, subject to the limits set by C<max> and C<start>.
117 143
118This is also the initial/minimum amount of workers in the pool. The 144This is also the initial amount of workers in the pool. The default of
119default of zero means that the pool starts empty and can shrink back to 145zero means that the pool starts empty and can shrink back to zero workers
120zero workers over time. 146over time.
121 147
122=item max => $count (default: 4) 148=item max => $count (default: 4)
123 149
124The maximum number of processes in the pool, in addition to the template 150The maximum number of processes in the pool, in addition to the template
125process. C<AnyEvent::Fork::Pool> will never create more than this number 151process. C<AnyEvent::Fork::Pool> will never have more than this number of
126of worker processes, although there can be more temporarily when a worker 152worker processes, although there can be more temporarily when a worker is
127is shut down and hasn't exited yet. 153shut down and hasn't exited yet.
128 154
129=item load => $count (default: 2) 155=item load => $count (default: 2)
130 156
131The maximum number of concurrent jobs sent to a single worker 157The maximum number of concurrent jobs sent to a single worker process.
132process. Worker processes that handle this number of jobs already are
133called "busy".
134 158
135Jobs that cannot be sent to a worker immediately (because all workers are 159Jobs that cannot be sent to a worker immediately (because all workers are
136busy) will be queued until a worker is available. 160busy) will be queued until a worker is available.
137 161
162Setting this low improves latency. For example, at C<1>, every job that
163is sent to a worker is sent to a completely idle worker that doesn't run
164any other jobs. The downside is that throughput is reduced - a worker that
165finishes a job needs to wait for a new job from the parent.
166
167The default of C<2> is usually a good compromise.
168
138=item start => $seconds (default: 0.1) 169=item start => $seconds (default: 0.1)
139 170
140When a job is queued and all workers are busy, a timer is started. If the 171When there are fewer than C<idle> workers (or all workers are completely
141timer elapses and there are still jobs that cannot be queued to a worker, 172busy), then a timer is started. If the timer elapses and there are still
142a new worker is started. 173jobs that cannot be queued to a worker, a new worker is started.
143 174
144This configurs the time that all workers must be busy before a new worker 175This sets the minimum time that all workers must be busy before a new
145is started. Or, put differently, the minimum delay betwene starting new 176worker is started. Or, put differently, the minimum delay between starting
146workers. 177new workers.
147 178
148The delay is zero by default, which means new workers will be started 179The delay is small by default, which means new workers will be started
149without delay. 180relatively quickly. A delay of C<0> is possible, and ensures that the pool
181will grow as quickly as possible under load.
150 182
183Non-zero values are useful to avoid "exploding" a pool because a lot of
184jobs are queued in an instant.
185
186Higher values are often useful to improve efficiency at the cost of
187latency - when fewer processes can do the job over time, starting more and
188more is not necessarily going to help.
189
151=item stop => $seconds (default: 1) 190=item stop => $seconds (default: 10)
152 191
153When a worker has no jobs to execute it becomes idle. An idle worker that 192When a worker has no jobs to execute it becomes idle. An idle worker that
154hasn't executed a job within this amount of time will be stopped, unless 193hasn't executed a job within this amount of time will be stopped, unless
155the other parameters say otherwise. 194the other parameters say otherwise.
156 195
196Setting this to a very high value means that workers stay around longer,
197even when they have nothing to do, which can be good as they don't have to
198be started on the netx load spike again.
199
200Setting this to a lower value can be useful to avoid memory or simply
201process table wastage.
202
203Usually, setting this to a time longer than the time between load spikes
204is best - if you expect a lot of requests every minute and little work
205in between, setting this to longer than a minute avoids having to stop
206and start workers. On the other hand, you have to ask yourself if letting
207workers run idle is a good use of your resources. Try to find a good
208balance between resource usage of your workers and the time to start new
209workers - the processes created by L<AnyEvent::Fork> itself is fats at
210creating workers while not using much memory for them, so most of the
211overhead is likely from your own code.
212
157=item on_destroy => $callback->() (default: none) 213=item on_destroy => $callback->() (default: none)
158 214
159When a pool object goes out of scope, it will still handle all outstanding 215When a pool object goes out of scope, the outstanding requests are still
160jobs. After that, it will destroy all workers (and also the template 216handled till completion. Only after handling all jobs will the workers
161process if it isn't referenced otherwise). 217be destroyed (and also the template process if it isn't referenced
218otherwise).
219
220To find out when a pool I<really> has finished its work, you can set this
221callback, which will be called when the pool has been destroyed.
162 222
163=back 223=back
164 224
165=item Template Process 225=item AnyEvent::Fork::RPC Parameters
166 226
167The worker processes are all forked from a single template 227These parameters are all passed more or less directly to
168process. Ideally, all modules and all cdoe used by the worker, as well as 228L<AnyEvent::Fork::RPC>. They are only briefly mentioned here, for
169any shared data structures should be loaded into the template process, to 229their full documentation please refer to the L<AnyEvent::Fork::RPC>
170take advantage of data sharing via fork. 230documentation. Also, the default values mentioned here are only documented
171 231as a best effort - the L<AnyEvent::Fork::RPC> documentation is binding.
172You can create your own template process by creating a L<AnyEvent::Fork>
173object yourself and passing it as the C<template> parameter, but
174C<AnyEvent::Fork::Pool> can create one for you, including some standard
175options.
176 232
177=over 4 233=over 4
178 234
179=item template => $fork (default: C<< AnyEvent::Fork->new >>)
180
181The template process to use, if you want to create your own.
182
183=item require => \@modules (default: C<[]>)
184
185The modules in this list will be laoded into the template process.
186
187=item eval => "# perl code to execute in template" (default: none)
188
189This is a perl string that is evaluated after creating the template
190process and after requiring the modules. It can do whatever it wants to
191configure the process, but it must not do anything that would keep a later
192fork from working (so must not create event handlers or (real) threads for
193example).
194
195=back
196
197=item AnyEvent::Fork::RPC Parameters
198
199These parameters are all passed directly to L<AnyEvent::Fork::RPC>. They
200are only briefly mentioned here, for their full documentation
201please refer to the L<AnyEvent::Fork::RPC> documentation. Also, the
202default values mentioned here are only documented as a best effort -
203L<AnyEvent::Fork::RPC> documentation is binding.
204
205=over 4
206
207=item async => $boolean (default: 0) 235=item async => $boolean (default: 0)
208 236
209Whether to sue the synchronous or asynchronous RPC backend. 237Whether to use the synchronous or asynchronous RPC backend.
210 238
211=item on_error => $callback->($message) (default: die with message) 239=item on_error => $callback->($message) (default: die with message)
212 240
213The callback to call on any (fatal) errors. 241The callback to call on any (fatal) errors.
214 242
235 263
236 my $max = $arg{max} || 4; 264 my $max = $arg{max} || 4;
237 my $idle = $arg{idle} || 0, 265 my $idle = $arg{idle} || 0,
238 my $load = $arg{load} || 2, 266 my $load = $arg{load} || 2,
239 my $start = $arg{start} || 0.1, 267 my $start = $arg{start} || 0.1,
240 my $stop = $arg{stop} || 1, 268 my $stop = $arg{stop} || 10,
241 my $on_event = $arg{on_event} || sub { }, 269 my $on_event = $arg{on_event} || sub { },
242 my $on_destroy = $arg{on_destroy}; 270 my $on_destroy = $arg{on_destroy};
243 271
244 my @rpc = ( 272 my @rpc = (
245 async => $arg{async}, 273 async => $arg{async},
258 286
259 $template 287 $template
260 ->require ("AnyEvent::Fork::RPC::" . ($arg{async} ? "Async" : "Sync")) 288 ->require ("AnyEvent::Fork::RPC::" . ($arg{async} ? "Async" : "Sync"))
261 ->eval (' 289 ->eval ('
262 my ($magic0, $magic1) = @_; 290 my ($magic0, $magic1) = @_;
263 sub AnyEvent::Fork::Pool::quit() { 291 sub AnyEvent::Fork::Pool::retire() {
264 AnyEvent::Fork::RPC::on_event $magic0, "quit", $magic1; 292 AnyEvent::Fork::RPC::event $magic0, "quit", $magic1;
265 } 293 }
266 ', $magic0, $magic1) 294 ', $magic0, $magic1)
267 ->eval ($arg{eval}); 295 ;
268 296
269 $start_worker = sub { 297 $start_worker = sub {
270 my $proc = [0, 0, undef]; # load, index, rpc 298 my $proc = [0, 0, undef]; # load, index, rpc
271 299
272 $proc->[2] = $template 300 $proc->[2] = $template
298 $proc->[0] 326 $proc->[0]
299 or --$nidle; 327 or --$nidle;
300 328
301 Array::Heap::splice_heap_idx @pool, $proc->[1] 329 Array::Heap::splice_heap_idx @pool, $proc->[1]
302 if defined $proc->[1]; 330 if defined $proc->[1];
331
332 @$proc = 0; # tell others to leave it be
303 }; 333 };
304 334
305 $want_start = sub { 335 $want_start = sub {
306 undef $stop_w; 336 undef $stop_w;
307 337
326 }; 356 };
327 357
328 $scheduler = sub { 358 $scheduler = sub {
329 if (@queue) { 359 if (@queue) {
330 while (@queue) { 360 while (@queue) {
361 @pool or $start_worker->();
362
331 my $proc = $pool[0]; 363 my $proc = $pool[0];
332 364
333 if ($proc->[0] < $load) { 365 if ($proc->[0] < $load) {
334 # found free worker, increase load 366 # found free worker, increase load
335 unless ($proc->[0]++) { 367 unless ($proc->[0]++) {
353 or $want_stop->(); 385 or $want_stop->();
354 386
355 Array::Heap::adjust_heap_idx @pool, $proc->[1] 387 Array::Heap::adjust_heap_idx @pool, $proc->[1]
356 if defined $proc->[1]; 388 if defined $proc->[1];
357 389
390 &$ocb;
391
358 $scheduler->(); 392 $scheduler->();
359
360 &$ocb;
361 }); 393 });
362 } else { 394 } else {
363 $want_start->() 395 $want_start->()
364 unless @pool >= $max; 396 unless @pool >= $max;
365 397
397 429
398=item $pool->(..., $cb->(...)) 430=item $pool->(..., $cb->(...))
399 431
400Call the RPC function of a worker with the given arguments, and when the 432Call the RPC function of a worker with the given arguments, and when the
401worker is done, call the C<$cb> with the results, just like calling the 433worker is done, call the C<$cb> with the results, just like calling the
402L<AnyEvent::Fork::RPC> object directly. 434RPC object durectly - see the L<AnyEvent::Fork::RPC> documentation for
435details on the RPC API.
403 436
404If there is no free worker, the call will be queued. 437If there is no free worker, the call will be queued until a worker becomes
438available.
405 439
406Note that there can be considerable time between calling this method and 440Note that there can be considerable time between calling this method and
407the call actually being executed. During this time, the parameters passed 441the call actually being executed. During this time, the parameters passed
408to this function are effectively read-only - modifying them after the call 442to this function are effectively read-only - modifying them after the call
409and before the callback is invoked causes undefined behaviour. 443and before the callback is invoked causes undefined behaviour.
410 444
411=cut 445=cut
412 446
447=item $cpus = AnyEvent::Fork::Pool::ncpu [$default_cpus]
448
449=item ($cpus, $eus) = AnyEvent::Fork::Pool::ncpu [$default_cpus]
450
451Tries to detect the number of CPUs (C<$cpus> often called CPU cores
452nowadays) and execution units (C<$eus>) which include e.g. extra
453hyperthreaded units). When C<$cpus> cannot be determined reliably,
454C<$default_cpus> is returned for both values, or C<1> if it is missing.
455
456For normal CPU bound uses, it is wise to have as many worker processes
457as CPUs in the system (C<$cpus>), if nothing else uses the CPU. Using
458hyperthreading is usually detrimental to performance, but in those rare
459cases where that really helps it might be beneficial to use more workers
460(C<$eus>).
461
462Currently, F</proc/cpuinfo> is parsed on GNU/Linux systems for both
463C<$cpus> and C<$eus>, and on {Free,Net,Open}BSD, F<sysctl -n hw.ncpu> is
464used for C<$cpus>.
465
466Example: create a worker pool with as many workers as CPU cores, or C<2>,
467if the actual number could not be determined.
468
469 $fork->AnyEvent::Fork::Pool::run ("myworker::function",
470 max => (scalar AnyEvent::Fork::Pool::ncpu 2),
471 );
472
473=cut
474
475BEGIN {
476 if ($^O eq "linux") {
477 *ncpu = sub(;$) {
478 my ($cpus, $eus);
479
480 if (open my $fh, "<", "/proc/cpuinfo") {
481 my %id;
482
483 while (<$fh>) {
484 if (/^core id\s*:\s*(\d+)/) {
485 ++$eus;
486 undef $id{$1};
487 }
488 }
489
490 $cpus = scalar keys %id;
491 } else {
492 $cpus = $eus = @_ ? shift : 1;
493 }
494 wantarray ? ($cpus, $eus) : $cpus
495 };
496 } elsif ($^O eq "freebsd" || $^O eq "netbsd" || $^O eq "openbsd") {
497 *ncpu = sub(;$) {
498 my $cpus = qx<sysctl -n hw.ncpu> * 1
499 || (@_ ? shift : 1);
500 wantarray ? ($cpus, $cpus) : $cpus
501 };
502 } else {
503 *ncpu = sub(;$) {
504 my $cpus = @_ ? shift : 1;
505 wantarray ? ($cpus, $cpus) : $cpus
506 };
507 }
508}
509
413=back 510=back
414 511
512=head1 CHILD USAGE
513
514In addition to the L<AnyEvent::Fork::RPC> API, this module implements one
515more child-side function:
516
517=over 4
518
519=item AnyEvent::Fork::Pool::retire ()
520
521This function sends an event to the parent process to request retirement:
522the worker is removed from the pool and no new jobs will be sent to it,
523but it still has to handle the jobs that are already queued.
524
525The parentheses are part of the syntax: the function usually isn't defined
526when you compile your code (because that happens I<before> handing the
527template process over to C<AnyEvent::Fork::Pool::run>, so you need the
528empty parentheses to tell Perl that the function is indeed a function.
529
530Retiring a worker can be useful to gracefully shut it down when the worker
531deems this useful. For example, after executing a job, it could check the
532process size or the number of jobs handled so far, and if either is too
533high, the worker could request to be retired, to avoid memory leaks to
534accumulate.
535
536Example: retire a worker after it has handled roughly 100 requests. It
537doesn't matter whether you retire at the beginning or end of your request,
538as the worker will continue to handle some outstanding requests. Likewise,
539it's ok to call retire multiple times.
540
541 my $count = 0;
542
543 sub my::worker {
544
545 ++$count == 100
546 and AnyEvent::Fork::Pool::retire ();
547
548 ... normal code goes here
549 }
550
551=back
552
553=head1 POOL PARAMETERS RECIPES
554
555This section describes some recipes for pool parameters. These are mostly
556meant for the synchronous RPC backend, as the asynchronous RPC backend
557changes the rules considerably, making workers themselves responsible for
558their scheduling.
559
560=over 4
561
562=item low latency - set load = 1
563
564If you need a deterministic low latency, you should set the C<load>
565parameter to C<1>. This ensures that never more than one job is sent to
566each worker. This avoids having to wait for a previous job to finish.
567
568This makes most sense with the synchronous (default) backend, as the
569asynchronous backend can handle multiple requests concurrently.
570
571=item lowest latency - set load = 1 and idle = max
572
573To achieve the lowest latency, you additionally should disable any dynamic
574resizing of the pool by setting C<idle> to the same value as C<max>.
575
576=item high throughput, cpu bound jobs - set load >= 2, max = #cpus
577
578To get high throughput with cpu-bound jobs, you should set the maximum
579pool size to the number of cpus in your system, and C<load> to at least
580C<2>, to make sure there can be another job waiting for the worker when it
581has finished one.
582
583The value of C<2> for C<load> is the minimum value that I<can> achieve
584100% throughput, but if your parent process itself is sometimes busy, you
585might need higher values. Also there is a limit on the amount of data that
586can be "in flight" to the worker, so if you send big blobs of data to your
587worker, C<load> might have much less of an effect.
588
589=item high throughput, I/O bound jobs - set load >= 2, max = 1, or very high
590
591When your jobs are I/O bound, using more workers usually boils down to
592higher throughput, depending very much on your actual workload - sometimes
593having only one worker is best, for example, when you read or write big
594files at maximum speed, as a second worker will increase seek times.
595
596=back
597
598=head1 EXCEPTIONS
599
600The same "policy" as with L<AnyEvent::Fork::RPC> applies - exceptions
601will not be caught, and exceptions in both worker and in callbacks causes
602undesirable or undefined behaviour.
603
415=head1 SEE ALSO 604=head1 SEE ALSO
416 605
417L<AnyEvent::Fork>, to create the processes in the first place. 606L<AnyEvent::Fork>, to create the processes in the first place.
607
608L<AnyEvent::Fork::Remote>, likewise, but helpful for remote processes.
418 609
419L<AnyEvent::Fork::RPC>, which implements the RPC protocol and API. 610L<AnyEvent::Fork::RPC>, which implements the RPC protocol and API.
420 611
421=head1 AUTHOR AND CONTACT INFORMATION 612=head1 AUTHOR AND CONTACT INFORMATION
422 613

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines