ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/Fork.pm
Revision: 1.5
Committed: Wed Apr 3 08:29:21 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
Changes since 1.4: +13 -1 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3 root 1.4 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4 root 1.1
5     =head1 SYNOPSIS
6    
7 root 1.4 use AnyEvent::Fork;
8 root 1.1
9     =head1 DESCRIPTION
10    
11 root 1.4 This module allows you to create new processes, without actually forking
12     them from your current process (avoiding the problems of forking), but
13     preserving most of the advantages of fork.
14    
15     It can be used to create new worker processes or new independent
16     subprocesses for short- and long-running jobs, process pools (e.g. for use
17     in pre-forked servers) but also to spawn new external processes (such as
18     CGI scripts from a webserver), which can be faster (and more well behaved)
19     than using fork+exec in big processes.
20 root 1.1
21 root 1.5 Special care has been taken to make this module useful from other modules,
22     while still supporting specialised environments such as L<App::Staticperl>
23     or L<PAR::Packer>.
24    
25 root 1.1 =head1 PROBLEM STATEMENT
26    
27     There are two ways to implement parallel processing on UNIX like operating
28     systems - fork and process, and fork+exec and process. They have different
29     advantages and disadvantages that I describe below, together with how this
30     module tries to mitigate the disadvantages.
31    
32     =over 4
33    
34     =item Forking from a big process can be very slow (a 5GB process needs
35     0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
36     is often shared with exec (because you have to fork first), but in some
37     circumstances (e.g. when vfork is used), fork+exec can be much faster.
38    
39     This module can help here by telling a small(er) helper process to fork,
40     or fork+exec instead.
41    
42     =item Forking usually creates a copy-on-write copy of the parent
43     process. Memory (for example, modules or data files that have been
44     will not take additional memory). When exec'ing a new process, modules
45     and data files might need to be loaded again, at extra cpu and memory
46     cost. Likewise when forking, all data structures are copied as well - if
47     the program frees them and replaces them by new data, the child processes
48     will retain the memory even if it isn't used.
49    
50     This module allows the main program to do a controlled fork, and allows
51     modules to exec processes safely at any time. When creating a custom
52     process pool you can take advantage of data sharing via fork without
53     risking to share large dynamic data structures that will blow up child
54     memory usage.
55    
56     =item Exec'ing a new perl process might be difficult and slow. For
57     example, it is not easy to find the correct path to the perl interpreter,
58     and all modules have to be loaded from disk again. Long running processes
59     might run into problems when perl is upgraded for example.
60    
61     This module supports creating pre-initialised perl processes to be used
62     as template, and also tries hard to identify the correct path to the perl
63     interpreter. With a cooperative main program, exec'ing the interpreter
64     might not even be necessary.
65    
66     =item Forking might be impossible when a program is running. For example,
67     POSIX makes it almost impossible to fork from a multithreaded program and
68     do anything useful in the child - strictly speaking, if your perl program
69     uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
70     you cannot call fork on the perl level anymore, at all.
71    
72     This module can safely fork helper processes at any time, by caling
73     fork+exec in C, in a POSIX-compatible way.
74    
75     =item Parallel processing with fork might be inconvenient or difficult
76     to implement. For example, when a program uses an event loop and creates
77     watchers it becomes very hard to use the event loop from a child
78     program, as the watchers already exist but are only meaningful in the
79     parent. Worse, a module might want to use such a system, not knowing
80     whether another module or the main program also does, leading to problems.
81    
82     This module only lets the main program create pools by forking (because
83     only the main program can know when it is still safe to do so) - all other
84     pools are created by fork+exec, after which such modules can again be
85     loaded.
86    
87     =back
88    
89 root 1.3 =head1 CONCEPTS
90    
91     This module can create new processes either by executing a new perl
92     process, or by forking from an existing "template" process.
93    
94     Each such process comes with its own file handle that can be used to
95     communicate with it (it's actually a socket - one end in the new process,
96     one end in the main process), and among the things you can do in it are
97     load modules, fork new processes, send file handles to it, and execute
98     functions.
99    
100     There are multiple ways to create additional processes to execute some
101     jobs:
102    
103     =over 4
104    
105     =item fork a new process from the "default" template process, load code,
106     run it
107    
108     This module has a "default" template process which it executes when it is
109     needed the first time. Forking from this process shares the memory used
110     for the perl interpreter with the new process, but loading modules takes
111     time, and the memory is not shared with anything else.
112    
113     This is ideal for when you only need one extra process of a kind, with the
114     option of starting and stipping it on demand.
115    
116     =item fork a new template process, load code, then fork processes off of
117     it and run the code
118    
119     When you need to have a bunch of processes that all execute the same (or
120     very similar) tasks, then a good way is to create a new template process
121     for them, loading all the modules you need, and then create your worker
122     processes from this new template process.
123    
124     This way, all code (and data structures) that can be shared (e.g. the
125     modules you loaded) is shared between the processes, and each new process
126     consumes relatively little memory of its own.
127    
128     The disadvantage of this approach is that you need to create a template
129     process for the sole purpose of forking new processes from it, but if you
130     only need a fixed number of proceses you can create them, and then destroy
131     the template process.
132    
133     =item execute a new perl interpreter, load some code, run it
134    
135     This is relatively slow, and doesn't allow you to share memory between
136     multiple processes.
137    
138     The only advantage is that you don't have to have a template process
139     hanging around all the time to fork off some new processes, which might be
140     an advantage when there are long time spans where no extra processes are
141     needed.
142    
143     =back
144    
145     =head1 FUNCTIONS
146    
147 root 1.1 =over 4
148    
149     =cut
150    
151 root 1.4 package AnyEvent::Fork;
152 root 1.1
153     use common::sense;
154    
155     use Socket ();
156    
157     use AnyEvent;
158 root 1.4 use AnyEvent::Fork::Util;
159 root 1.1 use AnyEvent::Util ();
160    
161 root 1.4 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
162 root 1.1
163 root 1.4 =item my $pool = new AnyEvent::Fork key => value...
164 root 1.1
165     Create a new process pool. The following named parameters are supported:
166    
167     =over 4
168    
169     =back
170    
171     =cut
172    
173 root 1.5 # the early fork template process
174     our $EARLY;
175    
176 root 1.4 # the empty template process
177     our $TEMPLATE;
178    
179     sub _cmd {
180     my $self = shift;
181    
182     # ideally, we would want to use "a (w/a)*" as format string, but perl versions
183 root 1.5 # from at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
184 root 1.4 push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
185    
186     $self->[3] ||= AE::io $self->[1], 1, sub {
187     if (ref $self->[2][0]) {
188     AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
189     and shift @{ $self->[2] };
190 root 1.5
191 root 1.4 } else {
192     my $len = syswrite $self->[1], $self->[2][0]
193     or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
194 root 1.5
195 root 1.4 substr $self->[2][0], 0, $len, "";
196     shift @{ $self->[2] } unless length $self->[2][0];
197     }
198    
199     unless (@{ $self->[2] }) {
200     undef $self->[3];
201     $self->[0]->($self->[1]) if $self->[0];
202     }
203     };
204     }
205 root 1.1
206 root 1.4 sub _new {
207     my ($self, $fh) = @_;
208 root 1.1
209 root 1.4 $self = bless [
210     undef, # run callback
211 root 1.1 $fh,
212 root 1.4 [], # write queue - strings or fd's
213     undef, # AE watcher
214     ], $self;
215    
216     # my ($a, $b) = AnyEvent::Util::portable_socketpair;
217    
218     # queue_cmd $template, "Iabc";
219     # push @{ $template->[2] }, \$b;
220    
221     # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
222     # undef $b;
223     # die "x" . <$a>;
224    
225     $self
226 root 1.1 }
227    
228 root 1.4 =item my $proc = new AnyEvent::Fork
229 root 1.1
230 root 1.4 Create a new "empty" perl interpreter process and returns its process
231     object for further manipulation.
232 root 1.1
233 root 1.4 The new process is forked from a template process that is kept around
234     for this purpose. When it doesn't exist yet, it is created by a call to
235     C<new_exec> and kept around for future calls.
236    
237     =cut
238    
239     sub new {
240     my $class = shift;
241 root 1.1
242 root 1.4 $TEMPLATE ||= $class->new_exec;
243     $TEMPLATE->fork
244 root 1.1 }
245    
246 root 1.4 =item $new_proc = $proc->fork
247    
248     Forks C<$proc>, creating a new process, and returns the process object
249     of the new process.
250    
251     If any of the C<send_> functions have been called before fork, then they
252     will be cloned in the child. For example, in a pre-forked server, you
253     might C<send_fh> the listening socket into the template process, and then
254     keep calling C<fork> and C<run>.
255    
256     =cut
257    
258     sub fork {
259     my ($self) = @_;
260 root 1.1
261     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
262 root 1.4
263     $self->send_fh ($slave);
264     $self->_cmd ("f");
265    
266 root 1.1 AnyEvent::Util::fh_nonblocking $fh, 1;
267    
268 root 1.4 AnyEvent::Fork->_new ($fh)
269     }
270    
271     =item my $proc = new_exec AnyEvent::Fork
272    
273     Create a new "empty" perl interpreter process and returns its process
274     object for further manipulation.
275    
276     Unlike the C<new> method, this method I<always> spawns a new perl process
277     (except in some cases, see L<AnyEvent::Fork::Early> for details). This
278     reduces the amount of memory sharing that is possible, and is also slower.
279    
280     You should use C<new> whenever possible, except when having a template
281     process around is unacceptable.
282    
283     The path to the perl interpreter is divined usign various methods - first
284     C<$^X> is investigated to see if the path ends with something that sounds
285     as if it were the perl interpreter. Failing this, the module falls back to
286     using C<$Config::Config{perlpath}>.
287    
288     =cut
289    
290     sub new_exec {
291     my ($self) = @_;
292    
293 root 1.5 return $EARLY->fork
294     if $EARLY;
295    
296 root 1.4 # first find path of perl
297     my $perl = $;
298    
299     # first we try $^X, but the path must be absolute (always on win32), and end in sth.
300     # that looks like perl. this obviously only works for posix and win32
301     unless (
302     (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
303     && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
304     ) {
305     # if it doesn't look perlish enough, try Config
306     require Config;
307     $perl = $Config::Config{perlpath};
308     $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
309     }
310    
311     require Proc::FastSpawn;
312    
313     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
314     AnyEvent::Util::fh_nonblocking $fh, 1;
315     Proc::FastSpawn::fd_inherit (fileno $slave);
316    
317     # quick. also doesn't work in win32. of course. what did you expect
318     #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
319 root 1.1 my %env = %ENV;
320     $env{PERL5LIB} = join ":", grep !ref, @INC;
321    
322 root 1.4 Proc::FastSpawn::spawn (
323     $perl,
324     ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave],
325     [map "$_=$env{$_}", keys %env],
326     ) or die "unable to spawn AnyEvent::Fork server: $!";
327    
328     $self->_new ($fh)
329     }
330    
331     =item $proc = $proc->require ($module, ...)
332 root 1.1
333 root 1.4 Tries to load the given modules into the process
334 root 1.1
335 root 1.4 Returns the process object for easy chaining of method calls.
336 root 1.1
337 root 1.4 =item $proc = $proc->send_fh ($handle, ...)
338 root 1.1
339 root 1.4 Send one or more file handles (I<not> file descriptors) to the process,
340     to prepare a call to C<run>.
341 root 1.1
342 root 1.4 The process object keeps a reference to the handles until this is done,
343     so you must not explicitly close the handles. This is most easily
344     accomplished by simply not storing the file handles anywhere after passing
345     them to this method.
346    
347     Returns the process object for easy chaining of method calls.
348    
349     =cut
350    
351     sub send_fh {
352     my ($self, @fh) = @_;
353    
354     for my $fh (@fh) {
355     $self->_cmd ("h");
356     push @{ $self->[2] }, \$fh;
357     }
358    
359     $self
360 root 1.1 }
361    
362 root 1.4 =item $proc = $proc->send_arg ($string, ...)
363    
364     Send one or more argument strings to the process, to prepare a call to
365     C<run>. The strings can be any octet string.
366    
367     Returns the process object for easy chaining of emthod calls.
368    
369     =cut
370 root 1.1
371 root 1.4 sub send_arg {
372     my ($self, @arg) = @_;
373 root 1.1
374 root 1.4 $self->_cmd (a => @arg);
375 root 1.1
376     $self
377     }
378    
379 root 1.4 =item $proc->run ($func, $cb->($fh))
380    
381     Enter the function specified by the fully qualified name in C<$func> in
382     the process. The function is called with the communication socket as first
383     argument, followed by all file handles and string arguments sent earlier
384     via C<send_fh> and C<send_arg> methods, in the order they were called.
385    
386     If the called function returns, the process exits.
387    
388     Preparing the process can take time - when the process is ready, the
389     callback is invoked with the local communications socket as argument.
390    
391     The process object becomes unusable on return from this function.
392    
393     If the communication socket isn't used, it should be closed on both sides,
394     to save on kernel memory.
395    
396     The socket is non-blocking in the parent, and blocking in the newly
397     created process. The close-on-exec flag is set on both. Even if not used
398     otherwise, the socket can be a good indicator for the existance of the
399     process - if the othe rprocess exits, you get a readable event on it,
400     because exiting the process closes the socket (if it didn't create any
401     children using fork).
402    
403     =cut
404    
405     sub run {
406     my ($self, $func, $cb) = @_;
407    
408     $self->[0] = $cb;
409     $self->_cmd ("r", $func);
410     }
411    
412 root 1.1 =back
413    
414     =head1 AUTHOR
415    
416     Marc Lehmann <schmorp@schmorp.de>
417     http://home.schmorp.de/
418    
419     =cut
420    
421     1
422