ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/Fork.pm
Revision: 1.6
Committed: Wed Apr 3 08:47:44 2013 UTC (11 years, 2 months ago) by root
Branch: MAIN
Changes since 1.5: +19 -3 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3 root 1.4 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4 root 1.1
5     =head1 SYNOPSIS
6    
7 root 1.4 use AnyEvent::Fork;
8 root 1.1
9     =head1 DESCRIPTION
10    
11 root 1.4 This module allows you to create new processes, without actually forking
12     them from your current process (avoiding the problems of forking), but
13     preserving most of the advantages of fork.
14    
15     It can be used to create new worker processes or new independent
16     subprocesses for short- and long-running jobs, process pools (e.g. for use
17     in pre-forked servers) but also to spawn new external processes (such as
18     CGI scripts from a webserver), which can be faster (and more well behaved)
19     than using fork+exec in big processes.
20 root 1.1
21 root 1.5 Special care has been taken to make this module useful from other modules,
22     while still supporting specialised environments such as L<App::Staticperl>
23     or L<PAR::Packer>.
24    
25 root 1.1 =head1 PROBLEM STATEMENT
26    
27     There are two ways to implement parallel processing on UNIX like operating
28     systems - fork and process, and fork+exec and process. They have different
29     advantages and disadvantages that I describe below, together with how this
30     module tries to mitigate the disadvantages.
31    
32     =over 4
33    
34     =item Forking from a big process can be very slow (a 5GB process needs
35     0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
36     is often shared with exec (because you have to fork first), but in some
37     circumstances (e.g. when vfork is used), fork+exec can be much faster.
38    
39     This module can help here by telling a small(er) helper process to fork,
40     or fork+exec instead.
41    
42     =item Forking usually creates a copy-on-write copy of the parent
43     process. Memory (for example, modules or data files that have been
44     will not take additional memory). When exec'ing a new process, modules
45     and data files might need to be loaded again, at extra cpu and memory
46     cost. Likewise when forking, all data structures are copied as well - if
47     the program frees them and replaces them by new data, the child processes
48     will retain the memory even if it isn't used.
49    
50     This module allows the main program to do a controlled fork, and allows
51     modules to exec processes safely at any time. When creating a custom
52     process pool you can take advantage of data sharing via fork without
53     risking to share large dynamic data structures that will blow up child
54     memory usage.
55    
56     =item Exec'ing a new perl process might be difficult and slow. For
57     example, it is not easy to find the correct path to the perl interpreter,
58     and all modules have to be loaded from disk again. Long running processes
59     might run into problems when perl is upgraded for example.
60    
61     This module supports creating pre-initialised perl processes to be used
62     as template, and also tries hard to identify the correct path to the perl
63     interpreter. With a cooperative main program, exec'ing the interpreter
64     might not even be necessary.
65    
66     =item Forking might be impossible when a program is running. For example,
67     POSIX makes it almost impossible to fork from a multithreaded program and
68     do anything useful in the child - strictly speaking, if your perl program
69     uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
70     you cannot call fork on the perl level anymore, at all.
71    
72     This module can safely fork helper processes at any time, by caling
73     fork+exec in C, in a POSIX-compatible way.
74    
75     =item Parallel processing with fork might be inconvenient or difficult
76     to implement. For example, when a program uses an event loop and creates
77     watchers it becomes very hard to use the event loop from a child
78     program, as the watchers already exist but are only meaningful in the
79     parent. Worse, a module might want to use such a system, not knowing
80     whether another module or the main program also does, leading to problems.
81    
82     This module only lets the main program create pools by forking (because
83     only the main program can know when it is still safe to do so) - all other
84     pools are created by fork+exec, after which such modules can again be
85     loaded.
86    
87     =back
88    
89 root 1.3 =head1 CONCEPTS
90    
91     This module can create new processes either by executing a new perl
92     process, or by forking from an existing "template" process.
93    
94     Each such process comes with its own file handle that can be used to
95     communicate with it (it's actually a socket - one end in the new process,
96     one end in the main process), and among the things you can do in it are
97     load modules, fork new processes, send file handles to it, and execute
98     functions.
99    
100     There are multiple ways to create additional processes to execute some
101     jobs:
102    
103     =over 4
104    
105     =item fork a new process from the "default" template process, load code,
106     run it
107    
108     This module has a "default" template process which it executes when it is
109     needed the first time. Forking from this process shares the memory used
110     for the perl interpreter with the new process, but loading modules takes
111     time, and the memory is not shared with anything else.
112    
113     This is ideal for when you only need one extra process of a kind, with the
114     option of starting and stipping it on demand.
115    
116     =item fork a new template process, load code, then fork processes off of
117     it and run the code
118    
119     When you need to have a bunch of processes that all execute the same (or
120     very similar) tasks, then a good way is to create a new template process
121     for them, loading all the modules you need, and then create your worker
122     processes from this new template process.
123    
124     This way, all code (and data structures) that can be shared (e.g. the
125     modules you loaded) is shared between the processes, and each new process
126     consumes relatively little memory of its own.
127    
128     The disadvantage of this approach is that you need to create a template
129     process for the sole purpose of forking new processes from it, but if you
130     only need a fixed number of proceses you can create them, and then destroy
131     the template process.
132    
133     =item execute a new perl interpreter, load some code, run it
134    
135     This is relatively slow, and doesn't allow you to share memory between
136     multiple processes.
137    
138     The only advantage is that you don't have to have a template process
139     hanging around all the time to fork off some new processes, which might be
140     an advantage when there are long time spans where no extra processes are
141     needed.
142    
143     =back
144    
145     =head1 FUNCTIONS
146    
147 root 1.1 =over 4
148    
149     =cut
150    
151 root 1.4 package AnyEvent::Fork;
152 root 1.1
153     use common::sense;
154    
155     use Socket ();
156    
157     use AnyEvent;
158 root 1.4 use AnyEvent::Fork::Util;
159 root 1.1 use AnyEvent::Util ();
160    
161 root 1.4 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
162 root 1.1
163 root 1.4 =item my $pool = new AnyEvent::Fork key => value...
164 root 1.1
165     Create a new process pool. The following named parameters are supported:
166    
167     =over 4
168    
169     =back
170    
171     =cut
172    
173 root 1.5 # the early fork template process
174     our $EARLY;
175    
176 root 1.4 # the empty template process
177     our $TEMPLATE;
178    
179     sub _cmd {
180     my $self = shift;
181    
182     # ideally, we would want to use "a (w/a)*" as format string, but perl versions
183 root 1.5 # from at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
184 root 1.4 push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
185    
186     $self->[3] ||= AE::io $self->[1], 1, sub {
187     if (ref $self->[2][0]) {
188     AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
189     and shift @{ $self->[2] };
190 root 1.5
191 root 1.4 } else {
192     my $len = syswrite $self->[1], $self->[2][0]
193     or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
194 root 1.5
195 root 1.4 substr $self->[2][0], 0, $len, "";
196     shift @{ $self->[2] } unless length $self->[2][0];
197     }
198    
199     unless (@{ $self->[2] }) {
200     undef $self->[3];
201     $self->[0]->($self->[1]) if $self->[0];
202     }
203     };
204     }
205 root 1.1
206 root 1.4 sub _new {
207     my ($self, $fh) = @_;
208 root 1.1
209 root 1.6 AnyEvent::Util::fh_nonblocking $fh, 1;
210    
211 root 1.4 $self = bless [
212     undef, # run callback
213 root 1.1 $fh,
214 root 1.4 [], # write queue - strings or fd's
215     undef, # AE watcher
216     ], $self;
217    
218     # my ($a, $b) = AnyEvent::Util::portable_socketpair;
219    
220     # queue_cmd $template, "Iabc";
221     # push @{ $template->[2] }, \$b;
222    
223     # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
224     # undef $b;
225     # die "x" . <$a>;
226    
227     $self
228 root 1.1 }
229    
230 root 1.6 # fork template from current process, used by AnyEvent::Fork::Early/Template
231     sub _new_fork {
232     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
233     my $pid = fork;
234    
235     if ($pid eq 0) {
236     require AnyEvent::Fork::Serve;
237     close $fh;
238     AnyEvent::Fork::Serve::serve ($slave);
239     AnyEvent::Fork::Util::_exit 0;
240     } elsif (!$pid) {
241     die "AnyEvent::Fork::Early/Template: unable to fork template process: $!";
242     }
243    
244     AnyEvent::Fork->_new ($fh)
245     }
246    
247 root 1.4 =item my $proc = new AnyEvent::Fork
248 root 1.1
249 root 1.4 Create a new "empty" perl interpreter process and returns its process
250     object for further manipulation.
251 root 1.1
252 root 1.4 The new process is forked from a template process that is kept around
253     for this purpose. When it doesn't exist yet, it is created by a call to
254     C<new_exec> and kept around for future calls.
255    
256     =cut
257    
258     sub new {
259     my $class = shift;
260 root 1.1
261 root 1.4 $TEMPLATE ||= $class->new_exec;
262     $TEMPLATE->fork
263 root 1.1 }
264    
265 root 1.4 =item $new_proc = $proc->fork
266    
267     Forks C<$proc>, creating a new process, and returns the process object
268     of the new process.
269    
270     If any of the C<send_> functions have been called before fork, then they
271     will be cloned in the child. For example, in a pre-forked server, you
272     might C<send_fh> the listening socket into the template process, and then
273     keep calling C<fork> and C<run>.
274    
275     =cut
276    
277     sub fork {
278     my ($self) = @_;
279 root 1.1
280     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
281 root 1.4
282     $self->send_fh ($slave);
283     $self->_cmd ("f");
284    
285     AnyEvent::Fork->_new ($fh)
286     }
287    
288     =item my $proc = new_exec AnyEvent::Fork
289    
290     Create a new "empty" perl interpreter process and returns its process
291     object for further manipulation.
292    
293     Unlike the C<new> method, this method I<always> spawns a new perl process
294     (except in some cases, see L<AnyEvent::Fork::Early> for details). This
295     reduces the amount of memory sharing that is possible, and is also slower.
296    
297     You should use C<new> whenever possible, except when having a template
298     process around is unacceptable.
299    
300     The path to the perl interpreter is divined usign various methods - first
301     C<$^X> is investigated to see if the path ends with something that sounds
302     as if it were the perl interpreter. Failing this, the module falls back to
303     using C<$Config::Config{perlpath}>.
304    
305     =cut
306    
307     sub new_exec {
308     my ($self) = @_;
309    
310 root 1.5 return $EARLY->fork
311     if $EARLY;
312    
313 root 1.4 # first find path of perl
314     my $perl = $;
315    
316     # first we try $^X, but the path must be absolute (always on win32), and end in sth.
317     # that looks like perl. this obviously only works for posix and win32
318     unless (
319     (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
320     && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
321     ) {
322     # if it doesn't look perlish enough, try Config
323     require Config;
324     $perl = $Config::Config{perlpath};
325     $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
326     }
327    
328     require Proc::FastSpawn;
329    
330     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
331     Proc::FastSpawn::fd_inherit (fileno $slave);
332    
333     # quick. also doesn't work in win32. of course. what did you expect
334     #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
335 root 1.1 my %env = %ENV;
336     $env{PERL5LIB} = join ":", grep !ref, @INC;
337    
338 root 1.4 Proc::FastSpawn::spawn (
339     $perl,
340     ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave],
341     [map "$_=$env{$_}", keys %env],
342     ) or die "unable to spawn AnyEvent::Fork server: $!";
343    
344     $self->_new ($fh)
345     }
346    
347     =item $proc = $proc->require ($module, ...)
348 root 1.1
349 root 1.4 Tries to load the given modules into the process
350 root 1.1
351 root 1.4 Returns the process object for easy chaining of method calls.
352 root 1.1
353 root 1.4 =item $proc = $proc->send_fh ($handle, ...)
354 root 1.1
355 root 1.4 Send one or more file handles (I<not> file descriptors) to the process,
356     to prepare a call to C<run>.
357 root 1.1
358 root 1.4 The process object keeps a reference to the handles until this is done,
359     so you must not explicitly close the handles. This is most easily
360     accomplished by simply not storing the file handles anywhere after passing
361     them to this method.
362    
363     Returns the process object for easy chaining of method calls.
364    
365     =cut
366    
367     sub send_fh {
368     my ($self, @fh) = @_;
369    
370     for my $fh (@fh) {
371     $self->_cmd ("h");
372     push @{ $self->[2] }, \$fh;
373     }
374    
375     $self
376 root 1.1 }
377    
378 root 1.4 =item $proc = $proc->send_arg ($string, ...)
379    
380     Send one or more argument strings to the process, to prepare a call to
381     C<run>. The strings can be any octet string.
382    
383     Returns the process object for easy chaining of emthod calls.
384    
385     =cut
386 root 1.1
387 root 1.4 sub send_arg {
388     my ($self, @arg) = @_;
389 root 1.1
390 root 1.4 $self->_cmd (a => @arg);
391 root 1.1
392     $self
393     }
394    
395 root 1.4 =item $proc->run ($func, $cb->($fh))
396    
397     Enter the function specified by the fully qualified name in C<$func> in
398     the process. The function is called with the communication socket as first
399     argument, followed by all file handles and string arguments sent earlier
400     via C<send_fh> and C<send_arg> methods, in the order they were called.
401    
402     If the called function returns, the process exits.
403    
404     Preparing the process can take time - when the process is ready, the
405     callback is invoked with the local communications socket as argument.
406    
407     The process object becomes unusable on return from this function.
408    
409     If the communication socket isn't used, it should be closed on both sides,
410     to save on kernel memory.
411    
412     The socket is non-blocking in the parent, and blocking in the newly
413     created process. The close-on-exec flag is set on both. Even if not used
414     otherwise, the socket can be a good indicator for the existance of the
415     process - if the othe rprocess exits, you get a readable event on it,
416     because exiting the process closes the socket (if it didn't create any
417     children using fork).
418    
419     =cut
420    
421     sub run {
422     my ($self, $func, $cb) = @_;
423    
424     $self->[0] = $cb;
425     $self->_cmd ("r", $func);
426     }
427    
428 root 1.1 =back
429    
430     =head1 AUTHOR
431    
432     Marc Lehmann <schmorp@schmorp.de>
433     http://home.schmorp.de/
434    
435     =cut
436    
437     1
438