ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/Fork.pm
Revision: 1.8
Committed: Thu Apr 4 01:54:40 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
Changes since 1.7: +9 -2 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3 root 1.4 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4 root 1.1
5     =head1 SYNOPSIS
6    
7 root 1.4 use AnyEvent::Fork;
8 root 1.1
9     =head1 DESCRIPTION
10    
11 root 1.4 This module allows you to create new processes, without actually forking
12     them from your current process (avoiding the problems of forking), but
13     preserving most of the advantages of fork.
14    
15     It can be used to create new worker processes or new independent
16     subprocesses for short- and long-running jobs, process pools (e.g. for use
17     in pre-forked servers) but also to spawn new external processes (such as
18     CGI scripts from a webserver), which can be faster (and more well behaved)
19     than using fork+exec in big processes.
20 root 1.1
21 root 1.5 Special care has been taken to make this module useful from other modules,
22     while still supporting specialised environments such as L<App::Staticperl>
23     or L<PAR::Packer>.
24    
25 root 1.1 =head1 PROBLEM STATEMENT
26    
27     There are two ways to implement parallel processing on UNIX like operating
28     systems - fork and process, and fork+exec and process. They have different
29     advantages and disadvantages that I describe below, together with how this
30     module tries to mitigate the disadvantages.
31    
32     =over 4
33    
34     =item Forking from a big process can be very slow (a 5GB process needs
35     0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
36     is often shared with exec (because you have to fork first), but in some
37     circumstances (e.g. when vfork is used), fork+exec can be much faster.
38    
39     This module can help here by telling a small(er) helper process to fork,
40     or fork+exec instead.
41    
42     =item Forking usually creates a copy-on-write copy of the parent
43     process. Memory (for example, modules or data files that have been
44     will not take additional memory). When exec'ing a new process, modules
45     and data files might need to be loaded again, at extra cpu and memory
46     cost. Likewise when forking, all data structures are copied as well - if
47     the program frees them and replaces them by new data, the child processes
48     will retain the memory even if it isn't used.
49    
50     This module allows the main program to do a controlled fork, and allows
51     modules to exec processes safely at any time. When creating a custom
52     process pool you can take advantage of data sharing via fork without
53     risking to share large dynamic data structures that will blow up child
54     memory usage.
55    
56     =item Exec'ing a new perl process might be difficult and slow. For
57     example, it is not easy to find the correct path to the perl interpreter,
58     and all modules have to be loaded from disk again. Long running processes
59     might run into problems when perl is upgraded for example.
60    
61     This module supports creating pre-initialised perl processes to be used
62     as template, and also tries hard to identify the correct path to the perl
63     interpreter. With a cooperative main program, exec'ing the interpreter
64     might not even be necessary.
65    
66     =item Forking might be impossible when a program is running. For example,
67     POSIX makes it almost impossible to fork from a multithreaded program and
68     do anything useful in the child - strictly speaking, if your perl program
69     uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
70     you cannot call fork on the perl level anymore, at all.
71    
72     This module can safely fork helper processes at any time, by caling
73     fork+exec in C, in a POSIX-compatible way.
74    
75     =item Parallel processing with fork might be inconvenient or difficult
76     to implement. For example, when a program uses an event loop and creates
77     watchers it becomes very hard to use the event loop from a child
78     program, as the watchers already exist but are only meaningful in the
79     parent. Worse, a module might want to use such a system, not knowing
80     whether another module or the main program also does, leading to problems.
81    
82     This module only lets the main program create pools by forking (because
83     only the main program can know when it is still safe to do so) - all other
84     pools are created by fork+exec, after which such modules can again be
85     loaded.
86    
87     =back
88    
89 root 1.3 =head1 CONCEPTS
90    
91     This module can create new processes either by executing a new perl
92     process, or by forking from an existing "template" process.
93    
94     Each such process comes with its own file handle that can be used to
95     communicate with it (it's actually a socket - one end in the new process,
96     one end in the main process), and among the things you can do in it are
97     load modules, fork new processes, send file handles to it, and execute
98     functions.
99    
100     There are multiple ways to create additional processes to execute some
101     jobs:
102    
103     =over 4
104    
105     =item fork a new process from the "default" template process, load code,
106     run it
107    
108     This module has a "default" template process which it executes when it is
109     needed the first time. Forking from this process shares the memory used
110     for the perl interpreter with the new process, but loading modules takes
111     time, and the memory is not shared with anything else.
112    
113     This is ideal for when you only need one extra process of a kind, with the
114     option of starting and stipping it on demand.
115    
116     =item fork a new template process, load code, then fork processes off of
117     it and run the code
118    
119     When you need to have a bunch of processes that all execute the same (or
120     very similar) tasks, then a good way is to create a new template process
121     for them, loading all the modules you need, and then create your worker
122     processes from this new template process.
123    
124     This way, all code (and data structures) that can be shared (e.g. the
125     modules you loaded) is shared between the processes, and each new process
126     consumes relatively little memory of its own.
127    
128     The disadvantage of this approach is that you need to create a template
129     process for the sole purpose of forking new processes from it, but if you
130     only need a fixed number of proceses you can create them, and then destroy
131     the template process.
132    
133     =item execute a new perl interpreter, load some code, run it
134    
135     This is relatively slow, and doesn't allow you to share memory between
136     multiple processes.
137    
138     The only advantage is that you don't have to have a template process
139     hanging around all the time to fork off some new processes, which might be
140     an advantage when there are long time spans where no extra processes are
141     needed.
142    
143     =back
144    
145     =head1 FUNCTIONS
146    
147 root 1.1 =over 4
148    
149     =cut
150    
151 root 1.4 package AnyEvent::Fork;
152 root 1.1
153     use common::sense;
154    
155     use Socket ();
156    
157     use AnyEvent;
158 root 1.4 use AnyEvent::Fork::Util;
159 root 1.1 use AnyEvent::Util ();
160    
161 root 1.4 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
162 root 1.1
163 root 1.4 =item my $pool = new AnyEvent::Fork key => value...
164 root 1.1
165     Create a new process pool. The following named parameters are supported:
166    
167     =over 4
168    
169     =back
170    
171     =cut
172    
173 root 1.5 # the early fork template process
174     our $EARLY;
175    
176 root 1.4 # the empty template process
177     our $TEMPLATE;
178    
179     sub _cmd {
180     my $self = shift;
181    
182     # ideally, we would want to use "a (w/a)*" as format string, but perl versions
183 root 1.5 # from at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
184 root 1.4 push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
185    
186     $self->[3] ||= AE::io $self->[1], 1, sub {
187     if (ref $self->[2][0]) {
188     AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
189     and shift @{ $self->[2] };
190 root 1.5
191 root 1.4 } else {
192     my $len = syswrite $self->[1], $self->[2][0]
193     or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
194 root 1.5
195 root 1.4 substr $self->[2][0], 0, $len, "";
196     shift @{ $self->[2] } unless length $self->[2][0];
197     }
198    
199     unless (@{ $self->[2] }) {
200     undef $self->[3];
201     $self->[0]->($self->[1]) if $self->[0];
202     }
203     };
204     }
205 root 1.1
206 root 1.4 sub _new {
207     my ($self, $fh) = @_;
208 root 1.1
209 root 1.6 AnyEvent::Util::fh_nonblocking $fh, 1;
210    
211 root 1.4 $self = bless [
212     undef, # run callback
213 root 1.1 $fh,
214 root 1.4 [], # write queue - strings or fd's
215     undef, # AE watcher
216     ], $self;
217    
218     # my ($a, $b) = AnyEvent::Util::portable_socketpair;
219    
220     # queue_cmd $template, "Iabc";
221     # push @{ $template->[2] }, \$b;
222    
223     # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
224     # undef $b;
225     # die "x" . <$a>;
226    
227     $self
228 root 1.1 }
229    
230 root 1.6 # fork template from current process, used by AnyEvent::Fork::Early/Template
231     sub _new_fork {
232     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
233 root 1.7 my $parent = $$;
234    
235 root 1.6 my $pid = fork;
236    
237     if ($pid eq 0) {
238     require AnyEvent::Fork::Serve;
239 root 1.7 $AnyEvent::Fork::Serve::OWNER = $parent;
240 root 1.6 close $fh;
241 root 1.7 $0 = "$_[1] of $parent";
242 root 1.6 AnyEvent::Fork::Serve::serve ($slave);
243     AnyEvent::Fork::Util::_exit 0;
244     } elsif (!$pid) {
245     die "AnyEvent::Fork::Early/Template: unable to fork template process: $!";
246     }
247    
248     AnyEvent::Fork->_new ($fh)
249     }
250    
251 root 1.4 =item my $proc = new AnyEvent::Fork
252 root 1.1
253 root 1.4 Create a new "empty" perl interpreter process and returns its process
254     object for further manipulation.
255 root 1.1
256 root 1.4 The new process is forked from a template process that is kept around
257     for this purpose. When it doesn't exist yet, it is created by a call to
258     C<new_exec> and kept around for future calls.
259    
260     =cut
261    
262     sub new {
263     my $class = shift;
264 root 1.1
265 root 1.4 $TEMPLATE ||= $class->new_exec;
266     $TEMPLATE->fork
267 root 1.1 }
268    
269 root 1.4 =item $new_proc = $proc->fork
270    
271     Forks C<$proc>, creating a new process, and returns the process object
272     of the new process.
273    
274     If any of the C<send_> functions have been called before fork, then they
275     will be cloned in the child. For example, in a pre-forked server, you
276     might C<send_fh> the listening socket into the template process, and then
277     keep calling C<fork> and C<run>.
278    
279     =cut
280    
281     sub fork {
282     my ($self) = @_;
283 root 1.1
284     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
285 root 1.4
286     $self->send_fh ($slave);
287     $self->_cmd ("f");
288    
289     AnyEvent::Fork->_new ($fh)
290     }
291    
292     =item my $proc = new_exec AnyEvent::Fork
293    
294     Create a new "empty" perl interpreter process and returns its process
295     object for further manipulation.
296    
297     Unlike the C<new> method, this method I<always> spawns a new perl process
298     (except in some cases, see L<AnyEvent::Fork::Early> for details). This
299     reduces the amount of memory sharing that is possible, and is also slower.
300    
301     You should use C<new> whenever possible, except when having a template
302     process around is unacceptable.
303    
304     The path to the perl interpreter is divined usign various methods - first
305     C<$^X> is investigated to see if the path ends with something that sounds
306     as if it were the perl interpreter. Failing this, the module falls back to
307     using C<$Config::Config{perlpath}>.
308    
309     =cut
310    
311     sub new_exec {
312     my ($self) = @_;
313    
314 root 1.5 return $EARLY->fork
315     if $EARLY;
316    
317 root 1.4 # first find path of perl
318     my $perl = $;
319    
320     # first we try $^X, but the path must be absolute (always on win32), and end in sth.
321     # that looks like perl. this obviously only works for posix and win32
322     unless (
323     (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
324     && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
325     ) {
326     # if it doesn't look perlish enough, try Config
327     require Config;
328     $perl = $Config::Config{perlpath};
329     $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
330     }
331    
332     require Proc::FastSpawn;
333    
334     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
335     Proc::FastSpawn::fd_inherit (fileno $slave);
336    
337     # quick. also doesn't work in win32. of course. what did you expect
338     #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
339 root 1.1 my %env = %ENV;
340 root 1.8 $env{PERL5LIB} = join +(AnyEvent::Fork::Util::WIN32 ? ";" : ":"), grep !ref, @INC;
341 root 1.1
342 root 1.4 Proc::FastSpawn::spawn (
343     $perl,
344 root 1.7 ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave, $$],
345 root 1.4 [map "$_=$env{$_}", keys %env],
346     ) or die "unable to spawn AnyEvent::Fork server: $!";
347    
348     $self->_new ($fh)
349     }
350    
351     =item $proc = $proc->require ($module, ...)
352 root 1.1
353 root 1.4 Tries to load the given modules into the process
354 root 1.1
355 root 1.4 Returns the process object for easy chaining of method calls.
356 root 1.1
357 root 1.4 =item $proc = $proc->send_fh ($handle, ...)
358 root 1.1
359 root 1.4 Send one or more file handles (I<not> file descriptors) to the process,
360     to prepare a call to C<run>.
361 root 1.1
362 root 1.4 The process object keeps a reference to the handles until this is done,
363     so you must not explicitly close the handles. This is most easily
364     accomplished by simply not storing the file handles anywhere after passing
365     them to this method.
366    
367     Returns the process object for easy chaining of method calls.
368    
369     =cut
370    
371     sub send_fh {
372     my ($self, @fh) = @_;
373    
374     for my $fh (@fh) {
375     $self->_cmd ("h");
376     push @{ $self->[2] }, \$fh;
377 root 1.8 push @$self, $fh; # dire hack
378 root 1.4 }
379    
380     $self
381 root 1.1 }
382    
383 root 1.4 =item $proc = $proc->send_arg ($string, ...)
384    
385     Send one or more argument strings to the process, to prepare a call to
386     C<run>. The strings can be any octet string.
387    
388     Returns the process object for easy chaining of emthod calls.
389    
390     =cut
391 root 1.1
392 root 1.4 sub send_arg {
393     my ($self, @arg) = @_;
394 root 1.1
395 root 1.4 $self->_cmd (a => @arg);
396 root 1.1
397     $self
398     }
399    
400 root 1.4 =item $proc->run ($func, $cb->($fh))
401    
402     Enter the function specified by the fully qualified name in C<$func> in
403     the process. The function is called with the communication socket as first
404     argument, followed by all file handles and string arguments sent earlier
405     via C<send_fh> and C<send_arg> methods, in the order they were called.
406    
407     If the called function returns, the process exits.
408    
409     Preparing the process can take time - when the process is ready, the
410     callback is invoked with the local communications socket as argument.
411    
412     The process object becomes unusable on return from this function.
413    
414     If the communication socket isn't used, it should be closed on both sides,
415     to save on kernel memory.
416    
417     The socket is non-blocking in the parent, and blocking in the newly
418     created process. The close-on-exec flag is set on both. Even if not used
419     otherwise, the socket can be a good indicator for the existance of the
420 root 1.8 process - if the other process exits, you get a readable event on it,
421 root 1.4 because exiting the process closes the socket (if it didn't create any
422     children using fork).
423    
424     =cut
425    
426     sub run {
427     my ($self, $func, $cb) = @_;
428    
429     $self->[0] = $cb;
430     $self->_cmd ("r", $func);
431     }
432    
433 root 1.1 =back
434    
435 root 1.8 =head1 PORTABILITY NOTES
436    
437     Win32 is a loser - code has been written for this platform, pain has been
438     felt, but in the end, this platform is just too broken - maybe a later
439     version can do it.
440    
441 root 1.1 =head1 AUTHOR
442    
443     Marc Lehmann <schmorp@schmorp.de>
444     http://home.schmorp.de/
445    
446     =cut
447    
448     1
449