ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/Fork.pm
Revision: 1.4
Committed: Wed Apr 3 07:35:57 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
Changes since 1.3: +216 -67 lines
Log Message:
phew

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3 root 1.4 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4 root 1.1
5     =head1 SYNOPSIS
6    
7 root 1.4 use AnyEvent::Fork;
8 root 1.1
9     =head1 DESCRIPTION
10    
11 root 1.4 This module allows you to create new processes, without actually forking
12     them from your current process (avoiding the problems of forking), but
13     preserving most of the advantages of fork.
14    
15     It can be used to create new worker processes or new independent
16     subprocesses for short- and long-running jobs, process pools (e.g. for use
17     in pre-forked servers) but also to spawn new external processes (such as
18     CGI scripts from a webserver), which can be faster (and more well behaved)
19     than using fork+exec in big processes.
20 root 1.1
21     =head1 PROBLEM STATEMENT
22    
23     There are two ways to implement parallel processing on UNIX like operating
24     systems - fork and process, and fork+exec and process. They have different
25     advantages and disadvantages that I describe below, together with how this
26     module tries to mitigate the disadvantages.
27    
28     =over 4
29    
30     =item Forking from a big process can be very slow (a 5GB process needs
31     0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
32     is often shared with exec (because you have to fork first), but in some
33     circumstances (e.g. when vfork is used), fork+exec can be much faster.
34    
35     This module can help here by telling a small(er) helper process to fork,
36     or fork+exec instead.
37    
38     =item Forking usually creates a copy-on-write copy of the parent
39     process. Memory (for example, modules or data files that have been
40     will not take additional memory). When exec'ing a new process, modules
41     and data files might need to be loaded again, at extra cpu and memory
42     cost. Likewise when forking, all data structures are copied as well - if
43     the program frees them and replaces them by new data, the child processes
44     will retain the memory even if it isn't used.
45    
46     This module allows the main program to do a controlled fork, and allows
47     modules to exec processes safely at any time. When creating a custom
48     process pool you can take advantage of data sharing via fork without
49     risking to share large dynamic data structures that will blow up child
50     memory usage.
51    
52     =item Exec'ing a new perl process might be difficult and slow. For
53     example, it is not easy to find the correct path to the perl interpreter,
54     and all modules have to be loaded from disk again. Long running processes
55     might run into problems when perl is upgraded for example.
56    
57     This module supports creating pre-initialised perl processes to be used
58     as template, and also tries hard to identify the correct path to the perl
59     interpreter. With a cooperative main program, exec'ing the interpreter
60     might not even be necessary.
61    
62     =item Forking might be impossible when a program is running. For example,
63     POSIX makes it almost impossible to fork from a multithreaded program and
64     do anything useful in the child - strictly speaking, if your perl program
65     uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
66     you cannot call fork on the perl level anymore, at all.
67    
68     This module can safely fork helper processes at any time, by caling
69     fork+exec in C, in a POSIX-compatible way.
70    
71     =item Parallel processing with fork might be inconvenient or difficult
72     to implement. For example, when a program uses an event loop and creates
73     watchers it becomes very hard to use the event loop from a child
74     program, as the watchers already exist but are only meaningful in the
75     parent. Worse, a module might want to use such a system, not knowing
76     whether another module or the main program also does, leading to problems.
77    
78     This module only lets the main program create pools by forking (because
79     only the main program can know when it is still safe to do so) - all other
80     pools are created by fork+exec, after which such modules can again be
81     loaded.
82    
83     =back
84    
85 root 1.3 =head1 CONCEPTS
86    
87     This module can create new processes either by executing a new perl
88     process, or by forking from an existing "template" process.
89    
90     Each such process comes with its own file handle that can be used to
91     communicate with it (it's actually a socket - one end in the new process,
92     one end in the main process), and among the things you can do in it are
93     load modules, fork new processes, send file handles to it, and execute
94     functions.
95    
96     There are multiple ways to create additional processes to execute some
97     jobs:
98    
99     =over 4
100    
101     =item fork a new process from the "default" template process, load code,
102     run it
103    
104     This module has a "default" template process which it executes when it is
105     needed the first time. Forking from this process shares the memory used
106     for the perl interpreter with the new process, but loading modules takes
107     time, and the memory is not shared with anything else.
108    
109     This is ideal for when you only need one extra process of a kind, with the
110     option of starting and stipping it on demand.
111    
112     =item fork a new template process, load code, then fork processes off of
113     it and run the code
114    
115     When you need to have a bunch of processes that all execute the same (or
116     very similar) tasks, then a good way is to create a new template process
117     for them, loading all the modules you need, and then create your worker
118     processes from this new template process.
119    
120     This way, all code (and data structures) that can be shared (e.g. the
121     modules you loaded) is shared between the processes, and each new process
122     consumes relatively little memory of its own.
123    
124     The disadvantage of this approach is that you need to create a template
125     process for the sole purpose of forking new processes from it, but if you
126     only need a fixed number of proceses you can create them, and then destroy
127     the template process.
128    
129     =item execute a new perl interpreter, load some code, run it
130    
131     This is relatively slow, and doesn't allow you to share memory between
132     multiple processes.
133    
134     The only advantage is that you don't have to have a template process
135     hanging around all the time to fork off some new processes, which might be
136     an advantage when there are long time spans where no extra processes are
137     needed.
138    
139     =back
140    
141     =head1 FUNCTIONS
142    
143 root 1.1 =over 4
144    
145     =cut
146    
147 root 1.4 package AnyEvent::Fork;
148 root 1.1
149     use common::sense;
150    
151     use Socket ();
152    
153     use AnyEvent;
154 root 1.4 use AnyEvent::Fork::Util;
155 root 1.1 use AnyEvent::Util ();
156    
157 root 1.4 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
158 root 1.1
159 root 1.4 =item my $pool = new AnyEvent::Fork key => value...
160 root 1.1
161     Create a new process pool. The following named parameters are supported:
162    
163     =over 4
164    
165     =back
166    
167     =cut
168    
169 root 1.4 # the empty template process
170     our $TEMPLATE;
171    
172     sub _cmd {
173     my $self = shift;
174    
175     # ideally, we would want to use "a (w/a)*" as format string, but perl versions
176     # form at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
177     push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
178    
179     $self->[3] ||= AE::io $self->[1], 1, sub {
180     if (ref $self->[2][0]) {
181     AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
182     and shift @{ $self->[2] };
183     } else {
184     my $len = syswrite $self->[1], $self->[2][0]
185     or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
186     substr $self->[2][0], 0, $len, "";
187     shift @{ $self->[2] } unless length $self->[2][0];
188     }
189    
190     unless (@{ $self->[2] }) {
191     undef $self->[3];
192     $self->[0]->($self->[1]) if $self->[0];
193     }
194     };
195     }
196 root 1.1
197 root 1.4 sub _new {
198     my ($self, $fh) = @_;
199 root 1.1
200 root 1.4 $self = bless [
201     undef, # run callback
202 root 1.1 $fh,
203 root 1.4 [], # write queue - strings or fd's
204     undef, # AE watcher
205     ], $self;
206    
207     # my ($a, $b) = AnyEvent::Util::portable_socketpair;
208    
209     # queue_cmd $template, "Iabc";
210     # push @{ $template->[2] }, \$b;
211    
212     # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
213     # undef $b;
214     # die "x" . <$a>;
215    
216     $self
217 root 1.1 }
218    
219 root 1.4 =item my $proc = new AnyEvent::Fork
220 root 1.1
221 root 1.4 Create a new "empty" perl interpreter process and returns its process
222     object for further manipulation.
223 root 1.1
224 root 1.4 The new process is forked from a template process that is kept around
225     for this purpose. When it doesn't exist yet, it is created by a call to
226     C<new_exec> and kept around for future calls.
227    
228     =cut
229    
230     sub new {
231     my $class = shift;
232 root 1.1
233 root 1.4 $TEMPLATE ||= $class->new_exec;
234     $TEMPLATE->fork
235 root 1.1 }
236    
237 root 1.4 =item $new_proc = $proc->fork
238    
239     Forks C<$proc>, creating a new process, and returns the process object
240     of the new process.
241    
242     If any of the C<send_> functions have been called before fork, then they
243     will be cloned in the child. For example, in a pre-forked server, you
244     might C<send_fh> the listening socket into the template process, and then
245     keep calling C<fork> and C<run>.
246    
247     =cut
248    
249     sub fork {
250     my ($self) = @_;
251 root 1.1
252     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
253 root 1.4
254     $self->send_fh ($slave);
255     $self->_cmd ("f");
256    
257 root 1.1 AnyEvent::Util::fh_nonblocking $fh, 1;
258    
259 root 1.4 AnyEvent::Fork->_new ($fh)
260     }
261    
262     =item my $proc = new_exec AnyEvent::Fork
263    
264     Create a new "empty" perl interpreter process and returns its process
265     object for further manipulation.
266    
267     Unlike the C<new> method, this method I<always> spawns a new perl process
268     (except in some cases, see L<AnyEvent::Fork::Early> for details). This
269     reduces the amount of memory sharing that is possible, and is also slower.
270    
271     You should use C<new> whenever possible, except when having a template
272     process around is unacceptable.
273    
274     The path to the perl interpreter is divined usign various methods - first
275     C<$^X> is investigated to see if the path ends with something that sounds
276     as if it were the perl interpreter. Failing this, the module falls back to
277     using C<$Config::Config{perlpath}>.
278    
279     =cut
280    
281     sub new_exec {
282     my ($self) = @_;
283    
284     # first find path of perl
285     my $perl = $;
286    
287     # first we try $^X, but the path must be absolute (always on win32), and end in sth.
288     # that looks like perl. this obviously only works for posix and win32
289     unless (
290     (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
291     && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
292     ) {
293     # if it doesn't look perlish enough, try Config
294     require Config;
295     $perl = $Config::Config{perlpath};
296     $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
297     }
298    
299     require Proc::FastSpawn;
300    
301     my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
302     AnyEvent::Util::fh_nonblocking $fh, 1;
303     Proc::FastSpawn::fd_inherit (fileno $slave);
304    
305     # quick. also doesn't work in win32. of course. what did you expect
306     #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
307 root 1.1 my %env = %ENV;
308     $env{PERL5LIB} = join ":", grep !ref, @INC;
309    
310 root 1.4 Proc::FastSpawn::spawn (
311     $perl,
312     ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave],
313     [map "$_=$env{$_}", keys %env],
314     ) or die "unable to spawn AnyEvent::Fork server: $!";
315    
316     $self->_new ($fh)
317     }
318    
319     =item $proc = $proc->require ($module, ...)
320 root 1.1
321 root 1.4 Tries to load the given modules into the process
322 root 1.1
323 root 1.4 Returns the process object for easy chaining of method calls.
324 root 1.1
325 root 1.4 =item $proc = $proc->send_fh ($handle, ...)
326 root 1.1
327 root 1.4 Send one or more file handles (I<not> file descriptors) to the process,
328     to prepare a call to C<run>.
329 root 1.1
330 root 1.4 The process object keeps a reference to the handles until this is done,
331     so you must not explicitly close the handles. This is most easily
332     accomplished by simply not storing the file handles anywhere after passing
333     them to this method.
334    
335     Returns the process object for easy chaining of method calls.
336    
337     =cut
338    
339     sub send_fh {
340     my ($self, @fh) = @_;
341    
342     for my $fh (@fh) {
343     $self->_cmd ("h");
344     push @{ $self->[2] }, \$fh;
345     }
346    
347     $self
348 root 1.1 }
349    
350 root 1.4 =item $proc = $proc->send_arg ($string, ...)
351    
352     Send one or more argument strings to the process, to prepare a call to
353     C<run>. The strings can be any octet string.
354    
355     Returns the process object for easy chaining of emthod calls.
356    
357     =cut
358 root 1.1
359 root 1.4 sub send_arg {
360     my ($self, @arg) = @_;
361 root 1.1
362 root 1.4 $self->_cmd (a => @arg);
363 root 1.1
364     $self
365     }
366    
367 root 1.4 =item $proc->run ($func, $cb->($fh))
368    
369     Enter the function specified by the fully qualified name in C<$func> in
370     the process. The function is called with the communication socket as first
371     argument, followed by all file handles and string arguments sent earlier
372     via C<send_fh> and C<send_arg> methods, in the order they were called.
373    
374     If the called function returns, the process exits.
375    
376     Preparing the process can take time - when the process is ready, the
377     callback is invoked with the local communications socket as argument.
378    
379     The process object becomes unusable on return from this function.
380    
381     If the communication socket isn't used, it should be closed on both sides,
382     to save on kernel memory.
383    
384     The socket is non-blocking in the parent, and blocking in the newly
385     created process. The close-on-exec flag is set on both. Even if not used
386     otherwise, the socket can be a good indicator for the existance of the
387     process - if the othe rprocess exits, you get a readable event on it,
388     because exiting the process closes the socket (if it didn't create any
389     children using fork).
390    
391     =cut
392    
393     sub run {
394     my ($self, $func, $cb) = @_;
395    
396     $self->[0] = $cb;
397     $self->_cmd ("r", $func);
398     }
399    
400 root 1.1 =back
401    
402     =head1 AUTHOR
403    
404     Marc Lehmann <schmorp@schmorp.de>
405     http://home.schmorp.de/
406    
407     =cut
408    
409     1
410