ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/Fork.pm
Revision: 1.6
Committed: Wed Apr 3 08:47:44 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
Changes since 1.5: +19 -3 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4
5 =head1 SYNOPSIS
6
7 use AnyEvent::Fork;
8
9 =head1 DESCRIPTION
10
11 This module allows you to create new processes, without actually forking
12 them from your current process (avoiding the problems of forking), but
13 preserving most of the advantages of fork.
14
15 It can be used to create new worker processes or new independent
16 subprocesses for short- and long-running jobs, process pools (e.g. for use
17 in pre-forked servers) but also to spawn new external processes (such as
18 CGI scripts from a webserver), which can be faster (and more well behaved)
19 than using fork+exec in big processes.
20
21 Special care has been taken to make this module useful from other modules,
22 while still supporting specialised environments such as L<App::Staticperl>
23 or L<PAR::Packer>.
24
25 =head1 PROBLEM STATEMENT
26
27 There are two ways to implement parallel processing on UNIX like operating
28 systems - fork and process, and fork+exec and process. They have different
29 advantages and disadvantages that I describe below, together with how this
30 module tries to mitigate the disadvantages.
31
32 =over 4
33
34 =item Forking from a big process can be very slow (a 5GB process needs
35 0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
36 is often shared with exec (because you have to fork first), but in some
37 circumstances (e.g. when vfork is used), fork+exec can be much faster.
38
39 This module can help here by telling a small(er) helper process to fork,
40 or fork+exec instead.
41
42 =item Forking usually creates a copy-on-write copy of the parent
43 process. Memory (for example, modules or data files that have been
44 will not take additional memory). When exec'ing a new process, modules
45 and data files might need to be loaded again, at extra cpu and memory
46 cost. Likewise when forking, all data structures are copied as well - if
47 the program frees them and replaces them by new data, the child processes
48 will retain the memory even if it isn't used.
49
50 This module allows the main program to do a controlled fork, and allows
51 modules to exec processes safely at any time. When creating a custom
52 process pool you can take advantage of data sharing via fork without
53 risking to share large dynamic data structures that will blow up child
54 memory usage.
55
56 =item Exec'ing a new perl process might be difficult and slow. For
57 example, it is not easy to find the correct path to the perl interpreter,
58 and all modules have to be loaded from disk again. Long running processes
59 might run into problems when perl is upgraded for example.
60
61 This module supports creating pre-initialised perl processes to be used
62 as template, and also tries hard to identify the correct path to the perl
63 interpreter. With a cooperative main program, exec'ing the interpreter
64 might not even be necessary.
65
66 =item Forking might be impossible when a program is running. For example,
67 POSIX makes it almost impossible to fork from a multithreaded program and
68 do anything useful in the child - strictly speaking, if your perl program
69 uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
70 you cannot call fork on the perl level anymore, at all.
71
72 This module can safely fork helper processes at any time, by caling
73 fork+exec in C, in a POSIX-compatible way.
74
75 =item Parallel processing with fork might be inconvenient or difficult
76 to implement. For example, when a program uses an event loop and creates
77 watchers it becomes very hard to use the event loop from a child
78 program, as the watchers already exist but are only meaningful in the
79 parent. Worse, a module might want to use such a system, not knowing
80 whether another module or the main program also does, leading to problems.
81
82 This module only lets the main program create pools by forking (because
83 only the main program can know when it is still safe to do so) - all other
84 pools are created by fork+exec, after which such modules can again be
85 loaded.
86
87 =back
88
89 =head1 CONCEPTS
90
91 This module can create new processes either by executing a new perl
92 process, or by forking from an existing "template" process.
93
94 Each such process comes with its own file handle that can be used to
95 communicate with it (it's actually a socket - one end in the new process,
96 one end in the main process), and among the things you can do in it are
97 load modules, fork new processes, send file handles to it, and execute
98 functions.
99
100 There are multiple ways to create additional processes to execute some
101 jobs:
102
103 =over 4
104
105 =item fork a new process from the "default" template process, load code,
106 run it
107
108 This module has a "default" template process which it executes when it is
109 needed the first time. Forking from this process shares the memory used
110 for the perl interpreter with the new process, but loading modules takes
111 time, and the memory is not shared with anything else.
112
113 This is ideal for when you only need one extra process of a kind, with the
114 option of starting and stipping it on demand.
115
116 =item fork a new template process, load code, then fork processes off of
117 it and run the code
118
119 When you need to have a bunch of processes that all execute the same (or
120 very similar) tasks, then a good way is to create a new template process
121 for them, loading all the modules you need, and then create your worker
122 processes from this new template process.
123
124 This way, all code (and data structures) that can be shared (e.g. the
125 modules you loaded) is shared between the processes, and each new process
126 consumes relatively little memory of its own.
127
128 The disadvantage of this approach is that you need to create a template
129 process for the sole purpose of forking new processes from it, but if you
130 only need a fixed number of proceses you can create them, and then destroy
131 the template process.
132
133 =item execute a new perl interpreter, load some code, run it
134
135 This is relatively slow, and doesn't allow you to share memory between
136 multiple processes.
137
138 The only advantage is that you don't have to have a template process
139 hanging around all the time to fork off some new processes, which might be
140 an advantage when there are long time spans where no extra processes are
141 needed.
142
143 =back
144
145 =head1 FUNCTIONS
146
147 =over 4
148
149 =cut
150
151 package AnyEvent::Fork;
152
153 use common::sense;
154
155 use Socket ();
156
157 use AnyEvent;
158 use AnyEvent::Fork::Util;
159 use AnyEvent::Util ();
160
161 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
162
163 =item my $pool = new AnyEvent::Fork key => value...
164
165 Create a new process pool. The following named parameters are supported:
166
167 =over 4
168
169 =back
170
171 =cut
172
173 # the early fork template process
174 our $EARLY;
175
176 # the empty template process
177 our $TEMPLATE;
178
179 sub _cmd {
180 my $self = shift;
181
182 # ideally, we would want to use "a (w/a)*" as format string, but perl versions
183 # from at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
184 push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
185
186 $self->[3] ||= AE::io $self->[1], 1, sub {
187 if (ref $self->[2][0]) {
188 AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
189 and shift @{ $self->[2] };
190
191 } else {
192 my $len = syswrite $self->[1], $self->[2][0]
193 or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
194
195 substr $self->[2][0], 0, $len, "";
196 shift @{ $self->[2] } unless length $self->[2][0];
197 }
198
199 unless (@{ $self->[2] }) {
200 undef $self->[3];
201 $self->[0]->($self->[1]) if $self->[0];
202 }
203 };
204 }
205
206 sub _new {
207 my ($self, $fh) = @_;
208
209 AnyEvent::Util::fh_nonblocking $fh, 1;
210
211 $self = bless [
212 undef, # run callback
213 $fh,
214 [], # write queue - strings or fd's
215 undef, # AE watcher
216 ], $self;
217
218 # my ($a, $b) = AnyEvent::Util::portable_socketpair;
219
220 # queue_cmd $template, "Iabc";
221 # push @{ $template->[2] }, \$b;
222
223 # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
224 # undef $b;
225 # die "x" . <$a>;
226
227 $self
228 }
229
230 # fork template from current process, used by AnyEvent::Fork::Early/Template
231 sub _new_fork {
232 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
233 my $pid = fork;
234
235 if ($pid eq 0) {
236 require AnyEvent::Fork::Serve;
237 close $fh;
238 AnyEvent::Fork::Serve::serve ($slave);
239 AnyEvent::Fork::Util::_exit 0;
240 } elsif (!$pid) {
241 die "AnyEvent::Fork::Early/Template: unable to fork template process: $!";
242 }
243
244 AnyEvent::Fork->_new ($fh)
245 }
246
247 =item my $proc = new AnyEvent::Fork
248
249 Create a new "empty" perl interpreter process and returns its process
250 object for further manipulation.
251
252 The new process is forked from a template process that is kept around
253 for this purpose. When it doesn't exist yet, it is created by a call to
254 C<new_exec> and kept around for future calls.
255
256 =cut
257
258 sub new {
259 my $class = shift;
260
261 $TEMPLATE ||= $class->new_exec;
262 $TEMPLATE->fork
263 }
264
265 =item $new_proc = $proc->fork
266
267 Forks C<$proc>, creating a new process, and returns the process object
268 of the new process.
269
270 If any of the C<send_> functions have been called before fork, then they
271 will be cloned in the child. For example, in a pre-forked server, you
272 might C<send_fh> the listening socket into the template process, and then
273 keep calling C<fork> and C<run>.
274
275 =cut
276
277 sub fork {
278 my ($self) = @_;
279
280 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
281
282 $self->send_fh ($slave);
283 $self->_cmd ("f");
284
285 AnyEvent::Fork->_new ($fh)
286 }
287
288 =item my $proc = new_exec AnyEvent::Fork
289
290 Create a new "empty" perl interpreter process and returns its process
291 object for further manipulation.
292
293 Unlike the C<new> method, this method I<always> spawns a new perl process
294 (except in some cases, see L<AnyEvent::Fork::Early> for details). This
295 reduces the amount of memory sharing that is possible, and is also slower.
296
297 You should use C<new> whenever possible, except when having a template
298 process around is unacceptable.
299
300 The path to the perl interpreter is divined usign various methods - first
301 C<$^X> is investigated to see if the path ends with something that sounds
302 as if it were the perl interpreter. Failing this, the module falls back to
303 using C<$Config::Config{perlpath}>.
304
305 =cut
306
307 sub new_exec {
308 my ($self) = @_;
309
310 return $EARLY->fork
311 if $EARLY;
312
313 # first find path of perl
314 my $perl = $;
315
316 # first we try $^X, but the path must be absolute (always on win32), and end in sth.
317 # that looks like perl. this obviously only works for posix and win32
318 unless (
319 (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
320 && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
321 ) {
322 # if it doesn't look perlish enough, try Config
323 require Config;
324 $perl = $Config::Config{perlpath};
325 $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
326 }
327
328 require Proc::FastSpawn;
329
330 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
331 Proc::FastSpawn::fd_inherit (fileno $slave);
332
333 # quick. also doesn't work in win32. of course. what did you expect
334 #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
335 my %env = %ENV;
336 $env{PERL5LIB} = join ":", grep !ref, @INC;
337
338 Proc::FastSpawn::spawn (
339 $perl,
340 ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave],
341 [map "$_=$env{$_}", keys %env],
342 ) or die "unable to spawn AnyEvent::Fork server: $!";
343
344 $self->_new ($fh)
345 }
346
347 =item $proc = $proc->require ($module, ...)
348
349 Tries to load the given modules into the process
350
351 Returns the process object for easy chaining of method calls.
352
353 =item $proc = $proc->send_fh ($handle, ...)
354
355 Send one or more file handles (I<not> file descriptors) to the process,
356 to prepare a call to C<run>.
357
358 The process object keeps a reference to the handles until this is done,
359 so you must not explicitly close the handles. This is most easily
360 accomplished by simply not storing the file handles anywhere after passing
361 them to this method.
362
363 Returns the process object for easy chaining of method calls.
364
365 =cut
366
367 sub send_fh {
368 my ($self, @fh) = @_;
369
370 for my $fh (@fh) {
371 $self->_cmd ("h");
372 push @{ $self->[2] }, \$fh;
373 }
374
375 $self
376 }
377
378 =item $proc = $proc->send_arg ($string, ...)
379
380 Send one or more argument strings to the process, to prepare a call to
381 C<run>. The strings can be any octet string.
382
383 Returns the process object for easy chaining of emthod calls.
384
385 =cut
386
387 sub send_arg {
388 my ($self, @arg) = @_;
389
390 $self->_cmd (a => @arg);
391
392 $self
393 }
394
395 =item $proc->run ($func, $cb->($fh))
396
397 Enter the function specified by the fully qualified name in C<$func> in
398 the process. The function is called with the communication socket as first
399 argument, followed by all file handles and string arguments sent earlier
400 via C<send_fh> and C<send_arg> methods, in the order they were called.
401
402 If the called function returns, the process exits.
403
404 Preparing the process can take time - when the process is ready, the
405 callback is invoked with the local communications socket as argument.
406
407 The process object becomes unusable on return from this function.
408
409 If the communication socket isn't used, it should be closed on both sides,
410 to save on kernel memory.
411
412 The socket is non-blocking in the parent, and blocking in the newly
413 created process. The close-on-exec flag is set on both. Even if not used
414 otherwise, the socket can be a good indicator for the existance of the
415 process - if the othe rprocess exits, you get a readable event on it,
416 because exiting the process closes the socket (if it didn't create any
417 children using fork).
418
419 =cut
420
421 sub run {
422 my ($self, $func, $cb) = @_;
423
424 $self->[0] = $cb;
425 $self->_cmd ("r", $func);
426 }
427
428 =back
429
430 =head1 AUTHOR
431
432 Marc Lehmann <schmorp@schmorp.de>
433 http://home.schmorp.de/
434
435 =cut
436
437 1
438