ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/Fork.pm
Revision: 1.7
Committed: Thu Apr 4 00:27:06 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
Changes since 1.6: +5 -1 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4
5 =head1 SYNOPSIS
6
7 use AnyEvent::Fork;
8
9 =head1 DESCRIPTION
10
11 This module allows you to create new processes, without actually forking
12 them from your current process (avoiding the problems of forking), but
13 preserving most of the advantages of fork.
14
15 It can be used to create new worker processes or new independent
16 subprocesses for short- and long-running jobs, process pools (e.g. for use
17 in pre-forked servers) but also to spawn new external processes (such as
18 CGI scripts from a webserver), which can be faster (and more well behaved)
19 than using fork+exec in big processes.
20
21 Special care has been taken to make this module useful from other modules,
22 while still supporting specialised environments such as L<App::Staticperl>
23 or L<PAR::Packer>.
24
25 =head1 PROBLEM STATEMENT
26
27 There are two ways to implement parallel processing on UNIX like operating
28 systems - fork and process, and fork+exec and process. They have different
29 advantages and disadvantages that I describe below, together with how this
30 module tries to mitigate the disadvantages.
31
32 =over 4
33
34 =item Forking from a big process can be very slow (a 5GB process needs
35 0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
36 is often shared with exec (because you have to fork first), but in some
37 circumstances (e.g. when vfork is used), fork+exec can be much faster.
38
39 This module can help here by telling a small(er) helper process to fork,
40 or fork+exec instead.
41
42 =item Forking usually creates a copy-on-write copy of the parent
43 process. Memory (for example, modules or data files that have been
44 will not take additional memory). When exec'ing a new process, modules
45 and data files might need to be loaded again, at extra cpu and memory
46 cost. Likewise when forking, all data structures are copied as well - if
47 the program frees them and replaces them by new data, the child processes
48 will retain the memory even if it isn't used.
49
50 This module allows the main program to do a controlled fork, and allows
51 modules to exec processes safely at any time. When creating a custom
52 process pool you can take advantage of data sharing via fork without
53 risking to share large dynamic data structures that will blow up child
54 memory usage.
55
56 =item Exec'ing a new perl process might be difficult and slow. For
57 example, it is not easy to find the correct path to the perl interpreter,
58 and all modules have to be loaded from disk again. Long running processes
59 might run into problems when perl is upgraded for example.
60
61 This module supports creating pre-initialised perl processes to be used
62 as template, and also tries hard to identify the correct path to the perl
63 interpreter. With a cooperative main program, exec'ing the interpreter
64 might not even be necessary.
65
66 =item Forking might be impossible when a program is running. For example,
67 POSIX makes it almost impossible to fork from a multithreaded program and
68 do anything useful in the child - strictly speaking, if your perl program
69 uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
70 you cannot call fork on the perl level anymore, at all.
71
72 This module can safely fork helper processes at any time, by caling
73 fork+exec in C, in a POSIX-compatible way.
74
75 =item Parallel processing with fork might be inconvenient or difficult
76 to implement. For example, when a program uses an event loop and creates
77 watchers it becomes very hard to use the event loop from a child
78 program, as the watchers already exist but are only meaningful in the
79 parent. Worse, a module might want to use such a system, not knowing
80 whether another module or the main program also does, leading to problems.
81
82 This module only lets the main program create pools by forking (because
83 only the main program can know when it is still safe to do so) - all other
84 pools are created by fork+exec, after which such modules can again be
85 loaded.
86
87 =back
88
89 =head1 CONCEPTS
90
91 This module can create new processes either by executing a new perl
92 process, or by forking from an existing "template" process.
93
94 Each such process comes with its own file handle that can be used to
95 communicate with it (it's actually a socket - one end in the new process,
96 one end in the main process), and among the things you can do in it are
97 load modules, fork new processes, send file handles to it, and execute
98 functions.
99
100 There are multiple ways to create additional processes to execute some
101 jobs:
102
103 =over 4
104
105 =item fork a new process from the "default" template process, load code,
106 run it
107
108 This module has a "default" template process which it executes when it is
109 needed the first time. Forking from this process shares the memory used
110 for the perl interpreter with the new process, but loading modules takes
111 time, and the memory is not shared with anything else.
112
113 This is ideal for when you only need one extra process of a kind, with the
114 option of starting and stipping it on demand.
115
116 =item fork a new template process, load code, then fork processes off of
117 it and run the code
118
119 When you need to have a bunch of processes that all execute the same (or
120 very similar) tasks, then a good way is to create a new template process
121 for them, loading all the modules you need, and then create your worker
122 processes from this new template process.
123
124 This way, all code (and data structures) that can be shared (e.g. the
125 modules you loaded) is shared between the processes, and each new process
126 consumes relatively little memory of its own.
127
128 The disadvantage of this approach is that you need to create a template
129 process for the sole purpose of forking new processes from it, but if you
130 only need a fixed number of proceses you can create them, and then destroy
131 the template process.
132
133 =item execute a new perl interpreter, load some code, run it
134
135 This is relatively slow, and doesn't allow you to share memory between
136 multiple processes.
137
138 The only advantage is that you don't have to have a template process
139 hanging around all the time to fork off some new processes, which might be
140 an advantage when there are long time spans where no extra processes are
141 needed.
142
143 =back
144
145 =head1 FUNCTIONS
146
147 =over 4
148
149 =cut
150
151 package AnyEvent::Fork;
152
153 use common::sense;
154
155 use Socket ();
156
157 use AnyEvent;
158 use AnyEvent::Fork::Util;
159 use AnyEvent::Util ();
160
161 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
162
163 =item my $pool = new AnyEvent::Fork key => value...
164
165 Create a new process pool. The following named parameters are supported:
166
167 =over 4
168
169 =back
170
171 =cut
172
173 # the early fork template process
174 our $EARLY;
175
176 # the empty template process
177 our $TEMPLATE;
178
179 sub _cmd {
180 my $self = shift;
181
182 # ideally, we would want to use "a (w/a)*" as format string, but perl versions
183 # from at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
184 push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
185
186 $self->[3] ||= AE::io $self->[1], 1, sub {
187 if (ref $self->[2][0]) {
188 AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
189 and shift @{ $self->[2] };
190
191 } else {
192 my $len = syswrite $self->[1], $self->[2][0]
193 or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
194
195 substr $self->[2][0], 0, $len, "";
196 shift @{ $self->[2] } unless length $self->[2][0];
197 }
198
199 unless (@{ $self->[2] }) {
200 undef $self->[3];
201 $self->[0]->($self->[1]) if $self->[0];
202 }
203 };
204 }
205
206 sub _new {
207 my ($self, $fh) = @_;
208
209 AnyEvent::Util::fh_nonblocking $fh, 1;
210
211 $self = bless [
212 undef, # run callback
213 $fh,
214 [], # write queue - strings or fd's
215 undef, # AE watcher
216 ], $self;
217
218 # my ($a, $b) = AnyEvent::Util::portable_socketpair;
219
220 # queue_cmd $template, "Iabc";
221 # push @{ $template->[2] }, \$b;
222
223 # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
224 # undef $b;
225 # die "x" . <$a>;
226
227 $self
228 }
229
230 # fork template from current process, used by AnyEvent::Fork::Early/Template
231 sub _new_fork {
232 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
233 my $parent = $$;
234
235 my $pid = fork;
236
237 if ($pid eq 0) {
238 require AnyEvent::Fork::Serve;
239 $AnyEvent::Fork::Serve::OWNER = $parent;
240 close $fh;
241 $0 = "$_[1] of $parent";
242 AnyEvent::Fork::Serve::serve ($slave);
243 AnyEvent::Fork::Util::_exit 0;
244 } elsif (!$pid) {
245 die "AnyEvent::Fork::Early/Template: unable to fork template process: $!";
246 }
247
248 AnyEvent::Fork->_new ($fh)
249 }
250
251 =item my $proc = new AnyEvent::Fork
252
253 Create a new "empty" perl interpreter process and returns its process
254 object for further manipulation.
255
256 The new process is forked from a template process that is kept around
257 for this purpose. When it doesn't exist yet, it is created by a call to
258 C<new_exec> and kept around for future calls.
259
260 =cut
261
262 sub new {
263 my $class = shift;
264
265 $TEMPLATE ||= $class->new_exec;
266 $TEMPLATE->fork
267 }
268
269 =item $new_proc = $proc->fork
270
271 Forks C<$proc>, creating a new process, and returns the process object
272 of the new process.
273
274 If any of the C<send_> functions have been called before fork, then they
275 will be cloned in the child. For example, in a pre-forked server, you
276 might C<send_fh> the listening socket into the template process, and then
277 keep calling C<fork> and C<run>.
278
279 =cut
280
281 sub fork {
282 my ($self) = @_;
283
284 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
285
286 $self->send_fh ($slave);
287 $self->_cmd ("f");
288
289 AnyEvent::Fork->_new ($fh)
290 }
291
292 =item my $proc = new_exec AnyEvent::Fork
293
294 Create a new "empty" perl interpreter process and returns its process
295 object for further manipulation.
296
297 Unlike the C<new> method, this method I<always> spawns a new perl process
298 (except in some cases, see L<AnyEvent::Fork::Early> for details). This
299 reduces the amount of memory sharing that is possible, and is also slower.
300
301 You should use C<new> whenever possible, except when having a template
302 process around is unacceptable.
303
304 The path to the perl interpreter is divined usign various methods - first
305 C<$^X> is investigated to see if the path ends with something that sounds
306 as if it were the perl interpreter. Failing this, the module falls back to
307 using C<$Config::Config{perlpath}>.
308
309 =cut
310
311 sub new_exec {
312 my ($self) = @_;
313
314 return $EARLY->fork
315 if $EARLY;
316
317 # first find path of perl
318 my $perl = $;
319
320 # first we try $^X, but the path must be absolute (always on win32), and end in sth.
321 # that looks like perl. this obviously only works for posix and win32
322 unless (
323 (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
324 && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
325 ) {
326 # if it doesn't look perlish enough, try Config
327 require Config;
328 $perl = $Config::Config{perlpath};
329 $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
330 }
331
332 require Proc::FastSpawn;
333
334 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
335 Proc::FastSpawn::fd_inherit (fileno $slave);
336
337 # quick. also doesn't work in win32. of course. what did you expect
338 #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
339 my %env = %ENV;
340 $env{PERL5LIB} = join ":", grep !ref, @INC;
341
342 Proc::FastSpawn::spawn (
343 $perl,
344 ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave, $$],
345 [map "$_=$env{$_}", keys %env],
346 ) or die "unable to spawn AnyEvent::Fork server: $!";
347
348 $self->_new ($fh)
349 }
350
351 =item $proc = $proc->require ($module, ...)
352
353 Tries to load the given modules into the process
354
355 Returns the process object for easy chaining of method calls.
356
357 =item $proc = $proc->send_fh ($handle, ...)
358
359 Send one or more file handles (I<not> file descriptors) to the process,
360 to prepare a call to C<run>.
361
362 The process object keeps a reference to the handles until this is done,
363 so you must not explicitly close the handles. This is most easily
364 accomplished by simply not storing the file handles anywhere after passing
365 them to this method.
366
367 Returns the process object for easy chaining of method calls.
368
369 =cut
370
371 sub send_fh {
372 my ($self, @fh) = @_;
373
374 for my $fh (@fh) {
375 $self->_cmd ("h");
376 push @{ $self->[2] }, \$fh;
377 }
378
379 $self
380 }
381
382 =item $proc = $proc->send_arg ($string, ...)
383
384 Send one or more argument strings to the process, to prepare a call to
385 C<run>. The strings can be any octet string.
386
387 Returns the process object for easy chaining of emthod calls.
388
389 =cut
390
391 sub send_arg {
392 my ($self, @arg) = @_;
393
394 $self->_cmd (a => @arg);
395
396 $self
397 }
398
399 =item $proc->run ($func, $cb->($fh))
400
401 Enter the function specified by the fully qualified name in C<$func> in
402 the process. The function is called with the communication socket as first
403 argument, followed by all file handles and string arguments sent earlier
404 via C<send_fh> and C<send_arg> methods, in the order they were called.
405
406 If the called function returns, the process exits.
407
408 Preparing the process can take time - when the process is ready, the
409 callback is invoked with the local communications socket as argument.
410
411 The process object becomes unusable on return from this function.
412
413 If the communication socket isn't used, it should be closed on both sides,
414 to save on kernel memory.
415
416 The socket is non-blocking in the parent, and blocking in the newly
417 created process. The close-on-exec flag is set on both. Even if not used
418 otherwise, the socket can be a good indicator for the existance of the
419 process - if the othe rprocess exits, you get a readable event on it,
420 because exiting the process closes the socket (if it didn't create any
421 children using fork).
422
423 =cut
424
425 sub run {
426 my ($self, $func, $cb) = @_;
427
428 $self->[0] = $cb;
429 $self->_cmd ("r", $func);
430 }
431
432 =back
433
434 =head1 AUTHOR
435
436 Marc Lehmann <schmorp@schmorp.de>
437 http://home.schmorp.de/
438
439 =cut
440
441 1
442