ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/cvsroot/AnyEvent-Fork/Fork.pm
Revision: 1.5
Committed: Wed Apr 3 08:29:21 2013 UTC (11 years, 3 months ago) by root
Branch: MAIN
Changes since 1.4: +13 -1 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4
5 =head1 SYNOPSIS
6
7 use AnyEvent::Fork;
8
9 =head1 DESCRIPTION
10
11 This module allows you to create new processes, without actually forking
12 them from your current process (avoiding the problems of forking), but
13 preserving most of the advantages of fork.
14
15 It can be used to create new worker processes or new independent
16 subprocesses for short- and long-running jobs, process pools (e.g. for use
17 in pre-forked servers) but also to spawn new external processes (such as
18 CGI scripts from a webserver), which can be faster (and more well behaved)
19 than using fork+exec in big processes.
20
21 Special care has been taken to make this module useful from other modules,
22 while still supporting specialised environments such as L<App::Staticperl>
23 or L<PAR::Packer>.
24
25 =head1 PROBLEM STATEMENT
26
27 There are two ways to implement parallel processing on UNIX like operating
28 systems - fork and process, and fork+exec and process. They have different
29 advantages and disadvantages that I describe below, together with how this
30 module tries to mitigate the disadvantages.
31
32 =over 4
33
34 =item Forking from a big process can be very slow (a 5GB process needs
35 0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
36 is often shared with exec (because you have to fork first), but in some
37 circumstances (e.g. when vfork is used), fork+exec can be much faster.
38
39 This module can help here by telling a small(er) helper process to fork,
40 or fork+exec instead.
41
42 =item Forking usually creates a copy-on-write copy of the parent
43 process. Memory (for example, modules or data files that have been
44 will not take additional memory). When exec'ing a new process, modules
45 and data files might need to be loaded again, at extra cpu and memory
46 cost. Likewise when forking, all data structures are copied as well - if
47 the program frees them and replaces them by new data, the child processes
48 will retain the memory even if it isn't used.
49
50 This module allows the main program to do a controlled fork, and allows
51 modules to exec processes safely at any time. When creating a custom
52 process pool you can take advantage of data sharing via fork without
53 risking to share large dynamic data structures that will blow up child
54 memory usage.
55
56 =item Exec'ing a new perl process might be difficult and slow. For
57 example, it is not easy to find the correct path to the perl interpreter,
58 and all modules have to be loaded from disk again. Long running processes
59 might run into problems when perl is upgraded for example.
60
61 This module supports creating pre-initialised perl processes to be used
62 as template, and also tries hard to identify the correct path to the perl
63 interpreter. With a cooperative main program, exec'ing the interpreter
64 might not even be necessary.
65
66 =item Forking might be impossible when a program is running. For example,
67 POSIX makes it almost impossible to fork from a multithreaded program and
68 do anything useful in the child - strictly speaking, if your perl program
69 uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
70 you cannot call fork on the perl level anymore, at all.
71
72 This module can safely fork helper processes at any time, by caling
73 fork+exec in C, in a POSIX-compatible way.
74
75 =item Parallel processing with fork might be inconvenient or difficult
76 to implement. For example, when a program uses an event loop and creates
77 watchers it becomes very hard to use the event loop from a child
78 program, as the watchers already exist but are only meaningful in the
79 parent. Worse, a module might want to use such a system, not knowing
80 whether another module or the main program also does, leading to problems.
81
82 This module only lets the main program create pools by forking (because
83 only the main program can know when it is still safe to do so) - all other
84 pools are created by fork+exec, after which such modules can again be
85 loaded.
86
87 =back
88
89 =head1 CONCEPTS
90
91 This module can create new processes either by executing a new perl
92 process, or by forking from an existing "template" process.
93
94 Each such process comes with its own file handle that can be used to
95 communicate with it (it's actually a socket - one end in the new process,
96 one end in the main process), and among the things you can do in it are
97 load modules, fork new processes, send file handles to it, and execute
98 functions.
99
100 There are multiple ways to create additional processes to execute some
101 jobs:
102
103 =over 4
104
105 =item fork a new process from the "default" template process, load code,
106 run it
107
108 This module has a "default" template process which it executes when it is
109 needed the first time. Forking from this process shares the memory used
110 for the perl interpreter with the new process, but loading modules takes
111 time, and the memory is not shared with anything else.
112
113 This is ideal for when you only need one extra process of a kind, with the
114 option of starting and stipping it on demand.
115
116 =item fork a new template process, load code, then fork processes off of
117 it and run the code
118
119 When you need to have a bunch of processes that all execute the same (or
120 very similar) tasks, then a good way is to create a new template process
121 for them, loading all the modules you need, and then create your worker
122 processes from this new template process.
123
124 This way, all code (and data structures) that can be shared (e.g. the
125 modules you loaded) is shared between the processes, and each new process
126 consumes relatively little memory of its own.
127
128 The disadvantage of this approach is that you need to create a template
129 process for the sole purpose of forking new processes from it, but if you
130 only need a fixed number of proceses you can create them, and then destroy
131 the template process.
132
133 =item execute a new perl interpreter, load some code, run it
134
135 This is relatively slow, and doesn't allow you to share memory between
136 multiple processes.
137
138 The only advantage is that you don't have to have a template process
139 hanging around all the time to fork off some new processes, which might be
140 an advantage when there are long time spans where no extra processes are
141 needed.
142
143 =back
144
145 =head1 FUNCTIONS
146
147 =over 4
148
149 =cut
150
151 package AnyEvent::Fork;
152
153 use common::sense;
154
155 use Socket ();
156
157 use AnyEvent;
158 use AnyEvent::Fork::Util;
159 use AnyEvent::Util ();
160
161 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
162
163 =item my $pool = new AnyEvent::Fork key => value...
164
165 Create a new process pool. The following named parameters are supported:
166
167 =over 4
168
169 =back
170
171 =cut
172
173 # the early fork template process
174 our $EARLY;
175
176 # the empty template process
177 our $TEMPLATE;
178
179 sub _cmd {
180 my $self = shift;
181
182 # ideally, we would want to use "a (w/a)*" as format string, but perl versions
183 # from at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
184 push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
185
186 $self->[3] ||= AE::io $self->[1], 1, sub {
187 if (ref $self->[2][0]) {
188 AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
189 and shift @{ $self->[2] };
190
191 } else {
192 my $len = syswrite $self->[1], $self->[2][0]
193 or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
194
195 substr $self->[2][0], 0, $len, "";
196 shift @{ $self->[2] } unless length $self->[2][0];
197 }
198
199 unless (@{ $self->[2] }) {
200 undef $self->[3];
201 $self->[0]->($self->[1]) if $self->[0];
202 }
203 };
204 }
205
206 sub _new {
207 my ($self, $fh) = @_;
208
209 $self = bless [
210 undef, # run callback
211 $fh,
212 [], # write queue - strings or fd's
213 undef, # AE watcher
214 ], $self;
215
216 # my ($a, $b) = AnyEvent::Util::portable_socketpair;
217
218 # queue_cmd $template, "Iabc";
219 # push @{ $template->[2] }, \$b;
220
221 # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
222 # undef $b;
223 # die "x" . <$a>;
224
225 $self
226 }
227
228 =item my $proc = new AnyEvent::Fork
229
230 Create a new "empty" perl interpreter process and returns its process
231 object for further manipulation.
232
233 The new process is forked from a template process that is kept around
234 for this purpose. When it doesn't exist yet, it is created by a call to
235 C<new_exec> and kept around for future calls.
236
237 =cut
238
239 sub new {
240 my $class = shift;
241
242 $TEMPLATE ||= $class->new_exec;
243 $TEMPLATE->fork
244 }
245
246 =item $new_proc = $proc->fork
247
248 Forks C<$proc>, creating a new process, and returns the process object
249 of the new process.
250
251 If any of the C<send_> functions have been called before fork, then they
252 will be cloned in the child. For example, in a pre-forked server, you
253 might C<send_fh> the listening socket into the template process, and then
254 keep calling C<fork> and C<run>.
255
256 =cut
257
258 sub fork {
259 my ($self) = @_;
260
261 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
262
263 $self->send_fh ($slave);
264 $self->_cmd ("f");
265
266 AnyEvent::Util::fh_nonblocking $fh, 1;
267
268 AnyEvent::Fork->_new ($fh)
269 }
270
271 =item my $proc = new_exec AnyEvent::Fork
272
273 Create a new "empty" perl interpreter process and returns its process
274 object for further manipulation.
275
276 Unlike the C<new> method, this method I<always> spawns a new perl process
277 (except in some cases, see L<AnyEvent::Fork::Early> for details). This
278 reduces the amount of memory sharing that is possible, and is also slower.
279
280 You should use C<new> whenever possible, except when having a template
281 process around is unacceptable.
282
283 The path to the perl interpreter is divined usign various methods - first
284 C<$^X> is investigated to see if the path ends with something that sounds
285 as if it were the perl interpreter. Failing this, the module falls back to
286 using C<$Config::Config{perlpath}>.
287
288 =cut
289
290 sub new_exec {
291 my ($self) = @_;
292
293 return $EARLY->fork
294 if $EARLY;
295
296 # first find path of perl
297 my $perl = $;
298
299 # first we try $^X, but the path must be absolute (always on win32), and end in sth.
300 # that looks like perl. this obviously only works for posix and win32
301 unless (
302 (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
303 && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
304 ) {
305 # if it doesn't look perlish enough, try Config
306 require Config;
307 $perl = $Config::Config{perlpath};
308 $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
309 }
310
311 require Proc::FastSpawn;
312
313 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
314 AnyEvent::Util::fh_nonblocking $fh, 1;
315 Proc::FastSpawn::fd_inherit (fileno $slave);
316
317 # quick. also doesn't work in win32. of course. what did you expect
318 #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
319 my %env = %ENV;
320 $env{PERL5LIB} = join ":", grep !ref, @INC;
321
322 Proc::FastSpawn::spawn (
323 $perl,
324 ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave],
325 [map "$_=$env{$_}", keys %env],
326 ) or die "unable to spawn AnyEvent::Fork server: $!";
327
328 $self->_new ($fh)
329 }
330
331 =item $proc = $proc->require ($module, ...)
332
333 Tries to load the given modules into the process
334
335 Returns the process object for easy chaining of method calls.
336
337 =item $proc = $proc->send_fh ($handle, ...)
338
339 Send one or more file handles (I<not> file descriptors) to the process,
340 to prepare a call to C<run>.
341
342 The process object keeps a reference to the handles until this is done,
343 so you must not explicitly close the handles. This is most easily
344 accomplished by simply not storing the file handles anywhere after passing
345 them to this method.
346
347 Returns the process object for easy chaining of method calls.
348
349 =cut
350
351 sub send_fh {
352 my ($self, @fh) = @_;
353
354 for my $fh (@fh) {
355 $self->_cmd ("h");
356 push @{ $self->[2] }, \$fh;
357 }
358
359 $self
360 }
361
362 =item $proc = $proc->send_arg ($string, ...)
363
364 Send one or more argument strings to the process, to prepare a call to
365 C<run>. The strings can be any octet string.
366
367 Returns the process object for easy chaining of emthod calls.
368
369 =cut
370
371 sub send_arg {
372 my ($self, @arg) = @_;
373
374 $self->_cmd (a => @arg);
375
376 $self
377 }
378
379 =item $proc->run ($func, $cb->($fh))
380
381 Enter the function specified by the fully qualified name in C<$func> in
382 the process. The function is called with the communication socket as first
383 argument, followed by all file handles and string arguments sent earlier
384 via C<send_fh> and C<send_arg> methods, in the order they were called.
385
386 If the called function returns, the process exits.
387
388 Preparing the process can take time - when the process is ready, the
389 callback is invoked with the local communications socket as argument.
390
391 The process object becomes unusable on return from this function.
392
393 If the communication socket isn't used, it should be closed on both sides,
394 to save on kernel memory.
395
396 The socket is non-blocking in the parent, and blocking in the newly
397 created process. The close-on-exec flag is set on both. Even if not used
398 otherwise, the socket can be a good indicator for the existance of the
399 process - if the othe rprocess exits, you get a readable event on it,
400 because exiting the process closes the socket (if it didn't create any
401 children using fork).
402
403 =cut
404
405 sub run {
406 my ($self, $func, $cb) = @_;
407
408 $self->[0] = $cb;
409 $self->_cmd ("r", $func);
410 }
411
412 =back
413
414 =head1 AUTHOR
415
416 Marc Lehmann <schmorp@schmorp.de>
417 http://home.schmorp.de/
418
419 =cut
420
421 1
422