ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/Fork.pm
Revision: 1.4
Committed: Wed Apr 3 07:35:57 2013 UTC (11 years, 2 months ago) by root
Branch: MAIN
Changes since 1.3: +216 -67 lines
Log Message:
phew

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
4
5 =head1 SYNOPSIS
6
7 use AnyEvent::Fork;
8
9 =head1 DESCRIPTION
10
11 This module allows you to create new processes, without actually forking
12 them from your current process (avoiding the problems of forking), but
13 preserving most of the advantages of fork.
14
15 It can be used to create new worker processes or new independent
16 subprocesses for short- and long-running jobs, process pools (e.g. for use
17 in pre-forked servers) but also to spawn new external processes (such as
18 CGI scripts from a webserver), which can be faster (and more well behaved)
19 than using fork+exec in big processes.
20
21 =head1 PROBLEM STATEMENT
22
23 There are two ways to implement parallel processing on UNIX like operating
24 systems - fork and process, and fork+exec and process. They have different
25 advantages and disadvantages that I describe below, together with how this
26 module tries to mitigate the disadvantages.
27
28 =over 4
29
30 =item Forking from a big process can be very slow (a 5GB process needs
31 0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead
32 is often shared with exec (because you have to fork first), but in some
33 circumstances (e.g. when vfork is used), fork+exec can be much faster.
34
35 This module can help here by telling a small(er) helper process to fork,
36 or fork+exec instead.
37
38 =item Forking usually creates a copy-on-write copy of the parent
39 process. Memory (for example, modules or data files that have been
40 will not take additional memory). When exec'ing a new process, modules
41 and data files might need to be loaded again, at extra cpu and memory
42 cost. Likewise when forking, all data structures are copied as well - if
43 the program frees them and replaces them by new data, the child processes
44 will retain the memory even if it isn't used.
45
46 This module allows the main program to do a controlled fork, and allows
47 modules to exec processes safely at any time. When creating a custom
48 process pool you can take advantage of data sharing via fork without
49 risking to share large dynamic data structures that will blow up child
50 memory usage.
51
52 =item Exec'ing a new perl process might be difficult and slow. For
53 example, it is not easy to find the correct path to the perl interpreter,
54 and all modules have to be loaded from disk again. Long running processes
55 might run into problems when perl is upgraded for example.
56
57 This module supports creating pre-initialised perl processes to be used
58 as template, and also tries hard to identify the correct path to the perl
59 interpreter. With a cooperative main program, exec'ing the interpreter
60 might not even be necessary.
61
62 =item Forking might be impossible when a program is running. For example,
63 POSIX makes it almost impossible to fork from a multithreaded program and
64 do anything useful in the child - strictly speaking, if your perl program
65 uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
66 you cannot call fork on the perl level anymore, at all.
67
68 This module can safely fork helper processes at any time, by caling
69 fork+exec in C, in a POSIX-compatible way.
70
71 =item Parallel processing with fork might be inconvenient or difficult
72 to implement. For example, when a program uses an event loop and creates
73 watchers it becomes very hard to use the event loop from a child
74 program, as the watchers already exist but are only meaningful in the
75 parent. Worse, a module might want to use such a system, not knowing
76 whether another module or the main program also does, leading to problems.
77
78 This module only lets the main program create pools by forking (because
79 only the main program can know when it is still safe to do so) - all other
80 pools are created by fork+exec, after which such modules can again be
81 loaded.
82
83 =back
84
85 =head1 CONCEPTS
86
87 This module can create new processes either by executing a new perl
88 process, or by forking from an existing "template" process.
89
90 Each such process comes with its own file handle that can be used to
91 communicate with it (it's actually a socket - one end in the new process,
92 one end in the main process), and among the things you can do in it are
93 load modules, fork new processes, send file handles to it, and execute
94 functions.
95
96 There are multiple ways to create additional processes to execute some
97 jobs:
98
99 =over 4
100
101 =item fork a new process from the "default" template process, load code,
102 run it
103
104 This module has a "default" template process which it executes when it is
105 needed the first time. Forking from this process shares the memory used
106 for the perl interpreter with the new process, but loading modules takes
107 time, and the memory is not shared with anything else.
108
109 This is ideal for when you only need one extra process of a kind, with the
110 option of starting and stipping it on demand.
111
112 =item fork a new template process, load code, then fork processes off of
113 it and run the code
114
115 When you need to have a bunch of processes that all execute the same (or
116 very similar) tasks, then a good way is to create a new template process
117 for them, loading all the modules you need, and then create your worker
118 processes from this new template process.
119
120 This way, all code (and data structures) that can be shared (e.g. the
121 modules you loaded) is shared between the processes, and each new process
122 consumes relatively little memory of its own.
123
124 The disadvantage of this approach is that you need to create a template
125 process for the sole purpose of forking new processes from it, but if you
126 only need a fixed number of proceses you can create them, and then destroy
127 the template process.
128
129 =item execute a new perl interpreter, load some code, run it
130
131 This is relatively slow, and doesn't allow you to share memory between
132 multiple processes.
133
134 The only advantage is that you don't have to have a template process
135 hanging around all the time to fork off some new processes, which might be
136 an advantage when there are long time spans where no extra processes are
137 needed.
138
139 =back
140
141 =head1 FUNCTIONS
142
143 =over 4
144
145 =cut
146
147 package AnyEvent::Fork;
148
149 use common::sense;
150
151 use Socket ();
152
153 use AnyEvent;
154 use AnyEvent::Fork::Util;
155 use AnyEvent::Util ();
156
157 our $PERL; # the path to the perl interpreter, deduces with various forms of magic
158
159 =item my $pool = new AnyEvent::Fork key => value...
160
161 Create a new process pool. The following named parameters are supported:
162
163 =over 4
164
165 =back
166
167 =cut
168
169 # the empty template process
170 our $TEMPLATE;
171
172 sub _cmd {
173 my $self = shift;
174
175 # ideally, we would want to use "a (w/a)*" as format string, but perl versions
176 # form at least 5.8.9 to 5.16.3 are all buggy and can't unpack it.
177 push @{ $self->[2] }, pack "N/a", pack "(w/a)*", @_;
178
179 $self->[3] ||= AE::io $self->[1], 1, sub {
180 if (ref $self->[2][0]) {
181 AnyEvent::Fork::Util::fd_send fileno $self->[1], fileno ${ $self->[2][0] }
182 and shift @{ $self->[2] };
183 } else {
184 my $len = syswrite $self->[1], $self->[2][0]
185 or do { undef $self->[3]; die "AnyEvent::Fork: command write failure: $!" };
186 substr $self->[2][0], 0, $len, "";
187 shift @{ $self->[2] } unless length $self->[2][0];
188 }
189
190 unless (@{ $self->[2] }) {
191 undef $self->[3];
192 $self->[0]->($self->[1]) if $self->[0];
193 }
194 };
195 }
196
197 sub _new {
198 my ($self, $fh) = @_;
199
200 $self = bless [
201 undef, # run callback
202 $fh,
203 [], # write queue - strings or fd's
204 undef, # AE watcher
205 ], $self;
206
207 # my ($a, $b) = AnyEvent::Util::portable_socketpair;
208
209 # queue_cmd $template, "Iabc";
210 # push @{ $template->[2] }, \$b;
211
212 # use Coro::AnyEvent; Coro::AnyEvent::sleep 1;
213 # undef $b;
214 # die "x" . <$a>;
215
216 $self
217 }
218
219 =item my $proc = new AnyEvent::Fork
220
221 Create a new "empty" perl interpreter process and returns its process
222 object for further manipulation.
223
224 The new process is forked from a template process that is kept around
225 for this purpose. When it doesn't exist yet, it is created by a call to
226 C<new_exec> and kept around for future calls.
227
228 =cut
229
230 sub new {
231 my $class = shift;
232
233 $TEMPLATE ||= $class->new_exec;
234 $TEMPLATE->fork
235 }
236
237 =item $new_proc = $proc->fork
238
239 Forks C<$proc>, creating a new process, and returns the process object
240 of the new process.
241
242 If any of the C<send_> functions have been called before fork, then they
243 will be cloned in the child. For example, in a pre-forked server, you
244 might C<send_fh> the listening socket into the template process, and then
245 keep calling C<fork> and C<run>.
246
247 =cut
248
249 sub fork {
250 my ($self) = @_;
251
252 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
253
254 $self->send_fh ($slave);
255 $self->_cmd ("f");
256
257 AnyEvent::Util::fh_nonblocking $fh, 1;
258
259 AnyEvent::Fork->_new ($fh)
260 }
261
262 =item my $proc = new_exec AnyEvent::Fork
263
264 Create a new "empty" perl interpreter process and returns its process
265 object for further manipulation.
266
267 Unlike the C<new> method, this method I<always> spawns a new perl process
268 (except in some cases, see L<AnyEvent::Fork::Early> for details). This
269 reduces the amount of memory sharing that is possible, and is also slower.
270
271 You should use C<new> whenever possible, except when having a template
272 process around is unacceptable.
273
274 The path to the perl interpreter is divined usign various methods - first
275 C<$^X> is investigated to see if the path ends with something that sounds
276 as if it were the perl interpreter. Failing this, the module falls back to
277 using C<$Config::Config{perlpath}>.
278
279 =cut
280
281 sub new_exec {
282 my ($self) = @_;
283
284 # first find path of perl
285 my $perl = $;
286
287 # first we try $^X, but the path must be absolute (always on win32), and end in sth.
288 # that looks like perl. this obviously only works for posix and win32
289 unless (
290 (AnyEvent::Fork::Util::WIN32 || $perl =~ m%^/%)
291 && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
292 ) {
293 # if it doesn't look perlish enough, try Config
294 require Config;
295 $perl = $Config::Config{perlpath};
296 $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
297 }
298
299 require Proc::FastSpawn;
300
301 my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
302 AnyEvent::Util::fh_nonblocking $fh, 1;
303 Proc::FastSpawn::fd_inherit (fileno $slave);
304
305 # quick. also doesn't work in win32. of course. what did you expect
306 #local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
307 my %env = %ENV;
308 $env{PERL5LIB} = join ":", grep !ref, @INC;
309
310 Proc::FastSpawn::spawn (
311 $perl,
312 ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave],
313 [map "$_=$env{$_}", keys %env],
314 ) or die "unable to spawn AnyEvent::Fork server: $!";
315
316 $self->_new ($fh)
317 }
318
319 =item $proc = $proc->require ($module, ...)
320
321 Tries to load the given modules into the process
322
323 Returns the process object for easy chaining of method calls.
324
325 =item $proc = $proc->send_fh ($handle, ...)
326
327 Send one or more file handles (I<not> file descriptors) to the process,
328 to prepare a call to C<run>.
329
330 The process object keeps a reference to the handles until this is done,
331 so you must not explicitly close the handles. This is most easily
332 accomplished by simply not storing the file handles anywhere after passing
333 them to this method.
334
335 Returns the process object for easy chaining of method calls.
336
337 =cut
338
339 sub send_fh {
340 my ($self, @fh) = @_;
341
342 for my $fh (@fh) {
343 $self->_cmd ("h");
344 push @{ $self->[2] }, \$fh;
345 }
346
347 $self
348 }
349
350 =item $proc = $proc->send_arg ($string, ...)
351
352 Send one or more argument strings to the process, to prepare a call to
353 C<run>. The strings can be any octet string.
354
355 Returns the process object for easy chaining of emthod calls.
356
357 =cut
358
359 sub send_arg {
360 my ($self, @arg) = @_;
361
362 $self->_cmd (a => @arg);
363
364 $self
365 }
366
367 =item $proc->run ($func, $cb->($fh))
368
369 Enter the function specified by the fully qualified name in C<$func> in
370 the process. The function is called with the communication socket as first
371 argument, followed by all file handles and string arguments sent earlier
372 via C<send_fh> and C<send_arg> methods, in the order they were called.
373
374 If the called function returns, the process exits.
375
376 Preparing the process can take time - when the process is ready, the
377 callback is invoked with the local communications socket as argument.
378
379 The process object becomes unusable on return from this function.
380
381 If the communication socket isn't used, it should be closed on both sides,
382 to save on kernel memory.
383
384 The socket is non-blocking in the parent, and blocking in the newly
385 created process. The close-on-exec flag is set on both. Even if not used
386 otherwise, the socket can be a good indicator for the existance of the
387 process - if the othe rprocess exits, you get a readable event on it,
388 because exiting the process closes the socket (if it didn't create any
389 children using fork).
390
391 =cut
392
393 sub run {
394 my ($self, $func, $cb) = @_;
395
396 $self->[0] = $cb;
397 $self->_cmd ("r", $func);
398 }
399
400 =back
401
402 =head1 AUTHOR
403
404 Marc Lehmann <schmorp@schmorp.de>
405 http://home.schmorp.de/
406
407 =cut
408
409 1
410