ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/README
Revision: 1.3
Committed: Fri Apr 5 19:10:10 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-0_2
Changes since 1.2: +57 -3 lines
Log Message:
0.2

File Contents

# User Rev Content
1 root 1.2 NAME
2     AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
3    
4     SYNOPSIS
5     use AnyEvent::Fork;
6    
7     ##################################################################
8     # create a single new process, tell it to run your worker function
9    
10     AnyEvent::Fork
11     ->new
12     ->require ("MyModule")
13     ->run ("MyModule::worker, sub {
14     my ($master_filehandle) = @_;
15    
16     # now $master_filehandle is connected to the
17     # $slave_filehandle in the new process.
18     });
19    
20     # MyModule::worker might look like this
21     sub MyModule::worker {
22     my ($slave_filehandle) = @_;
23    
24     # now $slave_filehandle is connected to the $master_filehandle
25     # in the original prorcess. have fun!
26     }
27    
28     ##################################################################
29     # create a pool of server processes all accepting on the same socket
30    
31     # create listener socket
32     my $listener = ...;
33    
34     # create a pool template, initialise it and give it the socket
35     my $pool = AnyEvent::Fork
36     ->new
37     ->require ("Some::Stuff", "My::Server")
38     ->send_fh ($listener);
39    
40     # now create 10 identical workers
41     for my $id (1..10) {
42     $pool
43     ->fork
44     ->send_arg ($id)
45     ->run ("My::Server::run");
46     }
47    
48     # now do other things - maybe use the filehandle provided by run
49     # to wait for the processes to die. or whatever.
50    
51     # My::Server::run might look like this
52     sub My::Server::run {
53     my ($slave, $listener, $id) = @_;
54    
55     close $slave; # we do not use the socket, so close it to save resources
56    
57     # we could go ballistic and use e.g. AnyEvent here, or IO::AIO,
58     # or anything we usually couldn't do in a process forked normally.
59     while (my $socket = $listener->accept) {
60     # do sth. with new socket
61     }
62     }
63    
64     DESCRIPTION
65     This module allows you to create new processes, without actually forking
66     them from your current process (avoiding the problems of forking), but
67     preserving most of the advantages of fork.
68    
69     It can be used to create new worker processes or new independent
70     subprocesses for short- and long-running jobs, process pools (e.g. for
71     use in pre-forked servers) but also to spawn new external processes
72     (such as CGI scripts from a webserver), which can be faster (and more
73     well behaved) than using fork+exec in big processes.
74    
75     Special care has been taken to make this module useful from other
76     modules, while still supporting specialised environments such as
77     App::Staticperl or PAR::Packer.
78    
79     PROBLEM STATEMENT
80     There are two ways to implement parallel processing on UNIX like
81     operating systems - fork and process, and fork+exec and process. They
82     have different advantages and disadvantages that I describe below,
83     together with how this module tries to mitigate the disadvantages.
84    
85     Forking from a big process can be very slow (a 5GB process needs 0.05s
86     to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead is
87     often shared with exec (because you have to fork first), but in some
88     circumstances (e.g. when vfork is used), fork+exec can be much faster.
89     This module can help here by telling a small(er) helper process to
90     fork, or fork+exec instead.
91    
92     Forking usually creates a copy-on-write copy of the parent process.
93     Memory (for example, modules or data files that have been will not take
94     additional memory). When exec'ing a new process, modules and data files
95     might need to be loaded again, at extra cpu and memory cost. Likewise
96     when forking, all data structures are copied as well - if the program
97     frees them and replaces them by new data, the child processes will
98     retain the memory even if it isn't used.
99     This module allows the main program to do a controlled fork, and
100     allows modules to exec processes safely at any time. When creating a
101     custom process pool you can take advantage of data sharing via fork
102     without risking to share large dynamic data structures that will
103     blow up child memory usage.
104    
105     Exec'ing a new perl process might be difficult and slow. For example, it
106     is not easy to find the correct path to the perl interpreter, and all
107     modules have to be loaded from disk again. Long running processes might
108     run into problems when perl is upgraded for example.
109     This module supports creating pre-initialised perl processes to be
110     used as template, and also tries hard to identify the correct path
111     to the perl interpreter. With a cooperative main program, exec'ing
112     the interpreter might not even be necessary.
113    
114     Forking might be impossible when a program is running. For example,
115     POSIX makes it almost impossible to fork from a multithreaded program
116     and do anything useful in the child - strictly speaking, if your perl
117     program uses posix threads (even indirectly via e.g. IO::AIO or
118     threads), you cannot call fork on the perl level anymore, at all.
119     This module can safely fork helper processes at any time, by caling
120     fork+exec in C, in a POSIX-compatible way.
121    
122     Parallel processing with fork might be inconvenient or difficult to
123     implement. For example, when a program uses an event loop and creates
124     watchers it becomes very hard to use the event loop from a child
125     program, as the watchers already exist but are only meaningful in the
126     parent. Worse, a module might want to use such a system, not knowing
127     whether another module or the main program also does, leading to
128     problems.
129     This module only lets the main program create pools by forking
130     (because only the main program can know when it is still safe to do
131     so) - all other pools are created by fork+exec, after which such
132     modules can again be loaded.
133    
134     CONCEPTS
135     This module can create new processes either by executing a new perl
136     process, or by forking from an existing "template" process.
137    
138     Each such process comes with its own file handle that can be used to
139     communicate with it (it's actually a socket - one end in the new
140     process, one end in the main process), and among the things you can do
141     in it are load modules, fork new processes, send file handles to it, and
142     execute functions.
143    
144     There are multiple ways to create additional processes to execute some
145     jobs:
146    
147     fork a new process from the "default" template process, load code, run
148     it
149     This module has a "default" template process which it executes when
150     it is needed the first time. Forking from this process shares the
151     memory used for the perl interpreter with the new process, but
152     loading modules takes time, and the memory is not shared with
153     anything else.
154    
155     This is ideal for when you only need one extra process of a kind,
156     with the option of starting and stipping it on demand.
157    
158     Example:
159    
160     AnyEvent::Fork
161     ->new
162     ->require ("Some::Module")
163     ->run ("Some::Module::run", sub {
164     my ($fork_fh) = @_;
165     });
166    
167     fork a new template process, load code, then fork processes off of it
168     and run the code
169     When you need to have a bunch of processes that all execute the same
170     (or very similar) tasks, then a good way is to create a new template
171     process for them, loading all the modules you need, and then create
172     your worker processes from this new template process.
173    
174     This way, all code (and data structures) that can be shared (e.g.
175     the modules you loaded) is shared between the processes, and each
176     new process consumes relatively little memory of its own.
177    
178     The disadvantage of this approach is that you need to create a
179     template process for the sole purpose of forking new processes from
180     it, but if you only need a fixed number of proceses you can create
181     them, and then destroy the template process.
182    
183     Example:
184    
185     my $template = AnyEvent::Fork->new->require ("Some::Module");
186    
187     for (1..10) {
188     $template->fork->run ("Some::Module::run", sub {
189     my ($fork_fh) = @_;
190     });
191     }
192    
193     # at this point, you can keep $template around to fork new processes
194     # later, or you can destroy it, which causes it to vanish.
195    
196     execute a new perl interpreter, load some code, run it
197     This is relatively slow, and doesn't allow you to share memory
198     between multiple processes.
199    
200     The only advantage is that you don't have to have a template process
201     hanging around all the time to fork off some new processes, which
202     might be an advantage when there are long time spans where no extra
203     processes are needed.
204    
205     Example:
206    
207     AnyEvent::Fork
208     ->new_exec
209     ->require ("Some::Module")
210     ->run ("Some::Module::run", sub {
211     my ($fork_fh) = @_;
212     });
213    
214     FUNCTIONS
215     my $pool = new AnyEvent::Fork key => value...
216     Create a new process pool. The following named parameters are
217     supported:
218    
219     my $proc = new AnyEvent::Fork
220     Create a new "empty" perl interpreter process and returns its
221     process object for further manipulation.
222    
223     The new process is forked from a template process that is kept
224     around for this purpose. When it doesn't exist yet, it is created by
225     a call to "new_exec" and kept around for future calls.
226    
227     When the process object is destroyed, it will release the file
228     handle that connects it with the new process. When the new process
229     has not yet called "run", then the process will exit. Otherwise,
230     what happens depends entirely on the code that is executed.
231    
232     $new_proc = $proc->fork
233     Forks $proc, creating a new process, and returns the process object
234     of the new process.
235    
236     If any of the "send_" functions have been called before fork, then
237     they will be cloned in the child. For example, in a pre-forked
238     server, you might "send_fh" the listening socket into the template
239     process, and then keep calling "fork" and "run".
240    
241     my $proc = new_exec AnyEvent::Fork
242     Create a new "empty" perl interpreter process and returns its
243     process object for further manipulation.
244    
245     Unlike the "new" method, this method *always* spawns a new perl
246     process (except in some cases, see AnyEvent::Fork::Early for
247     details). This reduces the amount of memory sharing that is
248     possible, and is also slower.
249    
250     You should use "new" whenever possible, except when having a
251     template process around is unacceptable.
252    
253     The path to the perl interpreter is divined usign various methods -
254     first $^X is investigated to see if the path ends with something
255     that sounds as if it were the perl interpreter. Failing this, the
256     module falls back to using $Config::Config{perlpath}.
257    
258     $proc = $proc->eval ($perlcode, @args)
259     Evaluates the given $perlcode as ... perl code, while setting @_ to
260     the strings specified by @args.
261    
262     This call is meant to do any custom initialisation that might be
263     required (for example, the "require" method uses it). It's not
264     supposed to be used to completely take over the process, use "run"
265     for that.
266    
267     The code will usually be executed after this call returns, and there
268     is no way to pass anything back to the calling process. Any
269     evaluation errors will be reported to stderr and cause the process
270     to exit.
271    
272     Returns the process object for easy chaining of method calls.
273    
274     $proc = $proc->require ($module, ...)
275     Tries to load the given module(s) into the process
276    
277     Returns the process object for easy chaining of method calls.
278    
279     $proc = $proc->send_fh ($handle, ...)
280     Send one or more file handles (*not* file descriptors) to the
281     process, to prepare a call to "run".
282    
283     The process object keeps a reference to the handles until this is
284     done, so you must not explicitly close the handles. This is most
285     easily accomplished by simply not storing the file handles anywhere
286     after passing them to this method.
287    
288     Returns the process object for easy chaining of method calls.
289    
290     Example: pass an fh to a process, and release it without closing. it
291     will be closed automatically when it is no longer used.
292    
293     $proc->send_fh ($my_fh);
294     undef $my_fh; # free the reference if you want, but DO NOT CLOSE IT
295    
296     $proc = $proc->send_arg ($string, ...)
297     Send one or more argument strings to the process, to prepare a call
298     to "run". The strings can be any octet string.
299    
300     Returns the process object for easy chaining of emthod calls.
301    
302     $proc->run ($func, $cb->($fh))
303     Enter the function specified by the fully qualified name in $func in
304     the process. The function is called with the communication socket as
305     first argument, followed by all file handles and string arguments
306     sent earlier via "send_fh" and "send_arg" methods, in the order they
307     were called.
308    
309     If the called function returns, the process exits.
310    
311     Preparing the process can take time - when the process is ready, the
312     callback is invoked with the local communications socket as
313     argument.
314    
315     The process object becomes unusable on return from this function.
316    
317     If the communication socket isn't used, it should be closed on both
318     sides, to save on kernel memory.
319    
320     The socket is non-blocking in the parent, and blocking in the newly
321     created process. The close-on-exec flag is set on both. Even if not
322     used otherwise, the socket can be a good indicator for the existance
323     of the process - if the other process exits, you get a readable
324     event on it, because exiting the process closes the socket (if it
325     didn't create any children using fork).
326    
327     Example: create a template for a process pool, pass a few strings,
328     some file handles, then fork, pass one more string, and run some
329     code.
330    
331     my $pool = AnyEvent::Fork
332     ->new
333     ->send_arg ("str1", "str2")
334     ->send_fh ($fh1, $fh2);
335    
336     for (1..2) {
337     $pool
338     ->fork
339     ->send_arg ("str3")
340     ->run ("Some::function", sub {
341     my ($fh) = @_;
342    
343     # fh is nonblocking, but we trust that the OS can accept these
344     # extra 3 octets anyway.
345     syswrite $fh, "hi #$_\n";
346    
347     # $fh is being closed here, as we don't store it anywhere
348     });
349     }
350    
351     # Some::function might look like this - all parameters passed before fork
352     # and after will be passed, in order, after the communications socket.
353     sub Some::function {
354     my ($fh, $str1, $str2, $fh1, $fh2, $str3) = @_;
355    
356     print scalar <$fh>; # prints "hi 1\n" and "hi 2\n"
357     }
358    
359 root 1.3 TYPICAL PROBLEMS
360     This section lists typical problems that remain. I hope by recognising
361     them, most can be avoided.
362    
363     "leaked" file descriptors for exec'ed processes
364     POSIX systems inherit file descriptors by default when exec'ing a
365     new process. While perl itself laudably sets the close-on-exec flags
366     on new file handles, most C libraries don't care, and even if all
367     cared, it's often not possible to set the flag in a race-free
368     manner.
369    
370     That means some file descriptors can leak through. And since it
371     isn't possible to know which file descriptors are "good" and
372     "neccessary" (or even to know which file descreiptors are open),
373     there is no good way to close the ones that might harm.
374    
375     As an example of what "harm" can be done consider a web server that
376     accepts connections and afterwards some module uses AnyEvent::Fork
377     for the first time, causing it to fork and exec a new process, which
378     might inherit the network socket. When the server closes the socket,
379     it is still open in the child (which doesn't even know that) and the
380     client might conclude that the connection is still fine.
381    
382     For the main program, there are multiple remedies available -
383     AnyEvent::Fork::Early is one, creating a process early and not using
384     "new_exec" is another, as in both cases, the first process can be
385     exec'ed well before many random file descriptors are open.
386    
387     In general, the solution for these kind of problems is to fix the
388     libraries or the code that leaks those file descriptors.
389    
390     Fortunately, most of these lekaed descriptors do no harm, other than
391     sitting on some resources.
392    
393     "leaked" file descriptors for fork'ed processes
394     Normally, AnyEvent::Fork does start new processes by exec'ing them,
395     which closes file descriptors not marked for being inherited.
396    
397     However, AnyEvent::Fork::Early and AnyEvent::Fork::Template offer a
398     way to create these processes by forking, and this leaks more file
399     descriptors than exec'ing them, as there is no way to mark
400     descriptors as "close on fork".
401    
402     An example would be modules like EV, IO::AIO or Gtk2. Both create
403     pipes for internal uses, and Gtk2 might open a connection to the X
404     server. EV and IO::AIO can deal with fork, but Gtk2 might have
405     trouble with a fork.
406    
407     The solution is to either not load these modules before use'ing
408     AnyEvent::Fork::Early or AnyEvent::Fork::Template, or to delay
409     initialising them, for example, by calling "init Gtk2" manually.
410    
411 root 1.2 PORTABILITY NOTES
412     Native win32 perls are somewhat supported (AnyEvent::Fork::Early is a
413     nop, and ::Template is not going to work), and it cost a lot of blood
414     and sweat to make it so, mostly due to the bloody broken perl that
415     nobody seems to care about. The fork emulation is a bad joke - I have
416     yet to see something useful that you cna do with it without running into
417     memory corruption issues or other braindamage. Hrrrr.
418    
419     Cygwin perl is not supported at the moment, as it should implement fd
420     passing, but doesn't, and rolling my own is hard, as cygwin doesn't
421     support enough functionality to do it.
422    
423 root 1.3 SEE ALSO
424     AnyEvent::Fork::Early (to avoid executing a perl interpreter),
425     AnyEvent::Fork::Template (to create a process by forking the main
426     program at a convenient time).
427    
428 root 1.2 AUTHOR
429     Marc Lehmann <schmorp@schmorp.de>
430     http://home.schmorp.de/
431