ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/README
Revision: 1.2
Committed: Thu Apr 4 07:27:09 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-0_01
Changes since 1.1: +377 -0 lines
Log Message:
0.01

File Contents

# User Rev Content
1 root 1.2 NAME
2     AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
3    
4     ATTENTION, this is a very early release, and very untested. Consider it
5     a technology preview.
6    
7     SYNOPSIS
8     use AnyEvent::Fork;
9    
10     ##################################################################
11     # create a single new process, tell it to run your worker function
12    
13     AnyEvent::Fork
14     ->new
15     ->require ("MyModule")
16     ->run ("MyModule::worker, sub {
17     my ($master_filehandle) = @_;
18    
19     # now $master_filehandle is connected to the
20     # $slave_filehandle in the new process.
21     });
22    
23     # MyModule::worker might look like this
24     sub MyModule::worker {
25     my ($slave_filehandle) = @_;
26    
27     # now $slave_filehandle is connected to the $master_filehandle
28     # in the original prorcess. have fun!
29     }
30    
31     ##################################################################
32     # create a pool of server processes all accepting on the same socket
33    
34     # create listener socket
35     my $listener = ...;
36    
37     # create a pool template, initialise it and give it the socket
38     my $pool = AnyEvent::Fork
39     ->new
40     ->require ("Some::Stuff", "My::Server")
41     ->send_fh ($listener);
42    
43     # now create 10 identical workers
44     for my $id (1..10) {
45     $pool
46     ->fork
47     ->send_arg ($id)
48     ->run ("My::Server::run");
49     }
50    
51     # now do other things - maybe use the filehandle provided by run
52     # to wait for the processes to die. or whatever.
53    
54     # My::Server::run might look like this
55     sub My::Server::run {
56     my ($slave, $listener, $id) = @_;
57    
58     close $slave; # we do not use the socket, so close it to save resources
59    
60     # we could go ballistic and use e.g. AnyEvent here, or IO::AIO,
61     # or anything we usually couldn't do in a process forked normally.
62     while (my $socket = $listener->accept) {
63     # do sth. with new socket
64     }
65     }
66    
67     DESCRIPTION
68     This module allows you to create new processes, without actually forking
69     them from your current process (avoiding the problems of forking), but
70     preserving most of the advantages of fork.
71    
72     It can be used to create new worker processes or new independent
73     subprocesses for short- and long-running jobs, process pools (e.g. for
74     use in pre-forked servers) but also to spawn new external processes
75     (such as CGI scripts from a webserver), which can be faster (and more
76     well behaved) than using fork+exec in big processes.
77    
78     Special care has been taken to make this module useful from other
79     modules, while still supporting specialised environments such as
80     App::Staticperl or PAR::Packer.
81    
82     PROBLEM STATEMENT
83     There are two ways to implement parallel processing on UNIX like
84     operating systems - fork and process, and fork+exec and process. They
85     have different advantages and disadvantages that I describe below,
86     together with how this module tries to mitigate the disadvantages.
87    
88     Forking from a big process can be very slow (a 5GB process needs 0.05s
89     to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead is
90     often shared with exec (because you have to fork first), but in some
91     circumstances (e.g. when vfork is used), fork+exec can be much faster.
92     This module can help here by telling a small(er) helper process to
93     fork, or fork+exec instead.
94    
95     Forking usually creates a copy-on-write copy of the parent process.
96     Memory (for example, modules or data files that have been will not take
97     additional memory). When exec'ing a new process, modules and data files
98     might need to be loaded again, at extra cpu and memory cost. Likewise
99     when forking, all data structures are copied as well - if the program
100     frees them and replaces them by new data, the child processes will
101     retain the memory even if it isn't used.
102     This module allows the main program to do a controlled fork, and
103     allows modules to exec processes safely at any time. When creating a
104     custom process pool you can take advantage of data sharing via fork
105     without risking to share large dynamic data structures that will
106     blow up child memory usage.
107    
108     Exec'ing a new perl process might be difficult and slow. For example, it
109     is not easy to find the correct path to the perl interpreter, and all
110     modules have to be loaded from disk again. Long running processes might
111     run into problems when perl is upgraded for example.
112     This module supports creating pre-initialised perl processes to be
113     used as template, and also tries hard to identify the correct path
114     to the perl interpreter. With a cooperative main program, exec'ing
115     the interpreter might not even be necessary.
116    
117     Forking might be impossible when a program is running. For example,
118     POSIX makes it almost impossible to fork from a multithreaded program
119     and do anything useful in the child - strictly speaking, if your perl
120     program uses posix threads (even indirectly via e.g. IO::AIO or
121     threads), you cannot call fork on the perl level anymore, at all.
122     This module can safely fork helper processes at any time, by caling
123     fork+exec in C, in a POSIX-compatible way.
124    
125     Parallel processing with fork might be inconvenient or difficult to
126     implement. For example, when a program uses an event loop and creates
127     watchers it becomes very hard to use the event loop from a child
128     program, as the watchers already exist but are only meaningful in the
129     parent. Worse, a module might want to use such a system, not knowing
130     whether another module or the main program also does, leading to
131     problems.
132     This module only lets the main program create pools by forking
133     (because only the main program can know when it is still safe to do
134     so) - all other pools are created by fork+exec, after which such
135     modules can again be loaded.
136    
137     CONCEPTS
138     This module can create new processes either by executing a new perl
139     process, or by forking from an existing "template" process.
140    
141     Each such process comes with its own file handle that can be used to
142     communicate with it (it's actually a socket - one end in the new
143     process, one end in the main process), and among the things you can do
144     in it are load modules, fork new processes, send file handles to it, and
145     execute functions.
146    
147     There are multiple ways to create additional processes to execute some
148     jobs:
149    
150     fork a new process from the "default" template process, load code, run
151     it
152     This module has a "default" template process which it executes when
153     it is needed the first time. Forking from this process shares the
154     memory used for the perl interpreter with the new process, but
155     loading modules takes time, and the memory is not shared with
156     anything else.
157    
158     This is ideal for when you only need one extra process of a kind,
159     with the option of starting and stipping it on demand.
160    
161     Example:
162    
163     AnyEvent::Fork
164     ->new
165     ->require ("Some::Module")
166     ->run ("Some::Module::run", sub {
167     my ($fork_fh) = @_;
168     });
169    
170     fork a new template process, load code, then fork processes off of it
171     and run the code
172     When you need to have a bunch of processes that all execute the same
173     (or very similar) tasks, then a good way is to create a new template
174     process for them, loading all the modules you need, and then create
175     your worker processes from this new template process.
176    
177     This way, all code (and data structures) that can be shared (e.g.
178     the modules you loaded) is shared between the processes, and each
179     new process consumes relatively little memory of its own.
180    
181     The disadvantage of this approach is that you need to create a
182     template process for the sole purpose of forking new processes from
183     it, but if you only need a fixed number of proceses you can create
184     them, and then destroy the template process.
185    
186     Example:
187    
188     my $template = AnyEvent::Fork->new->require ("Some::Module");
189    
190     for (1..10) {
191     $template->fork->run ("Some::Module::run", sub {
192     my ($fork_fh) = @_;
193     });
194     }
195    
196     # at this point, you can keep $template around to fork new processes
197     # later, or you can destroy it, which causes it to vanish.
198    
199     execute a new perl interpreter, load some code, run it
200     This is relatively slow, and doesn't allow you to share memory
201     between multiple processes.
202    
203     The only advantage is that you don't have to have a template process
204     hanging around all the time to fork off some new processes, which
205     might be an advantage when there are long time spans where no extra
206     processes are needed.
207    
208     Example:
209    
210     AnyEvent::Fork
211     ->new_exec
212     ->require ("Some::Module")
213     ->run ("Some::Module::run", sub {
214     my ($fork_fh) = @_;
215     });
216    
217     FUNCTIONS
218     my $pool = new AnyEvent::Fork key => value...
219     Create a new process pool. The following named parameters are
220     supported:
221    
222     my $proc = new AnyEvent::Fork
223     Create a new "empty" perl interpreter process and returns its
224     process object for further manipulation.
225    
226     The new process is forked from a template process that is kept
227     around for this purpose. When it doesn't exist yet, it is created by
228     a call to "new_exec" and kept around for future calls.
229    
230     When the process object is destroyed, it will release the file
231     handle that connects it with the new process. When the new process
232     has not yet called "run", then the process will exit. Otherwise,
233     what happens depends entirely on the code that is executed.
234    
235     $new_proc = $proc->fork
236     Forks $proc, creating a new process, and returns the process object
237     of the new process.
238    
239     If any of the "send_" functions have been called before fork, then
240     they will be cloned in the child. For example, in a pre-forked
241     server, you might "send_fh" the listening socket into the template
242     process, and then keep calling "fork" and "run".
243    
244     my $proc = new_exec AnyEvent::Fork
245     Create a new "empty" perl interpreter process and returns its
246     process object for further manipulation.
247    
248     Unlike the "new" method, this method *always* spawns a new perl
249     process (except in some cases, see AnyEvent::Fork::Early for
250     details). This reduces the amount of memory sharing that is
251     possible, and is also slower.
252    
253     You should use "new" whenever possible, except when having a
254     template process around is unacceptable.
255    
256     The path to the perl interpreter is divined usign various methods -
257     first $^X is investigated to see if the path ends with something
258     that sounds as if it were the perl interpreter. Failing this, the
259     module falls back to using $Config::Config{perlpath}.
260    
261     $proc = $proc->eval ($perlcode, @args)
262     Evaluates the given $perlcode as ... perl code, while setting @_ to
263     the strings specified by @args.
264    
265     This call is meant to do any custom initialisation that might be
266     required (for example, the "require" method uses it). It's not
267     supposed to be used to completely take over the process, use "run"
268     for that.
269    
270     The code will usually be executed after this call returns, and there
271     is no way to pass anything back to the calling process. Any
272     evaluation errors will be reported to stderr and cause the process
273     to exit.
274    
275     Returns the process object for easy chaining of method calls.
276    
277     $proc = $proc->require ($module, ...)
278     Tries to load the given module(s) into the process
279    
280     Returns the process object for easy chaining of method calls.
281    
282     $proc = $proc->send_fh ($handle, ...)
283     Send one or more file handles (*not* file descriptors) to the
284     process, to prepare a call to "run".
285    
286     The process object keeps a reference to the handles until this is
287     done, so you must not explicitly close the handles. This is most
288     easily accomplished by simply not storing the file handles anywhere
289     after passing them to this method.
290    
291     Returns the process object for easy chaining of method calls.
292    
293     Example: pass an fh to a process, and release it without closing. it
294     will be closed automatically when it is no longer used.
295    
296     $proc->send_fh ($my_fh);
297     undef $my_fh; # free the reference if you want, but DO NOT CLOSE IT
298    
299     $proc = $proc->send_arg ($string, ...)
300     Send one or more argument strings to the process, to prepare a call
301     to "run". The strings can be any octet string.
302    
303     Returns the process object for easy chaining of emthod calls.
304    
305     $proc->run ($func, $cb->($fh))
306     Enter the function specified by the fully qualified name in $func in
307     the process. The function is called with the communication socket as
308     first argument, followed by all file handles and string arguments
309     sent earlier via "send_fh" and "send_arg" methods, in the order they
310     were called.
311    
312     If the called function returns, the process exits.
313    
314     Preparing the process can take time - when the process is ready, the
315     callback is invoked with the local communications socket as
316     argument.
317    
318     The process object becomes unusable on return from this function.
319    
320     If the communication socket isn't used, it should be closed on both
321     sides, to save on kernel memory.
322    
323     The socket is non-blocking in the parent, and blocking in the newly
324     created process. The close-on-exec flag is set on both. Even if not
325     used otherwise, the socket can be a good indicator for the existance
326     of the process - if the other process exits, you get a readable
327     event on it, because exiting the process closes the socket (if it
328     didn't create any children using fork).
329    
330     Example: create a template for a process pool, pass a few strings,
331     some file handles, then fork, pass one more string, and run some
332     code.
333    
334     my $pool = AnyEvent::Fork
335     ->new
336     ->send_arg ("str1", "str2")
337     ->send_fh ($fh1, $fh2);
338    
339     for (1..2) {
340     $pool
341     ->fork
342     ->send_arg ("str3")
343     ->run ("Some::function", sub {
344     my ($fh) = @_;
345    
346     # fh is nonblocking, but we trust that the OS can accept these
347     # extra 3 octets anyway.
348     syswrite $fh, "hi #$_\n";
349    
350     # $fh is being closed here, as we don't store it anywhere
351     });
352     }
353    
354     # Some::function might look like this - all parameters passed before fork
355     # and after will be passed, in order, after the communications socket.
356     sub Some::function {
357     my ($fh, $str1, $str2, $fh1, $fh2, $str3) = @_;
358    
359     print scalar <$fh>; # prints "hi 1\n" and "hi 2\n"
360     }
361    
362     PORTABILITY NOTES
363     Native win32 perls are somewhat supported (AnyEvent::Fork::Early is a
364     nop, and ::Template is not going to work), and it cost a lot of blood
365     and sweat to make it so, mostly due to the bloody broken perl that
366     nobody seems to care about. The fork emulation is a bad joke - I have
367     yet to see something useful that you cna do with it without running into
368     memory corruption issues or other braindamage. Hrrrr.
369    
370     Cygwin perl is not supported at the moment, as it should implement fd
371     passing, but doesn't, and rolling my own is hard, as cygwin doesn't
372     support enough functionality to do it.
373    
374     AUTHOR
375     Marc Lehmann <schmorp@schmorp.de>
376     http://home.schmorp.de/
377