ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork/README
Revision: 1.2
Committed: Thu Apr 4 07:27:09 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
CVS Tags: rel-0_01
Changes since 1.1: +377 -0 lines
Log Message:
0.01

File Contents

# Content
1 NAME
2 AnyEvent::Fork - everything you wanted to use fork() for, but couldn't
3
4 ATTENTION, this is a very early release, and very untested. Consider it
5 a technology preview.
6
7 SYNOPSIS
8 use AnyEvent::Fork;
9
10 ##################################################################
11 # create a single new process, tell it to run your worker function
12
13 AnyEvent::Fork
14 ->new
15 ->require ("MyModule")
16 ->run ("MyModule::worker, sub {
17 my ($master_filehandle) = @_;
18
19 # now $master_filehandle is connected to the
20 # $slave_filehandle in the new process.
21 });
22
23 # MyModule::worker might look like this
24 sub MyModule::worker {
25 my ($slave_filehandle) = @_;
26
27 # now $slave_filehandle is connected to the $master_filehandle
28 # in the original prorcess. have fun!
29 }
30
31 ##################################################################
32 # create a pool of server processes all accepting on the same socket
33
34 # create listener socket
35 my $listener = ...;
36
37 # create a pool template, initialise it and give it the socket
38 my $pool = AnyEvent::Fork
39 ->new
40 ->require ("Some::Stuff", "My::Server")
41 ->send_fh ($listener);
42
43 # now create 10 identical workers
44 for my $id (1..10) {
45 $pool
46 ->fork
47 ->send_arg ($id)
48 ->run ("My::Server::run");
49 }
50
51 # now do other things - maybe use the filehandle provided by run
52 # to wait for the processes to die. or whatever.
53
54 # My::Server::run might look like this
55 sub My::Server::run {
56 my ($slave, $listener, $id) = @_;
57
58 close $slave; # we do not use the socket, so close it to save resources
59
60 # we could go ballistic and use e.g. AnyEvent here, or IO::AIO,
61 # or anything we usually couldn't do in a process forked normally.
62 while (my $socket = $listener->accept) {
63 # do sth. with new socket
64 }
65 }
66
67 DESCRIPTION
68 This module allows you to create new processes, without actually forking
69 them from your current process (avoiding the problems of forking), but
70 preserving most of the advantages of fork.
71
72 It can be used to create new worker processes or new independent
73 subprocesses for short- and long-running jobs, process pools (e.g. for
74 use in pre-forked servers) but also to spawn new external processes
75 (such as CGI scripts from a webserver), which can be faster (and more
76 well behaved) than using fork+exec in big processes.
77
78 Special care has been taken to make this module useful from other
79 modules, while still supporting specialised environments such as
80 App::Staticperl or PAR::Packer.
81
82 PROBLEM STATEMENT
83 There are two ways to implement parallel processing on UNIX like
84 operating systems - fork and process, and fork+exec and process. They
85 have different advantages and disadvantages that I describe below,
86 together with how this module tries to mitigate the disadvantages.
87
88 Forking from a big process can be very slow (a 5GB process needs 0.05s
89 to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead is
90 often shared with exec (because you have to fork first), but in some
91 circumstances (e.g. when vfork is used), fork+exec can be much faster.
92 This module can help here by telling a small(er) helper process to
93 fork, or fork+exec instead.
94
95 Forking usually creates a copy-on-write copy of the parent process.
96 Memory (for example, modules or data files that have been will not take
97 additional memory). When exec'ing a new process, modules and data files
98 might need to be loaded again, at extra cpu and memory cost. Likewise
99 when forking, all data structures are copied as well - if the program
100 frees them and replaces them by new data, the child processes will
101 retain the memory even if it isn't used.
102 This module allows the main program to do a controlled fork, and
103 allows modules to exec processes safely at any time. When creating a
104 custom process pool you can take advantage of data sharing via fork
105 without risking to share large dynamic data structures that will
106 blow up child memory usage.
107
108 Exec'ing a new perl process might be difficult and slow. For example, it
109 is not easy to find the correct path to the perl interpreter, and all
110 modules have to be loaded from disk again. Long running processes might
111 run into problems when perl is upgraded for example.
112 This module supports creating pre-initialised perl processes to be
113 used as template, and also tries hard to identify the correct path
114 to the perl interpreter. With a cooperative main program, exec'ing
115 the interpreter might not even be necessary.
116
117 Forking might be impossible when a program is running. For example,
118 POSIX makes it almost impossible to fork from a multithreaded program
119 and do anything useful in the child - strictly speaking, if your perl
120 program uses posix threads (even indirectly via e.g. IO::AIO or
121 threads), you cannot call fork on the perl level anymore, at all.
122 This module can safely fork helper processes at any time, by caling
123 fork+exec in C, in a POSIX-compatible way.
124
125 Parallel processing with fork might be inconvenient or difficult to
126 implement. For example, when a program uses an event loop and creates
127 watchers it becomes very hard to use the event loop from a child
128 program, as the watchers already exist but are only meaningful in the
129 parent. Worse, a module might want to use such a system, not knowing
130 whether another module or the main program also does, leading to
131 problems.
132 This module only lets the main program create pools by forking
133 (because only the main program can know when it is still safe to do
134 so) - all other pools are created by fork+exec, after which such
135 modules can again be loaded.
136
137 CONCEPTS
138 This module can create new processes either by executing a new perl
139 process, or by forking from an existing "template" process.
140
141 Each such process comes with its own file handle that can be used to
142 communicate with it (it's actually a socket - one end in the new
143 process, one end in the main process), and among the things you can do
144 in it are load modules, fork new processes, send file handles to it, and
145 execute functions.
146
147 There are multiple ways to create additional processes to execute some
148 jobs:
149
150 fork a new process from the "default" template process, load code, run
151 it
152 This module has a "default" template process which it executes when
153 it is needed the first time. Forking from this process shares the
154 memory used for the perl interpreter with the new process, but
155 loading modules takes time, and the memory is not shared with
156 anything else.
157
158 This is ideal for when you only need one extra process of a kind,
159 with the option of starting and stipping it on demand.
160
161 Example:
162
163 AnyEvent::Fork
164 ->new
165 ->require ("Some::Module")
166 ->run ("Some::Module::run", sub {
167 my ($fork_fh) = @_;
168 });
169
170 fork a new template process, load code, then fork processes off of it
171 and run the code
172 When you need to have a bunch of processes that all execute the same
173 (or very similar) tasks, then a good way is to create a new template
174 process for them, loading all the modules you need, and then create
175 your worker processes from this new template process.
176
177 This way, all code (and data structures) that can be shared (e.g.
178 the modules you loaded) is shared between the processes, and each
179 new process consumes relatively little memory of its own.
180
181 The disadvantage of this approach is that you need to create a
182 template process for the sole purpose of forking new processes from
183 it, but if you only need a fixed number of proceses you can create
184 them, and then destroy the template process.
185
186 Example:
187
188 my $template = AnyEvent::Fork->new->require ("Some::Module");
189
190 for (1..10) {
191 $template->fork->run ("Some::Module::run", sub {
192 my ($fork_fh) = @_;
193 });
194 }
195
196 # at this point, you can keep $template around to fork new processes
197 # later, or you can destroy it, which causes it to vanish.
198
199 execute a new perl interpreter, load some code, run it
200 This is relatively slow, and doesn't allow you to share memory
201 between multiple processes.
202
203 The only advantage is that you don't have to have a template process
204 hanging around all the time to fork off some new processes, which
205 might be an advantage when there are long time spans where no extra
206 processes are needed.
207
208 Example:
209
210 AnyEvent::Fork
211 ->new_exec
212 ->require ("Some::Module")
213 ->run ("Some::Module::run", sub {
214 my ($fork_fh) = @_;
215 });
216
217 FUNCTIONS
218 my $pool = new AnyEvent::Fork key => value...
219 Create a new process pool. The following named parameters are
220 supported:
221
222 my $proc = new AnyEvent::Fork
223 Create a new "empty" perl interpreter process and returns its
224 process object for further manipulation.
225
226 The new process is forked from a template process that is kept
227 around for this purpose. When it doesn't exist yet, it is created by
228 a call to "new_exec" and kept around for future calls.
229
230 When the process object is destroyed, it will release the file
231 handle that connects it with the new process. When the new process
232 has not yet called "run", then the process will exit. Otherwise,
233 what happens depends entirely on the code that is executed.
234
235 $new_proc = $proc->fork
236 Forks $proc, creating a new process, and returns the process object
237 of the new process.
238
239 If any of the "send_" functions have been called before fork, then
240 they will be cloned in the child. For example, in a pre-forked
241 server, you might "send_fh" the listening socket into the template
242 process, and then keep calling "fork" and "run".
243
244 my $proc = new_exec AnyEvent::Fork
245 Create a new "empty" perl interpreter process and returns its
246 process object for further manipulation.
247
248 Unlike the "new" method, this method *always* spawns a new perl
249 process (except in some cases, see AnyEvent::Fork::Early for
250 details). This reduces the amount of memory sharing that is
251 possible, and is also slower.
252
253 You should use "new" whenever possible, except when having a
254 template process around is unacceptable.
255
256 The path to the perl interpreter is divined usign various methods -
257 first $^X is investigated to see if the path ends with something
258 that sounds as if it were the perl interpreter. Failing this, the
259 module falls back to using $Config::Config{perlpath}.
260
261 $proc = $proc->eval ($perlcode, @args)
262 Evaluates the given $perlcode as ... perl code, while setting @_ to
263 the strings specified by @args.
264
265 This call is meant to do any custom initialisation that might be
266 required (for example, the "require" method uses it). It's not
267 supposed to be used to completely take over the process, use "run"
268 for that.
269
270 The code will usually be executed after this call returns, and there
271 is no way to pass anything back to the calling process. Any
272 evaluation errors will be reported to stderr and cause the process
273 to exit.
274
275 Returns the process object for easy chaining of method calls.
276
277 $proc = $proc->require ($module, ...)
278 Tries to load the given module(s) into the process
279
280 Returns the process object for easy chaining of method calls.
281
282 $proc = $proc->send_fh ($handle, ...)
283 Send one or more file handles (*not* file descriptors) to the
284 process, to prepare a call to "run".
285
286 The process object keeps a reference to the handles until this is
287 done, so you must not explicitly close the handles. This is most
288 easily accomplished by simply not storing the file handles anywhere
289 after passing them to this method.
290
291 Returns the process object for easy chaining of method calls.
292
293 Example: pass an fh to a process, and release it without closing. it
294 will be closed automatically when it is no longer used.
295
296 $proc->send_fh ($my_fh);
297 undef $my_fh; # free the reference if you want, but DO NOT CLOSE IT
298
299 $proc = $proc->send_arg ($string, ...)
300 Send one or more argument strings to the process, to prepare a call
301 to "run". The strings can be any octet string.
302
303 Returns the process object for easy chaining of emthod calls.
304
305 $proc->run ($func, $cb->($fh))
306 Enter the function specified by the fully qualified name in $func in
307 the process. The function is called with the communication socket as
308 first argument, followed by all file handles and string arguments
309 sent earlier via "send_fh" and "send_arg" methods, in the order they
310 were called.
311
312 If the called function returns, the process exits.
313
314 Preparing the process can take time - when the process is ready, the
315 callback is invoked with the local communications socket as
316 argument.
317
318 The process object becomes unusable on return from this function.
319
320 If the communication socket isn't used, it should be closed on both
321 sides, to save on kernel memory.
322
323 The socket is non-blocking in the parent, and blocking in the newly
324 created process. The close-on-exec flag is set on both. Even if not
325 used otherwise, the socket can be a good indicator for the existance
326 of the process - if the other process exits, you get a readable
327 event on it, because exiting the process closes the socket (if it
328 didn't create any children using fork).
329
330 Example: create a template for a process pool, pass a few strings,
331 some file handles, then fork, pass one more string, and run some
332 code.
333
334 my $pool = AnyEvent::Fork
335 ->new
336 ->send_arg ("str1", "str2")
337 ->send_fh ($fh1, $fh2);
338
339 for (1..2) {
340 $pool
341 ->fork
342 ->send_arg ("str3")
343 ->run ("Some::function", sub {
344 my ($fh) = @_;
345
346 # fh is nonblocking, but we trust that the OS can accept these
347 # extra 3 octets anyway.
348 syswrite $fh, "hi #$_\n";
349
350 # $fh is being closed here, as we don't store it anywhere
351 });
352 }
353
354 # Some::function might look like this - all parameters passed before fork
355 # and after will be passed, in order, after the communications socket.
356 sub Some::function {
357 my ($fh, $str1, $str2, $fh1, $fh2, $str3) = @_;
358
359 print scalar <$fh>; # prints "hi 1\n" and "hi 2\n"
360 }
361
362 PORTABILITY NOTES
363 Native win32 perls are somewhat supported (AnyEvent::Fork::Early is a
364 nop, and ::Template is not going to work), and it cost a lot of blood
365 and sweat to make it so, mostly due to the bloody broken perl that
366 nobody seems to care about. The fork emulation is a bad joke - I have
367 yet to see something useful that you cna do with it without running into
368 memory corruption issues or other braindamage. Hrrrr.
369
370 Cygwin perl is not supported at the moment, as it should implement fd
371 passing, but doesn't, and rolling my own is hard, as cygwin doesn't
372 support enough functionality to do it.
373
374 AUTHOR
375 Marc Lehmann <schmorp@schmorp.de>
376 http://home.schmorp.de/
377