… | |
… | |
4 | |
4 | |
5 | =head1 SYNOPSIS |
5 | =head1 SYNOPSIS |
6 | |
6 | |
7 | use AnyEvent::Fork; |
7 | use AnyEvent::Fork; |
8 | |
8 | |
9 | ################################################################## |
9 | AnyEvent::Fork |
|
|
10 | ->new |
|
|
11 | ->require ("MyModule") |
|
|
12 | ->run ("MyModule::server", my $cv = AE::cv); |
|
|
13 | |
|
|
14 | my $fh = $cv->recv; |
|
|
15 | |
|
|
16 | =head1 DESCRIPTION |
|
|
17 | |
|
|
18 | This module allows you to create new processes, without actually forking |
|
|
19 | them from your current process (avoiding the problems of forking), but |
|
|
20 | preserving most of the advantages of fork. |
|
|
21 | |
|
|
22 | It can be used to create new worker processes or new independent |
|
|
23 | subprocesses for short- and long-running jobs, process pools (e.g. for use |
|
|
24 | in pre-forked servers) but also to spawn new external processes (such as |
|
|
25 | CGI scripts from a web server), which can be faster (and more well behaved) |
|
|
26 | than using fork+exec in big processes. |
|
|
27 | |
|
|
28 | Special care has been taken to make this module useful from other modules, |
|
|
29 | while still supporting specialised environments such as L<App::Staticperl> |
|
|
30 | or L<PAR::Packer>. |
|
|
31 | |
|
|
32 | =head1 WHAT THIS MODULE IS NOT |
|
|
33 | |
|
|
34 | This module only creates processes and lets you pass file handles and |
|
|
35 | strings to it, and run perl code. It does not implement any kind of RPC - |
|
|
36 | there is no back channel from the process back to you, and there is no RPC |
|
|
37 | or message passing going on. |
|
|
38 | |
|
|
39 | If you need some form of RPC, you can either implement it yourself |
|
|
40 | in whatever way you like, use some message-passing module such |
|
|
41 | as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use |
|
|
42 | L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages, |
|
|
43 | and so on. |
|
|
44 | |
|
|
45 | =head1 PROBLEM STATEMENT |
|
|
46 | |
|
|
47 | There are two traditional ways to implement parallel processing on UNIX |
|
|
48 | like operating systems - fork and process, and fork+exec and process. They |
|
|
49 | have different advantages and disadvantages that I describe below, |
|
|
50 | together with how this module tries to mitigate the disadvantages. |
|
|
51 | |
|
|
52 | =over 4 |
|
|
53 | |
|
|
54 | =item Forking from a big process can be very slow. |
|
|
55 | |
|
|
56 | A 5GB process needs 0.05s to fork on my 3.6GHz amd64 GNU/Linux box. This |
|
|
57 | overhead is often shared with exec (because you have to fork first), but |
|
|
58 | in some circumstances (e.g. when vfork is used), fork+exec can be much |
|
|
59 | faster. |
|
|
60 | |
|
|
61 | This module can help here by telling a small(er) helper process to fork, |
|
|
62 | which is faster then forking the main process, and also uses vfork where |
|
|
63 | possible. This gives the speed of vfork, with the flexibility of fork. |
|
|
64 | |
|
|
65 | =item Forking usually creates a copy-on-write copy of the parent |
|
|
66 | process. |
|
|
67 | |
|
|
68 | For example, modules or data files that are loaded will not use additional |
|
|
69 | memory after a fork. When exec'ing a new process, modules and data files |
|
|
70 | might need to be loaded again, at extra CPU and memory cost. But when |
|
|
71 | forking, literally all data structures are copied - if the program frees |
|
|
72 | them and replaces them by new data, the child processes will retain the |
|
|
73 | old version even if it isn't used, which can suddenly and unexpectedly |
|
|
74 | increase memory usage when freeing memory. |
|
|
75 | |
|
|
76 | The trade-off is between more sharing with fork (which can be good or |
|
|
77 | bad), and no sharing with exec. |
|
|
78 | |
|
|
79 | This module allows the main program to do a controlled fork, and allows |
|
|
80 | modules to exec processes safely at any time. When creating a custom |
|
|
81 | process pool you can take advantage of data sharing via fork without |
|
|
82 | risking to share large dynamic data structures that will blow up child |
|
|
83 | memory usage. |
|
|
84 | |
|
|
85 | In other words, this module puts you into control over what is being |
|
|
86 | shared and what isn't, at all times. |
|
|
87 | |
|
|
88 | =item Exec'ing a new perl process might be difficult. |
|
|
89 | |
|
|
90 | For example, it is not easy to find the correct path to the perl |
|
|
91 | interpreter - C<$^X> might not be a perl interpreter at all. |
|
|
92 | |
|
|
93 | This module tries hard to identify the correct path to the perl |
|
|
94 | interpreter. With a cooperative main program, exec'ing the interpreter |
|
|
95 | might not even be necessary, but even without help from the main program, |
|
|
96 | it will still work when used from a module. |
|
|
97 | |
|
|
98 | =item Exec'ing a new perl process might be slow, as all necessary modules |
|
|
99 | have to be loaded from disk again, with no guarantees of success. |
|
|
100 | |
|
|
101 | Long running processes might run into problems when perl is upgraded |
|
|
102 | and modules are no longer loadable because they refer to a different |
|
|
103 | perl version, or parts of a distribution are newer than the ones already |
|
|
104 | loaded. |
|
|
105 | |
|
|
106 | This module supports creating pre-initialised perl processes to be used as |
|
|
107 | a template for new processes. |
|
|
108 | |
|
|
109 | =item Forking might be impossible when a program is running. |
|
|
110 | |
|
|
111 | For example, POSIX makes it almost impossible to fork from a |
|
|
112 | multi-threaded program while doing anything useful in the child - in |
|
|
113 | fact, if your perl program uses POSIX threads (even indirectly via |
|
|
114 | e.g. L<IO::AIO> or L<threads>), you cannot call fork on the perl level |
|
|
115 | anymore without risking corruption issues on a number of operating |
|
|
116 | systems. |
|
|
117 | |
|
|
118 | This module can safely fork helper processes at any time, by calling |
|
|
119 | fork+exec in C, in a POSIX-compatible way (via L<Proc::FastSpawn>). |
|
|
120 | |
|
|
121 | =item Parallel processing with fork might be inconvenient or difficult |
|
|
122 | to implement. Modules might not work in both parent and child. |
|
|
123 | |
|
|
124 | For example, when a program uses an event loop and creates watchers it |
|
|
125 | becomes very hard to use the event loop from a child program, as the |
|
|
126 | watchers already exist but are only meaningful in the parent. Worse, a |
|
|
127 | module might want to use such a module, not knowing whether another module |
|
|
128 | or the main program also does, leading to problems. |
|
|
129 | |
|
|
130 | Apart from event loops, graphical toolkits also commonly fall into the |
|
|
131 | "unsafe module" category, or just about anything that communicates with |
|
|
132 | the external world, such as network libraries and file I/O modules, which |
|
|
133 | usually don't like being copied and then allowed to continue in two |
|
|
134 | processes. |
|
|
135 | |
|
|
136 | With this module only the main program is allowed to create new processes |
|
|
137 | by forking (because only the main program can know when it is still safe |
|
|
138 | to do so) - all other processes are created via fork+exec, which makes it |
|
|
139 | possible to use modules such as event loops or window interfaces safely. |
|
|
140 | |
|
|
141 | =back |
|
|
142 | |
|
|
143 | =head1 EXAMPLES |
|
|
144 | |
10 | # create a single new process, tell it to run your worker function |
145 | =head2 Create a single new process, tell it to run your worker function. |
11 | |
146 | |
12 | AnyEvent::Fork |
147 | AnyEvent::Fork |
13 | ->new |
148 | ->new |
14 | ->require ("MyModule") |
149 | ->require ("MyModule") |
15 | ->run ("MyModule::worker, sub { |
150 | ->run ("MyModule::worker, sub { |
… | |
… | |
17 | |
152 | |
18 | # now $master_filehandle is connected to the |
153 | # now $master_filehandle is connected to the |
19 | # $slave_filehandle in the new process. |
154 | # $slave_filehandle in the new process. |
20 | }); |
155 | }); |
21 | |
156 | |
22 | # MyModule::worker might look like this |
157 | MyModule might look like this: |
|
|
158 | |
|
|
159 | package MyModule; |
|
|
160 | |
23 | sub MyModule::worker { |
161 | sub worker { |
24 | my ($slave_filehandle) = @_; |
162 | my ($slave_filehandle) = @_; |
25 | |
163 | |
26 | # now $slave_filehandle is connected to the $master_filehandle |
164 | # now $slave_filehandle is connected to the $master_filehandle |
27 | # in the original prorcess. have fun! |
165 | # in the original prorcess. have fun! |
28 | } |
166 | } |
29 | |
167 | |
30 | ################################################################## |
|
|
31 | # create a pool of server processes all accepting on the same socket |
168 | =head2 Create a pool of server processes all accepting on the same socket. |
32 | |
169 | |
33 | # create listener socket |
170 | # create listener socket |
34 | my $listener = ...; |
171 | my $listener = ...; |
35 | |
172 | |
36 | # create a pool template, initialise it and give it the socket |
173 | # create a pool template, initialise it and give it the socket |
… | |
… | |
48 | } |
185 | } |
49 | |
186 | |
50 | # now do other things - maybe use the filehandle provided by run |
187 | # now do other things - maybe use the filehandle provided by run |
51 | # to wait for the processes to die. or whatever. |
188 | # to wait for the processes to die. or whatever. |
52 | |
189 | |
53 | # My::Server::run might look like this |
190 | My::Server might look like this: |
54 | sub My::Server::run { |
191 | |
|
|
192 | package My::Server; |
|
|
193 | |
|
|
194 | sub run { |
55 | my ($slave, $listener, $id) = @_; |
195 | my ($slave, $listener, $id) = @_; |
56 | |
196 | |
57 | close $slave; # we do not use the socket, so close it to save resources |
197 | close $slave; # we do not use the socket, so close it to save resources |
58 | |
198 | |
59 | # we could go ballistic and use e.g. AnyEvent here, or IO::AIO, |
199 | # we could go ballistic and use e.g. AnyEvent here, or IO::AIO, |
… | |
… | |
61 | while (my $socket = $listener->accept) { |
201 | while (my $socket = $listener->accept) { |
62 | # do sth. with new socket |
202 | # do sth. with new socket |
63 | } |
203 | } |
64 | } |
204 | } |
65 | |
205 | |
66 | ################################################################## |
|
|
67 | # use AnyEvent::Fork as a faster fork+exec |
206 | =head2 use AnyEvent::Fork as a faster fork+exec |
68 | |
207 | |
69 | # this runs /bin/echo hi, with stdout redirected to /tmp/log |
208 | This runs /bin/echo hi, with stdout redirected to /tmp/log and stderr to |
70 | # and stderr to the communications socket. it is usually faster |
209 | the communications socket. It is usually faster than fork+exec, but still |
71 | # than fork+exec, but still let's you prepare the environment. |
210 | let's you prepare the environment. |
72 | |
211 | |
73 | open my $output, ">/tmp/log" or die "$!"; |
212 | open my $output, ">/tmp/log" or die "$!"; |
74 | |
213 | |
75 | AnyEvent::Fork |
214 | AnyEvent::Fork |
76 | ->new |
215 | ->new |
… | |
… | |
88 | ->send_fh ($output) |
227 | ->send_fh ($output) |
89 | ->send_arg ("/bin/echo", "hi") |
228 | ->send_arg ("/bin/echo", "hi") |
90 | ->run ("run", my $cv = AE::cv); |
229 | ->run ("run", my $cv = AE::cv); |
91 | |
230 | |
92 | my $stderr = $cv->recv; |
231 | my $stderr = $cv->recv; |
93 | |
|
|
94 | =head1 DESCRIPTION |
|
|
95 | |
|
|
96 | This module allows you to create new processes, without actually forking |
|
|
97 | them from your current process (avoiding the problems of forking), but |
|
|
98 | preserving most of the advantages of fork. |
|
|
99 | |
|
|
100 | It can be used to create new worker processes or new independent |
|
|
101 | subprocesses for short- and long-running jobs, process pools (e.g. for use |
|
|
102 | in pre-forked servers) but also to spawn new external processes (such as |
|
|
103 | CGI scripts from a web server), which can be faster (and more well behaved) |
|
|
104 | than using fork+exec in big processes. |
|
|
105 | |
|
|
106 | Special care has been taken to make this module useful from other modules, |
|
|
107 | while still supporting specialised environments such as L<App::Staticperl> |
|
|
108 | or L<PAR::Packer>. |
|
|
109 | |
|
|
110 | =head1 WHAT THIS MODULE IS NOT |
|
|
111 | |
|
|
112 | This module only creates processes and lets you pass file handles and |
|
|
113 | strings to it, and run perl code. It does not implement any kind of RPC - |
|
|
114 | there is no back channel from the process back to you, and there is no RPC |
|
|
115 | or message passing going on. |
|
|
116 | |
|
|
117 | If you need some form of RPC, you can either implement it yourself |
|
|
118 | in whatever way you like, use some message-passing module such |
|
|
119 | as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use |
|
|
120 | L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages, |
|
|
121 | and so on. |
|
|
122 | |
|
|
123 | =head1 PROBLEM STATEMENT |
|
|
124 | |
|
|
125 | There are two ways to implement parallel processing on UNIX like operating |
|
|
126 | systems - fork and process, and fork+exec and process. They have different |
|
|
127 | advantages and disadvantages that I describe below, together with how this |
|
|
128 | module tries to mitigate the disadvantages. |
|
|
129 | |
|
|
130 | =over 4 |
|
|
131 | |
|
|
132 | =item Forking from a big process can be very slow (a 5GB process needs |
|
|
133 | 0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead |
|
|
134 | is often shared with exec (because you have to fork first), but in some |
|
|
135 | circumstances (e.g. when vfork is used), fork+exec can be much faster. |
|
|
136 | |
|
|
137 | This module can help here by telling a small(er) helper process to fork, |
|
|
138 | or fork+exec instead. |
|
|
139 | |
|
|
140 | =item Forking usually creates a copy-on-write copy of the parent |
|
|
141 | process. Memory (for example, modules or data files that have been |
|
|
142 | will not take additional memory). When exec'ing a new process, modules |
|
|
143 | and data files might need to be loaded again, at extra CPU and memory |
|
|
144 | cost. Likewise when forking, all data structures are copied as well - if |
|
|
145 | the program frees them and replaces them by new data, the child processes |
|
|
146 | will retain the memory even if it isn't used. |
|
|
147 | |
|
|
148 | This module allows the main program to do a controlled fork, and allows |
|
|
149 | modules to exec processes safely at any time. When creating a custom |
|
|
150 | process pool you can take advantage of data sharing via fork without |
|
|
151 | risking to share large dynamic data structures that will blow up child |
|
|
152 | memory usage. |
|
|
153 | |
|
|
154 | =item Exec'ing a new perl process might be difficult and slow. For |
|
|
155 | example, it is not easy to find the correct path to the perl interpreter, |
|
|
156 | and all modules have to be loaded from disk again. Long running processes |
|
|
157 | might run into problems when perl is upgraded for example. |
|
|
158 | |
|
|
159 | This module supports creating pre-initialised perl processes to be used |
|
|
160 | as template, and also tries hard to identify the correct path to the perl |
|
|
161 | interpreter. With a cooperative main program, exec'ing the interpreter |
|
|
162 | might not even be necessary. |
|
|
163 | |
|
|
164 | =item Forking might be impossible when a program is running. For example, |
|
|
165 | POSIX makes it almost impossible to fork from a multi-threaded program and |
|
|
166 | do anything useful in the child - strictly speaking, if your perl program |
|
|
167 | uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>), |
|
|
168 | you cannot call fork on the perl level anymore, at all. |
|
|
169 | |
|
|
170 | This module can safely fork helper processes at any time, by calling |
|
|
171 | fork+exec in C, in a POSIX-compatible way. |
|
|
172 | |
|
|
173 | =item Parallel processing with fork might be inconvenient or difficult |
|
|
174 | to implement. For example, when a program uses an event loop and creates |
|
|
175 | watchers it becomes very hard to use the event loop from a child |
|
|
176 | program, as the watchers already exist but are only meaningful in the |
|
|
177 | parent. Worse, a module might want to use such a system, not knowing |
|
|
178 | whether another module or the main program also does, leading to problems. |
|
|
179 | |
|
|
180 | This module only lets the main program create pools by forking (because |
|
|
181 | only the main program can know when it is still safe to do so) - all other |
|
|
182 | pools are created by fork+exec, after which such modules can again be |
|
|
183 | loaded. |
|
|
184 | |
|
|
185 | =back |
|
|
186 | |
232 | |
187 | =head1 CONCEPTS |
233 | =head1 CONCEPTS |
188 | |
234 | |
189 | This module can create new processes either by executing a new perl |
235 | This module can create new processes either by executing a new perl |
190 | process, or by forking from an existing "template" process. |
236 | process, or by forking from an existing "template" process. |
… | |
… | |
269 | my ($fork_fh) = @_; |
315 | my ($fork_fh) = @_; |
270 | }); |
316 | }); |
271 | |
317 | |
272 | =back |
318 | =back |
273 | |
319 | |
274 | =head1 FUNCTIONS |
320 | =head1 THE C<AnyEvent::Fork> CLASS |
|
|
321 | |
|
|
322 | This module exports nothing, and only implements a single class - |
|
|
323 | C<AnyEvent::Fork>. |
|
|
324 | |
|
|
325 | There are two class constructors that both create new processes - C<new> |
|
|
326 | and C<new_exec>. The C<fork> method creates a new process by forking an |
|
|
327 | existing one and could be considered a third constructor. |
|
|
328 | |
|
|
329 | Most of the remaining methods deal with preparing the new process, by |
|
|
330 | loading code, evaluating code and sending data to the new process. They |
|
|
331 | usually return the process object, so you can chain method calls. |
|
|
332 | |
|
|
333 | If a process object is destroyed before calling its C<run> method, then |
|
|
334 | the process simply exits. After C<run> is called, all responsibility is |
|
|
335 | passed to the specified function. |
|
|
336 | |
|
|
337 | As long as there is any outstanding work to be done, process objects |
|
|
338 | resist being destroyed, so there is no reason to store them unless you |
|
|
339 | need them later - configure and forget works just fine. |
275 | |
340 | |
276 | =over 4 |
341 | =over 4 |
277 | |
342 | |
278 | =cut |
343 | =cut |
279 | |
344 | |
… | |
… | |
289 | use IO::FDPass; |
354 | use IO::FDPass; |
290 | |
355 | |
291 | our $VERSION = 0.5; |
356 | our $VERSION = 0.5; |
292 | |
357 | |
293 | our $PERL; # the path to the perl interpreter, deduces with various forms of magic |
358 | our $PERL; # the path to the perl interpreter, deduces with various forms of magic |
294 | |
|
|
295 | =item my $pool = new AnyEvent::Fork key => value... |
|
|
296 | |
|
|
297 | Create a new process pool. The following named parameters are supported: |
|
|
298 | |
359 | |
299 | =over 4 |
360 | =over 4 |
300 | |
361 | |
301 | =back |
362 | =back |
302 | |
363 | |
… | |
… | |
398 | Create a new "empty" perl interpreter process and returns its process |
459 | Create a new "empty" perl interpreter process and returns its process |
399 | object for further manipulation. |
460 | object for further manipulation. |
400 | |
461 | |
401 | The new process is forked from a template process that is kept around |
462 | The new process is forked from a template process that is kept around |
402 | for this purpose. When it doesn't exist yet, it is created by a call to |
463 | for this purpose. When it doesn't exist yet, it is created by a call to |
403 | C<new_exec> and kept around for future calls. |
464 | C<new_exec> first and then stays around for future calls. |
404 | |
|
|
405 | When the process object is destroyed, it will release the file handle |
|
|
406 | that connects it with the new process. When the new process has not yet |
|
|
407 | called C<run>, then the process will exit. Otherwise, what happens depends |
|
|
408 | entirely on the code that is executed. |
|
|
409 | |
465 | |
410 | =cut |
466 | =cut |
411 | |
467 | |
412 | sub new { |
468 | sub new { |
413 | my $class = shift; |
469 | my $class = shift; |
… | |
… | |
510 | Normally, only processes created via C<< AnyEvent::Fork->new_exec >> and |
566 | Normally, only processes created via C<< AnyEvent::Fork->new_exec >> and |
511 | L<AnyEvent::Fork::Template> are direct children, and you are responsible |
567 | L<AnyEvent::Fork::Template> are direct children, and you are responsible |
512 | to clean up their zombies when they die. |
568 | to clean up their zombies when they die. |
513 | |
569 | |
514 | All other processes are not direct children, and will be cleaned up by |
570 | All other processes are not direct children, and will be cleaned up by |
515 | AnyEvent::Fork. |
571 | AnyEvent::Fork itself. |
516 | |
572 | |
517 | =cut |
573 | =cut |
518 | |
574 | |
519 | sub pid { |
575 | sub pid { |
520 | $_[0][0] |
576 | $_[0][0] |