… | |
… | |
39 | If you need some form of RPC, you can either implement it yourself |
39 | If you need some form of RPC, you can either implement it yourself |
40 | in whatever way you like, use some message-passing module such |
40 | in whatever way you like, use some message-passing module such |
41 | as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use |
41 | as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use |
42 | L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages, |
42 | L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages, |
43 | and so on. |
43 | and so on. |
|
|
44 | |
|
|
45 | =head1 PROBLEM STATEMENT |
|
|
46 | |
|
|
47 | There are two traditional ways to implement parallel processing on UNIX |
|
|
48 | like operating systems - fork and process, and fork+exec and process. They |
|
|
49 | have different advantages and disadvantages that I describe below, |
|
|
50 | together with how this module tries to mitigate the disadvantages. |
|
|
51 | |
|
|
52 | =over 4 |
|
|
53 | |
|
|
54 | =item Forking from a big process can be very slow. |
|
|
55 | |
|
|
56 | A 5GB process needs 0.05s to fork on my 3.6GHz amd64 GNU/Linux box. This |
|
|
57 | overhead is often shared with exec (because you have to fork first), but |
|
|
58 | in some circumstances (e.g. when vfork is used), fork+exec can be much |
|
|
59 | faster. |
|
|
60 | |
|
|
61 | This module can help here by telling a small(er) helper process to fork, |
|
|
62 | which is faster then forking the main process, and also uses vfork where |
|
|
63 | possible. This gives the speed of vfork, with the flexibility of fork. |
|
|
64 | |
|
|
65 | =item Forking usually creates a copy-on-write copy of the parent |
|
|
66 | process. |
|
|
67 | |
|
|
68 | For example, modules or data files that are loaded will not use additional |
|
|
69 | memory after a fork. When exec'ing a new process, modules and data files |
|
|
70 | might need to be loaded again, at extra CPU and memory cost. But when |
|
|
71 | forking, literally all data structures are copied - if the program frees |
|
|
72 | them and replaces them by new data, the child processes will retain the |
|
|
73 | old version even if it isn't used, which can suddenly and unexpectedly |
|
|
74 | increase memory usage when freeing memory. |
|
|
75 | |
|
|
76 | The trade-off is between more sharing with fork (which can be good or |
|
|
77 | bad), and no sharing with exec. |
|
|
78 | |
|
|
79 | This module allows the main program to do a controlled fork, and allows |
|
|
80 | modules to exec processes safely at any time. When creating a custom |
|
|
81 | process pool you can take advantage of data sharing via fork without |
|
|
82 | risking to share large dynamic data structures that will blow up child |
|
|
83 | memory usage. |
|
|
84 | |
|
|
85 | In other words, this module puts you into control over what is being |
|
|
86 | shared and what isn't, at all times. |
|
|
87 | |
|
|
88 | =item Exec'ing a new perl process might be difficult. |
|
|
89 | |
|
|
90 | For example, it is not easy to find the correct path to the perl |
|
|
91 | interpreter - C<$^X> might not be a perl interpreter at all. |
|
|
92 | |
|
|
93 | This module tries hard to identify the correct path to the perl |
|
|
94 | interpreter. With a cooperative main program, exec'ing the interpreter |
|
|
95 | might not even be necessary, but even without help from the main program, |
|
|
96 | it will still work when used from a module. |
|
|
97 | |
|
|
98 | =item Exec'ing a new perl process might be slow, as all necessary modules |
|
|
99 | have to be loaded from disk again, with no guarantees of success. |
|
|
100 | |
|
|
101 | Long running processes might run into problems when perl is upgraded |
|
|
102 | and modules are no longer loadable because they refer to a different |
|
|
103 | perl version, or parts of a distribution are newer than the ones already |
|
|
104 | loaded. |
|
|
105 | |
|
|
106 | This module supports creating pre-initialised perl processes to be used as |
|
|
107 | a template for new processes. |
|
|
108 | |
|
|
109 | =item Forking might be impossible when a program is running. |
|
|
110 | |
|
|
111 | For example, POSIX makes it almost impossible to fork from a |
|
|
112 | multi-threaded program while doing anything useful in the child - in |
|
|
113 | fact, if your perl program uses POSIX threads (even indirectly via |
|
|
114 | e.g. L<IO::AIO> or L<threads>), you cannot call fork on the perl level |
|
|
115 | anymore without risking corruption issues on a number of operating |
|
|
116 | systems. |
|
|
117 | |
|
|
118 | This module can safely fork helper processes at any time, by calling |
|
|
119 | fork+exec in C, in a POSIX-compatible way (via L<Proc::FastSpawn>). |
|
|
120 | |
|
|
121 | =item Parallel processing with fork might be inconvenient or difficult |
|
|
122 | to implement. Modules might not work in both parent and child. |
|
|
123 | |
|
|
124 | For example, when a program uses an event loop and creates watchers it |
|
|
125 | becomes very hard to use the event loop from a child program, as the |
|
|
126 | watchers already exist but are only meaningful in the parent. Worse, a |
|
|
127 | module might want to use such a module, not knowing whether another module |
|
|
128 | or the main program also does, leading to problems. |
|
|
129 | |
|
|
130 | With this module only the main program is allowed to create new processes |
|
|
131 | by forking (because only the main program can know when it is still safe |
|
|
132 | to do so) - all other processes are created via fork+exec, which makes it |
|
|
133 | possible to use modules such as event loops or window interfaces safely. |
|
|
134 | |
|
|
135 | =back |
44 | |
136 | |
45 | =head1 EXAMPLES |
137 | =head1 EXAMPLES |
46 | |
138 | |
47 | =head2 Create a single new process, tell it to run your worker function. |
139 | =head2 Create a single new process, tell it to run your worker function. |
48 | |
140 | |
… | |
… | |
123 | ->send_fh ($output) |
215 | ->send_fh ($output) |
124 | ->send_arg ("/bin/echo", "hi") |
216 | ->send_arg ("/bin/echo", "hi") |
125 | ->run ("run", my $cv = AE::cv); |
217 | ->run ("run", my $cv = AE::cv); |
126 | |
218 | |
127 | my $stderr = $cv->recv; |
219 | my $stderr = $cv->recv; |
128 | |
|
|
129 | =head1 PROBLEM STATEMENT |
|
|
130 | |
|
|
131 | There are two ways to implement parallel processing on UNIX like operating |
|
|
132 | systems - fork and process, and fork+exec and process. They have different |
|
|
133 | advantages and disadvantages that I describe below, together with how this |
|
|
134 | module tries to mitigate the disadvantages. |
|
|
135 | |
|
|
136 | =over 4 |
|
|
137 | |
|
|
138 | =item Forking from a big process can be very slow (a 5GB process needs |
|
|
139 | 0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead |
|
|
140 | is often shared with exec (because you have to fork first), but in some |
|
|
141 | circumstances (e.g. when vfork is used), fork+exec can be much faster. |
|
|
142 | |
|
|
143 | This module can help here by telling a small(er) helper process to fork, |
|
|
144 | or fork+exec instead. |
|
|
145 | |
|
|
146 | =item Forking usually creates a copy-on-write copy of the parent |
|
|
147 | process. Memory (for example, modules or data files that have been |
|
|
148 | will not take additional memory). When exec'ing a new process, modules |
|
|
149 | and data files might need to be loaded again, at extra CPU and memory |
|
|
150 | cost. Likewise when forking, all data structures are copied as well - if |
|
|
151 | the program frees them and replaces them by new data, the child processes |
|
|
152 | will retain the memory even if it isn't used. |
|
|
153 | |
|
|
154 | This module allows the main program to do a controlled fork, and allows |
|
|
155 | modules to exec processes safely at any time. When creating a custom |
|
|
156 | process pool you can take advantage of data sharing via fork without |
|
|
157 | risking to share large dynamic data structures that will blow up child |
|
|
158 | memory usage. |
|
|
159 | |
|
|
160 | =item Exec'ing a new perl process might be difficult and slow. For |
|
|
161 | example, it is not easy to find the correct path to the perl interpreter, |
|
|
162 | and all modules have to be loaded from disk again. Long running processes |
|
|
163 | might run into problems when perl is upgraded for example. |
|
|
164 | |
|
|
165 | This module supports creating pre-initialised perl processes to be used |
|
|
166 | as template, and also tries hard to identify the correct path to the perl |
|
|
167 | interpreter. With a cooperative main program, exec'ing the interpreter |
|
|
168 | might not even be necessary. |
|
|
169 | |
|
|
170 | =item Forking might be impossible when a program is running. For example, |
|
|
171 | POSIX makes it almost impossible to fork from a multi-threaded program and |
|
|
172 | do anything useful in the child - strictly speaking, if your perl program |
|
|
173 | uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>), |
|
|
174 | you cannot call fork on the perl level anymore, at all. |
|
|
175 | |
|
|
176 | This module can safely fork helper processes at any time, by calling |
|
|
177 | fork+exec in C, in a POSIX-compatible way. |
|
|
178 | |
|
|
179 | =item Parallel processing with fork might be inconvenient or difficult |
|
|
180 | to implement. For example, when a program uses an event loop and creates |
|
|
181 | watchers it becomes very hard to use the event loop from a child |
|
|
182 | program, as the watchers already exist but are only meaningful in the |
|
|
183 | parent. Worse, a module might want to use such a system, not knowing |
|
|
184 | whether another module or the main program also does, leading to problems. |
|
|
185 | |
|
|
186 | This module only lets the main program create pools by forking (because |
|
|
187 | only the main program can know when it is still safe to do so) - all other |
|
|
188 | pools are created by fork+exec, after which such modules can again be |
|
|
189 | loaded. |
|
|
190 | |
|
|
191 | =back |
|
|
192 | |
220 | |
193 | =head1 CONCEPTS |
221 | =head1 CONCEPTS |
194 | |
222 | |
195 | This module can create new processes either by executing a new perl |
223 | This module can create new processes either by executing a new perl |
196 | process, or by forking from an existing "template" process. |
224 | process, or by forking from an existing "template" process. |