--- AnyEvent-Fork/Fork.pm 2013/04/06 08:32:23 1.24 +++ AnyEvent-Fork/Fork.pm 2013/04/06 08:55:16 1.25 @@ -42,6 +42,98 @@ L on both sides to send e.g. JSON or Storable messages, and so on. +=head1 PROBLEM STATEMENT + +There are two traditional ways to implement parallel processing on UNIX +like operating systems - fork and process, and fork+exec and process. They +have different advantages and disadvantages that I describe below, +together with how this module tries to mitigate the disadvantages. + +=over 4 + +=item Forking from a big process can be very slow. + +A 5GB process needs 0.05s to fork on my 3.6GHz amd64 GNU/Linux box. This +overhead is often shared with exec (because you have to fork first), but +in some circumstances (e.g. when vfork is used), fork+exec can be much +faster. + +This module can help here by telling a small(er) helper process to fork, +which is faster then forking the main process, and also uses vfork where +possible. This gives the speed of vfork, with the flexibility of fork. + +=item Forking usually creates a copy-on-write copy of the parent +process. + +For example, modules or data files that are loaded will not use additional +memory after a fork. When exec'ing a new process, modules and data files +might need to be loaded again, at extra CPU and memory cost. But when +forking, literally all data structures are copied - if the program frees +them and replaces them by new data, the child processes will retain the +old version even if it isn't used, which can suddenly and unexpectedly +increase memory usage when freeing memory. + +The trade-off is between more sharing with fork (which can be good or +bad), and no sharing with exec. + +This module allows the main program to do a controlled fork, and allows +modules to exec processes safely at any time. When creating a custom +process pool you can take advantage of data sharing via fork without +risking to share large dynamic data structures that will blow up child +memory usage. + +In other words, this module puts you into control over what is being +shared and what isn't, at all times. + +=item Exec'ing a new perl process might be difficult. + +For example, it is not easy to find the correct path to the perl +interpreter - C<$^X> might not be a perl interpreter at all. + +This module tries hard to identify the correct path to the perl +interpreter. With a cooperative main program, exec'ing the interpreter +might not even be necessary, but even without help from the main program, +it will still work when used from a module. + +=item Exec'ing a new perl process might be slow, as all necessary modules +have to be loaded from disk again, with no guarantees of success. + +Long running processes might run into problems when perl is upgraded +and modules are no longer loadable because they refer to a different +perl version, or parts of a distribution are newer than the ones already +loaded. + +This module supports creating pre-initialised perl processes to be used as +a template for new processes. + +=item Forking might be impossible when a program is running. + +For example, POSIX makes it almost impossible to fork from a +multi-threaded program while doing anything useful in the child - in +fact, if your perl program uses POSIX threads (even indirectly via +e.g. L or L), you cannot call fork on the perl level +anymore without risking corruption issues on a number of operating +systems. + +This module can safely fork helper processes at any time, by calling +fork+exec in C, in a POSIX-compatible way (via L). + +=item Parallel processing with fork might be inconvenient or difficult +to implement. Modules might not work in both parent and child. + +For example, when a program uses an event loop and creates watchers it +becomes very hard to use the event loop from a child program, as the +watchers already exist but are only meaningful in the parent. Worse, a +module might want to use such a module, not knowing whether another module +or the main program also does, leading to problems. + +With this module only the main program is allowed to create new processes +by forking (because only the main program can know when it is still safe +to do so) - all other processes are created via fork+exec, which makes it +possible to use modules such as event loops or window interfaces safely. + +=back + =head1 EXAMPLES =head2 Create a single new process, tell it to run your worker function. @@ -126,70 +218,6 @@ my $stderr = $cv->recv; -=head1 PROBLEM STATEMENT - -There are two ways to implement parallel processing on UNIX like operating -systems - fork and process, and fork+exec and process. They have different -advantages and disadvantages that I describe below, together with how this -module tries to mitigate the disadvantages. - -=over 4 - -=item Forking from a big process can be very slow (a 5GB process needs -0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead -is often shared with exec (because you have to fork first), but in some -circumstances (e.g. when vfork is used), fork+exec can be much faster. - -This module can help here by telling a small(er) helper process to fork, -or fork+exec instead. - -=item Forking usually creates a copy-on-write copy of the parent -process. Memory (for example, modules or data files that have been -will not take additional memory). When exec'ing a new process, modules -and data files might need to be loaded again, at extra CPU and memory -cost. Likewise when forking, all data structures are copied as well - if -the program frees them and replaces them by new data, the child processes -will retain the memory even if it isn't used. - -This module allows the main program to do a controlled fork, and allows -modules to exec processes safely at any time. When creating a custom -process pool you can take advantage of data sharing via fork without -risking to share large dynamic data structures that will blow up child -memory usage. - -=item Exec'ing a new perl process might be difficult and slow. For -example, it is not easy to find the correct path to the perl interpreter, -and all modules have to be loaded from disk again. Long running processes -might run into problems when perl is upgraded for example. - -This module supports creating pre-initialised perl processes to be used -as template, and also tries hard to identify the correct path to the perl -interpreter. With a cooperative main program, exec'ing the interpreter -might not even be necessary. - -=item Forking might be impossible when a program is running. For example, -POSIX makes it almost impossible to fork from a multi-threaded program and -do anything useful in the child - strictly speaking, if your perl program -uses posix threads (even indirectly via e.g. L or L), -you cannot call fork on the perl level anymore, at all. - -This module can safely fork helper processes at any time, by calling -fork+exec in C, in a POSIX-compatible way. - -=item Parallel processing with fork might be inconvenient or difficult -to implement. For example, when a program uses an event loop and creates -watchers it becomes very hard to use the event loop from a child -program, as the watchers already exist but are only meaningful in the -parent. Worse, a module might want to use such a system, not knowing -whether another module or the main program also does, leading to problems. - -This module only lets the main program create pools by forking (because -only the main program can know when it is still safe to do so) - all other -pools are created by fork+exec, after which such modules can again be -loaded. - -=back - =head1 CONCEPTS This module can create new processes either by executing a new perl