--- AnyEvent-Fork/Fork.pm 2013/04/06 20:07:30 1.38 +++ AnyEvent-Fork/Fork.pm 2018/07/25 22:35:00 1.71 @@ -29,20 +29,45 @@ while still supporting specialised environments such as L or L. -=head1 WHAT THIS MODULE IS NOT +=head2 WHAT THIS MODULE IS NOT This module only creates processes and lets you pass file handles and strings to it, and run perl code. It does not implement any kind of RPC - there is no back channel from the process back to you, and there is no RPC or message passing going on. -If you need some form of RPC, you can either implement it yourself -in whatever way you like, use some message-passing module such -as L, some pipe such as L, use -L on both sides to send e.g. JSON or Storable messages, -and so on. +If you need some form of RPC, you could use the L +companion module, which adds simple RPC/job queueing to a process created +by this module. + +And if you need some automatic process pool management on top of +L, you can look at the L +companion module. + +Or you can implement it yourself in whatever way you like: use some +message-passing module such as L, some pipe such as +L, use L on both sides to send +e.g. JSON or Storable messages, and so on. + +=head2 COMPARISON TO OTHER MODULES + +There is an abundance of modules on CPAN that do "something fork", such as +L, L, L +or L. There are modules that implement their own +process management, such as L. + +The problems that all these modules try to solve are real, however, none +of them (from what I have seen) tackle the very real problems of unwanted +memory sharing, efficiency or not being able to use event processing, GUI +toolkits or similar modules in the processes they create. + +This module doesn't try to replace any of them - instead it tries to solve +the problem of creating processes with a minimum of fuss and overhead (and +also luxury). Ideally, most of these would use AnyEvent::Fork internally, +except they were written before AnyEvent:Fork was available, so obviously +had to roll their own. -=head1 PROBLEM STATEMENT +=head2 PROBLEM STATEMENT There are two traditional ways to implement parallel processing on UNIX like operating systems - fork and process, and fork+exec and process. They @@ -66,14 +91,25 @@ process. For example, modules or data files that are loaded will not use additional -memory after a fork. When exec'ing a new process, modules and data files -might need to be loaded again, at extra CPU and memory cost. But when -forking, literally all data structures are copied - if the program frees -them and replaces them by new data, the child processes will retain the -old version even if it isn't used, which can suddenly and unexpectedly -increase memory usage when freeing memory. +memory after a fork. Exec'ing a new process, in contrast, means modules +and data files might need to be loaded again, at extra CPU and memory +cost. + +But when forking, you still create a copy of your data structures - if +the program frees them and replaces them by new data, the child processes +will retain the old version even if it isn't used, which can suddenly and +unexpectedly increase memory usage when freeing memory. + +For example, L is an image viewer optimised for large +directories (millions of pictures). It also forks subprocesses for +thumbnail generation, which inherit the data structure that stores all +file information. If the user changes the directory, it gets freed in +the main process, leaving a copy in the thumbnailer processes. This can +lead to many times the memory usage that would actually be required. The +solution is to fork early (and being unable to dynamically generate more +subprocesses or do this from a module)... or to use L. -The trade-off is between more sharing with fork (which can be good or +There is a trade-off between more sharing with fork (which can be good or bad), and no sharing with exec. This module allows the main program to do a controlled fork, and allows @@ -88,7 +124,8 @@ =item Exec'ing a new perl process might be difficult. For example, it is not easy to find the correct path to the perl -interpreter - C<$^X> might not be a perl interpreter at all. +interpreter - C<$^X> might not be a perl interpreter at all. Worse, there +might not even be a perl binary installed on the system. This module tries hard to identify the correct path to the perl interpreter. With a cooperative main program, exec'ing the interpreter @@ -104,7 +141,8 @@ loaded. This module supports creating pre-initialised perl processes to be used as -a template for new processes. +a template for new processes at a later time, e.g. for use in a process +pool. =item Forking might be impossible when a program is running. @@ -112,8 +150,8 @@ multi-threaded program while doing anything useful in the child - in fact, if your perl program uses POSIX threads (even indirectly via e.g. L or L), you cannot call fork on the perl level -anymore without risking corruption issues on a number of operating -systems. +anymore without risking memory corruption or worse on a number of +operating systems. This module can safely fork helper processes at any time, by calling fork+exec in C, in a POSIX-compatible way (via L). @@ -142,6 +180,8 @@ =head1 EXAMPLES +This is where the wall of text ends and code speaks. + =head2 Create a single new process, tell it to run your worker function. AnyEvent::Fork @@ -162,7 +202,7 @@ my ($slave_filehandle) = @_; # now $slave_filehandle is connected to the $master_filehandle - # in the original prorcess. have fun! + # in the original process. have fun! } =head2 Create a pool of server processes all accepting on the same socket. @@ -205,7 +245,7 @@ =head2 use AnyEvent::Fork as a faster fork+exec -This runs C, with stdandard output redirected to /tmp/log +This runs C, with standard output redirected to F and standard error redirected to the communications socket. It is usually faster than fork+exec, but still lets you prepare the environment. @@ -214,6 +254,7 @@ AnyEvent::Fork ->new ->eval (' + # compile a helper function for later use sub run { my ($fh, $output, @cmd) = @_; @@ -230,11 +271,84 @@ my $stderr = $cv->recv; +=head2 For stingy users: put the worker code into a C section. + +When you want to be stingy with files, you can put your code into the +C section of your module (or program): + + use AnyEvent::Fork; + + AnyEvent::Fork + ->new + ->eval (do { local $/; }) + ->run ("doit", sub { ... }); + + __DATA__ + + sub doit { + ... do something! + } + +=head2 For stingy standalone programs: do not rely on external files at +all. + +For single-file scripts it can be inconvenient to rely on external +files - even when using a C section, you still need to C an +external perl interpreter, which might not be available when using +L, L or L for example. + +Two modules help here - L forks a template process +for all further calls to C, and L +forks the main program as a template process. + +Here is how your main program should look like: + + #! perl + + # optional, as the very first thing. + # in case modules want to create their own processes. + use AnyEvent::Fork::Early; + + # next, load all modules you need in your template process + use Example::My::Module + use Example::Whatever; + + # next, put your run function definition and anything else you + # need, but do not use code outside of BEGIN blocks. + sub worker_run { + my ($fh, @args) = @_; + ... + } + + # now preserve everything so far as AnyEvent::Fork object + # in $TEMPLATE. + use AnyEvent::Fork::Template; + + # do not put code outside of BEGIN blocks until here + + # now use the $TEMPLATE process in any way you like + + # for example: create 10 worker processes + my @worker; + my $cv = AE::cv; + for (1..10) { + $cv->begin; + $TEMPLATE->fork->send_arg ($_)->run ("worker_run", sub { + push @worker, shift; + $cv->end; + }); + } + $cv->recv; + =head1 CONCEPTS This module can create new processes either by executing a new perl process, or by forking from an existing "template" process. +All these processes are called "child processes" (whether they are direct +children or not), while the process that manages them is called the +"parent process". + Each such process comes with its own file handle that can be used to communicate with it (it's actually a socket - one end in the new process, one end in the main process), and among the things you can do in it are @@ -353,15 +467,7 @@ use IO::FDPass; -our $VERSION = 0.5; - -our $PERL; # the path to the perl interpreter, deduces with various forms of magic - -=over 4 - -=back - -=cut +our $VERSION = 1.31; # the early fork template process our $EARLY; @@ -369,69 +475,78 @@ # the empty template process our $TEMPLATE; +sub QUEUE() { 0 } +sub FH() { 1 } +sub WW() { 2 } +sub PID() { 3 } +sub CB() { 4 } + +sub _new { + my ($self, $fh, $pid) = @_; + + AnyEvent::Util::fh_nonblocking $fh, 1; + + $self = bless [ + [], # write queue - strings or fd's + $fh, + undef, # AE watcher + $pid, + ], $self; + + $self +} + sub _cmd { my $self = shift; # ideally, we would want to use "a (w/a)*" as format string, but perl # versions from at least 5.8.9 to 5.16.3 are all buggy and can't unpack # it. - push @{ $self->[2] }, pack "a L/a*", $_[0], $_[1]; + push @{ $self->[QUEUE] }, pack "a L/a*", $_[0], $_[1]; - $self->[3] ||= AE::io $self->[1], 1, sub { + $self->[WW] ||= AE::io $self->[FH], 1, sub { do { # send the next "thing" in the queue - either a reference to an fh, # or a plain string. - if (ref $self->[2][0]) { + if (ref $self->[QUEUE][0]) { # send fh - unless (IO::FDPass::send fileno $self->[1], fileno ${ $self->[2][0] }) { + unless (IO::FDPass::send fileno $self->[FH], fileno ${ $self->[QUEUE][0] }) { return if $! == Errno::EAGAIN || $! == Errno::EWOULDBLOCK; - undef $self->[3]; + undef $self->[WW]; die "AnyEvent::Fork: file descriptor send failure: $!"; } - shift @{ $self->[2] }; + shift @{ $self->[QUEUE] }; } else { # send string - my $len = syswrite $self->[1], $self->[2][0]; + my $len = syswrite $self->[FH], $self->[QUEUE][0]; unless ($len) { return if $! == Errno::EAGAIN || $! == Errno::EWOULDBLOCK; - undef $self->[3]; + undef $self->[WW]; die "AnyEvent::Fork: command write failure: $!"; } - substr $self->[2][0], 0, $len, ""; - shift @{ $self->[2] } unless length $self->[2][0]; + substr $self->[QUEUE][0], 0, $len, ""; + shift @{ $self->[QUEUE] } unless length $self->[QUEUE][0]; } - } while @{ $self->[2] }; + } while @{ $self->[QUEUE] }; # everything written - undef $self->[3]; + undef $self->[WW]; # invoke run callback, if any - $self->[4]->($self->[1]) if $self->[4]; + if ($self->[CB]) { + $self->[CB]->($self->[FH]); + @$self = (); + } }; () # make sure we don't leak the watcher } -sub _new { - my ($self, $fh, $pid) = @_; - - AnyEvent::Util::fh_nonblocking $fh, 1; - - $self = bless [ - $pid, - $fh, - [], # write queue - strings or fd's - undef, # AE watcher - ], $self; - - $self -} - # fork template from current process, used by AnyEvent::Fork::Early/Template sub _new_fork { my ($fh, $slave) = AnyEvent::Util::portable_socketpair; @@ -443,8 +558,7 @@ require AnyEvent::Fork::Serve; $AnyEvent::Fork::Serve::OWNER = $parent; close $fh; - $0 = "$_[1] of $parent"; - $SIG{CHLD} = 'IGNORE'; + $0 = "$parent AnyEvent::Fork/exec"; AnyEvent::Fork::Serve::serve ($slave); exit 0; } elsif (!$pid) { @@ -508,31 +622,41 @@ process around is unacceptable. The path to the perl interpreter is divined using various methods - first -C<$^X> is investigated to see if the path ends with something that sounds +C<$^X> is investigated to see if the path ends with something that looks as if it were the perl interpreter. Failing this, the module falls back to using C<$Config::Config{perlpath}>. +The path to perl can also be overridden by setting the global variable +C<$AnyEvent::Fork::PERL> - it's value will be used for all subsequent +invocations. + =cut +our $PERL; + sub new_exec { my ($self) = @_; return $EARLY->fork if $EARLY; - # first find path of perl - my $perl = $; + unless (defined $PERL) { + # first find path of perl + my $perl = $^X; + + # first we try $^X, but the path must be absolute (always on win32), and end in sth. + # that looks like perl. this obviously only works for posix and win32 + unless ( + ($^O eq "MSWin32" || $perl =~ m%^/%) + && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i + ) { + # if it doesn't look perlish enough, try Config + require Config; + $perl = $Config::Config{perlpath}; + $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/; + } - # first we try $^X, but the path must be absolute (always on win32), and end in sth. - # that looks like perl. this obviously only works for posix and win32 - unless ( - ($^O eq "MSWin32" || $perl =~ m%^/%) - && $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i - ) { - # if it doesn't look perlish enough, try Config - require Config; - $perl = $Config::Config{perlpath}; - $perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/; + $PERL = $perl; } require Proc::FastSpawn; @@ -550,8 +674,8 @@ $env{PERL5LIB} = join +($^O eq "MSWin32" ? ";" : ":"), grep !ref, @INC; my $pid = Proc::FastSpawn::spawn ( - $perl, - ["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave, $$], + $PERL, + [$PERL, "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave, $$], [map "$_=$env{$_}", keys %env], ) or die "unable to spawn AnyEvent::Fork server: $!"; @@ -561,24 +685,25 @@ =item $pid = $proc->pid Returns the process id of the process I, and C otherwise. - -Normally, only processes created via C<< AnyEvent::Fork->new_exec >> and -L are direct children, and you are responsible -to clean up their zombies when they die. - -All other processes are not direct children, and will be cleaned up by -AnyEvent::Fork itself. +process running AnyEvent::Fork>, and C otherwise. As a general +rule (that you cannot rely upon), processes created via C, +L or L are direct +children, while all other processes are not. + +Or in other words, you do not normally have to take care of zombies for +processes created via C, but when in doubt, or zombies are a problem, +you need to check whether a process is a diretc child by calling this +method, and possibly creating a child watcher or reap it manually. =cut sub pid { - $_[0][0] + $_[0][PID] } =item $proc = $proc->eval ($perlcode, @args) -Evaluates the given C<$perlcode> as ... perl code, while setting C<@_> to +Evaluates the given C<$perlcode> as ... Perl code, while setting C<@_> to the strings specified by C<@args>, in the "main" package. This call is meant to do any custom initialisation that might be required @@ -597,6 +722,12 @@ Returns the process object for easy chaining of method calls. +It's common to want to call an iniitalisation function with some +arguments. Make sure you actually pass C<@_> to that function (for example +by using C<&name> syntax), and do not just specify a function name: + + $proc->eval ('&MyModule::init', $string1, $string2); + =cut sub eval { @@ -650,7 +781,7 @@ for my $fh (@fh) { $self->_cmd ("h"); - push @{ $self->[2] }, \$fh; + push @{ $self->[QUEUE] }, \$fh; } $self @@ -710,6 +841,32 @@ event on it, because exiting the process closes the socket (if it didn't create any children using fork). +=over 4 + +=item Compatibility to L + +If you want to write code that works with both this module and +L, you need to write your code so that it assumes +there are two file handles for communications, which might not be unix +domain sockets. The C function should start like this: + + sub run { + my ($rfh, @args) = @_; # @args is your normal arguments + my $wfh = fileno $rfh ? $rfh : *STDOUT; + + # now use $rfh for reading and $wfh for writing + } + +This checks whether the passed file handle is, in fact, the process +C handle. If it is, then the function was invoked visa +L, so STDIN should be used for reading and +C should be used for writing. + +In all other cases, the function was called via this module, and there is +only one file handle that should be sued for reading and writing. + +=back + Example: create a template for a process pool, pass a few strings, some file handles, then fork, pass one more string, and run some code. @@ -746,12 +903,83 @@ sub run { my ($self, $func, $cb) = @_; - $self->[4] = $cb; + $self->[CB] = $cb; $self->_cmd (r => $func); } =back + +=head2 CHILD PROCESS INTERFACE + +This module has a limited API for use in child processes. + +=over 4 + +=item @args = AnyEvent::Fork::Serve::run_args + +This function, which only exists before the C method is called, +returns the arguments that would be passed to the run function, and clears +them. + +This is mainly useful to get any file handles passed via C, but +works for any arguments passed via C<< send_I >> methods. + +=back + + +=head2 EXPERIMENTAL METHODS + +These methods might go away completely or change behaviour, at any time. + +=over 4 + +=item $proc->to_fh ($cb->($fh)) # EXPERIMENTAL, MIGHT BE REMOVED + +Flushes all commands out to the process and then calls the callback with +the communications socket. + +The process object becomes unusable on return from this function - any +further method calls result in undefined behaviour. + +The point of this method is to give you a file handle that you can pass +to another process. In that other process, you can call C to create a new C object from it, +thereby effectively passing a fork object to another process. + +=cut + +sub to_fh { + my ($self, $cb) = @_; + + $self->[CB] = $cb; + + unless ($self->[WW]) { + $self->[CB]->($self->[FH]); + @$self = (); + } +} + +=item new_from_fh AnyEvent::Fork $fh # EXPERIMENTAL, MIGHT BE REMOVED + +Takes a file handle originally rceeived by the C method and creates +a new C object. The child process itself will not change in +any way, i.e. it will keep all the modifications done to it before calling +C. + +The new object is very much like the original object, except that the +C method will return C even if the process is a direct child. + +=cut + +sub new_from_fh { + my ($class, $fh) = @_; + + $class->_new ($fh) +} + +=back + =head1 PERFORMANCE Now for some unscientific benchmark numbers (all done on an amd64 @@ -767,7 +995,7 @@ Then I did the same thing, but instead of calling fork, I called AnyEvent::Fork->new->run ("CORE::exit") and then again waited for the -socket form the child to close on exit. This does the same thing as manual +socket from the child to close on exit. This does the same thing as manual socket pair + fork, except that what is forked is the template process (2440kB), and the socket needs to be passed to the server at the other end of the socket first. @@ -784,7 +1012,7 @@ The difference is simply the process size: forking the 5MB process takes so much longer than forking the 2.5MB template process that the extra -overhead introduced is canceled out. +overhead is canceled out. If the benchmark process grows, the normal fork becomes even slower: @@ -854,7 +1082,7 @@ =item exiting calls object destructors This only applies to users of L and -L, or when initialiasing code creates objects +L, or when initialising code creates objects that reference external resources. When a process created by AnyEvent::Fork exits, it might do so by calling @@ -884,19 +1112,53 @@ useful that you can do with it without running into memory corruption issues or other braindamage. Hrrrr. +Since fork is endlessly broken on win32 perls (it doesn't even remotely +work within it's documented limits) and quite obviously it's not getting +improved any time soon, the best way to proceed on windows would be to +always use C and thus never rely on perl's fork "emulation". + Cygwin perl is not supported at the moment due to some hilarious -shortcomings of its API - see L for more details. +shortcomings of its API - see L for more details. If you never +use C and always use C to create processes, it should +work though. + +=head1 USING AnyEvent::Fork IN SUBPROCESSES + +AnyEvent::Fork itself cannot generally be used in subprocesses. As long as +only one process ever forks new processes, sharing the template processes +is possible (you could use a pipe as a lock by writing a byte into it to +unlock, and reading the byte to lock for example) + +To make concurrent calls possible after fork, you should get rid of the +template and early fork processes. AnyEvent::Fork will create a new +template process as needed. + + undef $AnyEvent::Fork::EARLY; + undef $AnyEvent::Fork::TEMPLATE; + +It doesn't matter whether you get rid of them in the parent or child after +a fork. =head1 SEE ALSO -L (to avoid executing a perl interpreter), -L (to create a process by forking the main -program at a convenient time). +L, to avoid executing a perl interpreter at all +(part of this distribution). + +L, to create a process by forking the main +program at a convenient time (part of this distribution). + +L, for another way to create processes that is +mostly compatible to this module and modules building on top of it, but +works better with remote processes. + +L, for simple RPC to child processes (on CPAN). + +L, for simple worker process pool (on CPAN). -=head1 AUTHOR +=head1 AUTHOR AND CONTACT INFORMATION Marc Lehmann - http://home.schmorp.de/ + http://software.schmorp.de/pkg/AnyEvent-Fork =cut