[ViewVC] Diff of: cvs/AnyEvent-Fork/Fork.pm

Comparing AnyEvent-Fork/Fork.pm (file contents):
Revision 1.31 by root, Sat Apr 6 09:29:26 2013 UTC vs.
Revision 1.49 by root, Fri Apr 19 12:56:53 2013 UTC

 Special care has been taken to make this module useful from other modules,
 while still supporting specialised environments such as L<App::Staticperl>
 or L<PAR::Packer>.
-=head1 WHAT THIS MODULE IS NOT
+=head2 WHAT THIS MODULE IS NOT
 This module only creates processes and lets you pass file handles and
 strings to it, and run perl code. It does not implement any kind of RPC -
 there is no back channel from the process back to you, and there is no RPC
 or message passing going on.
-If you need some form of RPC, you can either implement it yourself
+If you need some form of RPC, you could use the L<AnyEvent::Fork::RPC>
-in whatever way you like, use some message-passing module such
+companion module, which adds simple RPC/job queueing to a process created
-as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use
+by this module.
-L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages,
-and so on.
+Or you can implement it yourself in whatever way you like, use some
+message-passing module such as L<AnyEvent::MP>, some pipe such as
+L<AnyEvent::ZeroMQ>, use L<AnyEvent::Handle> on both sides to send
+e.g. JSON or Storable messages, and so on.
+=head2 COMPARISON TO OTHER MODULES
+There is an abundance of modules on CPAN that do "something fork", such as
+L<Parallel::ForkManager>, L<AnyEvent::ForkManager>, L<AnyEvent::Worker>
+or L<AnyEvent::Subprocess>. There are modules that implement their own
+process management, such as L<AnyEvent::DBI>.
+The problems that all these modules try to solve are real, however, none
+of them (from what I have seen) tackle the very real problems of unwanted
+memory sharing, efficiency, not being able to use event processing or
+similar modules in the processes they create.
+This module doesn't try to replace any of them - instead it tries to solve
+the problem of creating processes with a minimum of fuss and overhead (and
+also luxury). Ideally, most of these would use AnyEvent::Fork internally,
+except they were written before AnyEvent:Fork was available, so obviously
+had to roll their own.
-=head1 PROBLEM STATEMENT
+=head2 PROBLEM STATEMENT
 There are two traditional ways to implement parallel processing on UNIX
 like operating systems - fork and process, and fork+exec and process. They
 have different advantages and disadvantages that I describe below,
 together with how this module tries to mitigate the disadvantages.
          # now $master_filehandle is connected to the
          # $slave_filehandle in the new process.
       });
-MyModule might look like this:
+C<MyModule> might look like this:
    package MyModule;
    sub worker {
       my ($slave_filehandle) = @_;
    }
    # now do other things - maybe use the filehandle provided by run
    # to wait for the processes to die. or whatever.
-My::Server might look like this:
+C<My::Server> might look like this:
    package My::Server;
    sub run {
       my ($slave, $listener, $id) = @_;
       }
    }
 =head2 use AnyEvent::Fork as a faster fork+exec
-This runs /bin/echo hi, with stdout redirected to /tmp/log and stderr to
+This runs C</bin/echo hi>, with standard output redirected to F</tmp/log>
-the communications socket. It is usually faster than fork+exec, but still
+and standard error redirected to the communications socket. It is usually
-let's you prepare the environment.
+faster than fork+exec, but still lets you prepare the environment.
    open my $output, ">/tmp/log" or die "$!";
    AnyEvent::Fork
       ->new
       ->eval ('
+           # compile a helper function for later use
            sub run {
               my ($fh, $output, @cmd) = @_;
               # perl will clear close-on-exec on STDOUT/STDERR
               open STDOUT, ">&", $output or die;
 =head1 CONCEPTS
 This module can create new processes either by executing a new perl
 process, or by forking from an existing "template" process.
+All these processes are called "child processes" (whether they are direct
+children or not), while the process that manages them is called the
+"parent process".
 Each such process comes with its own file handle that can be used to
 communicate with it (it's actually a socket - one end in the new process,
 one end in the main process), and among the things you can do in it are
 load modules, fork new processes, send file handles to it, and execute
 use AnyEvent;
 use AnyEvent::Util ();
 use IO::FDPass;
-our $VERSION = 0.5;
+our $VERSION = 0.7;
-our $PERL; # the path to the perl interpreter, deduces with various forms of magic
-=over 4
-=back
-=cut
 # the early fork template process
 our $EARLY;
 # the empty template process
 our $TEMPLATE;
+sub QUEUE() { 0 }
+sub FH()    { 1 }
+sub WW()    { 2 }
+sub PID()   { 3 }
+sub CB()    { 4 }
+sub _new {
+   my ($self, $fh, $pid) = @_;
+   AnyEvent::Util::fh_nonblocking $fh, 1;
+   $self = bless [
+      [],    # write queue - strings or fd's
+      $fh,
+      undef, # AE watcher
+      $pid,
+   ], $self;
+   $self
+}
 sub _cmd {
    my $self = shift;
    # ideally, we would want to use "a (w/a)*" as format string, but perl
    # versions from at least 5.8.9 to 5.16.3 are all buggy and can't unpack
    # it.
-   push @{ $self->[2] }, pack "a L/a*", $_[0], $_[1];
+   push @{ $self->[QUEUE] }, pack "a L/a*", $_[0], $_[1];
-   $self->[3] ||= AE::io $self->[1], 1, sub {
+   $self->[WW] ||= AE::io $self->[FH], 1, sub {
       do {
          # send the next "thing" in the queue - either a reference to an fh,
          # or a plain string.
-         if (ref $self->[2][0]) {
+         if (ref $self->[QUEUE][0]) {
             # send fh
-            unless (IO::FDPass::send fileno $self->[1], fileno ${ $self->[2][0] }) {
+            unless (IO::FDPass::send fileno $self->[FH], fileno ${ $self->[QUEUE][0] }) {
                return if $! == Errno::EAGAIN || $! == Errno::EWOULDBLOCK;
-               undef $self->[3];
+               undef $self->[WW];
                die "AnyEvent::Fork: file descriptor send failure: $!";
             }
-            shift @{ $self->[2] };
+            shift @{ $self->[QUEUE] };
          } else {
             # send string
-            my $len = syswrite $self->[1], $self->[2][0];
+            my $len = syswrite $self->[FH], $self->[QUEUE][0];
             unless ($len) {
                return if $! == Errno::EAGAIN || $! == Errno::EWOULDBLOCK;
                undef $self->[3];
                die "AnyEvent::Fork: command write failure: $!";
             }
-            substr $self->[2][0], 0, $len, "";
+            substr $self->[QUEUE][0], 0, $len, "";
-            shift @{ $self->[2] } unless length $self->[2][0];
+            shift @{ $self->[QUEUE] } unless length $self->[QUEUE][0];
          }
-      } while @{ $self->[2] };
+      } while @{ $self->[QUEUE] };
       # everything written
-      undef $self->[3];
+      undef $self->[WW];
       # invoke run callback, if any
-      $self->[4]->($self->[1]) if $self->[4];
+      if ($self->[CB]) {
+         $self->[CB]->($self->[FH]);
+         @$self = ();
+      }
    };
    () # make sure we don't leak the watcher
-}
-sub _new {
-   my ($self, $fh, $pid) = @_;
-   AnyEvent::Util::fh_nonblocking $fh, 1;
-   $self = bless [
-      $pid,
-      $fh,
-      [],    # write queue - strings or fd's
-      undef, # AE watcher
-   ], $self;
-   $self
 }
 # fork template from current process, used by AnyEvent::Fork::Early/Template
 sub _new_fork {
    my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
    if ($pid eq 0) {
       require AnyEvent::Fork::Serve;
       $AnyEvent::Fork::Serve::OWNER = $parent;
       close $fh;
       $0 = "$_[1] of $parent";
-      $SIG{CHLD} = 'IGNORE';
       AnyEvent::Fork::Serve::serve ($slave);
       exit 0;
    } elsif (!$pid) {
       die "AnyEvent::Fork::Early/Template: unable to fork template process: $!";
    }
 }
 =item $pid = $proc->pid
 Returns the process id of the process I<iff it is a direct child of the
-process> running AnyEvent::Fork, and C<undef> otherwise.
+process running AnyEvent::Fork>, and C<undef> otherwise.
 Normally, only processes created via C<< AnyEvent::Fork->new_exec >> and
 L<AnyEvent::Fork::Template> are direct children, and you are responsible
 to clean up their zombies when they die.
 AnyEvent::Fork itself.
 =cut
 sub pid {
-   $_[0][0]
+   $_[0][PID]
 }
 =item $proc = $proc->eval ($perlcode, @args)
-Evaluates the given C<$perlcode> as ... perl code, while setting C<@_> to
+Evaluates the given C<$perlcode> as ... Perl code, while setting C<@_> to
 the strings specified by C<@args>, in the "main" package.
 This call is meant to do any custom initialisation that might be required
 (for example, the C<require> method uses it). It's not supposed to be used
 to completely take over the process, use C<run> for that.
 The code will usually be executed after this call returns, and there is no
 way to pass anything back to the calling process. Any evaluation errors
 will be reported to stderr and cause the process to exit.
-If you want to execute some code to take over the process (see the
+If you want to execute some code (that isn't in a module) to take over the
-"fork+exec" example in the SYNOPSIS), you should compile a function via
+process, you should compile a function via C<eval> first, and then call
-C<eval> first, and then call it via C<run>. This also gives you access to
+it via C<run>. This also gives you access to any arguments passed via the
-any arguments passed via the C<send_xxx> methods, such as file handles.
+C<send_xxx> methods, such as file handles. See the L<use AnyEvent::Fork as
+a faster fork+exec> example to see it in action.
 Returns the process object for easy chaining of method calls.
 =cut
 =item $proc = $proc->send_fh ($handle, ...)
 Send one or more file handles (I<not> file descriptors) to the process,
 to prepare a call to C<run>.
-The process object keeps a reference to the handles until this is done,
+The process object keeps a reference to the handles until they have
-so you must not explicitly close the handles. This is most easily
+been passed over to the process, so you must not explicitly close the
-accomplished by simply not storing the file handles anywhere after passing
+handles. This is most easily accomplished by simply not storing the file
-them to this method.
+handles anywhere after passing them to this method - when AnyEvent::Fork
+is finished using them, perl will automatically close them.
 Returns the process object for easy chaining of method calls.
 Example: pass a file handle to a process, and release it without
 closing. It will be closed automatically when it is no longer used.
 sub send_fh {
    my ($self, @fh) = @_;
    for my $fh (@fh) {
       $self->_cmd ("h");
-      push @{ $self->[2] }, \$fh;
+      push @{ $self->[QUEUE] }, \$fh;
    }
    $self
 }
 =item $proc = $proc->send_arg ($string, ...)
 Send one or more argument strings to the process, to prepare a call to
-C<run>. The strings can be any octet string.
+C<run>. The strings can be any octet strings.
 The protocol is optimised to pass a moderate number of relatively short
 strings - while you can pass up to 4GB of data in one go, this is more
 meant to pass some ID information or other startup info, not big chunks of
 data.
 Enter the function specified by the function name in C<$func> in the
 process. The function is called with the communication socket as first
 argument, followed by all file handles and string arguments sent earlier
 via C<send_fh> and C<send_arg> methods, in the order they were called.
+The process object becomes unusable on return from this function - any
+further method calls result in undefined behaviour.
 The function name should be fully qualified, but if it isn't, it will be
-looked up in the main package.
+looked up in the C<main> package.
 If the called function returns, doesn't exist, or any error occurs, the
 process exits.
 Preparing the process is done in the background - when all commands have
 been sent, the callback is invoked with the local communications socket
 as argument. At this point you can start using the socket in any way you
 like.
-The process object becomes unusable on return from this function - any
-further method calls result in undefined behaviour.
 If the communication socket isn't used, it should be closed on both sides,
 to save on kernel memory.
 The socket is non-blocking in the parent, and blocking in the newly
 =cut
 sub run {
    my ($self, $func, $cb) = @_;
-   $self->[4] = $cb;
+   $self->[CB] = $cb;
    $self->_cmd (r => $func);
+}
+=item $proc->to_fh ($cb->($fh))
+Flushes all commands out to the process and then calls the callback with
+the communications socket.
+The process object becomes unusable on return from this function - any
+further method calls result in undefined behaviour.
+The point of this method is to give you a file handle thta you cna pass
+to another process. In that other process, you can call C<new_from_fh
+AnyEvent::Fork::RPC> to create a new C<AnyEvent::Fork> object from it,
+thereby effectively passing a fork object to another process.
+=cut
+sub to_fh {
+   my ($self, $cb) = @_;
+   $self->[CB] = $cb;
+   unless ($self->[WW]) {
+      $self->[CB]->($self->[FH]);
+      @$self = ();
+   }
+}
+=item new_from_fh AnyEvent::Fork $fh
+Takes a file handle originally rceeived by the C<to_fh> method and creates
+a new C<AnyEvent:Fork> object. The child process itself will not change in
+any way, i.e. it will keep all the modifications done to it before calling
+C<to_fh>.
+The new object is very much like the original object, except that the
+C<pid> method will return C<undef> even if the process is a direct child.
+=cut
+sub new_from_fh {
+   my ($class, $fh) = @_;
+   $class->_new ($fh)
 }
 =back
 =head1 PERFORMANCE
    2079 new processes per second, using manual socketpair + fork
 Then I did the same thing, but instead of calling fork, I called
 AnyEvent::Fork->new->run ("CORE::exit") and then again waited for the
-socket form the child to close on exit. This does the same thing as manual
+socket from the child to close on exit. This does the same thing as manual
 socket pair + fork, except that what is forked is the template process
 (2440kB), and the socket needs to be passed to the server at the other end
 of the socket first.
    2307 new processes per second, using AnyEvent::Fork->new
     479 vfork+execs per second, using AnyEvent::Fork->new_exec
 So how can C<< AnyEvent->new >> be faster than a standard fork, even
 though it uses the same operations, but adds a lot of overhead?
-The difference is simply the process size: forking the 6MB process takes
+The difference is simply the process size: forking the 5MB process takes
-so much longer than forking the 2.5MB template process that the overhead
+so much longer than forking the 2.5MB template process that the extra
-introduced is canceled out.
+overhead is canceled out.
 If the benchmark process grows, the normal fork becomes even slower:
-   1340 new processes, manual fork in a 20MB process
+   1340 new processes, manual fork of a 20MB process
-    731 new processes, manual fork in a 200MB process
+    731 new processes, manual fork of a 200MB process
-    235 new processes, manual fork in a 2000MB process
+    235 new processes, manual fork of a 2000MB process
-What that means (to me) is that I can use this module without having a
+What that means (to me) is that I can use this module without having a bad
-very bad conscience because of the extra overhead required to start new
+conscience because of the extra overhead required to start new processes.
-processes.
 =head1 TYPICAL PROBLEMS
 This section lists typical problems that remain. I hope by recognising
 them, most can be avoided.
 =over 4
-=item "leaked" file descriptors for exec'ed processes
+=item leaked file descriptors for exec'ed processes
 POSIX systems inherit file descriptors by default when exec'ing a new
 process. While perl itself laudably sets the close-on-exec flags on new
 file handles, most C libraries don't care, and even if all cared, it's
 often not possible to set the flag in a race-free manner.
 libraries or the code that leaks those file descriptors.
 Fortunately, most of these leaked descriptors do no harm, other than
 sitting on some resources.
-=item "leaked" file descriptors for fork'ed processes
+=item leaked file descriptors for fork'ed processes
 Normally, L<AnyEvent::Fork> does start new processes by exec'ing them,
 which closes file descriptors not marked for being inherited.
 However, L<AnyEvent::Fork::Early> and L<AnyEvent::Fork::Template> offer
 The solution is to either not load these modules before use'ing
 L<AnyEvent::Fork::Early> or L<AnyEvent::Fork::Template>, or to delay
 initialising them, for example, by calling C<init Gtk2> manually.
-=item exit runs destructors
+=item exiting calls object destructors
-This only applies to users of Lc<AnyEvent::Fork:Early> and
+This only applies to users of L<AnyEvent::Fork:Early> and
-L<AnyEvent::Fork::Template>.
+L<AnyEvent::Fork::Template>, or when initialising code creates objects
+that reference external resources.
 When a process created by AnyEvent::Fork exits, it might do so by calling
 exit, or simply letting perl reach the end of the program. At which point
 Perl runs all destructors.
 to make it so, mostly due to the bloody broken perl that nobody seems to
 care about. The fork emulation is a bad joke - I have yet to see something
 useful that you can do with it without running into memory corruption
 issues or other braindamage. Hrrrr.
-Cygwin perl is not supported at the moment, as it should implement fd
+Since fork is endlessly broken on win32 perls (it doesn't even remotely
-passing, but doesn't, and rolling my own is hard, as cygwin doesn't
+work within it's documented limits) and quite obviously it's not getting
-support enough functionality to do it.
+improved any time soon, the best way to proceed on windows would be to
+always use C<new_exec> and thus never rely on perl's fork "emulation".
+Cygwin perl is not supported at the moment due to some hilarious
+shortcomings of its API - see L<IO::FDPoll> for more details. If you never
+use C<send_fh> and always use C<new_exec> to create processes, it should
+work though.
 =head1 SEE ALSO
-L<AnyEvent::Fork::Early> (to avoid executing a perl interpreter),
+L<AnyEvent::Fork::Early>, to avoid executing a perl interpreter at all
+(part of this distribution).
-L<AnyEvent::Fork::Template> (to create a process by forking the main
+L<AnyEvent::Fork::Template>, to create a process by forking the main
-program at a convenient time).
+program at a convenient time (part of this distribution).
-=head1 AUTHOR
+L<AnyEvent::Fork::RPC>, for simple RPC to child processes (on CPAN).
+=head1 AUTHOR AND CONTACT INFORMATION
  Marc Lehmann <schmorp@schmorp.de>
- http://home.schmorp.de/
+ http://software.schmorp.de/pkg/AnyEvent-Fork
 =cut
 1

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing AnyEvent-Fork/Fork.pm (file contents): Revision 1.31 by root, Sat Apr 6 09:29:26 2013 UTC vs. Revision 1.49 by root, Fri Apr 19 12:56:53 2013 UTC

Diff Legend

Comparing AnyEvent-Fork/Fork.pm (file contents):
Revision 1.31 by root, Sat Apr 6 09:29:26 2013 UTC vs.
Revision 1.49 by root, Fri Apr 19 12:56:53 2013 UTC