… | |
… | |
27 | |
27 | |
28 | Special care has been taken to make this module useful from other modules, |
28 | Special care has been taken to make this module useful from other modules, |
29 | while still supporting specialised environments such as L<App::Staticperl> |
29 | while still supporting specialised environments such as L<App::Staticperl> |
30 | or L<PAR::Packer>. |
30 | or L<PAR::Packer>. |
31 | |
31 | |
32 | =head1 WHAT THIS MODULE IS NOT |
32 | =head2 WHAT THIS MODULE IS NOT |
33 | |
33 | |
34 | This module only creates processes and lets you pass file handles and |
34 | This module only creates processes and lets you pass file handles and |
35 | strings to it, and run perl code. It does not implement any kind of RPC - |
35 | strings to it, and run perl code. It does not implement any kind of RPC - |
36 | there is no back channel from the process back to you, and there is no RPC |
36 | there is no back channel from the process back to you, and there is no RPC |
37 | or message passing going on. |
37 | or message passing going on. |
… | |
… | |
40 | in whatever way you like, use some message-passing module such |
40 | in whatever way you like, use some message-passing module such |
41 | as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use |
41 | as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use |
42 | L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages, |
42 | L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages, |
43 | and so on. |
43 | and so on. |
44 | |
44 | |
|
|
45 | =head2 COMPARISON TO OTHER MODULES |
|
|
46 | |
|
|
47 | There is an abundance of modules on CPAN that do "something fork", such as |
|
|
48 | L<Parallel::ForkManager>, L<AnyEvent::ForkManager>, L<AnyEvent::Worker> |
|
|
49 | or L<AnyEvent::Subprocess>. There are modules that implement their own |
|
|
50 | process management, such as L<AnyEvent::DBI>. |
|
|
51 | |
|
|
52 | The problems that all these modules try to solve are real, however, none |
|
|
53 | of them (from what I have seen) tackle the very real problems of unwanted |
|
|
54 | memory sharing, efficiency, not being able to use event processing or |
|
|
55 | similar modules in the processes they create. |
|
|
56 | |
|
|
57 | This module doesn't try to replace any of them - instead it tries to solve |
|
|
58 | the problem of creating processes with a minimum of fuss and overhead (and |
|
|
59 | also luxury). Ideally, most of these would use AnyEvent::Fork internally, |
|
|
60 | except they were written before AnyEvent:Fork was available, so obviously |
|
|
61 | had to roll their own. |
|
|
62 | |
45 | =head1 PROBLEM STATEMENT |
63 | =head2 PROBLEM STATEMENT |
46 | |
64 | |
47 | There are two traditional ways to implement parallel processing on UNIX |
65 | There are two traditional ways to implement parallel processing on UNIX |
48 | like operating systems - fork and process, and fork+exec and process. They |
66 | like operating systems - fork and process, and fork+exec and process. They |
49 | have different advantages and disadvantages that I describe below, |
67 | have different advantages and disadvantages that I describe below, |
50 | together with how this module tries to mitigate the disadvantages. |
68 | together with how this module tries to mitigate the disadvantages. |
… | |
… | |
152 | |
170 | |
153 | # now $master_filehandle is connected to the |
171 | # now $master_filehandle is connected to the |
154 | # $slave_filehandle in the new process. |
172 | # $slave_filehandle in the new process. |
155 | }); |
173 | }); |
156 | |
174 | |
157 | # MyModule::worker might look like this |
175 | C<MyModule> might look like this: |
|
|
176 | |
|
|
177 | package MyModule; |
|
|
178 | |
158 | sub MyModule::worker { |
179 | sub worker { |
159 | my ($slave_filehandle) = @_; |
180 | my ($slave_filehandle) = @_; |
160 | |
181 | |
161 | # now $slave_filehandle is connected to the $master_filehandle |
182 | # now $slave_filehandle is connected to the $master_filehandle |
162 | # in the original prorcess. have fun! |
183 | # in the original prorcess. have fun! |
163 | } |
184 | } |
… | |
… | |
182 | } |
203 | } |
183 | |
204 | |
184 | # now do other things - maybe use the filehandle provided by run |
205 | # now do other things - maybe use the filehandle provided by run |
185 | # to wait for the processes to die. or whatever. |
206 | # to wait for the processes to die. or whatever. |
186 | |
207 | |
187 | # My::Server::run might look like this |
208 | C<My::Server> might look like this: |
188 | sub My::Server::run { |
209 | |
|
|
210 | package My::Server; |
|
|
211 | |
|
|
212 | sub run { |
189 | my ($slave, $listener, $id) = @_; |
213 | my ($slave, $listener, $id) = @_; |
190 | |
214 | |
191 | close $slave; # we do not use the socket, so close it to save resources |
215 | close $slave; # we do not use the socket, so close it to save resources |
192 | |
216 | |
193 | # we could go ballistic and use e.g. AnyEvent here, or IO::AIO, |
217 | # we could go ballistic and use e.g. AnyEvent here, or IO::AIO, |
… | |
… | |
197 | } |
221 | } |
198 | } |
222 | } |
199 | |
223 | |
200 | =head2 use AnyEvent::Fork as a faster fork+exec |
224 | =head2 use AnyEvent::Fork as a faster fork+exec |
201 | |
225 | |
202 | This runs /bin/echo hi, with stdout redirected to /tmp/log and stderr to |
226 | This runs C</bin/echo hi>, with stdandard output redirected to /tmp/log |
203 | the communications socket. It is usually faster than fork+exec, but still |
227 | and standard error redirected to the communications socket. It is usually |
204 | let's you prepare the environment. |
228 | faster than fork+exec, but still lets you prepare the environment. |
205 | |
229 | |
206 | open my $output, ">/tmp/log" or die "$!"; |
230 | open my $output, ">/tmp/log" or die "$!"; |
207 | |
231 | |
208 | AnyEvent::Fork |
232 | AnyEvent::Fork |
209 | ->new |
233 | ->new |
210 | ->eval (' |
234 | ->eval (' |
|
|
235 | # compile a helper function for later use |
211 | sub run { |
236 | sub run { |
212 | my ($fh, $output, @cmd) = @_; |
237 | my ($fh, $output, @cmd) = @_; |
213 | |
238 | |
214 | # perl will clear close-on-exec on STDOUT/STDERR |
239 | # perl will clear close-on-exec on STDOUT/STDERR |
215 | open STDOUT, ">&", $output or die; |
240 | open STDOUT, ">&", $output or die; |
… | |
… | |
309 | my ($fork_fh) = @_; |
334 | my ($fork_fh) = @_; |
310 | }); |
335 | }); |
311 | |
336 | |
312 | =back |
337 | =back |
313 | |
338 | |
314 | =head1 FUNCTIONS |
339 | =head1 THE C<AnyEvent::Fork> CLASS |
|
|
340 | |
|
|
341 | This module exports nothing, and only implements a single class - |
|
|
342 | C<AnyEvent::Fork>. |
|
|
343 | |
|
|
344 | There are two class constructors that both create new processes - C<new> |
|
|
345 | and C<new_exec>. The C<fork> method creates a new process by forking an |
|
|
346 | existing one and could be considered a third constructor. |
|
|
347 | |
|
|
348 | Most of the remaining methods deal with preparing the new process, by |
|
|
349 | loading code, evaluating code and sending data to the new process. They |
|
|
350 | usually return the process object, so you can chain method calls. |
|
|
351 | |
|
|
352 | If a process object is destroyed before calling its C<run> method, then |
|
|
353 | the process simply exits. After C<run> is called, all responsibility is |
|
|
354 | passed to the specified function. |
|
|
355 | |
|
|
356 | As long as there is any outstanding work to be done, process objects |
|
|
357 | resist being destroyed, so there is no reason to store them unless you |
|
|
358 | need them later - configure and forget works just fine. |
315 | |
359 | |
316 | =over 4 |
360 | =over 4 |
317 | |
361 | |
318 | =cut |
362 | =cut |
319 | |
363 | |
… | |
… | |
326 | use AnyEvent; |
370 | use AnyEvent; |
327 | use AnyEvent::Util (); |
371 | use AnyEvent::Util (); |
328 | |
372 | |
329 | use IO::FDPass; |
373 | use IO::FDPass; |
330 | |
374 | |
331 | our $VERSION = 0.5; |
375 | our $VERSION = 0.6; |
332 | |
|
|
333 | our $PERL; # the path to the perl interpreter, deduces with various forms of magic |
|
|
334 | |
|
|
335 | =item my $pool = new AnyEvent::Fork key => value... |
|
|
336 | |
|
|
337 | Create a new process pool. The following named parameters are supported: |
|
|
338 | |
376 | |
339 | =over 4 |
377 | =over 4 |
340 | |
378 | |
341 | =back |
379 | =back |
342 | |
380 | |
… | |
… | |
421 | if ($pid eq 0) { |
459 | if ($pid eq 0) { |
422 | require AnyEvent::Fork::Serve; |
460 | require AnyEvent::Fork::Serve; |
423 | $AnyEvent::Fork::Serve::OWNER = $parent; |
461 | $AnyEvent::Fork::Serve::OWNER = $parent; |
424 | close $fh; |
462 | close $fh; |
425 | $0 = "$_[1] of $parent"; |
463 | $0 = "$_[1] of $parent"; |
426 | $SIG{CHLD} = 'IGNORE'; |
|
|
427 | AnyEvent::Fork::Serve::serve ($slave); |
464 | AnyEvent::Fork::Serve::serve ($slave); |
428 | exit 0; |
465 | exit 0; |
429 | } elsif (!$pid) { |
466 | } elsif (!$pid) { |
430 | die "AnyEvent::Fork::Early/Template: unable to fork template process: $!"; |
467 | die "AnyEvent::Fork::Early/Template: unable to fork template process: $!"; |
431 | } |
468 | } |
… | |
… | |
438 | Create a new "empty" perl interpreter process and returns its process |
475 | Create a new "empty" perl interpreter process and returns its process |
439 | object for further manipulation. |
476 | object for further manipulation. |
440 | |
477 | |
441 | The new process is forked from a template process that is kept around |
478 | The new process is forked from a template process that is kept around |
442 | for this purpose. When it doesn't exist yet, it is created by a call to |
479 | for this purpose. When it doesn't exist yet, it is created by a call to |
443 | C<new_exec> and kept around for future calls. |
480 | C<new_exec> first and then stays around for future calls. |
444 | |
|
|
445 | When the process object is destroyed, it will release the file handle |
|
|
446 | that connects it with the new process. When the new process has not yet |
|
|
447 | called C<run>, then the process will exit. Otherwise, what happens depends |
|
|
448 | entirely on the code that is executed. |
|
|
449 | |
481 | |
450 | =cut |
482 | =cut |
451 | |
483 | |
452 | sub new { |
484 | sub new { |
453 | my $class = shift; |
485 | my $class = shift; |
… | |
… | |
543 | } |
575 | } |
544 | |
576 | |
545 | =item $pid = $proc->pid |
577 | =item $pid = $proc->pid |
546 | |
578 | |
547 | Returns the process id of the process I<iff it is a direct child of the |
579 | Returns the process id of the process I<iff it is a direct child of the |
548 | process> running AnyEvent::Fork, and C<undef> otherwise. |
580 | process running AnyEvent::Fork>, and C<undef> otherwise. |
549 | |
581 | |
550 | Normally, only processes created via C<< AnyEvent::Fork->new_exec >> and |
582 | Normally, only processes created via C<< AnyEvent::Fork->new_exec >> and |
551 | L<AnyEvent::Fork::Template> are direct children, and you are responsible |
583 | L<AnyEvent::Fork::Template> are direct children, and you are responsible |
552 | to clean up their zombies when they die. |
584 | to clean up their zombies when they die. |
553 | |
585 | |
554 | All other processes are not direct children, and will be cleaned up by |
586 | All other processes are not direct children, and will be cleaned up by |
555 | AnyEvent::Fork. |
587 | AnyEvent::Fork itself. |
556 | |
588 | |
557 | =cut |
589 | =cut |
558 | |
590 | |
559 | sub pid { |
591 | sub pid { |
560 | $_[0][0] |
592 | $_[0][0] |
… | |
… | |
571 | |
603 | |
572 | The code will usually be executed after this call returns, and there is no |
604 | The code will usually be executed after this call returns, and there is no |
573 | way to pass anything back to the calling process. Any evaluation errors |
605 | way to pass anything back to the calling process. Any evaluation errors |
574 | will be reported to stderr and cause the process to exit. |
606 | will be reported to stderr and cause the process to exit. |
575 | |
607 | |
576 | If you want to execute some code to take over the process (see the |
608 | If you want to execute some code (that isn't in a module) to take over the |
577 | "fork+exec" example in the SYNOPSIS), you should compile a function via |
609 | process, you should compile a function via C<eval> first, and then call |
578 | C<eval> first, and then call it via C<run>. This also gives you access to |
610 | it via C<run>. This also gives you access to any arguments passed via the |
579 | any arguments passed via the C<send_xxx> methods, such as file handles. |
611 | C<send_xxx> methods, such as file handles. See the L<use AnyEvent::Fork as |
|
|
612 | a faster fork+exec> example to see it in action. |
580 | |
613 | |
581 | Returns the process object for easy chaining of method calls. |
614 | Returns the process object for easy chaining of method calls. |
582 | |
615 | |
583 | =cut |
616 | =cut |
584 | |
617 | |
… | |
… | |
610 | =item $proc = $proc->send_fh ($handle, ...) |
643 | =item $proc = $proc->send_fh ($handle, ...) |
611 | |
644 | |
612 | Send one or more file handles (I<not> file descriptors) to the process, |
645 | Send one or more file handles (I<not> file descriptors) to the process, |
613 | to prepare a call to C<run>. |
646 | to prepare a call to C<run>. |
614 | |
647 | |
615 | The process object keeps a reference to the handles until this is done, |
648 | The process object keeps a reference to the handles until they have |
616 | so you must not explicitly close the handles. This is most easily |
649 | been passed over to the process, so you must not explicitly close the |
617 | accomplished by simply not storing the file handles anywhere after passing |
650 | handles. This is most easily accomplished by simply not storing the file |
618 | them to this method. |
651 | handles anywhere after passing them to this method - when AnyEvent::Fork |
|
|
652 | is finished using them, perl will automatically close them. |
619 | |
653 | |
620 | Returns the process object for easy chaining of method calls. |
654 | Returns the process object for easy chaining of method calls. |
621 | |
655 | |
622 | Example: pass a file handle to a process, and release it without |
656 | Example: pass a file handle to a process, and release it without |
623 | closing. It will be closed automatically when it is no longer used. |
657 | closing. It will be closed automatically when it is no longer used. |
… | |
… | |
639 | } |
673 | } |
640 | |
674 | |
641 | =item $proc = $proc->send_arg ($string, ...) |
675 | =item $proc = $proc->send_arg ($string, ...) |
642 | |
676 | |
643 | Send one or more argument strings to the process, to prepare a call to |
677 | Send one or more argument strings to the process, to prepare a call to |
644 | C<run>. The strings can be any octet string. |
678 | C<run>. The strings can be any octet strings. |
645 | |
679 | |
646 | The protocol is optimised to pass a moderate number of relatively short |
680 | The protocol is optimised to pass a moderate number of relatively short |
647 | strings - while you can pass up to 4GB of data in one go, this is more |
681 | strings - while you can pass up to 4GB of data in one go, this is more |
648 | meant to pass some ID information or other startup info, not big chunks of |
682 | meant to pass some ID information or other startup info, not big chunks of |
649 | data. |
683 | data. |
… | |
… | |
665 | Enter the function specified by the function name in C<$func> in the |
699 | Enter the function specified by the function name in C<$func> in the |
666 | process. The function is called with the communication socket as first |
700 | process. The function is called with the communication socket as first |
667 | argument, followed by all file handles and string arguments sent earlier |
701 | argument, followed by all file handles and string arguments sent earlier |
668 | via C<send_fh> and C<send_arg> methods, in the order they were called. |
702 | via C<send_fh> and C<send_arg> methods, in the order they were called. |
669 | |
703 | |
|
|
704 | The process object becomes unusable on return from this function - any |
|
|
705 | further method calls result in undefined behaviour. |
|
|
706 | |
670 | The function name should be fully qualified, but if it isn't, it will be |
707 | The function name should be fully qualified, but if it isn't, it will be |
671 | looked up in the main package. |
708 | looked up in the C<main> package. |
672 | |
709 | |
673 | If the called function returns, doesn't exist, or any error occurs, the |
710 | If the called function returns, doesn't exist, or any error occurs, the |
674 | process exits. |
711 | process exits. |
675 | |
712 | |
676 | Preparing the process is done in the background - when all commands have |
713 | Preparing the process is done in the background - when all commands have |
677 | been sent, the callback is invoked with the local communications socket |
714 | been sent, the callback is invoked with the local communications socket |
678 | as argument. At this point you can start using the socket in any way you |
715 | as argument. At this point you can start using the socket in any way you |
679 | like. |
716 | like. |
680 | |
|
|
681 | The process object becomes unusable on return from this function - any |
|
|
682 | further method calls result in undefined behaviour. |
|
|
683 | |
717 | |
684 | If the communication socket isn't used, it should be closed on both sides, |
718 | If the communication socket isn't used, it should be closed on both sides, |
685 | to save on kernel memory. |
719 | to save on kernel memory. |
686 | |
720 | |
687 | The socket is non-blocking in the parent, and blocking in the newly |
721 | The socket is non-blocking in the parent, and blocking in the newly |
… | |
… | |
762 | 479 vfork+execs per second, using AnyEvent::Fork->new_exec |
796 | 479 vfork+execs per second, using AnyEvent::Fork->new_exec |
763 | |
797 | |
764 | So how can C<< AnyEvent->new >> be faster than a standard fork, even |
798 | So how can C<< AnyEvent->new >> be faster than a standard fork, even |
765 | though it uses the same operations, but adds a lot of overhead? |
799 | though it uses the same operations, but adds a lot of overhead? |
766 | |
800 | |
767 | The difference is simply the process size: forking the 6MB process takes |
801 | The difference is simply the process size: forking the 5MB process takes |
768 | so much longer than forking the 2.5MB template process that the overhead |
802 | so much longer than forking the 2.5MB template process that the extra |
769 | introduced is canceled out. |
803 | overhead introduced is canceled out. |
770 | |
804 | |
771 | If the benchmark process grows, the normal fork becomes even slower: |
805 | If the benchmark process grows, the normal fork becomes even slower: |
772 | |
806 | |
773 | 1340 new processes, manual fork in a 20MB process |
807 | 1340 new processes, manual fork of a 20MB process |
774 | 731 new processes, manual fork in a 200MB process |
808 | 731 new processes, manual fork of a 200MB process |
775 | 235 new processes, manual fork in a 2000MB process |
809 | 235 new processes, manual fork of a 2000MB process |
776 | |
810 | |
777 | What that means (to me) is that I can use this module without having a |
811 | What that means (to me) is that I can use this module without having a bad |
778 | very bad conscience because of the extra overhead required to start new |
812 | conscience because of the extra overhead required to start new processes. |
779 | processes. |
|
|
780 | |
813 | |
781 | =head1 TYPICAL PROBLEMS |
814 | =head1 TYPICAL PROBLEMS |
782 | |
815 | |
783 | This section lists typical problems that remain. I hope by recognising |
816 | This section lists typical problems that remain. I hope by recognising |
784 | them, most can be avoided. |
817 | them, most can be avoided. |
785 | |
818 | |
786 | =over 4 |
819 | =over 4 |
787 | |
820 | |
788 | =item "leaked" file descriptors for exec'ed processes |
821 | =item leaked file descriptors for exec'ed processes |
789 | |
822 | |
790 | POSIX systems inherit file descriptors by default when exec'ing a new |
823 | POSIX systems inherit file descriptors by default when exec'ing a new |
791 | process. While perl itself laudably sets the close-on-exec flags on new |
824 | process. While perl itself laudably sets the close-on-exec flags on new |
792 | file handles, most C libraries don't care, and even if all cared, it's |
825 | file handles, most C libraries don't care, and even if all cared, it's |
793 | often not possible to set the flag in a race-free manner. |
826 | often not possible to set the flag in a race-free manner. |
… | |
… | |
813 | libraries or the code that leaks those file descriptors. |
846 | libraries or the code that leaks those file descriptors. |
814 | |
847 | |
815 | Fortunately, most of these leaked descriptors do no harm, other than |
848 | Fortunately, most of these leaked descriptors do no harm, other than |
816 | sitting on some resources. |
849 | sitting on some resources. |
817 | |
850 | |
818 | =item "leaked" file descriptors for fork'ed processes |
851 | =item leaked file descriptors for fork'ed processes |
819 | |
852 | |
820 | Normally, L<AnyEvent::Fork> does start new processes by exec'ing them, |
853 | Normally, L<AnyEvent::Fork> does start new processes by exec'ing them, |
821 | which closes file descriptors not marked for being inherited. |
854 | which closes file descriptors not marked for being inherited. |
822 | |
855 | |
823 | However, L<AnyEvent::Fork::Early> and L<AnyEvent::Fork::Template> offer |
856 | However, L<AnyEvent::Fork::Early> and L<AnyEvent::Fork::Template> offer |
… | |
… | |
832 | |
865 | |
833 | The solution is to either not load these modules before use'ing |
866 | The solution is to either not load these modules before use'ing |
834 | L<AnyEvent::Fork::Early> or L<AnyEvent::Fork::Template>, or to delay |
867 | L<AnyEvent::Fork::Early> or L<AnyEvent::Fork::Template>, or to delay |
835 | initialising them, for example, by calling C<init Gtk2> manually. |
868 | initialising them, for example, by calling C<init Gtk2> manually. |
836 | |
869 | |
837 | =item exit runs destructors |
870 | =item exiting calls object destructors |
838 | |
871 | |
839 | This only applies to users of Lc<AnyEvent::Fork:Early> and |
872 | This only applies to users of L<AnyEvent::Fork:Early> and |
840 | L<AnyEvent::Fork::Template>. |
873 | L<AnyEvent::Fork::Template>, or when initialiasing code creates objects |
|
|
874 | that reference external resources. |
841 | |
875 | |
842 | When a process created by AnyEvent::Fork exits, it might do so by calling |
876 | When a process created by AnyEvent::Fork exits, it might do so by calling |
843 | exit, or simply letting perl reach the end of the program. At which point |
877 | exit, or simply letting perl reach the end of the program. At which point |
844 | Perl runs all destructors. |
878 | Perl runs all destructors. |
845 | |
879 | |
… | |
… | |
864 | to make it so, mostly due to the bloody broken perl that nobody seems to |
898 | to make it so, mostly due to the bloody broken perl that nobody seems to |
865 | care about. The fork emulation is a bad joke - I have yet to see something |
899 | care about. The fork emulation is a bad joke - I have yet to see something |
866 | useful that you can do with it without running into memory corruption |
900 | useful that you can do with it without running into memory corruption |
867 | issues or other braindamage. Hrrrr. |
901 | issues or other braindamage. Hrrrr. |
868 | |
902 | |
869 | Cygwin perl is not supported at the moment, as it should implement fd |
903 | Cygwin perl is not supported at the moment due to some hilarious |
870 | passing, but doesn't, and rolling my own is hard, as cygwin doesn't |
904 | shortcomings of its API - see L<IO::FDPoll> for more details. |
871 | support enough functionality to do it. |
|
|
872 | |
905 | |
873 | =head1 SEE ALSO |
906 | =head1 SEE ALSO |
874 | |
907 | |
875 | L<AnyEvent::Fork::Early> (to avoid executing a perl interpreter), |
908 | L<AnyEvent::Fork::Early> (to avoid executing a perl interpreter), |
876 | L<AnyEvent::Fork::Template> (to create a process by forking the main |
909 | L<AnyEvent::Fork::Template> (to create a process by forking the main |