[ViewVC] Diff of: cvs/AnyEvent-Fork/Fork.pm

Comparing AnyEvent-Fork/Fork.pm (file contents):
Revision 1.24 by root, Sat Apr 6 08:32:23 2013 UTC vs.
Revision 1.59 by root, Fri Aug 30 12:06:48 2013 UTC

…		…
27		27
28	Special care has been taken to make this module useful from other modules,	28	Special care has been taken to make this module useful from other modules,
29	while still supporting specialised environments such as L<App::Staticperl>	29	while still supporting specialised environments such as L<App::Staticperl>
30	or L<PAR::Packer>.	30	or L<PAR::Packer>.
31		31
32	=head1 WHAT THIS MODULE IS NOT	32	=head2 WHAT THIS MODULE IS NOT
33		33
34	This module only creates processes and lets you pass file handles and	34	This module only creates processes and lets you pass file handles and
35	strings to it, and run perl code. It does not implement any kind of RPC -	35	strings to it, and run perl code. It does not implement any kind of RPC -
36	there is no back channel from the process back to you, and there is no RPC	36	there is no back channel from the process back to you, and there is no RPC
37	or message passing going on.	37	or message passing going on.
38		38
39	If you need some form of RPC, you can either implement it yourself	39	If you need some form of RPC, you could use the L<AnyEvent::Fork::RPC>
40	in whatever way you like, use some message-passing module such	40	companion module, which adds simple RPC/job queueing to a process created
41	as L<AnyEvent::MP>, some pipe such as L<AnyEvent::ZeroMQ>, use	41	by this module.
42	L<AnyEvent::Handle> on both sides to send e.g. JSON or Storable messages,	42
43	and so on.	43	And if you need some automatic process pool management on top of
		44	L<AnyEvent::Fork::RPC>, you can look at the L<AnyEvent::Fork::Pool>
		45	companion module.
		46
		47	Or you can implement it yourself in whatever way you like: use some
		48	message-passing module such as L<AnyEvent::MP>, some pipe such as
		49	L<AnyEvent::ZeroMQ>, use L<AnyEvent::Handle> on both sides to send
		50	e.g. JSON or Storable messages, and so on.
		51
		52	=head2 COMPARISON TO OTHER MODULES
		53
		54	There is an abundance of modules on CPAN that do "something fork", such as
		55	L<Parallel::ForkManager>, L<AnyEvent::ForkManager>, L<AnyEvent::Worker>
		56	or L<AnyEvent::Subprocess>. There are modules that implement their own
		57	process management, such as L<AnyEvent::DBI>.
		58
		59	The problems that all these modules try to solve are real, however, none
		60	of them (from what I have seen) tackle the very real problems of unwanted
		61	memory sharing, efficiency, not being able to use event processing or
		62	similar modules in the processes they create.
		63
		64	This module doesn't try to replace any of them - instead it tries to solve
		65	the problem of creating processes with a minimum of fuss and overhead (and
		66	also luxury). Ideally, most of these would use AnyEvent::Fork internally,
		67	except they were written before AnyEvent:Fork was available, so obviously
		68	had to roll their own.
		69
		70	=head2 PROBLEM STATEMENT
		71
		72	There are two traditional ways to implement parallel processing on UNIX
		73	like operating systems - fork and process, and fork+exec and process. They
		74	have different advantages and disadvantages that I describe below,
		75	together with how this module tries to mitigate the disadvantages.
		76
		77	=over 4
		78
		79	=item Forking from a big process can be very slow.
		80
		81	A 5GB process needs 0.05s to fork on my 3.6GHz amd64 GNU/Linux box. This
		82	overhead is often shared with exec (because you have to fork first), but
		83	in some circumstances (e.g. when vfork is used), fork+exec can be much
		84	faster.
		85
		86	This module can help here by telling a small(er) helper process to fork,
		87	which is faster then forking the main process, and also uses vfork where
		88	possible. This gives the speed of vfork, with the flexibility of fork.
		89
		90	=item Forking usually creates a copy-on-write copy of the parent
		91	process.
		92
		93	For example, modules or data files that are loaded will not use additional
		94	memory after a fork. When exec'ing a new process, modules and data files
		95	might need to be loaded again, at extra CPU and memory cost. But when
		96	forking, literally all data structures are copied - if the program frees
		97	them and replaces them by new data, the child processes will retain the
		98	old version even if it isn't used, which can suddenly and unexpectedly
		99	increase memory usage when freeing memory.
		100
		101	The trade-off is between more sharing with fork (which can be good or
		102	bad), and no sharing with exec.
		103
		104	This module allows the main program to do a controlled fork, and allows
		105	modules to exec processes safely at any time. When creating a custom
		106	process pool you can take advantage of data sharing via fork without
		107	risking to share large dynamic data structures that will blow up child
		108	memory usage.
		109
		110	In other words, this module puts you into control over what is being
		111	shared and what isn't, at all times.
		112
		113	=item Exec'ing a new perl process might be difficult.
		114
		115	For example, it is not easy to find the correct path to the perl
		116	interpreter - C<$^X> might not be a perl interpreter at all.
		117
		118	This module tries hard to identify the correct path to the perl
		119	interpreter. With a cooperative main program, exec'ing the interpreter
		120	might not even be necessary, but even without help from the main program,
		121	it will still work when used from a module.
		122
		123	=item Exec'ing a new perl process might be slow, as all necessary modules
		124	have to be loaded from disk again, with no guarantees of success.
		125
		126	Long running processes might run into problems when perl is upgraded
		127	and modules are no longer loadable because they refer to a different
		128	perl version, or parts of a distribution are newer than the ones already
		129	loaded.
		130
		131	This module supports creating pre-initialised perl processes to be used as
		132	a template for new processes.
		133
		134	=item Forking might be impossible when a program is running.
		135
		136	For example, POSIX makes it almost impossible to fork from a
		137	multi-threaded program while doing anything useful in the child - in
		138	fact, if your perl program uses POSIX threads (even indirectly via
		139	e.g. L<IO::AIO> or L<threads>), you cannot call fork on the perl level
		140	anymore without risking corruption issues on a number of operating
		141	systems.
		142
		143	This module can safely fork helper processes at any time, by calling
		144	fork+exec in C, in a POSIX-compatible way (via L<Proc::FastSpawn>).
		145
		146	=item Parallel processing with fork might be inconvenient or difficult
		147	to implement. Modules might not work in both parent and child.
		148
		149	For example, when a program uses an event loop and creates watchers it
		150	becomes very hard to use the event loop from a child program, as the
		151	watchers already exist but are only meaningful in the parent. Worse, a
		152	module might want to use such a module, not knowing whether another module
		153	or the main program also does, leading to problems.
		154
		155	Apart from event loops, graphical toolkits also commonly fall into the
		156	"unsafe module" category, or just about anything that communicates with
		157	the external world, such as network libraries and file I/O modules, which
		158	usually don't like being copied and then allowed to continue in two
		159	processes.
		160
		161	With this module only the main program is allowed to create new processes
		162	by forking (because only the main program can know when it is still safe
		163	to do so) - all other processes are created via fork+exec, which makes it
		164	possible to use modules such as event loops or window interfaces safely.
		165
		166	=back
44		167
45	=head1 EXAMPLES	168	=head1 EXAMPLES
46		169
47	=head2 Create a single new process, tell it to run your worker function.	170	=head2 Create a single new process, tell it to run your worker function.
48		171
…		…
54		177
55	# now $master_filehandle is connected to the	178	# now $master_filehandle is connected to the
56	# $slave_filehandle in the new process.	179	# $slave_filehandle in the new process.
57	});	180	});
58		181
59	# MyModule::worker might look like this	182	C<MyModule> might look like this:
		183
		184	package MyModule;
		185
60	sub MyModule::worker {	186	sub worker {
61	my ($slave_filehandle) = @_;	187	my ($slave_filehandle) = @_;
62		188
63	# now $slave_filehandle is connected to the $master_filehandle	189	# now $slave_filehandle is connected to the $master_filehandle
64	# in the original prorcess. have fun!	190	# in the original prorcess. have fun!
65	}	191	}
…		…
84	}	210	}
85		211
86	# now do other things - maybe use the filehandle provided by run	212	# now do other things - maybe use the filehandle provided by run
87	# to wait for the processes to die. or whatever.	213	# to wait for the processes to die. or whatever.
88		214
89	# My::Server::run might look like this	215	C<My::Server> might look like this:
90	sub My::Server::run {	216
		217	package My::Server;
		218
		219	sub run {
91	my ($slave, $listener, $id) = @_;	220	my ($slave, $listener, $id) = @_;
92		221
93	close $slave; # we do not use the socket, so close it to save resources	222	close $slave; # we do not use the socket, so close it to save resources
94		223
95	# we could go ballistic and use e.g. AnyEvent here, or IO::AIO,	224	# we could go ballistic and use e.g. AnyEvent here, or IO::AIO,
…		…
99	}	228	}
100	}	229	}
101		230
102	=head2 use AnyEvent::Fork as a faster fork+exec	231	=head2 use AnyEvent::Fork as a faster fork+exec
103		232
104	This runs /bin/echo hi, with stdout redirected to /tmp/log and stderr to	233	This runs C</bin/echo hi>, with standard output redirected to F</tmp/log>
105	the communications socket. It is usually faster than fork+exec, but still	234	and standard error redirected to the communications socket. It is usually
106	let's you prepare the environment.	235	faster than fork+exec, but still lets you prepare the environment.
107		236
108	open my $output, ">/tmp/log" or die "$!";	237	open my $output, ">/tmp/log" or die "$!";
109		238
110	AnyEvent::Fork	239	AnyEvent::Fork
111	->new	240	->new
112	->eval ('	241	->eval ('
		242	# compile a helper function for later use
113	sub run {	243	sub run {
114	my ($fh, $output, @cmd) = @_;	244	my ($fh, $output, @cmd) = @_;
115		245
116	# perl will clear close-on-exec on STDOUT/STDERR	246	# perl will clear close-on-exec on STDOUT/STDERR
117	open STDOUT, ">&", $output or die;	247	open STDOUT, ">&", $output or die;
…		…
124	->send_arg ("/bin/echo", "hi")	254	->send_arg ("/bin/echo", "hi")
125	->run ("run", my $cv = AE::cv);	255	->run ("run", my $cv = AE::cv);
126		256
127	my $stderr = $cv->recv;	257	my $stderr = $cv->recv;
128		258
129	=head1 PROBLEM STATEMENT	259	=head2 For stingy users: put the worker code into a C<DATA> section.
130		260
131	There are two ways to implement parallel processing on UNIX like operating	261	When you want to be stingy with files, you cna put your code into the
132	systems - fork and process, and fork+exec and process. They have different	262	C<DATA> section of your module (or program):
133	advantages and disadvantages that I describe below, together with how this
134	module tries to mitigate the disadvantages.
135		263
136	=over 4	264	use AnyEvent::Fork;
137		265
138	=item Forking from a big process can be very slow (a 5GB process needs	266	AnyEvent::Fork
139	0.05s to fork on my 3.6GHz amd64 GNU/Linux box for example). This overhead	267	->new
140	is often shared with exec (because you have to fork first), but in some	268	->eval (do { local $/; <DATA> })
141	circumstances (e.g. when vfork is used), fork+exec can be much faster.	269	->run ("doit", sub { ... });
142		270
143	This module can help here by telling a small(er) helper process to fork,	271	__DATA__
144	or fork+exec instead.
145		272
146	=item Forking usually creates a copy-on-write copy of the parent	273	sub doit {
147	process. Memory (for example, modules or data files that have been	274	... do something!
148	will not take additional memory). When exec'ing a new process, modules	275	}
149	and data files might need to be loaded again, at extra CPU and memory
150	cost. Likewise when forking, all data structures are copied as well - if
151	the program frees them and replaces them by new data, the child processes
152	will retain the memory even if it isn't used.
153		276
154	This module allows the main program to do a controlled fork, and allows	277	=head2 For stingy standalone programs: do not rely on external files at
155	modules to exec processes safely at any time. When creating a custom	278	all.
156	process pool you can take advantage of data sharing via fork without
157	risking to share large dynamic data structures that will blow up child
158	memory usage.
159		279
160	=item Exec'ing a new perl process might be difficult and slow. For	280	For single-file scripts it can be inconvenient to rely on external
161	example, it is not easy to find the correct path to the perl interpreter,	281	files - even when using < C<DATA> section, you still need to C<exec>
162	and all modules have to be loaded from disk again. Long running processes	282	an external perl interpreter, which might not be available when using
163	might run into problems when perl is upgraded for example.	283	L<App::Staticperl>, L<Urlader> or L<PAR::Packer> for example.
164		284
165	This module supports creating pre-initialised perl processes to be used	285	Two modules help here - L<AnyEvent::Fork::Early> forks a template process
166	as template, and also tries hard to identify the correct path to the perl	286	for all further calls to C<new_exec>, and L<AnyEvent::Fork::Template>
167	interpreter. With a cooperative main program, exec'ing the interpreter	287	forks the main program as a template process.
168	might not even be necessary.
169		288
170	=item Forking might be impossible when a program is running. For example,	289	Here is how your main program should look like:
171	POSIX makes it almost impossible to fork from a multi-threaded program and
172	do anything useful in the child - strictly speaking, if your perl program
173	uses posix threads (even indirectly via e.g. L<IO::AIO> or L<threads>),
174	you cannot call fork on the perl level anymore, at all.
175		290
176	This module can safely fork helper processes at any time, by calling	291	#! perl
177	fork+exec in C, in a POSIX-compatible way.
178		292
179	=item Parallel processing with fork might be inconvenient or difficult	293	# optional, as the very first thing.
180	to implement. For example, when a program uses an event loop and creates	294	# in case modules want to create their own processes.
181	watchers it becomes very hard to use the event loop from a child	295	use AnyEvent::Fork::Early;
182	program, as the watchers already exist but are only meaningful in the
183	parent. Worse, a module might want to use such a system, not knowing
184	whether another module or the main program also does, leading to problems.
185		296
186	This module only lets the main program create pools by forking (because	297	# next, load all modules you need in your template process
187	only the main program can know when it is still safe to do so) - all other	298	use Example::My::Module
188	pools are created by fork+exec, after which such modules can again be	299	use Example::Whatever;
189	loaded.
190		300
191	=back	301	# next, put your run function definition and anything else you
		302	# need, but do not use code outside of BEGIN blocks.
		303	sub worker_run {
		304	my ($fh, @args) = @_;
		305	...
		306	}
		307
		308	# now preserve everything so far as AnyEvent::Fork object
		309	# in §TEMPLATE.
		310	use AnyEvent::Fork::Template;
		311
		312	# do not put code outside of BEGIN blocks until here
		313
		314	# now use the $TEMPLATE process in any way you like
		315
		316	# for example: create 10 worker processes
		317	my @worker;
		318	my $cv = AE::cv;
		319	for (1..10) {
		320	$cv->begin;
		321	$TEMPLATE->fork->send_arg ($_)->run ("worker_run", sub {
		322	push @worker, shift;
		323	$cv->end;
		324	});
		325	}
		326	$cv->recv;
192		327
193	=head1 CONCEPTS	328	=head1 CONCEPTS
194		329
195	This module can create new processes either by executing a new perl	330	This module can create new processes either by executing a new perl
196	process, or by forking from an existing "template" process.	331	process, or by forking from an existing "template" process.
		332
		333	All these processes are called "child processes" (whether they are direct
		334	children or not), while the process that manages them is called the
		335	"parent process".
197		336
198	Each such process comes with its own file handle that can be used to	337	Each such process comes with its own file handle that can be used to
199	communicate with it (it's actually a socket - one end in the new process,	338	communicate with it (it's actually a socket - one end in the new process,
200	one end in the main process), and among the things you can do in it are	339	one end in the main process), and among the things you can do in it are
201	load modules, fork new processes, send file handles to it, and execute	340	load modules, fork new processes, send file handles to it, and execute
…		…
275	my ($fork_fh) = @_;	414	my ($fork_fh) = @_;
276	});	415	});
277		416
278	=back	417	=back
279		418
280	=head1 FUNCTIONS	419	=head1 THE C<AnyEvent::Fork> CLASS
		420
		421	This module exports nothing, and only implements a single class -
		422	C<AnyEvent::Fork>.
		423
		424	There are two class constructors that both create new processes - C<new>
		425	and C<new_exec>. The C<fork> method creates a new process by forking an
		426	existing one and could be considered a third constructor.
		427
		428	Most of the remaining methods deal with preparing the new process, by
		429	loading code, evaluating code and sending data to the new process. They
		430	usually return the process object, so you can chain method calls.
		431
		432	If a process object is destroyed before calling its C<run> method, then
		433	the process simply exits. After C<run> is called, all responsibility is
		434	passed to the specified function.
		435
		436	As long as there is any outstanding work to be done, process objects
		437	resist being destroyed, so there is no reason to store them unless you
		438	need them later - configure and forget works just fine.
281		439
282	=over 4	440	=over 4
283		441
284	=cut	442	=cut
285		443
…		…
292	use AnyEvent;	450	use AnyEvent;
293	use AnyEvent::Util ();	451	use AnyEvent::Util ();
294		452
295	use IO::FDPass;	453	use IO::FDPass;
296		454
297	our $VERSION = 0.5;	455	our $VERSION = 1.1;
298
299	our $PERL; # the path to the perl interpreter, deduces with various forms of magic
300
301	=item my $pool = new AnyEvent::Fork key => value...
302
303	Create a new process pool. The following named parameters are supported:
304
305	=over 4
306
307	=back
308
309	=cut
310		456
311	# the early fork template process	457	# the early fork template process
312	our $EARLY;	458	our $EARLY;
313		459
314	# the empty template process	460	# the empty template process
315	our $TEMPLATE;	461	our $TEMPLATE;
		462
		463	sub QUEUE() { 0 }
		464	sub FH() { 1 }
		465	sub WW() { 2 }
		466	sub PID() { 3 }
		467	sub CB() { 4 }
		468
		469	sub _new {
		470	my ($self, $fh, $pid) = @_;
		471
		472	AnyEvent::Util::fh_nonblocking $fh, 1;
		473
		474	$self = bless [
		475	[], # write queue - strings or fd's
		476	$fh,
		477	undef, # AE watcher
		478	$pid,
		479	], $self;
		480
		481	$self
		482	}
316		483
317	sub _cmd {	484	sub _cmd {
318	my $self = shift;	485	my $self = shift;
319		486
320	# ideally, we would want to use "a (w/a)*" as format string, but perl	487	# ideally, we would want to use "a (w/a)*" as format string, but perl
321	# versions from at least 5.8.9 to 5.16.3 are all buggy and can't unpack	488	# versions from at least 5.8.9 to 5.16.3 are all buggy and can't unpack
322	# it.	489	# it.
323	push @{ $self->[2] }, pack "a L/a*", $_[0], $_[1];	490	push @{ $self->[QUEUE] }, pack "a L/a*", $_[0], $_[1];
324		491
325	$self->[3] \|\|= AE::io $self->[1], 1, sub {	492	$self->[WW] \|\|= AE::io $self->[FH], 1, sub {
326	do {	493	do {
327	# send the next "thing" in the queue - either a reference to an fh,	494	# send the next "thing" in the queue - either a reference to an fh,
328	# or a plain string.	495	# or a plain string.
329		496
330	if (ref $self->[2][0]) {	497	if (ref $self->[QUEUE][0]) {
331	# send fh	498	# send fh
332	unless (IO::FDPass::send fileno $self->[1], fileno ${ $self->[2][0] }) {	499	unless (IO::FDPass::send fileno $self->[FH], fileno ${ $self->[QUEUE][0] }) {
333	return if $! == Errno::EAGAIN \|\| $! == Errno::EWOULDBLOCK;	500	return if $! == Errno::EAGAIN \|\| $! == Errno::EWOULDBLOCK;
334	undef $self->[3];	501	undef $self->[WW];
335	die "AnyEvent::Fork: file descriptor send failure: $!";	502	die "AnyEvent::Fork: file descriptor send failure: $!";
336	}	503	}
337		504
338	shift @{ $self->[2] };	505	shift @{ $self->[QUEUE] };
339		506
340	} else {	507	} else {
341	# send string	508	# send string
342	my $len = syswrite $self->[1], $self->[2][0];	509	my $len = syswrite $self->[FH], $self->[QUEUE][0];
343		510
344	unless ($len) {	511	unless ($len) {
345	return if $! == Errno::EAGAIN \|\| $! == Errno::EWOULDBLOCK;	512	return if $! == Errno::EAGAIN \|\| $! == Errno::EWOULDBLOCK;
346	undef $self->[3];	513	undef $self->[WW];
347	die "AnyEvent::Fork: command write failure: $!";	514	die "AnyEvent::Fork: command write failure: $!";
348	}	515	}
349		516
350	substr $self->[2][0], 0, $len, "";	517	substr $self->[QUEUE][0], 0, $len, "";
351	shift @{ $self->[2] } unless length $self->[2][0];	518	shift @{ $self->[QUEUE] } unless length $self->[QUEUE][0];
352	}	519	}
353	} while @{ $self->[2] };	520	} while @{ $self->[QUEUE] };
354		521
355	# everything written	522	# everything written
356	undef $self->[3];	523	undef $self->[WW];
357		524
358	# invoke run callback, if any	525	# invoke run callback, if any
359	$self->[4]->($self->[1]) if $self->[4];	526	if ($self->[CB]) {
		527	$self->[CB]->($self->[FH]);
		528	@$self = ();
		529	}
360	};	530	};
361		531
362	() # make sure we don't leak the watcher	532	() # make sure we don't leak the watcher
363	}
364
365	sub _new {
366	my ($self, $fh, $pid) = @_;
367
368	AnyEvent::Util::fh_nonblocking $fh, 1;
369
370	$self = bless [
371	$pid,
372	$fh,
373	[], # write queue - strings or fd's
374	undef, # AE watcher
375	], $self;
376
377	$self
378	}	533	}
379		534
380	# fork template from current process, used by AnyEvent::Fork::Early/Template	535	# fork template from current process, used by AnyEvent::Fork::Early/Template
381	sub _new_fork {	536	sub _new_fork {
382	my ($fh, $slave) = AnyEvent::Util::portable_socketpair;	537	my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
…		…
387	if ($pid eq 0) {	542	if ($pid eq 0) {
388	require AnyEvent::Fork::Serve;	543	require AnyEvent::Fork::Serve;
389	$AnyEvent::Fork::Serve::OWNER = $parent;	544	$AnyEvent::Fork::Serve::OWNER = $parent;
390	close $fh;	545	close $fh;
391	$0 = "$_[1] of $parent";	546	$0 = "$_[1] of $parent";
392	$SIG{CHLD} = 'IGNORE';
393	AnyEvent::Fork::Serve::serve ($slave);	547	AnyEvent::Fork::Serve::serve ($slave);
394	exit 0;	548	exit 0;
395	} elsif (!$pid) {	549	} elsif (!$pid) {
396	die "AnyEvent::Fork::Early/Template: unable to fork template process: $!";	550	die "AnyEvent::Fork::Early/Template: unable to fork template process: $!";
397	}	551	}
…		…
404	Create a new "empty" perl interpreter process and returns its process	558	Create a new "empty" perl interpreter process and returns its process
405	object for further manipulation.	559	object for further manipulation.
406		560
407	The new process is forked from a template process that is kept around	561	The new process is forked from a template process that is kept around
408	for this purpose. When it doesn't exist yet, it is created by a call to	562	for this purpose. When it doesn't exist yet, it is created by a call to
409	C<new_exec> and kept around for future calls.	563	C<new_exec> first and then stays around for future calls.
410
411	When the process object is destroyed, it will release the file handle
412	that connects it with the new process. When the new process has not yet
413	called C<run>, then the process will exit. Otherwise, what happens depends
414	entirely on the code that is executed.
415		564
416	=cut	565	=cut
417		566
418	sub new {	567	sub new {
419	my $class = shift;	568	my $class = shift;
…		…
456		605
457	You should use C<new> whenever possible, except when having a template	606	You should use C<new> whenever possible, except when having a template
458	process around is unacceptable.	607	process around is unacceptable.
459		608
460	The path to the perl interpreter is divined using various methods - first	609	The path to the perl interpreter is divined using various methods - first
461	C<$^X> is investigated to see if the path ends with something that sounds	610	C<$^X> is investigated to see if the path ends with something that looks
462	as if it were the perl interpreter. Failing this, the module falls back to	611	as if it were the perl interpreter. Failing this, the module falls back to
463	using C<$Config::Config{perlpath}>.	612	using C<$Config::Config{perlpath}>.
464		613
		614	The path to perl can also be overriden by setting the global variable
		615	C<$AnyEvent::Fork::PERL> - it's value will be used for all subsequent
		616	invocations.
		617
465	=cut	618	=cut
		619
		620	our $PERL;
466		621
467	sub new_exec {	622	sub new_exec {
468	my ($self) = @_;	623	my ($self) = @_;
469		624
470	return $EARLY->fork	625	return $EARLY->fork
471	if $EARLY;	626	if $EARLY;
472		627
		628	unless (defined $PERL) {
473	# first find path of perl	629	# first find path of perl
474	my $perl = $;	630	my $perl = $;
475		631
476	# first we try $^X, but the path must be absolute (always on win32), and end in sth.	632	# first we try $^X, but the path must be absolute (always on win32), and end in sth.
477	# that looks like perl. this obviously only works for posix and win32	633	# that looks like perl. this obviously only works for posix and win32
478	unless (	634	unless (
479	($^O eq "MSWin32" \|\| $perl =~ m%^/%)	635	($^O eq "MSWin32" \|\| $perl =~ m%^/%)
480	&& $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i	636	&& $perl =~ m%[/\\]perl(?:[0-9]+(\.[0-9]+)+)?(\.exe)?$%i
481	) {	637	) {
482	# if it doesn't look perlish enough, try Config	638	# if it doesn't look perlish enough, try Config
483	require Config;	639	require Config;
484	$perl = $Config::Config{perlpath};	640	$perl = $Config::Config{perlpath};
485	$perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;	641	$perl =~ s/(?:\Q$Config::Config{_exe}\E)?$/$Config::Config{_exe}/;
		642	}
		643
		644	$PERL = $perl;
486	}	645	}
487		646
488	require Proc::FastSpawn;	647	require Proc::FastSpawn;
489		648
490	my ($fh, $slave) = AnyEvent::Util::portable_socketpair;	649	my ($fh, $slave) = AnyEvent::Util::portable_socketpair;
…		…
498	#local $ENV{PERL5LIB} = join ":", grep !ref, @INC;	657	#local $ENV{PERL5LIB} = join ":", grep !ref, @INC;
499	my %env = %ENV;	658	my %env = %ENV;
500	$env{PERL5LIB} = join +($^O eq "MSWin32" ? ";" : ":"), grep !ref, @INC;	659	$env{PERL5LIB} = join +($^O eq "MSWin32" ? ";" : ":"), grep !ref, @INC;
501		660
502	my $pid = Proc::FastSpawn::spawn (	661	my $pid = Proc::FastSpawn::spawn (
503	$perl,	662	$PERL,
504	["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave, $$],	663	["perl", "-MAnyEvent::Fork::Serve", "-e", "AnyEvent::Fork::Serve::me", fileno $slave, $$],
505	[map "$_=$env{$_}", keys %env],	664	[map "$_=$env{$_}", keys %env],
506	) or die "unable to spawn AnyEvent::Fork server: $!";	665	) or die "unable to spawn AnyEvent::Fork server: $!";
507		666
508	$self->_new ($fh, $pid)	667	$self->_new ($fh, $pid)
509	}	668	}
510		669
511	=item $pid = $proc->pid	670	=item $pid = $proc->pid
512		671
513	Returns the process id of the process I<iff it is a direct child of the	672	Returns the process id of the process I<iff it is a direct child of the
514	process> running AnyEvent::Fork, and C<undef> otherwise.	673	process running AnyEvent::Fork>, and C<undef> otherwise. As a general
		674	rule (that you cannot rely upon), processes created via C<new_exec>,
		675	L<AnyEvent::Fork::Early> or L<AnyEvent::Fork::Template> are direct
		676	children, while all other processes are not.
515		677
516	Normally, only processes created via C<< AnyEvent::Fork->new_exec >> and	678	Or in other words, you do not normally have to take care of zombies for
517	L<AnyEvent::Fork::Template> are direct children, and you are responsible	679	processes created via C<new>, but when in doubt, or zombies are a problem,
518	to clean up their zombies when they die.	680	you need to check whether a process is a diretc child by calling this
519		681	method, and possibly creating a child watcher or reap it manually.
520	All other processes are not direct children, and will be cleaned up by
521	AnyEvent::Fork.
522		682
523	=cut	683	=cut
524		684
525	sub pid {	685	sub pid {
526	$_[0][0]	686	$_[0][PID]
527	}	687	}
528		688
529	=item $proc = $proc->eval ($perlcode, @args)	689	=item $proc = $proc->eval ($perlcode, @args)
530		690
531	Evaluates the given C<$perlcode> as ... perl code, while setting C<@_> to	691	Evaluates the given C<$perlcode> as ... Perl code, while setting C<@_> to
532	the strings specified by C<@args>, in the "main" package.	692	the strings specified by C<@args>, in the "main" package.
533		693
534	This call is meant to do any custom initialisation that might be required	694	This call is meant to do any custom initialisation that might be required
535	(for example, the C<require> method uses it). It's not supposed to be used	695	(for example, the C<require> method uses it). It's not supposed to be used
536	to completely take over the process, use C<run> for that.	696	to completely take over the process, use C<run> for that.
537		697
538	The code will usually be executed after this call returns, and there is no	698	The code will usually be executed after this call returns, and there is no
539	way to pass anything back to the calling process. Any evaluation errors	699	way to pass anything back to the calling process. Any evaluation errors
540	will be reported to stderr and cause the process to exit.	700	will be reported to stderr and cause the process to exit.
541		701
542	If you want to execute some code to take over the process (see the	702	If you want to execute some code (that isn't in a module) to take over the
543	"fork+exec" example in the SYNOPSIS), you should compile a function via	703	process, you should compile a function via C<eval> first, and then call
544	C<eval> first, and then call it via C<run>. This also gives you access to	704	it via C<run>. This also gives you access to any arguments passed via the
545	any arguments passed via the C<send_xxx> methods, such as file handles.	705	C<send_xxx> methods, such as file handles. See the L<use AnyEvent::Fork as
		706	a faster fork+exec> example to see it in action.
546		707
547	Returns the process object for easy chaining of method calls.	708	Returns the process object for easy chaining of method calls.
548		709
549	=cut	710	=cut
550		711
…		…
576	=item $proc = $proc->send_fh ($handle, ...)	737	=item $proc = $proc->send_fh ($handle, ...)
577		738
578	Send one or more file handles (I<not> file descriptors) to the process,	739	Send one or more file handles (I<not> file descriptors) to the process,
579	to prepare a call to C<run>.	740	to prepare a call to C<run>.
580		741
581	The process object keeps a reference to the handles until this is done,	742	The process object keeps a reference to the handles until they have
582	so you must not explicitly close the handles. This is most easily	743	been passed over to the process, so you must not explicitly close the
583	accomplished by simply not storing the file handles anywhere after passing	744	handles. This is most easily accomplished by simply not storing the file
584	them to this method.	745	handles anywhere after passing them to this method - when AnyEvent::Fork
		746	is finished using them, perl will automatically close them.
585		747
586	Returns the process object for easy chaining of method calls.	748	Returns the process object for easy chaining of method calls.
587		749
588	Example: pass a file handle to a process, and release it without	750	Example: pass a file handle to a process, and release it without
589	closing. It will be closed automatically when it is no longer used.	751	closing. It will be closed automatically when it is no longer used.
…		…
596	sub send_fh {	758	sub send_fh {
597	my ($self, @fh) = @_;	759	my ($self, @fh) = @_;
598		760
599	for my $fh (@fh) {	761	for my $fh (@fh) {
600	$self->_cmd ("h");	762	$self->_cmd ("h");
601	push @{ $self->[2] }, \$fh;	763	push @{ $self->[QUEUE] }, \$fh;
602	}	764	}
603		765
604	$self	766	$self
605	}	767	}
606		768
607	=item $proc = $proc->send_arg ($string, ...)	769	=item $proc = $proc->send_arg ($string, ...)
608		770
609	Send one or more argument strings to the process, to prepare a call to	771	Send one or more argument strings to the process, to prepare a call to
610	C<run>. The strings can be any octet string.	772	C<run>. The strings can be any octet strings.
611		773
612	The protocol is optimised to pass a moderate number of relatively short	774	The protocol is optimised to pass a moderate number of relatively short
613	strings - while you can pass up to 4GB of data in one go, this is more	775	strings - while you can pass up to 4GB of data in one go, this is more
614	meant to pass some ID information or other startup info, not big chunks of	776	meant to pass some ID information or other startup info, not big chunks of
615	data.	777	data.
…		…
631	Enter the function specified by the function name in C<$func> in the	793	Enter the function specified by the function name in C<$func> in the
632	process. The function is called with the communication socket as first	794	process. The function is called with the communication socket as first
633	argument, followed by all file handles and string arguments sent earlier	795	argument, followed by all file handles and string arguments sent earlier
634	via C<send_fh> and C<send_arg> methods, in the order they were called.	796	via C<send_fh> and C<send_arg> methods, in the order they were called.
635		797
		798	The process object becomes unusable on return from this function - any
		799	further method calls result in undefined behaviour.
		800
636	The function name should be fully qualified, but if it isn't, it will be	801	The function name should be fully qualified, but if it isn't, it will be
637	looked up in the main package.	802	looked up in the C<main> package.
638		803
639	If the called function returns, doesn't exist, or any error occurs, the	804	If the called function returns, doesn't exist, or any error occurs, the
640	process exits.	805	process exits.
641		806
642	Preparing the process is done in the background - when all commands have	807	Preparing the process is done in the background - when all commands have
643	been sent, the callback is invoked with the local communications socket	808	been sent, the callback is invoked with the local communications socket
644	as argument. At this point you can start using the socket in any way you	809	as argument. At this point you can start using the socket in any way you
645	like.	810	like.
646		811
647	The process object becomes unusable on return from this function - any
648	further method calls result in undefined behaviour.
649
650	If the communication socket isn't used, it should be closed on both sides,	812	If the communication socket isn't used, it should be closed on both sides,
651	to save on kernel memory.	813	to save on kernel memory.
652		814
653	The socket is non-blocking in the parent, and blocking in the newly	815	The socket is non-blocking in the parent, and blocking in the newly
654	created process. The close-on-exec flag is set in both.	816	created process. The close-on-exec flag is set in both.
655		817
656	Even if not used otherwise, the socket can be a good indicator for the	818	Even if not used otherwise, the socket can be a good indicator for the
657	existence of the process - if the other process exits, you get a readable	819	existence of the process - if the other process exits, you get a readable
658	event on it, because exiting the process closes the socket (if it didn't	820	event on it, because exiting the process closes the socket (if it didn't
659	create any children using fork).	821	create any children using fork).
		822
		823	=over 4
		824
		825	=item Compatibility to L<AnyEvent::Fork::Remote>
		826
		827	If you want to write code that works with both this module and
		828	L<AnyEvent::Fork::Remote>, you need to write your code so that it assumes
		829	there are two file handles for communications, which might not be unix
		830	domain sockets. The C<run> function should start like this:
		831
		832	sub run {
		833	my ($rfh, @args) = @_; # @args is your normal arguments
		834	my $wfh = fileno $rfh ? $rfh : *STDOUT;
		835
		836	# now use $rfh for reading and $wfh for writing
		837	}
		838
		839	This checks whether the passed file handle is, in fact, the process
		840	C<STDIN> handle. If it is, then the function was invoked visa
		841	L<AnyEvent::Fork::Remote>, so STDIN should be used for reading and
		842	C<STDOUT> should be used for writing.
		843
		844	In all other cases, the function was called via this module, and there is
		845	only one file handle that should be sued for reading and writing.
		846
		847	=back
660		848
661	Example: create a template for a process pool, pass a few strings, some	849	Example: create a template for a process pool, pass a few strings, some
662	file handles, then fork, pass one more string, and run some code.	850	file handles, then fork, pass one more string, and run some code.
663		851
664	my $pool = AnyEvent::Fork	852	my $pool = AnyEvent::Fork
…		…
692	=cut	880	=cut
693		881
694	sub run {	882	sub run {
695	my ($self, $func, $cb) = @_;	883	my ($self, $func, $cb) = @_;
696		884
697	$self->[4] = $cb;	885	$self->[CB] = $cb;
698	$self->_cmd (r => $func);	886	$self->_cmd (r => $func);
		887	}
		888
		889	=back
		890
		891	=head2 EXPERIMENTAL METHODS
		892
		893	These methods might go away completely or change behaviour, at any time.
		894
		895	=over 4
		896
		897	=item $proc->to_fh ($cb->($fh)) # EXPERIMENTAL, MIGHT BE REMOVED
		898
		899	Flushes all commands out to the process and then calls the callback with
		900	the communications socket.
		901
		902	The process object becomes unusable on return from this function - any
		903	further method calls result in undefined behaviour.
		904
		905	The point of this method is to give you a file handle that you can pass
		906	to another process. In that other process, you can call C<new_from_fh
		907	AnyEvent::Fork $fh> to create a new C<AnyEvent::Fork> object from it,
		908	thereby effectively passing a fork object to another process.
		909
		910	=cut
		911
		912	sub to_fh {
		913	my ($self, $cb) = @_;
		914
		915	$self->[CB] = $cb;
		916
		917	unless ($self->[WW]) {
		918	$self->[CB]->($self->[FH]);
		919	@$self = ();
		920	}
		921	}
		922
		923	=item new_from_fh AnyEvent::Fork $fh # EXPERIMENTAL, MIGHT BE REMOVED
		924
		925	Takes a file handle originally rceeived by the C<to_fh> method and creates
		926	a new C<AnyEvent:Fork> object. The child process itself will not change in
		927	any way, i.e. it will keep all the modifications done to it before calling
		928	C<to_fh>.
		929
		930	The new object is very much like the original object, except that the
		931	C<pid> method will return C<undef> even if the process is a direct child.
		932
		933	=cut
		934
		935	sub new_from_fh {
		936	my ($class, $fh) = @_;
		937
		938	$class->_new ($fh)
699	}	939	}
700		940
701	=back	941	=back
702		942
703	=head1 PERFORMANCE	943	=head1 PERFORMANCE
…		…
713		953
714	2079 new processes per second, using manual socketpair + fork	954	2079 new processes per second, using manual socketpair + fork
715		955
716	Then I did the same thing, but instead of calling fork, I called	956	Then I did the same thing, but instead of calling fork, I called
717	AnyEvent::Fork->new->run ("CORE::exit") and then again waited for the	957	AnyEvent::Fork->new->run ("CORE::exit") and then again waited for the
718	socket form the child to close on exit. This does the same thing as manual	958	socket from the child to close on exit. This does the same thing as manual
719	socket pair + fork, except that what is forked is the template process	959	socket pair + fork, except that what is forked is the template process
720	(2440kB), and the socket needs to be passed to the server at the other end	960	(2440kB), and the socket needs to be passed to the server at the other end
721	of the socket first.	961	of the socket first.
722		962
723	2307 new processes per second, using AnyEvent::Fork->new	963	2307 new processes per second, using AnyEvent::Fork->new
…		…
728	479 vfork+execs per second, using AnyEvent::Fork->new_exec	968	479 vfork+execs per second, using AnyEvent::Fork->new_exec
729		969
730	So how can C<< AnyEvent->new >> be faster than a standard fork, even	970	So how can C<< AnyEvent->new >> be faster than a standard fork, even
731	though it uses the same operations, but adds a lot of overhead?	971	though it uses the same operations, but adds a lot of overhead?
732		972
733	The difference is simply the process size: forking the 6MB process takes	973	The difference is simply the process size: forking the 5MB process takes
734	so much longer than forking the 2.5MB template process that the overhead	974	so much longer than forking the 2.5MB template process that the extra
735	introduced is canceled out.	975	overhead is canceled out.
736		976
737	If the benchmark process grows, the normal fork becomes even slower:	977	If the benchmark process grows, the normal fork becomes even slower:
738		978
739	1340 new processes, manual fork in a 20MB process	979	1340 new processes, manual fork of a 20MB process
740	731 new processes, manual fork in a 200MB process	980	731 new processes, manual fork of a 200MB process
741	235 new processes, manual fork in a 2000MB process	981	235 new processes, manual fork of a 2000MB process
742		982
743	What that means (to me) is that I can use this module without having a	983	What that means (to me) is that I can use this module without having a bad
744	very bad conscience because of the extra overhead required to start new	984	conscience because of the extra overhead required to start new processes.
745	processes.
746		985
747	=head1 TYPICAL PROBLEMS	986	=head1 TYPICAL PROBLEMS
748		987
749	This section lists typical problems that remain. I hope by recognising	988	This section lists typical problems that remain. I hope by recognising
750	them, most can be avoided.	989	them, most can be avoided.
751		990
752	=over 4	991	=over 4
753		992
754	=item "leaked" file descriptors for exec'ed processes	993	=item leaked file descriptors for exec'ed processes
755		994
756	POSIX systems inherit file descriptors by default when exec'ing a new	995	POSIX systems inherit file descriptors by default when exec'ing a new
757	process. While perl itself laudably sets the close-on-exec flags on new	996	process. While perl itself laudably sets the close-on-exec flags on new
758	file handles, most C libraries don't care, and even if all cared, it's	997	file handles, most C libraries don't care, and even if all cared, it's
759	often not possible to set the flag in a race-free manner.	998	often not possible to set the flag in a race-free manner.
…		…
779	libraries or the code that leaks those file descriptors.	1018	libraries or the code that leaks those file descriptors.
780		1019
781	Fortunately, most of these leaked descriptors do no harm, other than	1020	Fortunately, most of these leaked descriptors do no harm, other than
782	sitting on some resources.	1021	sitting on some resources.
783		1022
784	=item "leaked" file descriptors for fork'ed processes	1023	=item leaked file descriptors for fork'ed processes
785		1024
786	Normally, L<AnyEvent::Fork> does start new processes by exec'ing them,	1025	Normally, L<AnyEvent::Fork> does start new processes by exec'ing them,
787	which closes file descriptors not marked for being inherited.	1026	which closes file descriptors not marked for being inherited.
788		1027
789	However, L<AnyEvent::Fork::Early> and L<AnyEvent::Fork::Template> offer	1028	However, L<AnyEvent::Fork::Early> and L<AnyEvent::Fork::Template> offer
…		…
798		1037
799	The solution is to either not load these modules before use'ing	1038	The solution is to either not load these modules before use'ing
800	L<AnyEvent::Fork::Early> or L<AnyEvent::Fork::Template>, or to delay	1039	L<AnyEvent::Fork::Early> or L<AnyEvent::Fork::Template>, or to delay
801	initialising them, for example, by calling C<init Gtk2> manually.	1040	initialising them, for example, by calling C<init Gtk2> manually.
802		1041
803	=item exit runs destructors	1042	=item exiting calls object destructors
804		1043
805	This only applies to users of Lc<AnyEvent::Fork:Early> and	1044	This only applies to users of L<AnyEvent::Fork:Early> and
806	L<AnyEvent::Fork::Template>.	1045	L<AnyEvent::Fork::Template>, or when initialising code creates objects
		1046	that reference external resources.
807		1047
808	When a process created by AnyEvent::Fork exits, it might do so by calling	1048	When a process created by AnyEvent::Fork exits, it might do so by calling
809	exit, or simply letting perl reach the end of the program. At which point	1049	exit, or simply letting perl reach the end of the program. At which point
810	Perl runs all destructors.	1050	Perl runs all destructors.
811		1051
…		…
830	to make it so, mostly due to the bloody broken perl that nobody seems to	1070	to make it so, mostly due to the bloody broken perl that nobody seems to
831	care about. The fork emulation is a bad joke - I have yet to see something	1071	care about. The fork emulation is a bad joke - I have yet to see something
832	useful that you can do with it without running into memory corruption	1072	useful that you can do with it without running into memory corruption
833	issues or other braindamage. Hrrrr.	1073	issues or other braindamage. Hrrrr.
834		1074
835	Cygwin perl is not supported at the moment, as it should implement fd	1075	Since fork is endlessly broken on win32 perls (it doesn't even remotely
836	passing, but doesn't, and rolling my own is hard, as cygwin doesn't	1076	work within it's documented limits) and quite obviously it's not getting
837	support enough functionality to do it.	1077	improved any time soon, the best way to proceed on windows would be to
		1078	always use C<new_exec> and thus never rely on perl's fork "emulation".
		1079
		1080	Cygwin perl is not supported at the moment due to some hilarious
		1081	shortcomings of its API - see L<IO::FDPoll> for more details. If you never
		1082	use C<send_fh> and always use C<new_exec> to create processes, it should
		1083	work though.
838		1084
839	=head1 SEE ALSO	1085	=head1 SEE ALSO
840		1086
841	L<AnyEvent::Fork::Early> (to avoid executing a perl interpreter),	1087	L<AnyEvent::Fork::Early>, to avoid executing a perl interpreter at all
		1088	(part of this distribution).
		1089
842	L<AnyEvent::Fork::Template> (to create a process by forking the main	1090	L<AnyEvent::Fork::Template>, to create a process by forking the main
843	program at a convenient time).	1091	program at a convenient time (part of this distribution).
844		1092
845	=head1 AUTHOR	1093	L<AnyEvent::Fork::Remote>, for another way to create processes that is
		1094	mostly compatible to this module and modules building on top of it, but
		1095	works better with remote processes.
		1096
		1097	L<AnyEvent::Fork::RPC>, for simple RPC to child processes (on CPAN).
		1098
		1099	L<AnyEvent::Fork::Pool>, for simple worker process pool (on CPAN).
		1100
		1101	=head1 AUTHOR AND CONTACT INFORMATION
846		1102
847	Marc Lehmann <schmorp@schmorp.de>	1103	Marc Lehmann <schmorp@schmorp.de>
848	http://home.schmorp.de/	1104	http://software.schmorp.de/pkg/AnyEvent-Fork
849		1105
850	=cut	1106	=cut
851		1107
852	1	1108	1
853		1109

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing AnyEvent-Fork/Fork.pm (file contents): Revision 1.24 by root, Sat Apr 6 08:32:23 2013 UTC vs. Revision 1.59 by root, Fri Aug 30 12:06:48 2013 UTC

Diff Legend

Comparing AnyEvent-Fork/Fork.pm (file contents):
Revision 1.24 by root, Sat Apr 6 08:32:23 2013 UTC vs.
Revision 1.59 by root, Fri Aug 30 12:06:48 2013 UTC