ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-RPC/README
Revision: 1.3
Committed: Sun Apr 21 12:27:03 2013 UTC (11 years, 7 months ago) by root
Branch: MAIN
CVS Tags: rel-1_1, rel-0_2
Changes since 1.2: +8 -1 lines
Log Message:
0.2

File Contents

# User Rev Content
1 root 1.2 NAME
2     AnyEvent::Fork::RPC - simple RPC extension for AnyEvent::Fork
3    
4 root 1.3 THE API IS NOT FINISHED, CONSIDER THIS A BETA RELEASE
5 root 1.2
6     SYNOPSIS
7     use AnyEvent::Fork::RPC;
8     # use AnyEvent::Fork is not needed
9    
10     my $rpc = AnyEvent::Fork
11     ->new
12     ->require ("MyModule")
13     ->AnyEvent::Fork::RPC::run (
14     "MyModule::server",
15     );
16    
17     use AnyEvent;
18    
19     my $cv = AE::cv;
20    
21     $rpc->(1, 2, 3, sub {
22     print "MyModule::server returned @_\n";
23     $cv->send;
24     });
25    
26     $cv->recv;
27    
28     DESCRIPTION
29     This module implements a simple RPC protocol and backend for processes
30     created via AnyEvent::Fork, allowing you to call a function in the child
31     process and receive its return values (up to 4GB serialised).
32    
33     It implements two different backends: a synchronous one that works like
34     a normal function call, and an asynchronous one that can run multiple
35     jobs concurrently in the child, using AnyEvent.
36    
37     It also implements an asynchronous event mechanism from the child to the
38     parent, that could be used for progress indications or other
39     information.
40    
41     Loading this module also always loads AnyEvent::Fork, so you can make a
42     separate "use AnyEvent::Fork" if you wish, but you don't have to.
43    
44     EXAMPLES
45     Example 1: Synchronous Backend
46     Here is a simple example that implements a backend that executes
47     "unlink" and "rmdir" calls, and reports their status back. It also
48     reports the number of requests it has processed every three requests,
49     which is clearly silly, but illustrates the use of events.
50    
51     First the parent process:
52    
53     use AnyEvent;
54     use AnyEvent::Fork::RPC;
55    
56     my $done = AE::cv;
57    
58     my $rpc = AnyEvent::Fork
59     ->new
60     ->require ("MyWorker")
61     ->AnyEvent::Fork::RPC::run ("MyWorker::run",
62     on_error => sub { warn "FATAL: $_[0]"; exit 1 },
63     on_event => sub { warn "$_[0] requests handled\n" },
64     on_destroy => $done,
65     );
66    
67     for my $id (1..6) {
68     $rpc->(rmdir => "/tmp/somepath/$id", sub {
69     $_[0]
70     or warn "/tmp/somepath/$id: $_[1]\n";
71     });
72     }
73    
74     undef $rpc;
75    
76     $done->recv;
77    
78     The parent creates the process, queues a few rmdir's. It then forgets
79     about the $rpc object, so that the child exits after it has handled the
80     requests, and then it waits till the requests have been handled.
81    
82     The child is implemented using a separate module, "MyWorker", shown
83     here:
84    
85     package MyWorker;
86    
87     my $count;
88    
89     sub run {
90     my ($cmd, $path) = @_;
91    
92     AnyEvent::Fork::RPC::event ($count)
93     unless ++$count % 3;
94    
95     my $status = $cmd eq "rmdir" ? rmdir $path
96     : $cmd eq "unlink" ? unlink $path
97     : die "fatal error, illegal command '$cmd'";
98    
99     $status or (0, "$!")
100     }
101    
102     1
103    
104     The "run" function first sends a "progress" event every three calls, and
105     then executes "rmdir" or "unlink", depending on the first parameter (or
106     dies with a fatal error - obviously, you must never let this happen :).
107    
108     Eventually it returns the status value true if the command was
109     successful, or the status value 0 and the stringified error message.
110    
111     On my system, running the first code fragment with the given MyWorker.pm
112     in the current directory yields:
113    
114     /tmp/somepath/1: No such file or directory
115     /tmp/somepath/2: No such file or directory
116     3 requests handled
117     /tmp/somepath/3: No such file or directory
118     /tmp/somepath/4: No such file or directory
119     /tmp/somepath/5: No such file or directory
120     6 requests handled
121     /tmp/somepath/6: No such file or directory
122    
123     Obviously, none of the directories I am trying to delete even exist.
124     Also, the events and responses are processed in exactly the same order
125     as they were created in the child, which is true for both synchronous
126     and asynchronous backends.
127    
128     Note that the parentheses in the call to "AnyEvent::Fork::RPC::event"
129     are not optional. That is because the function isn't defined when the
130     code is compiled. You can make sure it is visible by pre-loading the
131     correct backend module in the call to "require":
132    
133     ->require ("AnyEvent::Fork::RPC::Sync", "MyWorker")
134    
135     Since the backend module declares the "event" function, loading it first
136     ensures that perl will correctly interpret calls to it.
137    
138     And as a final remark, there is a fine module on CPAN that can
139     asynchronously "rmdir" and "unlink" and a lot more, and more efficiently
140     than this example, namely IO::AIO.
141    
142     Example 1a: the same with the asynchronous backend
143     This example only shows what needs to be changed to use the async
144     backend instead. Doing this is not very useful, the purpose of this
145     example is to show the minimum amount of change that is required to go
146     from the synchronous to the asynchronous backend.
147    
148     To use the async backend in the previous example, you need to add the
149     "async" parameter to the "AnyEvent::Fork::RPC::run" call:
150    
151     ->AnyEvent::Fork::RPC::run ("MyWorker::run",
152     async => 1,
153     ...
154    
155     And since the function call protocol is now changed, you need to adopt
156     "MyWorker::run" to the async API.
157    
158     First, you need to accept the extra initial $done callback:
159    
160     sub run {
161     my ($done, $cmd, $path) = @_;
162    
163     And since a response is now generated when $done is called, as opposed
164     to when the function returns, we need to call the $done function with
165     the status:
166    
167     $done->($status or (0, "$!"));
168    
169     A few remarks are in order. First, it's quite pointless to use the async
170     backend for this example - but it *is* possible. Second, you can call
171     $done before or after returning from the function. Third, having both
172     returned from the function and having called the $done callback, the
173     child process may exit at any time, so you should call $done only when
174     you really *are* done.
175    
176     Example 2: Asynchronous Backend
177     This example implements multiple count-downs in the child, using
178     AnyEvent timers. While this is a bit silly (one could use timers in te
179     parent just as well), it illustrates the ability to use AnyEvent in the
180     child and the fact that responses can arrive in a different order then
181     the requests.
182    
183     It also shows how to embed the actual child code into a "__DATA__"
184     section, so it doesn't need any external files at all.
185    
186     And when your parent process is often busy, and you have stricter timing
187     requirements, then running timers in a child process suddenly doesn't
188     look so silly anymore.
189    
190     Without further ado, here is the code:
191    
192     use AnyEvent;
193     use AnyEvent::Fork::RPC;
194    
195     my $done = AE::cv;
196    
197     my $rpc = AnyEvent::Fork
198     ->new
199     ->require ("AnyEvent::Fork::RPC::Async")
200     ->eval (do { local $/; <DATA> })
201     ->AnyEvent::Fork::RPC::run ("run",
202     async => 1,
203     on_error => sub { warn "FATAL: $_[0]"; exit 1 },
204     on_event => sub { print $_[0] },
205     on_destroy => $done,
206     );
207    
208     for my $count (3, 2, 1) {
209     $rpc->($count, sub {
210     warn "job $count finished\n";
211     });
212     }
213    
214     undef $rpc;
215    
216     $done->recv;
217    
218     __DATA__
219    
220     # this ends up in main, as we don't use a package declaration
221    
222     use AnyEvent;
223    
224     sub run {
225     my ($done, $count) = @_;
226    
227     my $n;
228    
229     AnyEvent::Fork::RPC::event "starting to count up to $count\n";
230    
231     my $w; $w = AE::timer 1, 1, sub {
232     ++$n;
233    
234     AnyEvent::Fork::RPC::event "count $n of $count\n";
235    
236     if ($n == $count) {
237     undef $w;
238     $done->();
239     }
240     };
241     }
242    
243     The parent part (the one before the "__DATA__" section) isn't very
244     different from the earlier examples. It sets async mode, preloads the
245     backend module (so the "AnyEvent::Fork::RPC::event" function is
246     declared), uses a slightly different "on_event" handler (which we use
247     simply for logging purposes) and then, instead of loading a module with
248     the actual worker code, it "eval"'s the code from the data section in
249     the child process.
250    
251     It then starts three countdowns, from 3 to 1 seconds downwards, destroys
252     the rpc object so the example finishes eventually, and then just waits
253     for the stuff to trickle in.
254    
255     The worker code uses the event function to log some progress messages,
256     but mostly just creates a recurring one-second timer.
257    
258     The timer callback increments a counter, logs a message, and eventually,
259     when the count has been reached, calls the finish callback.
260    
261     On my system, this results in the following output. Since all timers
262     fire at roughly the same time, the actual order isn't guaranteed, but
263     the order shown is very likely what you would get, too.
264    
265     starting to count up to 3
266     starting to count up to 2
267     starting to count up to 1
268     count 1 of 3
269     count 1 of 2
270     count 1 of 1
271     job 1 finished
272     count 2 of 2
273     job 2 finished
274     count 2 of 3
275     count 3 of 3
276     job 3 finished
277    
278     While the overall ordering isn't guaranteed, the async backend still
279     guarantees that events and responses are delivered to the parent process
280     in the exact same ordering as they were generated in the child process.
281    
282     And unless your system is *very* busy, it should clearly show that the
283     job started last will finish first, as it has the lowest count.
284    
285     This concludes the async example. Since AnyEvent::Fork does not actually
286     fork, you are free to use about any module in the child, not just
287     AnyEvent, but also IO::AIO, or Tk for example.
288    
289     PARENT PROCESS USAGE
290     This module exports nothing, and only implements a single function:
291    
292     my $rpc = AnyEvent::Fork::RPC::run $fork, $function, [key => value...]
293     The traditional way to call it. But it is way cooler to call it in
294     the following way:
295    
296     my $rpc = $fork->AnyEvent::Fork::RPC::run ($function, [key => value...])
297     This "run" function/method can be used in place of the
298     AnyEvent::Fork::run method. Just like that method, it takes over the
299     AnyEvent::Fork process, but instead of calling the specified
300     $function directly, it runs a server that accepts RPC calls and
301     handles responses.
302    
303     It returns a function reference that can be used to call the
304     function in the child process, handling serialisation and data
305     transfers.
306    
307     The following key/value pairs are allowed. It is recommended to have
308     at least an "on_error" or "on_event" handler set.
309    
310     on_error => $cb->($msg)
311     Called on (fatal) errors, with a descriptive (hopefully)
312     message. If this callback is not provided, but "on_event" is,
313     then the "on_event" callback is called with the first argument
314     being the string "error", followed by the error message.
315    
316     If neither handler is provided it prints the error to STDERR and
317     will start failing badly.
318    
319     on_event => $cb->(...)
320     Called for every call to the "AnyEvent::Fork::RPC::event"
321     function in the child, with the arguments of that function
322     passed to the callback.
323    
324     Also called on errors when no "on_error" handler is provided.
325    
326     on_destroy => $cb->()
327     Called when the $rpc object has been destroyed and all requests
328     have been successfully handled. This is useful when you queue
329     some requests and want the child to go away after it has handled
330     them. The problem is that the parent must not exit either until
331     all requests have been handled, and this can be accomplished by
332     waiting for this callback.
333    
334     init => $function (default none)
335     When specified (by name), this function is called in the child
336     as the very first thing when taking over the process, with all
337     the arguments normally passed to the "AnyEvent::Fork::run"
338     function, except the communications socket.
339    
340     It can be used to do one-time things in the child such as
341     storing passed parameters or opening database connections.
342    
343     It is called very early - before the serialisers are created or
344     the $function name is resolved into a function reference, so it
345     could be used to load any modules that provide the serialiser or
346     function. It can not, however, create events.
347    
348     async => $boolean (default: 0)
349     The default server used in the child does all I/O blockingly,
350     and only allows a single RPC call to execute concurrently.
351    
352     Setting "async" to a true value switches to another
353     implementation that uses AnyEvent in the child and allows
354     multiple concurrent RPC calls (it does not support recursion in
355     the event loop however, blocking condvar calls will fail).
356    
357     The actual API in the child is documented in the section that
358     describes the calling semantics of the returned $rpc function.
359    
360     If you want to pre-load the actual back-end modules to enable
361     memory sharing, then you should load "AnyEvent::Fork::RPC::Sync"
362     for synchronous, and "AnyEvent::Fork::RPC::Async" for
363     asynchronous mode.
364    
365     If you use a template process and want to fork both sync and
366     async children, then it is permissible to load both modules.
367    
368     serialiser => $string (default:
369     $AnyEvent::Fork::RPC::STRING_SERIALISER)
370     All arguments, result data and event data have to be serialised
371     to be transferred between the processes. For this, they have to
372     be frozen and thawed in both parent and child processes.
373    
374     By default, only octet strings can be passed between the
375     processes, which is reasonably fast and efficient and requires
376     no extra modules.
377    
378     For more complicated use cases, you can provide your own freeze
379     and thaw functions, by specifying a string with perl source
380     code. It's supposed to return two code references when
381     evaluated: the first receives a list of perl values and must
382     return an octet string. The second receives the octet string and
383     must return the original list of values.
384    
385     If you need an external module for serialisation, then you can
386     either pre-load it into your AnyEvent::Fork process, or you can
387     add a "use" or "require" statement into the serialiser string.
388     Or both.
389    
390     Here are some examples - some of them are also available as
391     global variables that make them easier to use.
392    
393     octet strings - $AnyEvent::Fork::RPC::STRING_SERIALISER
394     This serialiser concatenates length-prefixes octet strings,
395     and is the default.
396    
397     Implementation:
398    
399     (
400     sub { pack "(w/a*)*", @_ },
401     sub { unpack "(w/a*)*", shift }
402     )
403    
404     json - $AnyEvent::Fork::RPC::JSON_SERIALISER
405     This serialiser creates JSON arrays - you have to make sure
406     the JSON module is installed for this serialiser to work. It
407     can be beneficial for sharing when you preload the JSON
408     module in a template process.
409    
410     JSON (with JSON::XS installed) is slower than the octet
411     string serialiser, but usually much faster than Storable,
412     unless big chunks of binary data need to be transferred.
413    
414     Implementation:
415    
416     use JSON ();
417     (
418     sub { JSON::encode_json \@_ },
419     sub { @{ JSON::decode_json shift } }
420     )
421    
422     storable - $AnyEvent::Fork::RPC::STORABLE_SERIALISER
423     This serialiser uses Storable, which means it has high
424     chance of serialising just about anything you throw at it,
425     at the cost of having very high overhead per operation. It
426     also comes with perl.
427    
428     Implementation:
429    
430     use Storable ();
431     (
432     sub { Storable::freeze \@_ },
433     sub { @{ Storable::thaw shift } }
434     )
435    
436     See the examples section earlier in this document for some actual
437     examples.
438    
439     $rpc->(..., $cb->(...))
440     The RPC object returned by "AnyEvent::Fork::RPC::run" is actually a
441     code reference. There are two things you can do with it: call it,
442     and let it go out of scope (let it get destroyed).
443    
444     If "async" was false when $rpc was created (the default), then, if
445     you call $rpc, the $function is invoked with all arguments passed to
446     $rpc except the last one (the callback). When the function returns,
447     the callback will be invoked with all the return values.
448    
449     If "async" was true, then the $function receives an additional
450     initial argument, the result callback. In this case, returning from
451     $function does nothing - the function only counts as "done" when the
452     result callback is called, and any arguments passed to it are
453     considered the return values. This makes it possible to "return"
454     from event handlers or e.g. Coro threads.
455    
456     The other thing that can be done with the RPC object is to destroy
457     it. In this case, the child process will execute all remaining RPC
458     calls, report their results, and then exit.
459    
460     See the examples section earlier in this document for some actual
461     examples.
462    
463     CHILD PROCESS USAGE
464     The following function is not available in this module. They are only
465     available in the namespace of this module when the child is running,
466     without having to load any extra modules. They are part of the
467     child-side API of AnyEvent::Fork::RPC.
468    
469     AnyEvent::Fork::RPC::event ...
470     Send an event to the parent. Events are a bit like RPC calls made by
471     the child process to the parent, except that there is no notion of
472     return values.
473    
474     See the examples section earlier in this document for some actual
475     examples.
476    
477     ADVANCED TOPICS
478     Choosing a backend
479     So how do you decide which backend to use? Well, that's your problem to
480     solve, but here are some thoughts on the matter:
481    
482     Synchronous
483     The synchronous backend does not rely on any external modules (well,
484     except common::sense, which works around a bug in how perl's warning
485     system works). This keeps the process very small, for example, on my
486     system, an empty perl interpreter uses 1492kB RSS, which becomes
487     2020kB after "use warnings; use strict" (for people who grew up with
488     C64s around them this is probably shocking every single time they
489     see it). The worker process in the first example in this document
490     uses 1792kB.
491    
492     Since the calls are done synchronously, slow jobs will keep newer
493     jobs from executing.
494    
495     The synchronous backend also has no overhead due to running an event
496     loop - reading requests is therefore very efficient, while writing
497     responses is less so, as every response results in a write syscall.
498    
499     If the parent process is busy and a bit slow reading responses, the
500     child waits instead of processing further requests. This also limits
501     the amount of memory needed for buffering, as never more than one
502     response has to be buffered.
503    
504     The API in the child is simple - you just have to define a function
505     that does something and returns something.
506    
507     It's hard to use modules or code that relies on an event loop, as
508     the child cannot execute anything while it waits for more input.
509    
510     Asynchronous
511     The asynchronous backend relies on AnyEvent, which tries to be
512     small, but still comes at a price: On my system, the worker from
513     example 1a uses 3420kB RSS (for AnyEvent, which loads EV, which
514     needs XSLoader which in turn loads a lot of other modules such as
515     warnings, strict, vars, Exporter...).
516    
517     It batches requests and responses reasonably efficiently, doing only
518     as few reads and writes as needed, but needs to poll for events via
519     the event loop.
520    
521     Responses are queued when the parent process is busy. This means the
522     child can continue to execute any queued requests. It also means
523     that a child might queue a lot of responses in memory when it
524     generates them and the parent process is slow accepting them.
525    
526     The API is not a straightforward RPC pattern - you have to call a
527     "done" callback to pass return values and signal completion. Also,
528     more importantly, the API starts jobs as fast as possible - when
529     1000 jobs are queued and the jobs are slow, they will all run
530     concurrently. The child must implement some queueing/limiting
531     mechanism if this causes problems. Alternatively, the parent could
532     limit the amount of rpc calls that are outstanding.
533    
534     Blocking use of condvars is not supported.
535    
536     Using event-based modules such as IO::AIO, Gtk2, Tk and so on is
537     easy.
538    
539     Passing file descriptors
540     Unlike AnyEvent::Fork, this module has no in-built file handle or file
541     descriptor passing abilities.
542    
543     The reason is that passing file descriptors is extraordinary tricky
544     business, and conflicts with efficient batching of messages.
545    
546     There still is a method you can use: Create a
547     "AnyEvent::Util::portable_socketpair" and "send_fh" one half of it to
548     the process before you pass control to "AnyEvent::Fork::RPC::run".
549    
550     Whenever you want to pass a file descriptor, send an rpc request to the
551     child process (so it expects the descriptor), then send it over the
552     other half of the socketpair. The child should fetch the descriptor from
553     the half it has passed earlier.
554    
555     Here is some (untested) pseudocode to that effect:
556    
557     use AnyEvent::Util;
558     use AnyEvent::Fork::RPC;
559     use IO::FDPass;
560    
561     my ($s1, $s2) = AnyEvent::Util::portable_socketpair;
562    
563     my $rpc = AnyEvent::Fork
564     ->new
565     ->send_fh ($s2)
566     ->require ("MyWorker")
567     ->AnyEvent::Fork::RPC::run ("MyWorker::run"
568     init => "MyWorker::init",
569     );
570    
571     undef $s2; # no need to keep it around
572    
573     # pass an fd
574     $rpc->("i'll send some fd now, please expect it!", my $cv = AE::cv);
575    
576     IO::FDPass fileno $s1, fileno $handle_to_pass;
577    
578     $cv->recv;
579    
580     The MyWorker module could look like this:
581    
582     package MyWorker;
583    
584     use IO::FDPass;
585    
586     my $s2;
587    
588     sub init {
589     $s2 = $_[0];
590     }
591    
592     sub run {
593     if ($_[0] eq "i'll send some fd now, please expect it!") {
594     my $fd = IO::FDPass::recv fileno $s2;
595     ...
596     }
597     }
598    
599     Of course, this might be blocking if you pass a lot of file descriptors,
600     so you might want to look into AnyEvent::FDpasser which can handle the
601     gory details.
602    
603 root 1.3 EXCEPTIONS
604     There are no provisions whatsoever for catching exceptions at this time
605     - in the child, exeptions might kill the process, causing calls to be
606     lost and the parent encountering a fatal error. In the parent,
607     exceptions in the result callback will not be caught and cause undefined
608     behaviour.
609    
610 root 1.2 SEE ALSO
611     AnyEvent::Fork, to create the processes in the first place.
612    
613     AnyEvent::Fork::Pool, to manage whole pools of processes.
614    
615     AUTHOR AND CONTACT INFORMATION
616     Marc Lehmann <schmorp@schmorp.de>
617     http://software.schmorp.de/pkg/AnyEvent-Fork-RPC
618