--- AnyEvent-Fork-RPC/README 2013/04/17 15:55:59 1.1 +++ AnyEvent-Fork-RPC/README 2013/04/21 12:27:03 1.3 @@ -0,0 +1,618 @@ +NAME + AnyEvent::Fork::RPC - simple RPC extension for AnyEvent::Fork + + THE API IS NOT FINISHED, CONSIDER THIS A BETA RELEASE + +SYNOPSIS + use AnyEvent::Fork::RPC; + # use AnyEvent::Fork is not needed + + my $rpc = AnyEvent::Fork + ->new + ->require ("MyModule") + ->AnyEvent::Fork::RPC::run ( + "MyModule::server", + ); + + use AnyEvent; + + my $cv = AE::cv; + + $rpc->(1, 2, 3, sub { + print "MyModule::server returned @_\n"; + $cv->send; + }); + + $cv->recv; + +DESCRIPTION + This module implements a simple RPC protocol and backend for processes + created via AnyEvent::Fork, allowing you to call a function in the child + process and receive its return values (up to 4GB serialised). + + It implements two different backends: a synchronous one that works like + a normal function call, and an asynchronous one that can run multiple + jobs concurrently in the child, using AnyEvent. + + It also implements an asynchronous event mechanism from the child to the + parent, that could be used for progress indications or other + information. + + Loading this module also always loads AnyEvent::Fork, so you can make a + separate "use AnyEvent::Fork" if you wish, but you don't have to. + +EXAMPLES + Example 1: Synchronous Backend + Here is a simple example that implements a backend that executes + "unlink" and "rmdir" calls, and reports their status back. It also + reports the number of requests it has processed every three requests, + which is clearly silly, but illustrates the use of events. + + First the parent process: + + use AnyEvent; + use AnyEvent::Fork::RPC; + + my $done = AE::cv; + + my $rpc = AnyEvent::Fork + ->new + ->require ("MyWorker") + ->AnyEvent::Fork::RPC::run ("MyWorker::run", + on_error => sub { warn "FATAL: $_[0]"; exit 1 }, + on_event => sub { warn "$_[0] requests handled\n" }, + on_destroy => $done, + ); + + for my $id (1..6) { + $rpc->(rmdir => "/tmp/somepath/$id", sub { + $_[0] + or warn "/tmp/somepath/$id: $_[1]\n"; + }); + } + + undef $rpc; + + $done->recv; + + The parent creates the process, queues a few rmdir's. It then forgets + about the $rpc object, so that the child exits after it has handled the + requests, and then it waits till the requests have been handled. + + The child is implemented using a separate module, "MyWorker", shown + here: + + package MyWorker; + + my $count; + + sub run { + my ($cmd, $path) = @_; + + AnyEvent::Fork::RPC::event ($count) + unless ++$count % 3; + + my $status = $cmd eq "rmdir" ? rmdir $path + : $cmd eq "unlink" ? unlink $path + : die "fatal error, illegal command '$cmd'"; + + $status or (0, "$!") + } + + 1 + + The "run" function first sends a "progress" event every three calls, and + then executes "rmdir" or "unlink", depending on the first parameter (or + dies with a fatal error - obviously, you must never let this happen :). + + Eventually it returns the status value true if the command was + successful, or the status value 0 and the stringified error message. + + On my system, running the first code fragment with the given MyWorker.pm + in the current directory yields: + + /tmp/somepath/1: No such file or directory + /tmp/somepath/2: No such file or directory + 3 requests handled + /tmp/somepath/3: No such file or directory + /tmp/somepath/4: No such file or directory + /tmp/somepath/5: No such file or directory + 6 requests handled + /tmp/somepath/6: No such file or directory + + Obviously, none of the directories I am trying to delete even exist. + Also, the events and responses are processed in exactly the same order + as they were created in the child, which is true for both synchronous + and asynchronous backends. + + Note that the parentheses in the call to "AnyEvent::Fork::RPC::event" + are not optional. That is because the function isn't defined when the + code is compiled. You can make sure it is visible by pre-loading the + correct backend module in the call to "require": + + ->require ("AnyEvent::Fork::RPC::Sync", "MyWorker") + + Since the backend module declares the "event" function, loading it first + ensures that perl will correctly interpret calls to it. + + And as a final remark, there is a fine module on CPAN that can + asynchronously "rmdir" and "unlink" and a lot more, and more efficiently + than this example, namely IO::AIO. + + Example 1a: the same with the asynchronous backend + This example only shows what needs to be changed to use the async + backend instead. Doing this is not very useful, the purpose of this + example is to show the minimum amount of change that is required to go + from the synchronous to the asynchronous backend. + + To use the async backend in the previous example, you need to add the + "async" parameter to the "AnyEvent::Fork::RPC::run" call: + + ->AnyEvent::Fork::RPC::run ("MyWorker::run", + async => 1, + ... + + And since the function call protocol is now changed, you need to adopt + "MyWorker::run" to the async API. + + First, you need to accept the extra initial $done callback: + + sub run { + my ($done, $cmd, $path) = @_; + + And since a response is now generated when $done is called, as opposed + to when the function returns, we need to call the $done function with + the status: + + $done->($status or (0, "$!")); + + A few remarks are in order. First, it's quite pointless to use the async + backend for this example - but it *is* possible. Second, you can call + $done before or after returning from the function. Third, having both + returned from the function and having called the $done callback, the + child process may exit at any time, so you should call $done only when + you really *are* done. + + Example 2: Asynchronous Backend + This example implements multiple count-downs in the child, using + AnyEvent timers. While this is a bit silly (one could use timers in te + parent just as well), it illustrates the ability to use AnyEvent in the + child and the fact that responses can arrive in a different order then + the requests. + + It also shows how to embed the actual child code into a "__DATA__" + section, so it doesn't need any external files at all. + + And when your parent process is often busy, and you have stricter timing + requirements, then running timers in a child process suddenly doesn't + look so silly anymore. + + Without further ado, here is the code: + + use AnyEvent; + use AnyEvent::Fork::RPC; + + my $done = AE::cv; + + my $rpc = AnyEvent::Fork + ->new + ->require ("AnyEvent::Fork::RPC::Async") + ->eval (do { local $/; }) + ->AnyEvent::Fork::RPC::run ("run", + async => 1, + on_error => sub { warn "FATAL: $_[0]"; exit 1 }, + on_event => sub { print $_[0] }, + on_destroy => $done, + ); + + for my $count (3, 2, 1) { + $rpc->($count, sub { + warn "job $count finished\n"; + }); + } + + undef $rpc; + + $done->recv; + + __DATA__ + + # this ends up in main, as we don't use a package declaration + + use AnyEvent; + + sub run { + my ($done, $count) = @_; + + my $n; + + AnyEvent::Fork::RPC::event "starting to count up to $count\n"; + + my $w; $w = AE::timer 1, 1, sub { + ++$n; + + AnyEvent::Fork::RPC::event "count $n of $count\n"; + + if ($n == $count) { + undef $w; + $done->(); + } + }; + } + + The parent part (the one before the "__DATA__" section) isn't very + different from the earlier examples. It sets async mode, preloads the + backend module (so the "AnyEvent::Fork::RPC::event" function is + declared), uses a slightly different "on_event" handler (which we use + simply for logging purposes) and then, instead of loading a module with + the actual worker code, it "eval"'s the code from the data section in + the child process. + + It then starts three countdowns, from 3 to 1 seconds downwards, destroys + the rpc object so the example finishes eventually, and then just waits + for the stuff to trickle in. + + The worker code uses the event function to log some progress messages, + but mostly just creates a recurring one-second timer. + + The timer callback increments a counter, logs a message, and eventually, + when the count has been reached, calls the finish callback. + + On my system, this results in the following output. Since all timers + fire at roughly the same time, the actual order isn't guaranteed, but + the order shown is very likely what you would get, too. + + starting to count up to 3 + starting to count up to 2 + starting to count up to 1 + count 1 of 3 + count 1 of 2 + count 1 of 1 + job 1 finished + count 2 of 2 + job 2 finished + count 2 of 3 + count 3 of 3 + job 3 finished + + While the overall ordering isn't guaranteed, the async backend still + guarantees that events and responses are delivered to the parent process + in the exact same ordering as they were generated in the child process. + + And unless your system is *very* busy, it should clearly show that the + job started last will finish first, as it has the lowest count. + + This concludes the async example. Since AnyEvent::Fork does not actually + fork, you are free to use about any module in the child, not just + AnyEvent, but also IO::AIO, or Tk for example. + +PARENT PROCESS USAGE + This module exports nothing, and only implements a single function: + + my $rpc = AnyEvent::Fork::RPC::run $fork, $function, [key => value...] + The traditional way to call it. But it is way cooler to call it in + the following way: + + my $rpc = $fork->AnyEvent::Fork::RPC::run ($function, [key => value...]) + This "run" function/method can be used in place of the + AnyEvent::Fork::run method. Just like that method, it takes over the + AnyEvent::Fork process, but instead of calling the specified + $function directly, it runs a server that accepts RPC calls and + handles responses. + + It returns a function reference that can be used to call the + function in the child process, handling serialisation and data + transfers. + + The following key/value pairs are allowed. It is recommended to have + at least an "on_error" or "on_event" handler set. + + on_error => $cb->($msg) + Called on (fatal) errors, with a descriptive (hopefully) + message. If this callback is not provided, but "on_event" is, + then the "on_event" callback is called with the first argument + being the string "error", followed by the error message. + + If neither handler is provided it prints the error to STDERR and + will start failing badly. + + on_event => $cb->(...) + Called for every call to the "AnyEvent::Fork::RPC::event" + function in the child, with the arguments of that function + passed to the callback. + + Also called on errors when no "on_error" handler is provided. + + on_destroy => $cb->() + Called when the $rpc object has been destroyed and all requests + have been successfully handled. This is useful when you queue + some requests and want the child to go away after it has handled + them. The problem is that the parent must not exit either until + all requests have been handled, and this can be accomplished by + waiting for this callback. + + init => $function (default none) + When specified (by name), this function is called in the child + as the very first thing when taking over the process, with all + the arguments normally passed to the "AnyEvent::Fork::run" + function, except the communications socket. + + It can be used to do one-time things in the child such as + storing passed parameters or opening database connections. + + It is called very early - before the serialisers are created or + the $function name is resolved into a function reference, so it + could be used to load any modules that provide the serialiser or + function. It can not, however, create events. + + async => $boolean (default: 0) + The default server used in the child does all I/O blockingly, + and only allows a single RPC call to execute concurrently. + + Setting "async" to a true value switches to another + implementation that uses AnyEvent in the child and allows + multiple concurrent RPC calls (it does not support recursion in + the event loop however, blocking condvar calls will fail). + + The actual API in the child is documented in the section that + describes the calling semantics of the returned $rpc function. + + If you want to pre-load the actual back-end modules to enable + memory sharing, then you should load "AnyEvent::Fork::RPC::Sync" + for synchronous, and "AnyEvent::Fork::RPC::Async" for + asynchronous mode. + + If you use a template process and want to fork both sync and + async children, then it is permissible to load both modules. + + serialiser => $string (default: + $AnyEvent::Fork::RPC::STRING_SERIALISER) + All arguments, result data and event data have to be serialised + to be transferred between the processes. For this, they have to + be frozen and thawed in both parent and child processes. + + By default, only octet strings can be passed between the + processes, which is reasonably fast and efficient and requires + no extra modules. + + For more complicated use cases, you can provide your own freeze + and thaw functions, by specifying a string with perl source + code. It's supposed to return two code references when + evaluated: the first receives a list of perl values and must + return an octet string. The second receives the octet string and + must return the original list of values. + + If you need an external module for serialisation, then you can + either pre-load it into your AnyEvent::Fork process, or you can + add a "use" or "require" statement into the serialiser string. + Or both. + + Here are some examples - some of them are also available as + global variables that make them easier to use. + + octet strings - $AnyEvent::Fork::RPC::STRING_SERIALISER + This serialiser concatenates length-prefixes octet strings, + and is the default. + + Implementation: + + ( + sub { pack "(w/a*)*", @_ }, + sub { unpack "(w/a*)*", shift } + ) + + json - $AnyEvent::Fork::RPC::JSON_SERIALISER + This serialiser creates JSON arrays - you have to make sure + the JSON module is installed for this serialiser to work. It + can be beneficial for sharing when you preload the JSON + module in a template process. + + JSON (with JSON::XS installed) is slower than the octet + string serialiser, but usually much faster than Storable, + unless big chunks of binary data need to be transferred. + + Implementation: + + use JSON (); + ( + sub { JSON::encode_json \@_ }, + sub { @{ JSON::decode_json shift } } + ) + + storable - $AnyEvent::Fork::RPC::STORABLE_SERIALISER + This serialiser uses Storable, which means it has high + chance of serialising just about anything you throw at it, + at the cost of having very high overhead per operation. It + also comes with perl. + + Implementation: + + use Storable (); + ( + sub { Storable::freeze \@_ }, + sub { @{ Storable::thaw shift } } + ) + + See the examples section earlier in this document for some actual + examples. + + $rpc->(..., $cb->(...)) + The RPC object returned by "AnyEvent::Fork::RPC::run" is actually a + code reference. There are two things you can do with it: call it, + and let it go out of scope (let it get destroyed). + + If "async" was false when $rpc was created (the default), then, if + you call $rpc, the $function is invoked with all arguments passed to + $rpc except the last one (the callback). When the function returns, + the callback will be invoked with all the return values. + + If "async" was true, then the $function receives an additional + initial argument, the result callback. In this case, returning from + $function does nothing - the function only counts as "done" when the + result callback is called, and any arguments passed to it are + considered the return values. This makes it possible to "return" + from event handlers or e.g. Coro threads. + + The other thing that can be done with the RPC object is to destroy + it. In this case, the child process will execute all remaining RPC + calls, report their results, and then exit. + + See the examples section earlier in this document for some actual + examples. + +CHILD PROCESS USAGE + The following function is not available in this module. They are only + available in the namespace of this module when the child is running, + without having to load any extra modules. They are part of the + child-side API of AnyEvent::Fork::RPC. + + AnyEvent::Fork::RPC::event ... + Send an event to the parent. Events are a bit like RPC calls made by + the child process to the parent, except that there is no notion of + return values. + + See the examples section earlier in this document for some actual + examples. + +ADVANCED TOPICS + Choosing a backend + So how do you decide which backend to use? Well, that's your problem to + solve, but here are some thoughts on the matter: + + Synchronous + The synchronous backend does not rely on any external modules (well, + except common::sense, which works around a bug in how perl's warning + system works). This keeps the process very small, for example, on my + system, an empty perl interpreter uses 1492kB RSS, which becomes + 2020kB after "use warnings; use strict" (for people who grew up with + C64s around them this is probably shocking every single time they + see it). The worker process in the first example in this document + uses 1792kB. + + Since the calls are done synchronously, slow jobs will keep newer + jobs from executing. + + The synchronous backend also has no overhead due to running an event + loop - reading requests is therefore very efficient, while writing + responses is less so, as every response results in a write syscall. + + If the parent process is busy and a bit slow reading responses, the + child waits instead of processing further requests. This also limits + the amount of memory needed for buffering, as never more than one + response has to be buffered. + + The API in the child is simple - you just have to define a function + that does something and returns something. + + It's hard to use modules or code that relies on an event loop, as + the child cannot execute anything while it waits for more input. + + Asynchronous + The asynchronous backend relies on AnyEvent, which tries to be + small, but still comes at a price: On my system, the worker from + example 1a uses 3420kB RSS (for AnyEvent, which loads EV, which + needs XSLoader which in turn loads a lot of other modules such as + warnings, strict, vars, Exporter...). + + It batches requests and responses reasonably efficiently, doing only + as few reads and writes as needed, but needs to poll for events via + the event loop. + + Responses are queued when the parent process is busy. This means the + child can continue to execute any queued requests. It also means + that a child might queue a lot of responses in memory when it + generates them and the parent process is slow accepting them. + + The API is not a straightforward RPC pattern - you have to call a + "done" callback to pass return values and signal completion. Also, + more importantly, the API starts jobs as fast as possible - when + 1000 jobs are queued and the jobs are slow, they will all run + concurrently. The child must implement some queueing/limiting + mechanism if this causes problems. Alternatively, the parent could + limit the amount of rpc calls that are outstanding. + + Blocking use of condvars is not supported. + + Using event-based modules such as IO::AIO, Gtk2, Tk and so on is + easy. + + Passing file descriptors + Unlike AnyEvent::Fork, this module has no in-built file handle or file + descriptor passing abilities. + + The reason is that passing file descriptors is extraordinary tricky + business, and conflicts with efficient batching of messages. + + There still is a method you can use: Create a + "AnyEvent::Util::portable_socketpair" and "send_fh" one half of it to + the process before you pass control to "AnyEvent::Fork::RPC::run". + + Whenever you want to pass a file descriptor, send an rpc request to the + child process (so it expects the descriptor), then send it over the + other half of the socketpair. The child should fetch the descriptor from + the half it has passed earlier. + + Here is some (untested) pseudocode to that effect: + + use AnyEvent::Util; + use AnyEvent::Fork::RPC; + use IO::FDPass; + + my ($s1, $s2) = AnyEvent::Util::portable_socketpair; + + my $rpc = AnyEvent::Fork + ->new + ->send_fh ($s2) + ->require ("MyWorker") + ->AnyEvent::Fork::RPC::run ("MyWorker::run" + init => "MyWorker::init", + ); + + undef $s2; # no need to keep it around + + # pass an fd + $rpc->("i'll send some fd now, please expect it!", my $cv = AE::cv); + + IO::FDPass fileno $s1, fileno $handle_to_pass; + + $cv->recv; + + The MyWorker module could look like this: + + package MyWorker; + + use IO::FDPass; + + my $s2; + + sub init { + $s2 = $_[0]; + } + + sub run { + if ($_[0] eq "i'll send some fd now, please expect it!") { + my $fd = IO::FDPass::recv fileno $s2; + ... + } + } + + Of course, this might be blocking if you pass a lot of file descriptors, + so you might want to look into AnyEvent::FDpasser which can handle the + gory details. + +EXCEPTIONS + There are no provisions whatsoever for catching exceptions at this time + - in the child, exeptions might kill the process, causing calls to be + lost and the parent encountering a fatal error. In the parent, + exceptions in the result callback will not be caught and cause undefined + behaviour. + +SEE ALSO + AnyEvent::Fork, to create the processes in the first place. + + AnyEvent::Fork::Pool, to manage whole pools of processes. + +AUTHOR AND CONTACT INFORMATION + Marc Lehmann + http://software.schmorp.de/pkg/AnyEvent-Fork-RPC +