ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-RPC/README
Revision: 1.3
Committed: Sun Apr 21 12:27:03 2013 UTC (11 years, 3 months ago) by root
Branch: MAIN
CVS Tags: rel-1_1, rel-0_2
Changes since 1.2: +8 -1 lines
Log Message:
0.2

File Contents

# Content
1 NAME
2 AnyEvent::Fork::RPC - simple RPC extension for AnyEvent::Fork
3
4 THE API IS NOT FINISHED, CONSIDER THIS A BETA RELEASE
5
6 SYNOPSIS
7 use AnyEvent::Fork::RPC;
8 # use AnyEvent::Fork is not needed
9
10 my $rpc = AnyEvent::Fork
11 ->new
12 ->require ("MyModule")
13 ->AnyEvent::Fork::RPC::run (
14 "MyModule::server",
15 );
16
17 use AnyEvent;
18
19 my $cv = AE::cv;
20
21 $rpc->(1, 2, 3, sub {
22 print "MyModule::server returned @_\n";
23 $cv->send;
24 });
25
26 $cv->recv;
27
28 DESCRIPTION
29 This module implements a simple RPC protocol and backend for processes
30 created via AnyEvent::Fork, allowing you to call a function in the child
31 process and receive its return values (up to 4GB serialised).
32
33 It implements two different backends: a synchronous one that works like
34 a normal function call, and an asynchronous one that can run multiple
35 jobs concurrently in the child, using AnyEvent.
36
37 It also implements an asynchronous event mechanism from the child to the
38 parent, that could be used for progress indications or other
39 information.
40
41 Loading this module also always loads AnyEvent::Fork, so you can make a
42 separate "use AnyEvent::Fork" if you wish, but you don't have to.
43
44 EXAMPLES
45 Example 1: Synchronous Backend
46 Here is a simple example that implements a backend that executes
47 "unlink" and "rmdir" calls, and reports their status back. It also
48 reports the number of requests it has processed every three requests,
49 which is clearly silly, but illustrates the use of events.
50
51 First the parent process:
52
53 use AnyEvent;
54 use AnyEvent::Fork::RPC;
55
56 my $done = AE::cv;
57
58 my $rpc = AnyEvent::Fork
59 ->new
60 ->require ("MyWorker")
61 ->AnyEvent::Fork::RPC::run ("MyWorker::run",
62 on_error => sub { warn "FATAL: $_[0]"; exit 1 },
63 on_event => sub { warn "$_[0] requests handled\n" },
64 on_destroy => $done,
65 );
66
67 for my $id (1..6) {
68 $rpc->(rmdir => "/tmp/somepath/$id", sub {
69 $_[0]
70 or warn "/tmp/somepath/$id: $_[1]\n";
71 });
72 }
73
74 undef $rpc;
75
76 $done->recv;
77
78 The parent creates the process, queues a few rmdir's. It then forgets
79 about the $rpc object, so that the child exits after it has handled the
80 requests, and then it waits till the requests have been handled.
81
82 The child is implemented using a separate module, "MyWorker", shown
83 here:
84
85 package MyWorker;
86
87 my $count;
88
89 sub run {
90 my ($cmd, $path) = @_;
91
92 AnyEvent::Fork::RPC::event ($count)
93 unless ++$count % 3;
94
95 my $status = $cmd eq "rmdir" ? rmdir $path
96 : $cmd eq "unlink" ? unlink $path
97 : die "fatal error, illegal command '$cmd'";
98
99 $status or (0, "$!")
100 }
101
102 1
103
104 The "run" function first sends a "progress" event every three calls, and
105 then executes "rmdir" or "unlink", depending on the first parameter (or
106 dies with a fatal error - obviously, you must never let this happen :).
107
108 Eventually it returns the status value true if the command was
109 successful, or the status value 0 and the stringified error message.
110
111 On my system, running the first code fragment with the given MyWorker.pm
112 in the current directory yields:
113
114 /tmp/somepath/1: No such file or directory
115 /tmp/somepath/2: No such file or directory
116 3 requests handled
117 /tmp/somepath/3: No such file or directory
118 /tmp/somepath/4: No such file or directory
119 /tmp/somepath/5: No such file or directory
120 6 requests handled
121 /tmp/somepath/6: No such file or directory
122
123 Obviously, none of the directories I am trying to delete even exist.
124 Also, the events and responses are processed in exactly the same order
125 as they were created in the child, which is true for both synchronous
126 and asynchronous backends.
127
128 Note that the parentheses in the call to "AnyEvent::Fork::RPC::event"
129 are not optional. That is because the function isn't defined when the
130 code is compiled. You can make sure it is visible by pre-loading the
131 correct backend module in the call to "require":
132
133 ->require ("AnyEvent::Fork::RPC::Sync", "MyWorker")
134
135 Since the backend module declares the "event" function, loading it first
136 ensures that perl will correctly interpret calls to it.
137
138 And as a final remark, there is a fine module on CPAN that can
139 asynchronously "rmdir" and "unlink" and a lot more, and more efficiently
140 than this example, namely IO::AIO.
141
142 Example 1a: the same with the asynchronous backend
143 This example only shows what needs to be changed to use the async
144 backend instead. Doing this is not very useful, the purpose of this
145 example is to show the minimum amount of change that is required to go
146 from the synchronous to the asynchronous backend.
147
148 To use the async backend in the previous example, you need to add the
149 "async" parameter to the "AnyEvent::Fork::RPC::run" call:
150
151 ->AnyEvent::Fork::RPC::run ("MyWorker::run",
152 async => 1,
153 ...
154
155 And since the function call protocol is now changed, you need to adopt
156 "MyWorker::run" to the async API.
157
158 First, you need to accept the extra initial $done callback:
159
160 sub run {
161 my ($done, $cmd, $path) = @_;
162
163 And since a response is now generated when $done is called, as opposed
164 to when the function returns, we need to call the $done function with
165 the status:
166
167 $done->($status or (0, "$!"));
168
169 A few remarks are in order. First, it's quite pointless to use the async
170 backend for this example - but it *is* possible. Second, you can call
171 $done before or after returning from the function. Third, having both
172 returned from the function and having called the $done callback, the
173 child process may exit at any time, so you should call $done only when
174 you really *are* done.
175
176 Example 2: Asynchronous Backend
177 This example implements multiple count-downs in the child, using
178 AnyEvent timers. While this is a bit silly (one could use timers in te
179 parent just as well), it illustrates the ability to use AnyEvent in the
180 child and the fact that responses can arrive in a different order then
181 the requests.
182
183 It also shows how to embed the actual child code into a "__DATA__"
184 section, so it doesn't need any external files at all.
185
186 And when your parent process is often busy, and you have stricter timing
187 requirements, then running timers in a child process suddenly doesn't
188 look so silly anymore.
189
190 Without further ado, here is the code:
191
192 use AnyEvent;
193 use AnyEvent::Fork::RPC;
194
195 my $done = AE::cv;
196
197 my $rpc = AnyEvent::Fork
198 ->new
199 ->require ("AnyEvent::Fork::RPC::Async")
200 ->eval (do { local $/; <DATA> })
201 ->AnyEvent::Fork::RPC::run ("run",
202 async => 1,
203 on_error => sub { warn "FATAL: $_[0]"; exit 1 },
204 on_event => sub { print $_[0] },
205 on_destroy => $done,
206 );
207
208 for my $count (3, 2, 1) {
209 $rpc->($count, sub {
210 warn "job $count finished\n";
211 });
212 }
213
214 undef $rpc;
215
216 $done->recv;
217
218 __DATA__
219
220 # this ends up in main, as we don't use a package declaration
221
222 use AnyEvent;
223
224 sub run {
225 my ($done, $count) = @_;
226
227 my $n;
228
229 AnyEvent::Fork::RPC::event "starting to count up to $count\n";
230
231 my $w; $w = AE::timer 1, 1, sub {
232 ++$n;
233
234 AnyEvent::Fork::RPC::event "count $n of $count\n";
235
236 if ($n == $count) {
237 undef $w;
238 $done->();
239 }
240 };
241 }
242
243 The parent part (the one before the "__DATA__" section) isn't very
244 different from the earlier examples. It sets async mode, preloads the
245 backend module (so the "AnyEvent::Fork::RPC::event" function is
246 declared), uses a slightly different "on_event" handler (which we use
247 simply for logging purposes) and then, instead of loading a module with
248 the actual worker code, it "eval"'s the code from the data section in
249 the child process.
250
251 It then starts three countdowns, from 3 to 1 seconds downwards, destroys
252 the rpc object so the example finishes eventually, and then just waits
253 for the stuff to trickle in.
254
255 The worker code uses the event function to log some progress messages,
256 but mostly just creates a recurring one-second timer.
257
258 The timer callback increments a counter, logs a message, and eventually,
259 when the count has been reached, calls the finish callback.
260
261 On my system, this results in the following output. Since all timers
262 fire at roughly the same time, the actual order isn't guaranteed, but
263 the order shown is very likely what you would get, too.
264
265 starting to count up to 3
266 starting to count up to 2
267 starting to count up to 1
268 count 1 of 3
269 count 1 of 2
270 count 1 of 1
271 job 1 finished
272 count 2 of 2
273 job 2 finished
274 count 2 of 3
275 count 3 of 3
276 job 3 finished
277
278 While the overall ordering isn't guaranteed, the async backend still
279 guarantees that events and responses are delivered to the parent process
280 in the exact same ordering as they were generated in the child process.
281
282 And unless your system is *very* busy, it should clearly show that the
283 job started last will finish first, as it has the lowest count.
284
285 This concludes the async example. Since AnyEvent::Fork does not actually
286 fork, you are free to use about any module in the child, not just
287 AnyEvent, but also IO::AIO, or Tk for example.
288
289 PARENT PROCESS USAGE
290 This module exports nothing, and only implements a single function:
291
292 my $rpc = AnyEvent::Fork::RPC::run $fork, $function, [key => value...]
293 The traditional way to call it. But it is way cooler to call it in
294 the following way:
295
296 my $rpc = $fork->AnyEvent::Fork::RPC::run ($function, [key => value...])
297 This "run" function/method can be used in place of the
298 AnyEvent::Fork::run method. Just like that method, it takes over the
299 AnyEvent::Fork process, but instead of calling the specified
300 $function directly, it runs a server that accepts RPC calls and
301 handles responses.
302
303 It returns a function reference that can be used to call the
304 function in the child process, handling serialisation and data
305 transfers.
306
307 The following key/value pairs are allowed. It is recommended to have
308 at least an "on_error" or "on_event" handler set.
309
310 on_error => $cb->($msg)
311 Called on (fatal) errors, with a descriptive (hopefully)
312 message. If this callback is not provided, but "on_event" is,
313 then the "on_event" callback is called with the first argument
314 being the string "error", followed by the error message.
315
316 If neither handler is provided it prints the error to STDERR and
317 will start failing badly.
318
319 on_event => $cb->(...)
320 Called for every call to the "AnyEvent::Fork::RPC::event"
321 function in the child, with the arguments of that function
322 passed to the callback.
323
324 Also called on errors when no "on_error" handler is provided.
325
326 on_destroy => $cb->()
327 Called when the $rpc object has been destroyed and all requests
328 have been successfully handled. This is useful when you queue
329 some requests and want the child to go away after it has handled
330 them. The problem is that the parent must not exit either until
331 all requests have been handled, and this can be accomplished by
332 waiting for this callback.
333
334 init => $function (default none)
335 When specified (by name), this function is called in the child
336 as the very first thing when taking over the process, with all
337 the arguments normally passed to the "AnyEvent::Fork::run"
338 function, except the communications socket.
339
340 It can be used to do one-time things in the child such as
341 storing passed parameters or opening database connections.
342
343 It is called very early - before the serialisers are created or
344 the $function name is resolved into a function reference, so it
345 could be used to load any modules that provide the serialiser or
346 function. It can not, however, create events.
347
348 async => $boolean (default: 0)
349 The default server used in the child does all I/O blockingly,
350 and only allows a single RPC call to execute concurrently.
351
352 Setting "async" to a true value switches to another
353 implementation that uses AnyEvent in the child and allows
354 multiple concurrent RPC calls (it does not support recursion in
355 the event loop however, blocking condvar calls will fail).
356
357 The actual API in the child is documented in the section that
358 describes the calling semantics of the returned $rpc function.
359
360 If you want to pre-load the actual back-end modules to enable
361 memory sharing, then you should load "AnyEvent::Fork::RPC::Sync"
362 for synchronous, and "AnyEvent::Fork::RPC::Async" for
363 asynchronous mode.
364
365 If you use a template process and want to fork both sync and
366 async children, then it is permissible to load both modules.
367
368 serialiser => $string (default:
369 $AnyEvent::Fork::RPC::STRING_SERIALISER)
370 All arguments, result data and event data have to be serialised
371 to be transferred between the processes. For this, they have to
372 be frozen and thawed in both parent and child processes.
373
374 By default, only octet strings can be passed between the
375 processes, which is reasonably fast and efficient and requires
376 no extra modules.
377
378 For more complicated use cases, you can provide your own freeze
379 and thaw functions, by specifying a string with perl source
380 code. It's supposed to return two code references when
381 evaluated: the first receives a list of perl values and must
382 return an octet string. The second receives the octet string and
383 must return the original list of values.
384
385 If you need an external module for serialisation, then you can
386 either pre-load it into your AnyEvent::Fork process, or you can
387 add a "use" or "require" statement into the serialiser string.
388 Or both.
389
390 Here are some examples - some of them are also available as
391 global variables that make them easier to use.
392
393 octet strings - $AnyEvent::Fork::RPC::STRING_SERIALISER
394 This serialiser concatenates length-prefixes octet strings,
395 and is the default.
396
397 Implementation:
398
399 (
400 sub { pack "(w/a*)*", @_ },
401 sub { unpack "(w/a*)*", shift }
402 )
403
404 json - $AnyEvent::Fork::RPC::JSON_SERIALISER
405 This serialiser creates JSON arrays - you have to make sure
406 the JSON module is installed for this serialiser to work. It
407 can be beneficial for sharing when you preload the JSON
408 module in a template process.
409
410 JSON (with JSON::XS installed) is slower than the octet
411 string serialiser, but usually much faster than Storable,
412 unless big chunks of binary data need to be transferred.
413
414 Implementation:
415
416 use JSON ();
417 (
418 sub { JSON::encode_json \@_ },
419 sub { @{ JSON::decode_json shift } }
420 )
421
422 storable - $AnyEvent::Fork::RPC::STORABLE_SERIALISER
423 This serialiser uses Storable, which means it has high
424 chance of serialising just about anything you throw at it,
425 at the cost of having very high overhead per operation. It
426 also comes with perl.
427
428 Implementation:
429
430 use Storable ();
431 (
432 sub { Storable::freeze \@_ },
433 sub { @{ Storable::thaw shift } }
434 )
435
436 See the examples section earlier in this document for some actual
437 examples.
438
439 $rpc->(..., $cb->(...))
440 The RPC object returned by "AnyEvent::Fork::RPC::run" is actually a
441 code reference. There are two things you can do with it: call it,
442 and let it go out of scope (let it get destroyed).
443
444 If "async" was false when $rpc was created (the default), then, if
445 you call $rpc, the $function is invoked with all arguments passed to
446 $rpc except the last one (the callback). When the function returns,
447 the callback will be invoked with all the return values.
448
449 If "async" was true, then the $function receives an additional
450 initial argument, the result callback. In this case, returning from
451 $function does nothing - the function only counts as "done" when the
452 result callback is called, and any arguments passed to it are
453 considered the return values. This makes it possible to "return"
454 from event handlers or e.g. Coro threads.
455
456 The other thing that can be done with the RPC object is to destroy
457 it. In this case, the child process will execute all remaining RPC
458 calls, report their results, and then exit.
459
460 See the examples section earlier in this document for some actual
461 examples.
462
463 CHILD PROCESS USAGE
464 The following function is not available in this module. They are only
465 available in the namespace of this module when the child is running,
466 without having to load any extra modules. They are part of the
467 child-side API of AnyEvent::Fork::RPC.
468
469 AnyEvent::Fork::RPC::event ...
470 Send an event to the parent. Events are a bit like RPC calls made by
471 the child process to the parent, except that there is no notion of
472 return values.
473
474 See the examples section earlier in this document for some actual
475 examples.
476
477 ADVANCED TOPICS
478 Choosing a backend
479 So how do you decide which backend to use? Well, that's your problem to
480 solve, but here are some thoughts on the matter:
481
482 Synchronous
483 The synchronous backend does not rely on any external modules (well,
484 except common::sense, which works around a bug in how perl's warning
485 system works). This keeps the process very small, for example, on my
486 system, an empty perl interpreter uses 1492kB RSS, which becomes
487 2020kB after "use warnings; use strict" (for people who grew up with
488 C64s around them this is probably shocking every single time they
489 see it). The worker process in the first example in this document
490 uses 1792kB.
491
492 Since the calls are done synchronously, slow jobs will keep newer
493 jobs from executing.
494
495 The synchronous backend also has no overhead due to running an event
496 loop - reading requests is therefore very efficient, while writing
497 responses is less so, as every response results in a write syscall.
498
499 If the parent process is busy and a bit slow reading responses, the
500 child waits instead of processing further requests. This also limits
501 the amount of memory needed for buffering, as never more than one
502 response has to be buffered.
503
504 The API in the child is simple - you just have to define a function
505 that does something and returns something.
506
507 It's hard to use modules or code that relies on an event loop, as
508 the child cannot execute anything while it waits for more input.
509
510 Asynchronous
511 The asynchronous backend relies on AnyEvent, which tries to be
512 small, but still comes at a price: On my system, the worker from
513 example 1a uses 3420kB RSS (for AnyEvent, which loads EV, which
514 needs XSLoader which in turn loads a lot of other modules such as
515 warnings, strict, vars, Exporter...).
516
517 It batches requests and responses reasonably efficiently, doing only
518 as few reads and writes as needed, but needs to poll for events via
519 the event loop.
520
521 Responses are queued when the parent process is busy. This means the
522 child can continue to execute any queued requests. It also means
523 that a child might queue a lot of responses in memory when it
524 generates them and the parent process is slow accepting them.
525
526 The API is not a straightforward RPC pattern - you have to call a
527 "done" callback to pass return values and signal completion. Also,
528 more importantly, the API starts jobs as fast as possible - when
529 1000 jobs are queued and the jobs are slow, they will all run
530 concurrently. The child must implement some queueing/limiting
531 mechanism if this causes problems. Alternatively, the parent could
532 limit the amount of rpc calls that are outstanding.
533
534 Blocking use of condvars is not supported.
535
536 Using event-based modules such as IO::AIO, Gtk2, Tk and so on is
537 easy.
538
539 Passing file descriptors
540 Unlike AnyEvent::Fork, this module has no in-built file handle or file
541 descriptor passing abilities.
542
543 The reason is that passing file descriptors is extraordinary tricky
544 business, and conflicts with efficient batching of messages.
545
546 There still is a method you can use: Create a
547 "AnyEvent::Util::portable_socketpair" and "send_fh" one half of it to
548 the process before you pass control to "AnyEvent::Fork::RPC::run".
549
550 Whenever you want to pass a file descriptor, send an rpc request to the
551 child process (so it expects the descriptor), then send it over the
552 other half of the socketpair. The child should fetch the descriptor from
553 the half it has passed earlier.
554
555 Here is some (untested) pseudocode to that effect:
556
557 use AnyEvent::Util;
558 use AnyEvent::Fork::RPC;
559 use IO::FDPass;
560
561 my ($s1, $s2) = AnyEvent::Util::portable_socketpair;
562
563 my $rpc = AnyEvent::Fork
564 ->new
565 ->send_fh ($s2)
566 ->require ("MyWorker")
567 ->AnyEvent::Fork::RPC::run ("MyWorker::run"
568 init => "MyWorker::init",
569 );
570
571 undef $s2; # no need to keep it around
572
573 # pass an fd
574 $rpc->("i'll send some fd now, please expect it!", my $cv = AE::cv);
575
576 IO::FDPass fileno $s1, fileno $handle_to_pass;
577
578 $cv->recv;
579
580 The MyWorker module could look like this:
581
582 package MyWorker;
583
584 use IO::FDPass;
585
586 my $s2;
587
588 sub init {
589 $s2 = $_[0];
590 }
591
592 sub run {
593 if ($_[0] eq "i'll send some fd now, please expect it!") {
594 my $fd = IO::FDPass::recv fileno $s2;
595 ...
596 }
597 }
598
599 Of course, this might be blocking if you pass a lot of file descriptors,
600 so you might want to look into AnyEvent::FDpasser which can handle the
601 gory details.
602
603 EXCEPTIONS
604 There are no provisions whatsoever for catching exceptions at this time
605 - in the child, exeptions might kill the process, causing calls to be
606 lost and the parent encountering a fatal error. In the parent,
607 exceptions in the result callback will not be caught and cause undefined
608 behaviour.
609
610 SEE ALSO
611 AnyEvent::Fork, to create the processes in the first place.
612
613 AnyEvent::Fork::Pool, to manage whole pools of processes.
614
615 AUTHOR AND CONTACT INFORMATION
616 Marc Lehmann <schmorp@schmorp.de>
617 http://software.schmorp.de/pkg/AnyEvent-Fork-RPC
618