ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-Fork-RPC/RPC.pm
Revision: 1.13
Committed: Thu Apr 18 11:11:26 2013 UTC (11 years, 1 month ago) by root
Branch: MAIN
Changes since 1.12: +0 -1 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 AnyEvent::Fork::RPC - simple RPC extension for AnyEvent::Fork
4
5 =head1 SYNOPSIS
6
7 use AnyEvent::Fork::RPC;
8 # use AnyEvent::Fork is not needed
9
10 my $rpc = AnyEvent::Fork
11 ->new
12 ->require ("MyModule")
13 ->AnyEvent::Fork::RPC::run (
14 "MyModule::server",
15 );
16
17 my $cv = AE::cv;
18
19 $rpc->(1, 2, 3, sub {
20 print "MyModule::server returned @_\n";
21 $cv->send;
22 });
23
24 $cv->recv;
25
26 =head1 DESCRIPTION
27
28 This module implements a simple RPC protocol and backend for processes
29 created via L<AnyEvent::Fork>, allowing you to call a function in the
30 child process and receive its return values (up to 4GB serialised).
31
32 It implements two different backends: a synchronous one that works like a
33 normal function call, and an asynchronous one that can run multiple jobs
34 concurrently in the child, using AnyEvent.
35
36 It also implements an asynchronous event mechanism from the child to the
37 parent, that could be used for progress indications or other information.
38
39 Loading this module also always loads L<AnyEvent::Fork>, so you can make a
40 separate C<use AnyEvent::Fork> if you wish, but you don't have to.
41
42 =head1 EXAMPLES
43
44 =head2 Example 1: Synchronous Backend
45
46 Here is a simple example that implements a backend that executes C<unlink>
47 and C<rmdir> calls, and reports their status back. It also reports the
48 number of requests it has processed every three requests, which is clearly
49 silly, but illustrates the use of events.
50
51 First the parent process:
52
53 use AnyEvent;
54 use AnyEvent::Fork::RPC;
55
56 my $done = AE::cv;
57
58 my $rpc = AnyEvent::Fork
59 ->new
60 ->require ("MyWorker")
61 ->AnyEvent::Fork::RPC::run ("MyWorker::run",
62 on_error => sub { warn "FATAL: $_[0]"; exit 1 },
63 on_event => sub { warn "$_[0] requests handled\n" },
64 on_destroy => $done,
65 );
66
67 for my $id (1..6) {
68 $rpc->(rmdir => "/tmp/somepath/$id", sub {
69 $_[0]
70 or warn "/tmp/somepath/$id: $_[1]\n";
71 });
72 }
73
74 undef $rpc;
75
76 $done->recv;
77
78 The parent creates the process, queues a few rmdir's. It then forgets
79 about the C<$rpc> object, so that the child exits after it has handled the
80 requests, and then it waits till the requests have been handled.
81
82 The child is implemented using a separate module, C<MyWorker>, shown here:
83
84 package MyWorker;
85
86 my $count;
87
88 sub run {
89 my ($cmd, $path) = @_;
90
91 AnyEvent::Fork::RPC::event ($count)
92 unless ++$count % 3;
93
94 my $status = $cmd eq "rmdir" ? rmdir $path
95 : $cmd eq "unlink" ? unlink $path
96 : die "fatal error, illegal command '$cmd'";
97
98 $status or (0, "$!")
99 }
100
101 1
102
103 The C<run> function first sends a "progress" event every three calls, and
104 then executes C<rmdir> or C<unlink>, depending on the first parameter (or
105 dies with a fatal error - obviously, you must never let this happen :).
106
107 Eventually it returns the status value true if the command was successful,
108 or the status value 0 and the stringified error message.
109
110 On my system, running the first code fragment with the given
111 F<MyWorker.pm> in the current directory yields:
112
113 /tmp/somepath/1: No such file or directory
114 /tmp/somepath/2: No such file or directory
115 3 requests handled
116 /tmp/somepath/3: No such file or directory
117 /tmp/somepath/4: No such file or directory
118 /tmp/somepath/5: No such file or directory
119 6 requests handled
120 /tmp/somepath/6: No such file or directory
121
122 Obviously, none of the directories I am trying to delete even exist. Also,
123 the events and responses are processed in exactly the same order as
124 they were created in the child, which is true for both synchronous and
125 asynchronous backends.
126
127 Note that the parentheses in the call to C<AnyEvent::Fork::RPC::event> are
128 not optional. That is because the function isn't defined when the code is
129 compiled. You can make sure it is visible by pre-loading the correct
130 backend module in the call to C<require>:
131
132 ->require ("AnyEvent::Fork::RPC::Sync", "MyWorker")
133
134 Since the backend module declares the C<event> function, loading it first
135 ensures that perl will correctly interpret calls to it.
136
137 And as a final remark, there is a fine module on CPAN that can
138 asynchronously C<rmdir> and C<unlink> and a lot more, and more efficiently
139 than this example, namely L<IO::AIO>.
140
141 =head3 Example 1a: the same with the asynchronous backend
142
143 This example only shows what needs to be changed to use the async backend
144 instead. Doing this is not very useful, the purpose of this example is
145 to show the minimum amount of change that is required to go from the
146 synchronous to the asynchronous backend.
147
148 To use the async backend in the previous example, you need to add the
149 C<async> parameter to the C<AnyEvent::Fork::RPC::run> call:
150
151 ->AnyEvent::Fork::RPC::run ("MyWorker::run",
152 async => 1,
153 ...
154
155 And since the function call protocol is now changed, you need to adopt
156 C<MyWorker::run> to the async API.
157
158 First, you need to accept the extra initial C<$done> callback:
159
160 sub run {
161 my ($done, $cmd, $path) = @_;
162
163 And since a response is now generated when C<$done> is called, as opposed
164 to when the function returns, we need to call the C<$done> function with
165 the status:
166
167 $done->($status or (0, "$!"));
168
169 A few remarks are in order. First, it's quite pointless to use the async
170 backend for this example - but it I<is> possible. Second, you can call
171 C<$done> before or after returning from the function. Third, having both
172 returned from the function and having called the C<$done> callback, the
173 child process may exit at any time, so you should call C<$done> only when
174 you really I<are> done.
175
176 =head2 Example 2: Asynchronous Backend
177
178 This example implements multiple count-downs in the child, using
179 L<AnyEvent> timers. While this is a bit silly (one could use timers in te
180 parent just as well), it illustrates the ability to use AnyEvent in the
181 child and the fact that responses can arrive in a different order then the
182 requests.
183
184 It also shows how to embed the actual child code into a C<__DATA__>
185 section, so it doesn't need any external files at all.
186
187 And when your parent process is often busy, and you have stricter timing
188 requirements, then running timers in a child process suddenly doesn't look
189 so silly anymore.
190
191 Without further ado, here is the code:
192
193 use AnyEvent;
194 use AnyEvent::Fork::RPC;
195
196 my $done = AE::cv;
197
198 my $rpc = AnyEvent::Fork
199 ->new
200 ->require ("AnyEvent::Fork::RPC::Async")
201 ->eval (do { local $/; <DATA> })
202 ->AnyEvent::Fork::RPC::run ("run",
203 async => 1,
204 on_error => sub { warn "FATAL: $_[0]"; exit 1 },
205 on_event => sub { print $_[0] },
206 on_destroy => $done,
207 );
208
209 for my $count (3, 2, 1) {
210 $rpc->($count, sub {
211 warn "job $count finished\n";
212 });
213 }
214
215 undef $rpc;
216
217 $done->recv;
218
219 __DATA__
220
221 # this ends up in main, as we don't use a package declaration
222
223 use AnyEvent;
224
225 sub run {
226 my ($done, $count) = @_;
227
228 my $n;
229
230 AnyEvent::Fork::RPC::event "starting to count up to $count\n";
231
232 my $w; $w = AE::timer 1, 1, sub {
233 ++$n;
234
235 AnyEvent::Fork::RPC::event "count $n of $count\n";
236
237 if ($n == $count) {
238 undef $w;
239 $done->();
240 }
241 };
242 }
243
244 The parent part (the one before the C<__DATA__> section) isn't very
245 different from the earlier examples. It sets async mode, preloads
246 the backend module (so the C<AnyEvent::Fork::RPC::event> function is
247 declared), uses a slightly different C<on_event> handler (which we use
248 simply for logging purposes) and then, instead of loading a module with
249 the actual worker code, it C<eval>'s the code from the data section in the
250 child process.
251
252 It then starts three countdowns, from 3 to 1 seconds downwards, destroys
253 the rpc object so the example finishes eventually, and then just waits for
254 the stuff to trickle in.
255
256 The worker code uses the event function to log some progress messages, but
257 mostly just creates a recurring one-second timer.
258
259 The timer callback increments a counter, logs a message, and eventually,
260 when the count has been reached, calls the finish callback.
261
262 On my system, this results in the following output. Since all timers fire
263 at roughly the same time, the actual order isn't guaranteed, but the order
264 shown is very likely what you would get, too.
265
266 starting to count up to 3
267 starting to count up to 2
268 starting to count up to 1
269 count 1 of 3
270 count 1 of 2
271 count 1 of 1
272 job 1 finished
273 count 2 of 2
274 job 2 finished
275 count 2 of 3
276 count 3 of 3
277 job 3 finished
278
279 While the overall ordering isn't guaranteed, the async backend still
280 guarantees that events and responses are delivered to the parent process
281 in the exact same ordering as they were generated in the child process.
282
283 And unless your system is I<very> busy, it should clearly show that the
284 job started last will finish first, as it has the lowest count.
285
286 This concludes the async example. Since L<AnyEvent::Fork> does not
287 actually fork, you are free to use about any module in the child, not just
288 L<AnyEvent>, but also L<IO::AIO>, or L<Tk> for example.
289
290 =head1 PARENT PROCESS USAGE
291
292 This module exports nothing, and only implements a single function:
293
294 =over 4
295
296 =cut
297
298 package AnyEvent::Fork::RPC;
299
300 use common::sense;
301
302 use Errno ();
303 use Guard ();
304
305 use AnyEvent;
306 use AnyEvent::Fork; # we don't actually depend on it, this is for convenience
307
308 our $VERSION = 0.1;
309
310 =item my $rpc = AnyEvent::Fork::RPC::run $fork, $function, [key => value...]
311
312 The traditional way to call it. But it is way cooler to call it in the
313 following way:
314
315 =item my $rpc = $fork->AnyEvent::Fork::RPC::run ($function, [key => value...])
316
317 This C<run> function/method can be used in place of the
318 L<AnyEvent::Fork::run> method. Just like that method, it takes over
319 the L<AnyEvent::Fork> process, but instead of calling the specified
320 C<$function> directly, it runs a server that accepts RPC calls and handles
321 responses.
322
323 It returns a function reference that can be used to call the function in
324 the child process, handling serialisation and data transfers.
325
326 The following key/value pairs are allowed. It is recommended to have at
327 least an C<on_error> or C<on_event> handler set.
328
329 =over 4
330
331 =item on_error => $cb->($msg)
332
333 Called on (fatal) errors, with a descriptive (hopefully) message. If
334 this callback is not provided, but C<on_event> is, then the C<on_event>
335 callback is called with the first argument being the string C<error>,
336 followed by the error message.
337
338 If neither handler is provided it prints the error to STDERR and will
339 start failing badly.
340
341 =item on_event => $cb->(...)
342
343 Called for every call to the C<AnyEvent::Fork::RPC::event> function in the
344 child, with the arguments of that function passed to the callback.
345
346 Also called on errors when no C<on_error> handler is provided.
347
348 =item on_destroy => $cb->()
349
350 Called when the C<$rpc> object has been destroyed and all requests have
351 been successfully handled. This is useful when you queue some requests and
352 want the child to go away after it has handled them. The problem is that
353 the parent must not exit either until all requests have been handled, and
354 this can be accomplished by waiting for this callback.
355
356 =item init => $function (default none)
357
358 When specified (by name), this function is called in the child as the very
359 first thing when taking over the process, with all the arguments normally
360 passed to the C<AnyEvent::Fork::run> function, except the communications
361 socket.
362
363 It can be used to do one-time things in the child such as storing passed
364 parameters or opening database connections.
365
366 It is called very early - before the serialisers are created or the
367 C<$function> name is resolved into a function reference, so it could be
368 used to load any modules that provide the serialiser or function. It can
369 not, however, create events.
370
371 =item async => $boolean (default: 0)
372
373 The default server used in the child does all I/O blockingly, and only
374 allows a single RPC call to execute concurrently.
375
376 Setting C<async> to a true value switches to another implementation that
377 uses L<AnyEvent> in the child and allows multiple concurrent RPC calls.
378
379 The actual API in the child is documented in the section that describes
380 the calling semantics of the returned C<$rpc> function.
381
382 If you want to pre-load the actual back-end modules to enable memory
383 sharing, then you should load C<AnyEvent::Fork::RPC::Sync> for
384 synchronous, and C<AnyEvent::Fork::RPC::Async> for asynchronous mode.
385
386 If you use a template process and want to fork both sync and async
387 children, then it is permissible to load both modules.
388
389 =item serialiser => $string (default: '(sub { pack "(w/a*)*", @_ }, sub { unpack "(w/a*)*", shift })')
390
391 All arguments, result data and event data have to be serialised to be
392 transferred between the processes. For this, they have to be frozen and
393 thawed in both parent and child processes.
394
395 By default, only octet strings can be passed between the processes, which
396 is reasonably fast and efficient.
397
398 For more complicated use cases, you can provide your own freeze and thaw
399 functions, by specifying a string with perl source code. It's supposed to
400 return two code references when evaluated: the first receives a list of
401 perl values and must return an octet string. The second receives the octet
402 string and must return the original list of values.
403
404 If you need an external module for serialisation, then you can either
405 pre-load it into your L<AnyEvent::Fork> process, or you can add a C<use>
406 or C<require> statement into the serialiser string. Or both.
407
408 =back
409
410 See the examples section earlier in this document for some actual
411 examples.
412
413 =cut
414
415 our $STRING_SERIALISER = '(sub { pack "(w/a*)*", @_ }, sub { unpack "(w/a*)*", shift })';
416
417 sub run {
418 my ($self, $function, %arg) = @_;
419
420 my $serialiser = delete $arg{serialiser} || $STRING_SERIALISER;
421 my $on_event = delete $arg{on_event};
422 my $on_error = delete $arg{on_error};
423 my $on_destroy = delete $arg{on_destroy};
424
425 # default for on_error is to on_event, if specified
426 $on_error ||= $on_event
427 ? sub { $on_event->(error => shift) }
428 : sub { die "AnyEvent::Fork::RPC: uncaught error: $_[0].\n" };
429
430 # default for on_event is to raise an error
431 $on_event ||= sub { $on_error->("event received, but no on_event handler") };
432
433 my ($f, $t) = eval $serialiser; die $@ if $@;
434
435 my (@rcb, %rcb, $fh, $shutdown, $wbuf, $ww);
436 my ($rlen, $rbuf, $rw) = 512 - 16;
437
438 my $wcb = sub {
439 my $len = syswrite $fh, $wbuf;
440
441 unless (defined $len) {
442 if ($! != Errno::EAGAIN && $! != Errno::EWOULDBLOCK) {
443 undef $rw; undef $ww; # it ends here
444 $on_error->("$!");
445 }
446 }
447
448 substr $wbuf, 0, $len, "";
449
450 unless (length $wbuf) {
451 undef $ww;
452 $shutdown and shutdown $fh, 1;
453 }
454 };
455
456 my $module = "AnyEvent::Fork::RPC::" . ($arg{async} ? "Async" : "Sync");
457
458 $self->require ($module)
459 ->send_arg ($function, $arg{init}, $serialiser)
460 ->run ("$module\::run", sub {
461 $fh = shift;
462
463 my ($id, $len);
464 $rw = AE::io $fh, 0, sub {
465 $rlen = $rlen * 2 + 16 if $rlen - 128 < length $rbuf;
466 $len = sysread $fh, $rbuf, $rlen - length $rbuf, length $rbuf;
467
468 if ($len) {
469 while (8 <= length $rbuf) {
470 ($id, $len) = unpack "LL", $rbuf;
471 8 + $len <= length $rbuf
472 or last;
473
474 my @r = $t->(substr $rbuf, 8, $len);
475 substr $rbuf, 0, 8 + $len, "";
476
477 if ($id) {
478 if (@rcb) {
479 (shift @rcb)->(@r);
480 } elsif (my $cb = delete $rcb{$id}) {
481 $cb->(@r);
482 } else {
483 undef $rw; undef $ww;
484 $on_error->("unexpected data from child");
485 }
486 } else {
487 $on_event->(@r);
488 }
489 }
490 } elsif (defined $len) {
491 undef $rw; undef $ww; # it ends here
492
493 if (@rcb || %rcb) {
494 $on_error->("unexpected eof");
495 } else {
496 $on_destroy->();
497 }
498 } elsif ($! != Errno::EAGAIN && $! != Errno::EWOULDBLOCK) {
499 undef $rw; undef $ww; # it ends here
500 $on_error->("read: $!");
501 }
502 };
503
504 $ww ||= AE::io $fh, 1, $wcb;
505 });
506
507 my $guard = Guard::guard {
508 $shutdown = 1;
509 $ww ||= $fh && AE::io $fh, 1, $wcb;
510 };
511
512 my $id;
513
514 $arg{async}
515 ? sub {
516 $id = ($id == 0xffffffff ? 0 : $id) + 1;
517 $id = ($id == 0xffffffff ? 0 : $id) + 1 while exists $rcb{$id}; # rarely loops
518
519 $rcb{$id} = pop;
520
521 $guard; # keep it alive
522
523 $wbuf .= pack "LL/a*", $id, &$f;
524 $ww ||= $fh && AE::io $fh, 1, $wcb;
525 }
526 : sub {
527 push @rcb, pop;
528
529 $guard; # keep it alive
530
531 $wbuf .= pack "L/a*", &$f;
532 $ww ||= $fh && AE::io $fh, 1, $wcb;
533 }
534 }
535
536 =item $rpc->(..., $cb->(...))
537
538 The RPC object returned by C<AnyEvent::Fork::RPC::run> is actually a code
539 reference. There are two things you can do with it: call it, and let it go
540 out of scope (let it get destroyed).
541
542 If C<async> was false when C<$rpc> was created (the default), then, if you
543 call C<$rpc>, the C<$function> is invoked with all arguments passed to
544 C<$rpc> except the last one (the callback). When the function returns, the
545 callback will be invoked with all the return values.
546
547 If C<async> was true, then the C<$function> receives an additional
548 initial argument, the result callback. In this case, returning from
549 C<$function> does nothing - the function only counts as "done" when the
550 result callback is called, and any arguments passed to it are considered
551 the return values. This makes it possible to "return" from event handlers
552 or e.g. Coro threads.
553
554 The other thing that can be done with the RPC object is to destroy it. In
555 this case, the child process will execute all remaining RPC calls, report
556 their results, and then exit.
557
558 See the examples section earlier in this document for some actual
559 examples.
560
561 =back
562
563 =head1 CHILD PROCESS USAGE
564
565 The following function is not available in this module. They are only
566 available in the namespace of this module when the child is running,
567 without having to load any extra modules. They are part of the child-side
568 API of L<AnyEvent::Fork::RPC>.
569
570 =over 4
571
572 =item AnyEvent::Fork::RPC::event ...
573
574 Send an event to the parent. Events are a bit like RPC calls made by the
575 child process to the parent, except that there is no notion of return
576 values.
577
578 See the examples section earlier in this document for some actual
579 examples.
580
581 =back
582
583 =head1 ADVANCED TOPICS
584
585 =head2 Choosing a backend
586
587 So how do you decide which backend to use? Well, that's your problem to
588 solve, but here are some thoughts on the matter:
589
590 =over 4
591
592 =item Synchronous
593
594 The synchronous backend does not rely on any external modules (well,
595 except L<common::sense>, which works around a bug in how perl's warning
596 system works). This keeps the process very small, for example, on my
597 system, an empty perl interpreter uses 1492kB RSS, which becomes 2020kB
598 after C<use warnings; use strict> (for people who grew up with C64s around
599 them this is probably shocking every single time they see it). The worker
600 process in the first example in this document uses 1792kB.
601
602 Since the calls are done synchronously, slow jobs will keep newer jobs
603 from executing.
604
605 The synchronous backend also has no overhead due to running an event loop
606 - reading requests is therefore very efficient, while writing responses is
607 less so, as every response results in a write syscall.
608
609 If the parent process is busy and a bit slow reading responses, the child
610 waits instead of processing further requests. This also limits the amount
611 of memory needed for buffering, as never more than one response has to be
612 buffered.
613
614 The API in the child is simple - you just have to define a function that
615 does something and returns something.
616
617 It's hard to use modules or code that relies on an event loop, as the
618 child cannot execute anything while it waits for more input.
619
620 =item Asynchronous
621
622 The asynchronous backend relies on L<AnyEvent>, which tries to be small,
623 but still comes at a price: On my system, the worker from example 1a uses
624 3420kB RSS (for L<AnyEvent>, which loads L<EV>, which needs L<XSLoader>
625 which in turn loads a lot of other modules such as L<warnings>, L<strict>,
626 L<vars>, L<Exporter>...).
627
628 It batches requests and responses reasonably efficiently, doing only as
629 few reads and writes as needed, but needs to poll for events via the event
630 loop.
631
632 Responses are queued when the parent process is busy. This means the child
633 can continue to execute any queued requests. It also means that a child
634 might queue a lot of responses in memory when it generates them and the
635 parent process is slow accepting them.
636
637 The API is not a straightforward RPC pattern - you have to call a
638 "done" callback to pass return values and signal completion. Also, more
639 importantly, the API starts jobs as fast as possible - when 1000 jobs
640 are queued and the jobs are slow, they will all run concurrently. The
641 child must implement some queueing/limiting mechanism if this causes
642 problems. Alternatively, the parent could limit the amount of rpc calls
643 that are outstanding.
644
645 Using event-based modules such as L<IO::AIO>, L<Gtk2>, L<Tk> and so on is
646 easy.
647
648 =back
649
650 =head2 Passing file descriptors
651
652 Unlike L<AnyEvent::Fork>, this module has no in-built file handle or file
653 descriptor passing abilities.
654
655 The reason is that passing file descriptors is extraordinary tricky
656 business, and conflicts with efficient batching of messages.
657
658 There still is a method you can use: Create a
659 C<AnyEvent::Util::portable_socketpair> and C<send_fh> one half of it to
660 the process before you pass control to C<AnyEvent::Fork::RPC::run>.
661
662 Whenever you want to pass a file descriptor, send an rpc request to the
663 child process (so it expects the descriptor), then send it over the other
664 half of the socketpair. The child should fetch the descriptor from the
665 half it has passed earlier.
666
667 Here is some (untested) pseudocode to that effect:
668
669 use AnyEvent::Util;
670 use AnyEvent::Fork::RPC;
671 use IO::FDPass;
672
673 my ($s1, $s2) = AnyEvent::Util::portable_socketpair;
674
675 my $rpc = AnyEvent::Fork
676 ->new
677 ->send_fh ($s2)
678 ->require ("MyWorker")
679 ->AnyEvent::Fork::RPC::run ("MyWorker::run"
680 init => "MyWorker::init",
681 );
682
683 undef $s2; # no need to keep it around
684
685 # pass an fd
686 $rpc->("i'll send some fd now, please expect it!", my $cv = AE::cv);
687
688 IO::FDPass fileno $s1, fileno $handle_to_pass;
689
690 $cv->recv;
691
692 The MyWorker module could look like this:
693
694 package MyWorker;
695
696 use IO::FDPass;
697
698 my $s2;
699
700 sub init {
701 $s2 = $_[0];
702 }
703
704 sub run {
705 if ($_[0] eq "i'll send some fd now, please expect it!") {
706 my $fd = IO::FDPass::recv fileno $s2;
707 ...
708 }
709 }
710
711 Of course, this might be blocking if you pass a lot of file descriptors,
712 so you might want to look into L<AnyEvent::FDpasser> which can handle the
713 gory details.
714
715 =head1 SEE ALSO
716
717 L<AnyEvent::Fork> (to create the processes in the first place),
718 L<AnyEvent::Fork::Pool> (to manage whole pools of processes).
719
720 =head1 AUTHOR AND CONTACT INFORMATION
721
722 Marc Lehmann <schmorp@schmorp.de>
723 http://software.schmorp.de/pkg/AnyEvent-Fork-RPC
724
725 =cut
726
727 1
728