ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-MP/MP/Intro.pod
Revision: 1.41
Committed: Tue Oct 6 18:38:25 2009 UTC (14 years, 7 months ago) by root
Branch: MAIN
Changes since 1.40: +17 -6 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.4 =head1 Message Passing for the Non-Blocked Mind
2 elmex 1.1
3 root 1.8 =head1 Introduction and Terminology
4 elmex 1.1
5 root 1.4 This is a tutorial about how to get the swing of the new L<AnyEvent::MP>
6 root 1.23 module, which allows programs to transparently pass messages within the
7     process and to other processes on the same or a different host.
8 elmex 1.1
9 root 1.23 What kind of messages? Basically a message here means a list of Perl
10 root 1.15 strings, numbers, hashes and arrays, anything that can be expressed as a
11 root 1.23 L<JSON> text (as JSON is used by default in the protocol). Here are two
12     examples:
13 elmex 1.1
14 root 1.23 write_log => 1251555874, "action was successful.\n"
15     123, ["a", "b", "c"], { foo => "bar" }
16 elmex 1.21
17 root 1.23 When using L<AnyEvent::MP> it is customary to use a descriptive string as
18     first element of a message, that indictes the type of the message. This
19     element is called a I<tag> in L<AnyEvent::MP>, as some API functions
20     (C<rcv>) support matching it directly.
21    
22     Supposedly you want to send a ping message with your current time to
23     somewhere, this is how such a message might look like (in Perl syntax):
24    
25     ping => 1251381636
26    
27     Now that we know what a message is, to which entities are those
28     messages being I<passed>? They are I<passed> to I<ports>. A I<port> is
29     a destination for messages but also a context to execute code: when
30     a runtime error occurs while executing code belonging to a port, the
31     exception will be raised on the port and can even travel to interested
32     parties on other nodes, which makes supervision of distributed processes
33     easy.
34    
35     How do these ports relate to things you know? Each I<port> belongs
36     to a I<node>, and a I<node> is just the UNIX process that runs your
37     L<AnyEvent::MP> application.
38    
39     Each I<node> is distinguished from other I<nodes> running on the same or
40     another host in a network by its I<node ID>. A I<node ID> is simply a
41     unique string chosen manually or assigned by L<AnyEvent::MP> in some way
42     (UNIX nodename, random string...).
43    
44     Here is a diagram about how I<nodes>, I<ports> and UNIX processes relate
45     to each other. The setup consists of two nodes (more are of course
46     possible): Node C<A> (in UNIX process 7066) with the ports C<ABC> and
47     C<DEF>. And the node C<B> (in UNIX process 8321) with the ports C<FOO> and
48     C<BAR>.
49 elmex 1.17
50    
51     |- PID: 7066 -| |- PID: 8321 -|
52     | | | |
53     | Node ID: A | | Node ID: B |
54     | | | |
55     | Port ABC =|= <----\ /-----> =|= Port FOO |
56     | | X | |
57     | Port DEF =|= <----/ \-----> =|= Port BAR |
58     | | | |
59     |-------------| |-------------|
60    
61 root 1.23 The strings for the I<port IDs> here are just for illustrative
62     purposes: Even though I<ports> in L<AnyEvent::MP> are also identified by
63     strings, they can't be choosen manually and are assigned by the system
64     dynamically. These I<port IDs> are unique within a network and can also be
65     used to identify senders or as message tags for instance.
66    
67     The next sections will explain the API of L<AnyEvent::MP> by going through
68     a few simple examples. Later some more complex idioms are introduced,
69     which are hopefully useful to solve some real world problems.
70 root 1.8
71 root 1.39 =head2 Passing Your First Message
72 elmex 1.16
73 root 1.24 As a start lets have a look at the messaging API. The following example
74     is just a demo to show the basic elements of message passing with
75     L<AnyEvent::MP>.
76    
77     The example should print: C<Ending with: 123>, in a rather complicated
78     way, by passing some message to a port.
79 elmex 1.16
80     use AnyEvent;
81     use AnyEvent::MP;
82    
83     my $end_cv = AnyEvent->condvar;
84    
85     my $port = port;
86    
87     rcv $port, test => sub {
88     my ($data) = @_;
89     $end_cv->send ($data);
90     };
91    
92     snd $port, test => 123;
93    
94     print "Ending with: " . $end_cv->recv . "\n";
95    
96 root 1.24 It already uses most of the essential functions inside
97     L<AnyEvent::MP>: First there is the C<port> function which will create a
98     I<port> and will return it's I<port ID>, a simple string.
99    
100     This I<port ID> can be used to send messages to the port and install
101     handlers to receive messages on the port. Since it is a simple string
102     it can be safely passed to other I<nodes> in the network when you want
103     to refer to that specific port (usually used for RPC, where you need
104     to tell the other end which I<port> to send the reply to - messages in
105     L<AnyEvent::MP> have a destination, but no source).
106 elmex 1.17
107 root 1.24 The next function is C<rcv>:
108 elmex 1.16
109 elmex 1.17 rcv $port, test => sub { ... };
110 elmex 1.16
111 root 1.24 It installs a receiver callback on the I<port> that specified as the first
112     argument (it only works for "local" ports, i.e. ports created on the same
113     node). The next argument, in this example C<test>, specifies a I<tag> to
114     match. This means that whenever a message with the first element being
115     the string C<test> is received, the callback is called with the remaining
116 elmex 1.17 parts of that message.
117    
118 root 1.24 Messages can be sent with the C<snd> function, which is used like this in
119     the example above:
120 elmex 1.17
121     snd $port, test => 123;
122    
123 root 1.24 This will send the message C<'test', 123> to the I<port> with the I<port
124     ID> stored in C<$port>. Since in this case the receiver has a I<tag> match
125     on C<test> it will call the callback with the first argument being the
126     number C<123>.
127    
128     The callback is a typicall AnyEvent idiom: the callback just passes
129     that number on to the I<condition variable> C<$end_cv> which will then
130     pass the value to the print. Condition variables are out of the scope
131     of this tutorial and not often used with ports, so please consult the
132 elmex 1.17 L<AnyEvent::Intro> about them.
133    
134 root 1.24 Passing messages inside just one process is boring. Before we can move on
135     and do interprocess message passing we first have to make sure some things
136     have been set up correctly for our nodes to talk to each other.
137 elmex 1.17
138 root 1.39 =head2 System Requirements and System Setup
139 elmex 1.17
140 root 1.25 Before we can start with real IPC we have to make sure some things work on
141     your system.
142 elmex 1.17
143 root 1.25 First we have to setup a I<shared secret>: for two L<AnyEvent::MP>
144     I<nodes> to be able to communicate with each other over the network it is
145     necessary to setup the same I<shared secret> for both of them, so they can
146     prove their trustworthyness to each other.
147 elmex 1.17
148     The easiest way is to set this up is to use the F<aemp> utility:
149    
150     aemp gensecret
151    
152 root 1.25 This creates a F<$HOME/.perl-anyevent-mp> config file and generates a
153     random shared secret. You can copy this file to any other system and
154     then communicate over the network (via TCP) with it. You can also select
155     your own shared secret (F<aemp setsecret>) and for increased security
156     requirements you can even create (or configure) a TLS certificate (F<aemp
157     gencert>), causing connections to not just be securely authenticated, but
158     also to be encrypted and protected against tinkering.
159    
160     Connections will only be successfully established when the I<nodes>
161     that want to connect to each other have the same I<shared secret> (or
162     successfully verify the TLS certificate of the other side, in which case
163     no shared secret is required).
164 elmex 1.17
165     B<If something does not work as expected, and for example tcpdump shows
166     that the connections are closed almost immediately, you should make sure
167     that F<~/.perl-anyevent-mp> is the same on all hosts/user accounts that
168     you try to connect with each other!>
169 elmex 1.16
170 root 1.25 Thats is all for now, you will find some more advanced fiddling with the
171     C<aemp> utility later.
172    
173 root 1.35 =head2 Shooting the Trouble
174    
175     Sometimes things go wrong, and AnyEvent::MP, being a professional module,
176     does not gratitiously spill out messages to your screen.
177    
178     To help troubleshooting any issues, there are two environment variables
179     that you can set. The first, C<PERL_ANYEVENT_MP_WARNLEVEL> sets the
180     logging level. The default is C<5>, which means nothing much is
181     printed. Youc an increase it to C<8> or C<9> to get more verbose
182     output. This is example output when starting a node:
183    
184     2009-08-31 19:51:50 <8> node anon/5RloFvvYL8jfSScXNL8EpX starting up.
185     2009-08-31 19:51:50 <7> starting global service.
186     2009-08-31 19:51:50 <9> 10.0.0.17:4040 connected as ruth
187     2009-08-31 19:51:50 <7> ruth is up ()
188     2009-08-31 19:51:50 <9> ruth told us it knows about {"doom":["10.0.0.5:45143"],"rain":["10.0.0.19:4040"],"anon/4SYrtJ3ft5l1C16w2hto3t":["10.0.0.1:45920","[2002:58c6:438b:20:21d:60ff:fee8:6e36]:35788","[fd00::a00:1]:37104"],"frank":["10.0.0.18:4040"]}.
189     2009-08-31 19:51:50 <9> connecting to doom with [10.0.0.5:45143]
190     2009-08-31 19:51:50 <9> connecting to anon/4SYrtJ3ft5l1C16w2hto3t with [10.0.0.1:45920 [2002:58c6:438b:20:21d:60ff:fee8:6e36]:35788 [fd00::a00:1]:37104]
191     2009-08-31 19:51:50 <9> ruth told us its addresses (10.0.0.17:4040).
192    
193     A lot of info, but at least you can see that it does something.
194    
195     The other environment variable that can be useful is
196     C<PERL_ANYEVENT_MP_TRACE>, which, when set to a true value, will cause
197     most messages that are sent or received to be printed. In the above
198     example you would see something like:
199    
200     SND ruth <- ["addr",["10.0.0.1:49358","[2002:58c6:438b:20:21d:60ff:fee8:6e36]:58884","[fd00::a00:1]:45006"]]
201     RCV ruth -> ["","AnyEvent::MP::_spawn","20QA7cWubCLTWUhFgBKOx2.x","AnyEvent::MP::Global::connect",0,"ruth"]
202     RCV ruth -> ["","mon1","20QA7cWubCLTWUhFgBKOx2.x"]
203     RCV ruth -> ["20QA7cWubCLTWUhFgBKOx2.x","addr",["10.0.0.17:4040"]]
204     RCV ruth -> ["20QA7cWubCLTWUhFgBKOx2.x","nodes",{"doom":["10.0.0.5:45143"],"rain":["10.0.0.19:4040"],"anon/4SYrtJ3ft5l1C16w2hto3t":["10.0.0.1:45920","[2002:58c6:438b:20:21d:60ff:fee8:6e36]:35788","[fd00::a00:1]:37104"],"frank":["10.0.0.18:4040"]}]
205 elmex 1.18
206 root 1.30 =head1 PART 1: Passing Messages Between Processes
207 elmex 1.18
208     =head2 The Receiver
209    
210 root 1.25 Lets split the previous example up into two programs: one that contains
211     the sender and one for the receiver. First the receiver application, in
212     full:
213 elmex 1.18
214     use AnyEvent;
215     use AnyEvent::MP;
216     use AnyEvent::MP::Global;
217    
218 root 1.28 configure nodeid => "eg_receiver", binds => ["*:4040"];
219 elmex 1.18
220     my $port = port;
221    
222 root 1.40 grp_reg eg_receivers => $port;
223 elmex 1.18
224     rcv $port, test => sub {
225     my ($data, $reply_port) = @_;
226    
227     print "Received data: " . $data . "\n";
228     };
229    
230     AnyEvent->condvar->recv;
231    
232     =head3 AnyEvent::MP::Global
233    
234 root 1.25 Now, that wasn't too bad, was it? Ok, let's step through the new functions
235     and modules that have been used.
236    
237     For starters, there is now an additional module being
238     used: L<AnyEvent::MP::Global>. This module provides us with a I<global
239     registry>, which lets us register ports in groups that are visible on all
240     I<nodes> in a network.
241    
242     What is this useful for? Well, the I<port IDs> are random-looking strings,
243     assigned by L<AnyEvent::MP>. We cannot know those I<port IDs> in advance,
244     so we don't know which I<port ID> to send messages to, especially when the
245     message is to be passed between different I<nodes> (or UNIX processes). To
246     find the right I<port> of another I<node> in the network we will need
247     to communicate this somehow to the sender. And exactly that is what
248     L<AnyEvent::MP::Global> provides.
249    
250     Especially in larger, more anonymous networks this is handy: imagine you
251     have a few database backends, a few web frontends and some processing
252     distributed over a number of hosts: all of these would simply register
253     themselves in the appropriate group, and your web frontends can start to
254     find some database backend.
255 elmex 1.18
256 root 1.28 =head3 C<configure> and the Network
257 elmex 1.18
258 root 1.28 Now, let's have a look at the new function, C<configure>:
259 elmex 1.18
260 root 1.28 configure nodeid => "eg_receiver", binds => ["*:4040"];
261 elmex 1.18
262     Before we are able to send messages to other nodes we have to initialise
263 root 1.26 ourself to become a "distributed node". Initialising a node means naming
264     the node, optionally binding some TCP listeners so that other nodes can
265     contact it and connecting to a predefined set of seed addresses so the
266     node can discover the existing network - and the existing network can
267     discover the node!
268    
269 root 1.28 All of this (and more) can be passed to the C<configure> function - later
270     we will see how we can do all this without even passing anything to
271     C<configure>!
272    
273     The first parameter, C<nodeid>, specified the node ID (in this case
274     C<eg_receiver> - the default is to use the node name of the current host,
275     but for this example we want to be able to run many nodes on the same
276     machine). Node IDs need to be unique within the network and can be almost
277     any string - if you don't care, you can specify a node ID of C<anon/>
278     which will then be replaced by a random node name.
279    
280     The second parameter, C<binds>, specifies a list of C<address:port> pairs
281     to bind TCP listeners on. The special "address" of C<*> means to bind on
282     every local IP address.
283    
284     The reason to bind on a TCP port is not just that other nodes can connect
285     to us: if no binds are specified, the node will still bind on a dynamic
286     port on all local addresses - but in this case we won't know the port, and
287     cannot tell other nodes to connect to it as seed node.
288    
289     A I<seed> is a (fixed) TCP address of some other node in the network. To
290     explain the need for seeds we have to look at the topology of a typical
291     L<AnyEvent::MP> network. The topology is called a I<fully connected mesh>,
292     here an example with 4 nodes:
293 elmex 1.18
294     N1--N2
295     | \/ |
296     | /\ |
297     N3--N4
298    
299 root 1.28 Now imagine another node - C<N5> - wants to connect itself to that network:
300 elmex 1.18
301     N1--N2
302     | \/ | N5
303     | /\ |
304     N3--N4
305    
306 root 1.26 The new node needs to know the I<binds> of all nodes already
307     connected. Exactly this is what the I<seeds> are for: Let's assume that
308     the new node (C<N5>) uses the TCP address of the node C<N2> as seed. This
309     cuases it to connect to C<N2>:
310 elmex 1.18
311     N1--N2____
312     | \/ | N5
313     | /\ |
314     N3--N4
315    
316 root 1.26 C<N2> then tells C<N5> about the I<binds> of the other nodes it is
317     connected to, and C<N5> creates the rest of the connections:
318 elmex 1.18
319     /--------\
320     N1--N2____|
321     | \/ | N5
322     | /\ | /|
323     N3--N4--- |
324     \________/
325    
326 root 1.26 All done: C<N5> is now happily connected to the rest of the network.
327 elmex 1.18
328 root 1.28 Of course, this process takes time, during which the node is already
329     running. This also means it takes time until the node is fully connected,
330     and global groups and other information is available. The best way to deal
331     with this is to either retry regularly until you found the resource you
332     were looking for, or to only start services on demand after a node has
333     become available.
334 elmex 1.19
335 root 1.28 =head3 Registering the Receiver
336 elmex 1.19
337 root 1.27 Coming back to our example, we have now introduced the basic purpose of
338 root 1.28 L<AnyEvent::MP::Global> and C<configure> and its use of profiles. We
339 root 1.27 also set up our profiles for later use and now we will finally continue
340     talking about the receiver.
341 elmex 1.19
342 root 1.27 Let's look at the next line(s):
343 elmex 1.19
344     my $port = port;
345 root 1.40 grp_reg eg_receivers => $port;
346 elmex 1.19
347 root 1.27 The C<port> function has already been discussed. It simply creates a new
348 root 1.40 I<port> and returns the I<port ID>. The C<grp_reg> function, however, is
349     new: The first argument is the name of a I<global group>, and the second
350     argument is the I<port ID> to register in that group. group>.
351 elmex 1.19
352 root 1.27 You can choose the name of such a I<global group> freely (prefixing your
353 root 1.40 package name is I<highly recommended> however and might be enforce din
354     future versions!). The purpose of such a group is to store a set of port
355     IDs. This set is made available throughout the L<AnyEvent::MP> network,
356     so that each node can see which ports belong to that group.
357 root 1.27
358 root 1.40 Later we will see how the sender looks for the ports in this global
359     group to send messages to them.
360 root 1.27
361     The last step in the example is to set up a receiver callback for those
362     messages, just as was discussed in the first example. We again match
363     for the tag C<test>. The difference is that this time we don't exit the
364     application after receiving the first message. Instead we continue to wait
365     for new messages indefinitely.
366 elmex 1.19
367 elmex 1.20 =head2 The Sender
368 root 1.8
369 root 1.27 Ok, now let's take a look at the sender code:
370 root 1.4
371 elmex 1.1 use AnyEvent;
372     use AnyEvent::MP;
373 elmex 1.20 use AnyEvent::MP::Global;
374 elmex 1.1
375 root 1.28 configure nodeid => "eg_sender", seeds => ["*:4040"];
376 elmex 1.1
377 elmex 1.20 my $find_timer =
378     AnyEvent->timer (after => 0, interval => 1, cb => sub {
379 root 1.40 my $ports = grp_get "eg_receivers"
380 elmex 1.20 or return;
381    
382     snd $_, test => time
383     for @$ports;
384     });
385 elmex 1.1
386     AnyEvent->condvar->recv;
387    
388 root 1.28 It's even less code. The C<configure> serves the same purpose as in the
389     receiver, but instead of specifying binds we specify a list of seeds -
390     which happens to be the same as the binds used by the receiver, which
391     becomes our seed node.
392 root 1.10
393 root 1.27 Next we set up a timer that repeatedly (every second) calls this chunk of
394     code:
395 elmex 1.1
396 root 1.40 my $ports = grp_get "eg_receivers"
397 elmex 1.20 or return;
398 elmex 1.2
399 elmex 1.20 snd $_, test => time
400     for @$ports;
401 elmex 1.1
402 root 1.40 The only new function here is the C<grp_get> function of
403 root 1.27 L<AnyEvent::MP::Global>. It searches in the global group named
404     C<eg_receivers> for ports. If none are found, it returns C<undef>, which
405     makes our code return instantly and wait for the next round, as nobody is
406     interested in our message.
407    
408     As soon as the receiver application has connected and the information
409     about the newly added port in the receiver has propagated to the sender
410 root 1.40 node, C<grp_get> returns an array reference that contains the I<port ID> of
411 root 1.27 the receiver I<port(s)>.
412    
413     We then just send a message with a tag and the current time to every
414     I<port> in the global group.
415    
416 root 1.28 =head3 Splitting Network Configuration and Application Code
417    
418     Ok, so far, this works. In the real world, however, the person configuring
419     your application to run on a specific network (the end user or network
420     administrator) is often different to the person coding the application.
421    
422     Or to put it differently: the arguments passed to configure are usually
423 elmex 1.31 provided not by the programmer, but by whoever is deploying the program.
424 root 1.28
425     To make this easy, AnyEvent::MP supports a simple configuration database,
426     using profiles, which can be managed using the F<aemp> command-line
427 root 1.30 utility (yes, this section is about the advanced tinkering we mentioned
428     before).
429 root 1.28
430     When you change both programs above to simply call
431    
432     configure;
433    
434     then AnyEvent::MP tries to look up a profile using the current node name
435     in its configuration database, falling back to some global default.
436    
437     You can run "generic" nodes using the F<aemp> utility as well, and we will
438     exploit this in the following way: we configure a profile "seed" and run
439     a node using it, whose sole purpose is to be a seed node for our example
440     programs.
441    
442     We bind the seed node to port 4040 on all interfaces:
443    
444 root 1.29 aemp profile seed binds "*:4040"
445 root 1.28
446     And we configure all nodes to use this as seed node (this only works when
447     running on the same host, for multiple machines you would provide the IP
448 root 1.30 address or hostname of the node running the seed), and use a random name
449     (because we want to start multiple nodes on the same host):
450 root 1.28
451 root 1.30 aemp seeds "*:4040" nodeid anon/
452 root 1.28
453     Then we run the seed node:
454    
455     aemp run profile seed
456    
457     After that, we can start as many other nodes as we want, and they will all
458     use our generic seed node to discover each other.
459 root 1.27
460 root 1.28 In fact, starting many receivers nicely illustrates that the time sender
461     can have multiple receivers.
462 elmex 1.7
463 root 1.30 That's all for now - next we will teach you about monitoring by writing a
464     simple chat client and server :)
465    
466     =head1 PART 2: Monitoring, Supervising, Exception Handling and Recovery
467    
468     That's a mouthful, so what does it mean? Our previous example is what one
469     could call "very loosely coupled" - the sender doesn't care about whether
470     there are any receivers, and the receivers do not care if there is any
471     sender.
472    
473     This can work fine for simple services, but most real-world applications
474     want to ensure that the side they are expecting to be there is actually
475     there. Going one step further: most bigger real-world applications even
476     want to ensure that if some component is missing, or has crashed, it will
477     still be there, by recovering and restarting the service.
478    
479     AnyEvent::MP supports this by catching exceptions and network problems,
480     and notifying interested parties of this.
481    
482 root 1.41 =head2 Exceptions, Port Context, Network Errors and Monitors
483 root 1.30
484     =head3 Exceptions
485    
486     Exceptions are handled on a per-port basis: receive callbacks are executed
487 root 1.41 in a special context, the so-called I<port-context>: code that throws an
488     otherwise uncaught exception will cause the port to be C<kil>led. Killed
489     ports are destroyed automatically (killing ports is the only way to free
490     ports, incidentally).
491 root 1.30
492     Ports can be monitored, even from a different host, and when a port is
493     killed any entity monitoring it will be notified.
494    
495     Here is a simple example:
496    
497     use AnyEvent::MP;
498    
499     # create a port, it always dies
500     my $port = port { die "oops" };
501    
502     # monitor it
503     mon $port, sub {
504     warn "$port was killed (with reason @_)";
505     };
506    
507     # now send it some message, causing it to die:
508     snd $port;
509    
510     It first creates a port whose only action is to throw an exception,
511     and the monitors it with the C<mon> function. Afterwards it sends it a
512     message, causing it to die and call the monitoring callback:
513    
514     anon/6WmIpj.a was killed (with reason die oops at xxx line 5.) at xxx line 9.
515    
516     The callback was actually passed two arguments: C<die> (to indicate it did
517     throw an exception as opposed to, say, a network error) and the exception
518     message itself.
519    
520     What happens when a port is killed before we have a chance to monitor
521     it? Granted, this is highly unlikely in our example, but when you program
522     in a network this can easily happen due to races between nodes.
523    
524     use AnyEvent::MP;
525    
526     my $port = port { die "oops" };
527    
528     snd $port;
529    
530     mon $port, sub {
531     warn "$port was killed (with reason @_)";
532     };
533    
534     This time we will get something like:
535    
536     anon/zpX.a was killed (with reason no_such_port cannot monitor nonexistent port)
537    
538     Since the port was already gone, the kill reason is now C<no_such_port>
539     with some descriptive (we hope) error message.
540    
541     In fact, the kill reason is usually some identifier as first argument
542     and a human-readable error message as second argument, but can be about
543     anything (it's a list) or even nothing - which is called a "normal" kill.
544    
545     You can kill ports manually using the C<kil> function, which will be
546     treated like an error when any reason is specified:
547    
548     kil $port, custom_error => "don't like your steenking face";
549    
550     And a clean kill without any reason arguments:
551    
552     kil $port;
553    
554     By now you probably wonder what this "normal" kill business is: A common
555     idiom is to not specify a callback to C<mon>, but another port, such as
556     C<$SELF>:
557    
558     mon $port, $SELF;
559    
560     This basically means "monitor $port and kill me when it crashes". And a
561     "normal" kill does not count as a crash. This way you can easily link
562     ports together and make them crash together on errors (but allow you to
563     remove a port silently).
564    
565 root 1.34 =head3 Port Context
566    
567     When code runs in an environment where C<$SELF> contains its own port ID
568     and exceptions will be caught, it is said to run in a port context.
569    
570     Since AnyEvent::MP is event-based, it is not uncommon to register
571     callbacks from C<rcv> handlers. As example, assume that the port receive
572     handler wants to C<die> a second later, using C<after>:
573    
574     my $port = port {
575     after 1, sub { die "oops" };
576     };
577    
578     Then you will find it does not work - when the after callback is executed,
579     it does not run in port context anymore, so exceptions will not be caught.
580    
581 root 1.41 For these cases, AnyEvent::MP exports a special "closure constructor"
582     called C<psub>, which works just like perl's builtin C<sub>:
583 root 1.34
584     my $port = port {
585     after 1, psub { die "oops" };
586     };
587    
588     C<psub> stores C<$SELF> and returns a code reference. When the code
589     reference is invoked, it will run the code block within the context of
590     that port, so exception handling once more works as expected.
591    
592 root 1.41 There is also a way to temporarily execute code in the context of some
593     port, namely C<peval>:
594    
595     peval $port, sub {
596     # die'ing here will kil $port
597     };
598    
599     The C<peval> function temporarily replaces C<$SELF> by the given C<$port>
600     and then executes the given sub in a port context.
601    
602 root 1.30 =head3 Network Errors and the AEMP Guarantee
603    
604     I mentioned another important source of monitoring failures: network
605     problems. When a node loses connection to another node, it will invoke all
606     monitoring actions as if the port was killed, even if it is possible that
607 elmex 1.31 the port still lives happily on another node (not being able to talk to a
608 root 1.30 node means we have no clue what's going on with it, it could be crashed,
609     but also still running without knowing we lost the connection).
610    
611     So another way to view monitors is "notify me when some of my messages
612     couldn't be delivered". AEMP has a guarantee about message delivery to a
613     port: After starting a monitor, any message sent to a port will either
614     be delivered, or, when it is lost, any further messages will also be lost
615 elmex 1.31 until the monitoring action is invoked. After that, further messages
616 root 1.30 I<might> get delivered again.
617    
618     This doesn't sound like a very big guarantee, but it is kind of the best
619 elmex 1.31 you can get while staying sane: Specifically, it means that there will
620     be no "holes" in the message sequence: all messages sent are delivered
621 root 1.30 in order, without any missing in between, and when some were lost, you
622     I<will> be notified of that, so you can take recovery action.
623    
624     =head3 Supervising
625    
626     Ok, so what is this crashing-everything-stuff going to make applications
627     I<more> stable? Well in fact, the goal is not really to make them more
628     stable, but to make them more resilient against actual errors and
629     crashes. And this is not done by crashing I<everything>, but by crashing
630     everything except a supervisor.
631    
632 elmex 1.31 A supervisor is simply some code that ensures that an application (or a
633 root 1.30 part of it) is running, and if it crashes, is restarted properly.
634    
635     To show how to do all this we will create a simple chat server that can
636     handle many chat clients. Both server and clients can be killed and
637     restarted, and even crash, to some extent.
638    
639     =head2 Chatting, the Resilient Way
640    
641     Without further ado, here is the chat server (to run it, we assume the
642     set-up explained earlier, with a separate F<aemp run> seed node):
643    
644     use common::sense;
645     use AnyEvent::MP;
646     use AnyEvent::MP::Global;
647    
648     configure;
649    
650     my %clients;
651    
652     sub msg {
653     print "relaying: $_[0]\n";
654     snd $_, $_[0]
655     for values %clients;
656     }
657    
658     our $server = port;
659    
660     rcv $server, join => sub {
661     my ($client, $nick) = @_;
662    
663     $clients{$client} = $client;
664    
665     mon $client, sub {
666     delete $clients{$client};
667     msg "$nick (quits, @_)";
668     };
669     msg "$nick (joins)";
670     };
671    
672     rcv $server, privmsg => sub {
673     my ($nick, $msg) = @_;
674     msg "$nick: $msg";
675     };
676    
677 root 1.40 grp_reg eg_chat_server => $server;
678 root 1.30
679     warn "server ready.\n";
680    
681     AnyEvent->condvar->recv;
682    
683 elmex 1.31 Looks like a lot, but it is actually quite simple: after your usual
684 root 1.30 preamble (this time we use common sense), we define a helper function that
685     sends some message to every registered chat client:
686    
687     sub msg {
688     print "relaying: $_[0]\n";
689     snd $_, $_[0]
690     for values %clients;
691     }
692    
693     The clients are stored in the hash C<%client>. Then we define a server
694     port and install two receivers on it, C<join>, which is sent by clients
695     to join the chat, and C<privmsg>, that clients use to send actual chat
696     messages.
697    
698     C<join> is most complicated. It expects the client port and the nickname
699     to be passed in the message, and registers the client in C<%clients>.
700    
701     rcv $server, join => sub {
702     my ($client, $nick) = @_;
703    
704     $clients{$client} = $client;
705    
706     The next step is to monitor the client. The monitoring action removes the
707     client and sends a quit message with the error to all remaining clients.
708    
709     mon $client, sub {
710     delete $clients{$client};
711     msg "$nick (quits, @_)";
712     };
713    
714     And finally, it creates a join message and sends it to all clients.
715    
716     msg "$nick (joins)";
717     };
718    
719     The C<privmsg> callback simply broadcasts the message to all clients:
720    
721     rcv $server, privmsg => sub {
722     my ($nick, $msg) = @_;
723     msg "$nick: $msg";
724     };
725    
726 elmex 1.31 And finally, the server registers itself in the server group, so that
727 root 1.30 clients can find it:
728    
729 root 1.40 grp_reg eg_chat_server => $server;
730 root 1.30
731     Well, well... and where is this supervisor stuff? Well... we cheated,
732     it's not there. To not overcomplicate the example, we only put it into
733     the..... CLIENT!
734    
735     =head3 The Client, and a Supervisor!
736    
737     Again, here is the client, including supervisor, which makes it a bit
738     longer:
739    
740     use common::sense;
741     use AnyEvent::MP;
742     use AnyEvent::MP::Global;
743    
744     my $nick = shift;
745    
746     configure;
747    
748     my ($client, $server);
749    
750     sub server_connect {
751 root 1.40 my $servernodes = grp_get "eg_chat_server"
752 root 1.30 or return after 1, \&server_connect;
753    
754     print "\rconnecting...\n";
755    
756     $client = port { print "\r \r@_\n> " };
757     mon $client, sub {
758     print "\rdisconnected @_\n";
759     &server_connect;
760     };
761    
762     $server = $servernodes->[0];
763     snd $server, join => $client, $nick;
764     mon $server, $client;
765     }
766    
767     server_connect;
768    
769 root 1.34 my $w = AnyEvent->io (fh => 0, poll => 'r', cb => sub {
770 root 1.30 chomp (my $line = <STDIN>);
771     print "> ";
772     snd $server, privmsg => $nick, $line
773     if $server;
774     });
775    
776     $| = 1;
777     print "> ";
778     AnyEvent->condvar->recv;
779    
780     The first thing the client does is to store the nick name (which is
781     expected as the only command line argument) in C<$nick>, for further
782     usage.
783    
784     The next relevant thing is... finally... the supervisor:
785    
786     sub server_connect {
787 root 1.40 my $servernodes = grp_get "eg_chat_server"
788 root 1.30 or return after 1, \&server_connect;
789    
790     This looks up the server in the C<eg_chat_server> global group. If it
791     cannot find it (which is likely when the node is just starting up),
792     it will wait a second and then retry. This "wait a bit and retry"
793     is an important pattern, as distributed programming means lots of
794     things are going on asynchronously. In practise, one should use a more
795     intelligent algorithm, to possibly warn after an excessive number of
796     retries. Hopefully future versions of AnyEvent::MP will offer some
797     predefined supervisors, for now you will have to code it on your own.
798    
799     Next it creates a local port for the server to send messages to, and
800     monitors it. When the port is killed, it will print "disconnected" and
801     tell the supervisor function to retry again.
802    
803     $client = port { print "\r \r@_\n> " };
804     mon $client, sub {
805     print "\rdisconnected @_\n";
806     &server_connect;
807     };
808    
809     Then everything is ready: the client will send a C<join> message with it's
810     local port to the server, and start monitoring it:
811    
812     $server = $servernodes->[0];
813     snd $server, join => $client, $nick;
814     mon $server, $client;
815     }
816    
817     The monitor will ensure that if the server crashes or goes away, the
818     client will be killed as well. This tells the user that the client was
819     disconnected, and will then start to connect the server again.
820    
821     The rest of the program deals with the boring details of actually invoking
822     the supervisor function to start the whole client process and handle the
823     actual terminal input, sending it to the server.
824    
825 elmex 1.31 You should now try to start the server and one or more clients in different
826 root 1.30 terminal windows (and the seed node):
827    
828     perl eg/chat_client nick1
829     perl eg/chat_client nick2
830     perl eg/chat_server
831     aemp run profile seed
832    
833     And then you can experiment with chatting, killing one or more clients, or
834     stopping and restarting the server, to see the monitoring in action.
835    
836 root 1.33 The crucial point you should understand from this example is that
837     monitoring is usually symmetric: when you monitor some other port,
838     potentially on another node, that other port usually should monitor you,
839     too, so when the connection dies, both ports get killed, or at least both
840     sides can take corrective action. Exceptions are "servers" that serve
841     multiple clients at once and might only wish to clean up, and supervisors,
842     who of course should not normally get killed (unless they, too, have a
843     supervisor).
844    
845     If you often think in object-oriented terms, then treat a port as an
846     object, C<port> is the constructor, the receive callbacks set by C<rcv>
847     act as methods, the C<kil> function becomes the explicit destructor and
848     C<mon> installs a destructor hook. Unlike conventional object oriented
849     programming, it can make sense to exchange ports more freely (for example,
850     to monitor one port from another).
851    
852 root 1.30 There is ample room for improvement: the server should probably remember
853     the nickname in the C<join> handler instead of expecting it in every chat
854     message, it should probably monitor itself, and the client should not try
855     to send any messages unless a server is actually connected.
856    
857     =head1 PART 3: TIMTOWTDI: Virtual Connections
858    
859 root 1.34 The chat system developed in the previous sections is very "traditional"
860     in a way: you start some server(s) and some clients statically and they
861     start talking to each other.
862    
863     Sometimes applications work more like "services": They can run on almost
864     any node and talks to itself on other nodes. The L<AnyEvent::MP::Global>
865     service for example monitors nodes joining the network and starts itself
866     automatically on other nodes (if it isn't running already).
867    
868     A good way to design such applications is to put them into a module and
869     create "virtual connections" to other nodes - we call this the "bridge
870     head" method, because you start by creating a remote port (the bridge
871     head) and from that you start to bootstrap your application.
872    
873     Since that sounds rather theoretical, let's redesign the chat server and
874     client using this design method.
875    
876     Here is the server:
877    
878     use common::sense;
879     use AnyEvent::MP;
880     use AnyEvent::MP::Global;
881    
882     configure;
883    
884 root 1.40 grp_reg eg_chat_server2 => $NODE;
885 root 1.34
886     my %clients;
887    
888     sub msg {
889     print "relaying: $_[0]\n";
890     snd $_, $_[0]
891     for values %clients;
892     }
893    
894     sub client_connect {
895     my ($client, $nick) = @_;
896    
897     mon $client;
898     mon $client, sub {
899     delete $clients{$client};
900     msg "$nick (quits, @_)";
901     };
902    
903     $clients{$client} = $client;
904    
905     msg "$nick (joins)";
906    
907     rcv $SELF, sub { msg "$nick: $_[0]" };
908     }
909    
910     warn "server ready.\n";
911    
912     AnyEvent->condvar->recv;
913    
914 root 1.39 It starts out not much different then the previous example, except that
915     this time, we register the node port in the global group and not any port
916     we created - the clients only want to know which node the server should be
917     running on. In fact, they could also use some kind of election mechanism,
918     to find the node with lowest load or something like that.
919    
920     The more interesting change is that indeed no server port is created -
921     the server consists only of code, and "does" nothing by itself. All it
922     does is define a function C<client_connect>, which expects a client port
923     and a nick name as arguments. It then monitors the client port and binds
924     a receive callback on C<$SELF>, which expects messages that in turn are
925     broadcast to all clients.
926 root 1.34
927     The two C<mon> calls are a bit tricky - the first C<mon> is a shorthand
928     for C<mon $client, $SELF>. The second does the normal "client has gone
929     away" clean-up action. Both could actually be rolled into one C<mon>
930     action.
931    
932 root 1.39 C<$SELF> is a good hint that something interesting is going on. And
933     indeed, when looking at the client code, there is a new function,
934     C<spawn>:
935 root 1.34
936     use common::sense;
937     use AnyEvent::MP;
938     use AnyEvent::MP::Global;
939    
940     my $nick = shift;
941    
942     configure;
943    
944     $| = 1;
945    
946     my $port = port;
947    
948     my ($client, $server);
949    
950     sub server_connect {
951 root 1.40 my $servernodes = grp_get "eg_chat_server2"
952 root 1.34 or return after 1, \&server_connect;
953    
954     print "\rconnecting...\n";
955    
956     $client = port { print "\r \r@_\n> " };
957     mon $client, sub {
958     print "\rdisconnected @_\n";
959     &server_connect;
960     };
961    
962     $server = spawn $servernodes->[0], "::client_connect", $client, $nick;
963     mon $server, $client;
964     }
965    
966     server_connect;
967    
968     my $w = AnyEvent->io (fh => 0, poll => 'r', cb => sub {
969     chomp (my $line = <STDIN>);
970     print "> ";
971     snd $server, $line
972     if $server;
973     });
974    
975     print "> ";
976     AnyEvent->condvar->recv;
977    
978     The client is quite similar to the previous one, but instead of contacting
979 root 1.39 the server I<port> (which no longer exists), it C<spawn>s (creates) a new
980     the server I<port on node>:
981 root 1.34
982     $server = spawn $servernodes->[0], "::client_connect", $client, $nick;
983     mon $server, $client;
984    
985 root 1.39 And of course the first thing after creating it is monitoring it.
986 root 1.34
987 root 1.39 The C<spawn> function creates a new port on a remote node and returns
988     its port ID. After creating the port it calls a function on the remote
989     node, passing any remaining arguments to it, and - most importantly -
990     executes the function within the context of the new port, so it can be
991     manipulated by refering to C<$SELF>. The init function can reside in a
992     module (actually it normally I<should> reside in a module) - AnyEvent::MP
993     will automatically load the module if the function isn't defined.
994    
995     The C<spawn> function returns immediately, which means you can instantly
996 root 1.34 send messages to the port, long before the remote node has even heard
997     of our request to create a port on it. In fact, the remote node might
998     not even be running. Despite these troubling facts, everything should
999     work just fine: if the node isn't running (or the init function throws an
1000     exception), then the monitor will trigger because the port doesn't exist.
1001    
1002     If the spawn message gets delivered, but the monitoring message is not
1003 root 1.39 because of network problems (extremely unlikely, but monitoring, after
1004     all, is implemented by passing a message, and messages can get lost), then
1005     this connection loss will eventually trigger the monitoring action. On the
1006     remote node (which in return monitors the client) the port will also be
1007     cleaned up on connection loss. When the remote node comes up again and our
1008     monitoring message can be delivered, it will instantly fail because the
1009     port has been cleaned up in the meantime.
1010 root 1.34
1011     If your head is spinning by now, that's fine - just keep in mind, after
1012 root 1.39 creating a port, monitor it on the local node, and monitor "the other
1013     side" from the remote node, and all will be cleaned up just fine.
1014 root 1.34
1015 root 1.36 =head2 Services
1016 root 1.34
1017 root 1.36 Above it was mentioned that C<spawn> automatically loads modules, and this
1018     can be exploited in various ways.
1019    
1020     Assume for a moment you put the server into a file called
1021     F<mymod/chatserver.pm> reachable from the current directory. Then you
1022     could run a node there with:
1023    
1024     aemp run
1025    
1026     The other nodes could C<spawn> the server by using
1027     C<mymod::chatserver::client_connect> as init function.
1028    
1029     Likewise, when you have some service that starts automatically (similar to
1030     AnyEvent::MP::Global), then you can configure this service statically:
1031    
1032     aemp profile mysrvnode services mymod::service::
1033     aemp run profile mysrvnode
1034    
1035 root 1.39 And the module will automatically be loaded in the node, as specifying a
1036 root 1.38 module name (with C<::>-suffix) will simply load the module, which is then
1037     free to do whatever it wants.
1038 root 1.36
1039     Of course, you can also do it in the much more standard way by writing
1040     a module (e.g. C<BK::Backend::IRC>), installing it as part of a module
1041     distribution and then configure nodes, for example, if I want to run the
1042     Bummskraut IRC backend on a machine named "ruth", I could do this:
1043    
1044     aemp profile ruth addservice BK::Backend::IRC::
1045    
1046 root 1.39 And any F<aemp run> on that host will automaticllay have the bummskraut
1047 root 1.36 irc backend running.
1048    
1049     That's plenty of possibilities you can use - it's all up to you how you
1050     structure your application.
1051 elmex 1.7
1052 root 1.37 =head1 THE END
1053    
1054     This is the end of this introduction, but hopefully not the end of
1055     your career als AEMP user. I hope the tutorial was enough to make the
1056     basic concepts clear. Keep in mind that distributed programming is not
1057     completely trivial, that AnyEvent::MP is still in it's infancy, and I hope
1058     it will be useful to create exciting new applications.
1059    
1060 elmex 1.1 =head1 SEE ALSO
1061    
1062     L<AnyEvent::MP>
1063    
1064 elmex 1.20 L<AnyEvent::MP::Global>
1065    
1066 root 1.34 L<AnyEvent>
1067    
1068 elmex 1.1 =head1 AUTHOR
1069    
1070     Robin Redeker <elmex@ta-sa.org>
1071 root 1.32 Marc Lehmann <schmorp@schmorp.de>
1072 root 1.4