ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/AnyEvent-MP/MP/Intro.pod
Revision: 1.48
Committed: Tue Mar 6 13:33:53 2012 UTC (12 years, 3 months ago) by root
Branch: MAIN
Changes since 1.47: +25 -23 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.4 =head1 Message Passing for the Non-Blocked Mind
2 elmex 1.1
3 root 1.8 =head1 Introduction and Terminology
4 elmex 1.1
5 root 1.4 This is a tutorial about how to get the swing of the new L<AnyEvent::MP>
6 root 1.23 module, which allows programs to transparently pass messages within the
7     process and to other processes on the same or a different host.
8 elmex 1.1
9 root 1.23 What kind of messages? Basically a message here means a list of Perl
10 root 1.15 strings, numbers, hashes and arrays, anything that can be expressed as a
11 root 1.43 L<JSON> text (as JSON is the default serialiser in the protocol). Here are
12     two examples:
13 elmex 1.1
14 root 1.23 write_log => 1251555874, "action was successful.\n"
15     123, ["a", "b", "c"], { foo => "bar" }
16 elmex 1.21
17 root 1.23 When using L<AnyEvent::MP> it is customary to use a descriptive string as
18 root 1.46 first element of a message that indicates the type of the message. This
19 root 1.23 element is called a I<tag> in L<AnyEvent::MP>, as some API functions
20     (C<rcv>) support matching it directly.
21    
22     Supposedly you want to send a ping message with your current time to
23     somewhere, this is how such a message might look like (in Perl syntax):
24    
25     ping => 1251381636
26    
27     Now that we know what a message is, to which entities are those
28     messages being I<passed>? They are I<passed> to I<ports>. A I<port> is
29     a destination for messages but also a context to execute code: when
30     a runtime error occurs while executing code belonging to a port, the
31     exception will be raised on the port and can even travel to interested
32     parties on other nodes, which makes supervision of distributed processes
33     easy.
34    
35     How do these ports relate to things you know? Each I<port> belongs
36     to a I<node>, and a I<node> is just the UNIX process that runs your
37     L<AnyEvent::MP> application.
38    
39     Each I<node> is distinguished from other I<nodes> running on the same or
40     another host in a network by its I<node ID>. A I<node ID> is simply a
41     unique string chosen manually or assigned by L<AnyEvent::MP> in some way
42     (UNIX nodename, random string...).
43    
44     Here is a diagram about how I<nodes>, I<ports> and UNIX processes relate
45     to each other. The setup consists of two nodes (more are of course
46     possible): Node C<A> (in UNIX process 7066) with the ports C<ABC> and
47     C<DEF>. And the node C<B> (in UNIX process 8321) with the ports C<FOO> and
48     C<BAR>.
49 elmex 1.17
50    
51     |- PID: 7066 -| |- PID: 8321 -|
52     | | | |
53     | Node ID: A | | Node ID: B |
54     | | | |
55     | Port ABC =|= <----\ /-----> =|= Port FOO |
56     | | X | |
57     | Port DEF =|= <----/ \-----> =|= Port BAR |
58     | | | |
59     |-------------| |-------------|
60    
61 root 1.23 The strings for the I<port IDs> here are just for illustrative
62     purposes: Even though I<ports> in L<AnyEvent::MP> are also identified by
63 root 1.43 strings, they can't be chosen manually and are assigned by the system
64 root 1.23 dynamically. These I<port IDs> are unique within a network and can also be
65 root 1.46 used to identify senders, or even as message tags for instance.
66 root 1.23
67     The next sections will explain the API of L<AnyEvent::MP> by going through
68     a few simple examples. Later some more complex idioms are introduced,
69     which are hopefully useful to solve some real world problems.
70 root 1.8
71 root 1.39 =head2 Passing Your First Message
72 elmex 1.16
73 root 1.46 For starters, let's have a look at the messaging API. The following
74     example is just a demo to show the basic elements of message passing with
75 root 1.24 L<AnyEvent::MP>.
76    
77     The example should print: C<Ending with: 123>, in a rather complicated
78     way, by passing some message to a port.
79 elmex 1.16
80     use AnyEvent;
81     use AnyEvent::MP;
82    
83     my $end_cv = AnyEvent->condvar;
84    
85     my $port = port;
86    
87     rcv $port, test => sub {
88     my ($data) = @_;
89     $end_cv->send ($data);
90     };
91    
92     snd $port, test => 123;
93    
94     print "Ending with: " . $end_cv->recv . "\n";
95    
96 root 1.24 It already uses most of the essential functions inside
97 root 1.46 L<AnyEvent::MP>: First there is the C<port> function which creates a
98 root 1.24 I<port> and will return it's I<port ID>, a simple string.
99    
100     This I<port ID> can be used to send messages to the port and install
101     handlers to receive messages on the port. Since it is a simple string
102     it can be safely passed to other I<nodes> in the network when you want
103     to refer to that specific port (usually used for RPC, where you need
104     to tell the other end which I<port> to send the reply to - messages in
105     L<AnyEvent::MP> have a destination, but no source).
106 elmex 1.17
107 root 1.24 The next function is C<rcv>:
108 elmex 1.16
109 elmex 1.17 rcv $port, test => sub { ... };
110 elmex 1.16
111 root 1.24 It installs a receiver callback on the I<port> that specified as the first
112     argument (it only works for "local" ports, i.e. ports created on the same
113     node). The next argument, in this example C<test>, specifies a I<tag> to
114     match. This means that whenever a message with the first element being
115     the string C<test> is received, the callback is called with the remaining
116 elmex 1.17 parts of that message.
117    
118 root 1.24 Messages can be sent with the C<snd> function, which is used like this in
119     the example above:
120 elmex 1.17
121     snd $port, test => 123;
122    
123 root 1.24 This will send the message C<'test', 123> to the I<port> with the I<port
124     ID> stored in C<$port>. Since in this case the receiver has a I<tag> match
125     on C<test> it will call the callback with the first argument being the
126     number C<123>.
127    
128 root 1.43 The callback is a typical AnyEvent idiom: the callback just passes
129 root 1.24 that number on to the I<condition variable> C<$end_cv> which will then
130     pass the value to the print. Condition variables are out of the scope
131     of this tutorial and not often used with ports, so please consult the
132 elmex 1.17 L<AnyEvent::Intro> about them.
133    
134 root 1.24 Passing messages inside just one process is boring. Before we can move on
135     and do interprocess message passing we first have to make sure some things
136     have been set up correctly for our nodes to talk to each other.
137 elmex 1.17
138 root 1.39 =head2 System Requirements and System Setup
139 elmex 1.17
140 root 1.25 Before we can start with real IPC we have to make sure some things work on
141     your system.
142 elmex 1.17
143 root 1.25 First we have to setup a I<shared secret>: for two L<AnyEvent::MP>
144     I<nodes> to be able to communicate with each other over the network it is
145     necessary to setup the same I<shared secret> for both of them, so they can
146     prove their trustworthyness to each other.
147 elmex 1.17
148     The easiest way is to set this up is to use the F<aemp> utility:
149    
150     aemp gensecret
151    
152 root 1.25 This creates a F<$HOME/.perl-anyevent-mp> config file and generates a
153     random shared secret. You can copy this file to any other system and
154     then communicate over the network (via TCP) with it. You can also select
155     your own shared secret (F<aemp setsecret>) and for increased security
156     requirements you can even create (or configure) a TLS certificate (F<aemp
157     gencert>), causing connections to not just be securely authenticated, but
158     also to be encrypted and protected against tinkering.
159    
160     Connections will only be successfully established when the I<nodes>
161     that want to connect to each other have the same I<shared secret> (or
162     successfully verify the TLS certificate of the other side, in which case
163     no shared secret is required).
164 elmex 1.17
165     B<If something does not work as expected, and for example tcpdump shows
166     that the connections are closed almost immediately, you should make sure
167     that F<~/.perl-anyevent-mp> is the same on all hosts/user accounts that
168     you try to connect with each other!>
169 elmex 1.16
170 root 1.25 Thats is all for now, you will find some more advanced fiddling with the
171     C<aemp> utility later.
172    
173 root 1.35 =head2 Shooting the Trouble
174    
175     Sometimes things go wrong, and AnyEvent::MP, being a professional module,
176 root 1.43 does not gratuitously spill out messages to your screen.
177 root 1.35
178     To help troubleshooting any issues, there are two environment variables
179     that you can set. The first, C<PERL_ANYEVENT_MP_WARNLEVEL> sets the
180     logging level. The default is C<5>, which means nothing much is
181 root 1.43 printed. You can increase it to C<8> or C<9> to get more verbose
182 root 1.35 output. This is example output when starting a node:
183    
184 root 1.46 2012-03-04 19:41:10 <8> node cerebro starting up.
185     2012-03-04 19:41:10 <8> node listens on [10.0.0.1:4040].
186     2012-03-04 19:41:10 <9> trying connect to seed node 10.0.0.19:4040.
187     2012-03-04 19:41:10 <9> 10.0.0.19:4040 connected as rain
188     2012-03-04 19:41:10 <7> rain is up ()
189 root 1.35
190     A lot of info, but at least you can see that it does something.
191    
192     The other environment variable that can be useful is
193     C<PERL_ANYEVENT_MP_TRACE>, which, when set to a true value, will cause
194 root 1.46 most messages that are sent or received to be printed. For example, F<aemp
195     restart rijk> might output these message exchanges:
196 root 1.35
197 root 1.46 SND rijk <- [null,"eval","AnyEvent::Watchdog::Util::restart; ()","aemp/cerebro/z4kUPp2JT4#b"]
198     SND rain <- [null,"g_slave",{"'l":{"aemp/cerebro/z4kUPp2JT4":["10.0.0.1:48168"]}}]
199     SND rain <- [null,"g_find","rijk"]
200     RCV rain -> ["","g_found","rijk",["10.0.0.23:4040"]]
201     RCV rijk -> ["b",""]
202 elmex 1.18
203 root 1.30 =head1 PART 1: Passing Messages Between Processes
204 elmex 1.18
205     =head2 The Receiver
206    
207 root 1.25 Lets split the previous example up into two programs: one that contains
208     the sender and one for the receiver. First the receiver application, in
209     full:
210 elmex 1.18
211     use AnyEvent;
212     use AnyEvent::MP;
213    
214 root 1.45 configure nodeid => "eg_receiver/%u", binds => ["*:4040"];
215 elmex 1.18
216     my $port = port;
217 root 1.47 db_set eg_receivers => $port;
218 elmex 1.18
219     rcv $port, test => sub {
220     my ($data, $reply_port) = @_;
221    
222     print "Received data: " . $data . "\n";
223     };
224    
225     AnyEvent->condvar->recv;
226    
227     =head3 AnyEvent::MP::Global
228    
229 root 1.43 Now, that wasn't too bad, was it? OK, let's step through the new functions
230 root 1.47 that have been used.
231 elmex 1.18
232 root 1.44 =head3 C<configure> and Joining and Maintaining the Network
233 elmex 1.18
234 root 1.47 First let's have a look at C<configure>:
235 elmex 1.18
236 root 1.47 configure nodeid => "eg_receiver/%u", binds => ["*:4040"];
237 elmex 1.18
238     Before we are able to send messages to other nodes we have to initialise
239 root 1.26 ourself to become a "distributed node". Initialising a node means naming
240 root 1.47 the node and binding some TCP listeners so that other nodes can
241     contact it.
242    
243     Additionally, to actually link all nodes in a network together, you can
244     specify a number of seed addresses, which will be used by the node to
245     connect itself into an existing network, as we will see shortly.
246 root 1.26
247 root 1.28 All of this (and more) can be passed to the C<configure> function - later
248     we will see how we can do all this without even passing anything to
249     C<configure>!
250    
251     The first parameter, C<nodeid>, specified the node ID (in this case
252 root 1.47 C<eg_receiver/%u> - the default is to use the node name of the current
253     host plus C</%u>, which goves the node a name with a random suffix to
254     make it unique, but for this example we want the node to have a bit more
255     personality, and name it C<eg_receiver> with a random suffix.
256    
257     Why the random suffix? Node IDs need to be unique within the network and
258     appending a random suffix is the easiest way to do that.
259 root 1.28
260     The second parameter, C<binds>, specifies a list of C<address:port> pairs
261     to bind TCP listeners on. The special "address" of C<*> means to bind on
262 root 1.47 every local IP address (this might not work on every OS, so explicit IP
263     addresses are best).
264 root 1.28
265     The reason to bind on a TCP port is not just that other nodes can connect
266     to us: if no binds are specified, the node will still bind on a dynamic
267     port on all local addresses - but in this case we won't know the port, and
268     cannot tell other nodes to connect to it as seed node.
269    
270 root 1.47 Now, a I<seed> is simply the TCP address of some other node in the
271     network, often the same string as used for the C<binds> parameter of the
272     other node. The need for seeds is easy to explain: I<somehow> the nodes
273     of an aemp network have to find each other, and often this means over the
274     internet. So broadcasts are out.
275    
276     Instead, a node usually specifies the addresses of a few (for redundancy)
277     other nodes, some of which should be up. Two nodes can set each other as
278     seeds without any issues. You could even specify all nodes as seeds for
279     all nodes, for total redundancy. But the common case is to have some more
280     or less central, stable servers running seed services for other nodes.
281    
282     All you need to do to ensure that an AnyEvent::MP network connects
283     together is to make sure that all connections from nodes to their seed
284     nodes I<somehow> span the whole network. The simplest way to do that would
285     be for all nodes to specify a single node as seed node, and you would get
286     a star topology. If you specify all nodes as seed nodes, you get a fully
287     meshed network (that's what previous releases of AnyEvent::MP actually
288     did).
289    
290     A node tries to keep connections open to all of it's seed nodes at all
291     times, while other connections are made on demand only.
292    
293     All of this ensures that the network stays one network - even if all the
294     nodes in one half of the net are separated from the nodes in the other
295     half by some network problem, once that is over, they will eventually
296     become a single network again.
297    
298     In addition to creating the network, a node also expects the seed nodes to
299     run the shared database service - if need be, by automatically starting it,
300     so you don't normally need to configure this explicitly.
301    
302     #TODO# later?#d#
303     The process of joining a network takes time, during which the node
304     is already running. This means it takes time until the node is
305     fully connected, and information about services in the network are
306     available. This is why most AnyEvent::MP programs start by waiting a while
307     until the information they need is available.
308    
309     We will see how this is done later, in the sender program.
310 elmex 1.19
311 root 1.28 =head3 Registering the Receiver
312 elmex 1.19
313 root 1.47 Coming back to our example, after the node has been configured for network
314     access, it is time to publish some service, namely the receive service.
315 elmex 1.19
316 root 1.47 For that, let's look at the next lines:
317 elmex 1.19
318     my $port = port;
319 root 1.47 db_set eg_receivers => $port;
320 elmex 1.19
321 root 1.27 The C<port> function has already been discussed. It simply creates a new
322 root 1.47 I<port> and returns the I<port ID>. The C<db_reg> function, however, is
323     new: The first argument is the name of a I<database family> and the second
324     argument is the name of a I<subkey> within that family. The third argument
325     would be the I<value> to be associated with the family and subkey, but,
326     since it is missing, it will simply be C<undef>.
327    
328 root 1.48 OK, what's this weird talk about families you wonder - AnyEvent::MP comes
329 root 1.47 with a distributed database. This database runs on so-called "global"
330     nodes, which usually are the seed nodes of your network. The database
331     structure is "simply" a hash of hashes of values.
332    
333     In other words, if the database were stored in C<%DB>, then the C<db_set>
334     function more or less would do this:
335    
336     $DB{eg_receivers}{$port} = undef;
337    
338     So the ominous "family" selects a hash in the database, and the "subkey"
339     is simply the key in this hash. And C<db_set> very much works like an
340     assignment.
341    
342     The family namespace is shared by all nodes in a network, so the names
343     should be reasonably unique, for example, they could start with the name
344 root 1.48 of your module, or the name of the program, using your port name or node
345     name as subkey.
346 root 1.27
347 root 1.47 The purpose behind adding this key to the database is that the sender can
348     look it up and find our port. We will shortly see how.
349 root 1.27
350     The last step in the example is to set up a receiver callback for those
351     messages, just as was discussed in the first example. We again match
352     for the tag C<test>. The difference is that this time we don't exit the
353     application after receiving the first message. Instead we continue to wait
354     for new messages indefinitely.
355 elmex 1.19
356 elmex 1.20 =head2 The Sender
357 root 1.8
358 root 1.48 OK, now let's take a look at the sender code:
359 root 1.4
360 elmex 1.1 use AnyEvent;
361     use AnyEvent::MP;
362    
363 root 1.45 configure nodeid => "eg_sender/%u", seeds => ["*:4040"];
364 elmex 1.1
365 root 1.47 my $guard = db_mon eg_receivers => sub {
366     my ($family, $keys) = @_;
367     return unless %$family;
368    
369     # now there are some receivers, send them a message
370     snd $_ => test => time, keys %$family
371     for keys %$family;
372     };
373 elmex 1.1
374     AnyEvent->condvar->recv;
375    
376 root 1.28 It's even less code. The C<configure> serves the same purpose as in the
377 root 1.48 receiver, but instead of specifying binds we specify a list of seeds - the
378     only seed happens to be the same as the bind used by the receiver, which
379 root 1.47 therefore becomes our seed node.
380 root 1.27
381 root 1.48 Remember the part about having to wait till things become available? Well,
382     after configure returns, nothing has been done yet - the node is not
383     connected to the network, knows nothing about the database contents, and
384     it can take ages (for a computer :) for this situation to change.
385 root 1.47
386     Therefore, the sender waits, in this case by using the C<db_mon>
387     function. This function registers an interest in a specific database
388 root 1.48 family (in this case C<eg_receivers>). Each time something inside the
389     family changes (a key is added, changed or deleted), it will call our
390     callback with the family hash as first argument, and the list of keys as
391     second argument.
392    
393     The callback only checks whether the C<%$family> has is empty - if it is,
394     then it doesn't do anything. But eventually the family will contain the
395     port subkey we set in the sender. Then it will send a message to it (and
396     any other receiver in the same family). Likewise, should the receiver go
397     away and come back, or should another receiver come up, it will again send
398     a message to all of them.
399 root 1.47
400     You can experiment by having multiple receivers - you have to change the
401     "binds" parameter in the receiver to the seeds used in the sender to start
402     up additional receivers, but then you can start as many as you like. If
403     you specify proper IP addresses for the seeds, you can even run them on
404     different computers.
405    
406     Each time you start the sender, it will send a message to all receivers it
407 root 1.48 finds (you have to interrupt it manually afterwards).
408 root 1.47
409 root 1.48 Additional things you could try include using C<PERL_ANYEVENT_MP_TRACE=1>
410     to see which messages are exchanged, or starting the sender first and see
411     how long it takes it to find the receiver.
412 root 1.27
413 root 1.28 =head3 Splitting Network Configuration and Application Code
414    
415 root 1.47 #TODO#
416 root 1.43 OK, so far, this works. In the real world, however, the person configuring
417 root 1.28 your application to run on a specific network (the end user or network
418     administrator) is often different to the person coding the application.
419    
420     Or to put it differently: the arguments passed to configure are usually
421 elmex 1.31 provided not by the programmer, but by whoever is deploying the program.
422 root 1.28
423     To make this easy, AnyEvent::MP supports a simple configuration database,
424     using profiles, which can be managed using the F<aemp> command-line
425 root 1.30 utility (yes, this section is about the advanced tinkering we mentioned
426     before).
427 root 1.28
428     When you change both programs above to simply call
429    
430     configure;
431    
432     then AnyEvent::MP tries to look up a profile using the current node name
433     in its configuration database, falling back to some global default.
434    
435     You can run "generic" nodes using the F<aemp> utility as well, and we will
436     exploit this in the following way: we configure a profile "seed" and run
437     a node using it, whose sole purpose is to be a seed node for our example
438     programs.
439    
440     We bind the seed node to port 4040 on all interfaces:
441    
442 root 1.29 aemp profile seed binds "*:4040"
443 root 1.28
444     And we configure all nodes to use this as seed node (this only works when
445     running on the same host, for multiple machines you would provide the IP
446 root 1.30 address or hostname of the node running the seed), and use a random name
447     (because we want to start multiple nodes on the same host):
448 root 1.28
449 root 1.30 aemp seeds "*:4040" nodeid anon/
450 root 1.28
451     Then we run the seed node:
452    
453     aemp run profile seed
454    
455     After that, we can start as many other nodes as we want, and they will all
456     use our generic seed node to discover each other.
457 root 1.27
458 root 1.28 In fact, starting many receivers nicely illustrates that the time sender
459     can have multiple receivers.
460 elmex 1.7
461 root 1.30 That's all for now - next we will teach you about monitoring by writing a
462     simple chat client and server :)
463    
464     =head1 PART 2: Monitoring, Supervising, Exception Handling and Recovery
465    
466     That's a mouthful, so what does it mean? Our previous example is what one
467     could call "very loosely coupled" - the sender doesn't care about whether
468     there are any receivers, and the receivers do not care if there is any
469     sender.
470    
471     This can work fine for simple services, but most real-world applications
472     want to ensure that the side they are expecting to be there is actually
473     there. Going one step further: most bigger real-world applications even
474     want to ensure that if some component is missing, or has crashed, it will
475     still be there, by recovering and restarting the service.
476    
477     AnyEvent::MP supports this by catching exceptions and network problems,
478     and notifying interested parties of this.
479    
480 root 1.41 =head2 Exceptions, Port Context, Network Errors and Monitors
481 root 1.30
482     =head3 Exceptions
483    
484     Exceptions are handled on a per-port basis: receive callbacks are executed
485 root 1.41 in a special context, the so-called I<port-context>: code that throws an
486     otherwise uncaught exception will cause the port to be C<kil>led. Killed
487     ports are destroyed automatically (killing ports is the only way to free
488     ports, incidentally).
489 root 1.30
490     Ports can be monitored, even from a different host, and when a port is
491     killed any entity monitoring it will be notified.
492    
493     Here is a simple example:
494    
495     use AnyEvent::MP;
496    
497     # create a port, it always dies
498     my $port = port { die "oops" };
499    
500     # monitor it
501     mon $port, sub {
502     warn "$port was killed (with reason @_)";
503     };
504    
505     # now send it some message, causing it to die:
506     snd $port;
507    
508     It first creates a port whose only action is to throw an exception,
509     and the monitors it with the C<mon> function. Afterwards it sends it a
510     message, causing it to die and call the monitoring callback:
511    
512     anon/6WmIpj.a was killed (with reason die oops at xxx line 5.) at xxx line 9.
513    
514     The callback was actually passed two arguments: C<die> (to indicate it did
515     throw an exception as opposed to, say, a network error) and the exception
516     message itself.
517    
518     What happens when a port is killed before we have a chance to monitor
519     it? Granted, this is highly unlikely in our example, but when you program
520     in a network this can easily happen due to races between nodes.
521    
522     use AnyEvent::MP;
523    
524     my $port = port { die "oops" };
525    
526     snd $port;
527    
528     mon $port, sub {
529     warn "$port was killed (with reason @_)";
530     };
531    
532     This time we will get something like:
533    
534     anon/zpX.a was killed (with reason no_such_port cannot monitor nonexistent port)
535    
536     Since the port was already gone, the kill reason is now C<no_such_port>
537     with some descriptive (we hope) error message.
538    
539     In fact, the kill reason is usually some identifier as first argument
540     and a human-readable error message as second argument, but can be about
541     anything (it's a list) or even nothing - which is called a "normal" kill.
542    
543     You can kill ports manually using the C<kil> function, which will be
544     treated like an error when any reason is specified:
545    
546     kil $port, custom_error => "don't like your steenking face";
547    
548     And a clean kill without any reason arguments:
549    
550     kil $port;
551    
552     By now you probably wonder what this "normal" kill business is: A common
553     idiom is to not specify a callback to C<mon>, but another port, such as
554     C<$SELF>:
555    
556     mon $port, $SELF;
557    
558     This basically means "monitor $port and kill me when it crashes". And a
559     "normal" kill does not count as a crash. This way you can easily link
560     ports together and make them crash together on errors (but allow you to
561     remove a port silently).
562    
563 root 1.34 =head3 Port Context
564    
565     When code runs in an environment where C<$SELF> contains its own port ID
566     and exceptions will be caught, it is said to run in a port context.
567    
568     Since AnyEvent::MP is event-based, it is not uncommon to register
569     callbacks from C<rcv> handlers. As example, assume that the port receive
570     handler wants to C<die> a second later, using C<after>:
571    
572     my $port = port {
573     after 1, sub { die "oops" };
574     };
575    
576     Then you will find it does not work - when the after callback is executed,
577     it does not run in port context anymore, so exceptions will not be caught.
578    
579 root 1.41 For these cases, AnyEvent::MP exports a special "closure constructor"
580 root 1.43 called C<psub>, which works just like perls built-in C<sub>:
581 root 1.34
582     my $port = port {
583     after 1, psub { die "oops" };
584     };
585    
586     C<psub> stores C<$SELF> and returns a code reference. When the code
587     reference is invoked, it will run the code block within the context of
588     that port, so exception handling once more works as expected.
589    
590 root 1.41 There is also a way to temporarily execute code in the context of some
591     port, namely C<peval>:
592    
593     peval $port, sub {
594     # die'ing here will kil $port
595     };
596    
597     The C<peval> function temporarily replaces C<$SELF> by the given C<$port>
598     and then executes the given sub in a port context.
599    
600 root 1.30 =head3 Network Errors and the AEMP Guarantee
601    
602     I mentioned another important source of monitoring failures: network
603     problems. When a node loses connection to another node, it will invoke all
604     monitoring actions as if the port was killed, even if it is possible that
605 elmex 1.31 the port still lives happily on another node (not being able to talk to a
606 root 1.30 node means we have no clue what's going on with it, it could be crashed,
607     but also still running without knowing we lost the connection).
608    
609     So another way to view monitors is "notify me when some of my messages
610     couldn't be delivered". AEMP has a guarantee about message delivery to a
611     port: After starting a monitor, any message sent to a port will either
612     be delivered, or, when it is lost, any further messages will also be lost
613 elmex 1.31 until the monitoring action is invoked. After that, further messages
614 root 1.30 I<might> get delivered again.
615    
616     This doesn't sound like a very big guarantee, but it is kind of the best
617 elmex 1.31 you can get while staying sane: Specifically, it means that there will
618     be no "holes" in the message sequence: all messages sent are delivered
619 root 1.30 in order, without any missing in between, and when some were lost, you
620     I<will> be notified of that, so you can take recovery action.
621    
622     =head3 Supervising
623    
624     Ok, so what is this crashing-everything-stuff going to make applications
625     I<more> stable? Well in fact, the goal is not really to make them more
626     stable, but to make them more resilient against actual errors and
627     crashes. And this is not done by crashing I<everything>, but by crashing
628     everything except a supervisor.
629    
630 elmex 1.31 A supervisor is simply some code that ensures that an application (or a
631 root 1.30 part of it) is running, and if it crashes, is restarted properly.
632    
633     To show how to do all this we will create a simple chat server that can
634     handle many chat clients. Both server and clients can be killed and
635     restarted, and even crash, to some extent.
636    
637     =head2 Chatting, the Resilient Way
638    
639     Without further ado, here is the chat server (to run it, we assume the
640     set-up explained earlier, with a separate F<aemp run> seed node):
641    
642     use common::sense;
643     use AnyEvent::MP;
644     use AnyEvent::MP::Global;
645    
646     configure;
647    
648     my %clients;
649    
650     sub msg {
651     print "relaying: $_[0]\n";
652     snd $_, $_[0]
653     for values %clients;
654     }
655    
656     our $server = port;
657    
658     rcv $server, join => sub {
659     my ($client, $nick) = @_;
660    
661     $clients{$client} = $client;
662    
663     mon $client, sub {
664     delete $clients{$client};
665     msg "$nick (quits, @_)";
666     };
667     msg "$nick (joins)";
668     };
669    
670     rcv $server, privmsg => sub {
671     my ($nick, $msg) = @_;
672     msg "$nick: $msg";
673     };
674    
675 root 1.40 grp_reg eg_chat_server => $server;
676 root 1.30
677     warn "server ready.\n";
678    
679     AnyEvent->condvar->recv;
680    
681 elmex 1.31 Looks like a lot, but it is actually quite simple: after your usual
682 root 1.30 preamble (this time we use common sense), we define a helper function that
683     sends some message to every registered chat client:
684    
685     sub msg {
686     print "relaying: $_[0]\n";
687     snd $_, $_[0]
688     for values %clients;
689     }
690    
691     The clients are stored in the hash C<%client>. Then we define a server
692     port and install two receivers on it, C<join>, which is sent by clients
693     to join the chat, and C<privmsg>, that clients use to send actual chat
694     messages.
695    
696     C<join> is most complicated. It expects the client port and the nickname
697     to be passed in the message, and registers the client in C<%clients>.
698    
699     rcv $server, join => sub {
700     my ($client, $nick) = @_;
701    
702     $clients{$client} = $client;
703    
704     The next step is to monitor the client. The monitoring action removes the
705     client and sends a quit message with the error to all remaining clients.
706    
707     mon $client, sub {
708     delete $clients{$client};
709     msg "$nick (quits, @_)";
710     };
711    
712     And finally, it creates a join message and sends it to all clients.
713    
714     msg "$nick (joins)";
715     };
716    
717     The C<privmsg> callback simply broadcasts the message to all clients:
718    
719     rcv $server, privmsg => sub {
720     my ($nick, $msg) = @_;
721     msg "$nick: $msg";
722     };
723    
724 elmex 1.31 And finally, the server registers itself in the server group, so that
725 root 1.30 clients can find it:
726    
727 root 1.40 grp_reg eg_chat_server => $server;
728 root 1.30
729     Well, well... and where is this supervisor stuff? Well... we cheated,
730     it's not there. To not overcomplicate the example, we only put it into
731     the..... CLIENT!
732    
733     =head3 The Client, and a Supervisor!
734    
735     Again, here is the client, including supervisor, which makes it a bit
736     longer:
737    
738     use common::sense;
739     use AnyEvent::MP;
740     use AnyEvent::MP::Global;
741    
742     my $nick = shift;
743    
744     configure;
745    
746     my ($client, $server);
747    
748     sub server_connect {
749 root 1.40 my $servernodes = grp_get "eg_chat_server"
750 root 1.30 or return after 1, \&server_connect;
751    
752     print "\rconnecting...\n";
753    
754     $client = port { print "\r \r@_\n> " };
755     mon $client, sub {
756     print "\rdisconnected @_\n";
757     &server_connect;
758     };
759    
760     $server = $servernodes->[0];
761     snd $server, join => $client, $nick;
762     mon $server, $client;
763     }
764    
765     server_connect;
766    
767 root 1.34 my $w = AnyEvent->io (fh => 0, poll => 'r', cb => sub {
768 root 1.30 chomp (my $line = <STDIN>);
769     print "> ";
770     snd $server, privmsg => $nick, $line
771     if $server;
772     });
773    
774     $| = 1;
775     print "> ";
776     AnyEvent->condvar->recv;
777    
778     The first thing the client does is to store the nick name (which is
779     expected as the only command line argument) in C<$nick>, for further
780     usage.
781    
782     The next relevant thing is... finally... the supervisor:
783    
784     sub server_connect {
785 root 1.40 my $servernodes = grp_get "eg_chat_server"
786 root 1.30 or return after 1, \&server_connect;
787    
788     This looks up the server in the C<eg_chat_server> global group. If it
789     cannot find it (which is likely when the node is just starting up),
790     it will wait a second and then retry. This "wait a bit and retry"
791     is an important pattern, as distributed programming means lots of
792     things are going on asynchronously. In practise, one should use a more
793     intelligent algorithm, to possibly warn after an excessive number of
794     retries. Hopefully future versions of AnyEvent::MP will offer some
795     predefined supervisors, for now you will have to code it on your own.
796    
797     Next it creates a local port for the server to send messages to, and
798     monitors it. When the port is killed, it will print "disconnected" and
799     tell the supervisor function to retry again.
800    
801     $client = port { print "\r \r@_\n> " };
802     mon $client, sub {
803     print "\rdisconnected @_\n";
804     &server_connect;
805     };
806    
807     Then everything is ready: the client will send a C<join> message with it's
808     local port to the server, and start monitoring it:
809    
810     $server = $servernodes->[0];
811     snd $server, join => $client, $nick;
812     mon $server, $client;
813     }
814    
815     The monitor will ensure that if the server crashes or goes away, the
816     client will be killed as well. This tells the user that the client was
817     disconnected, and will then start to connect the server again.
818    
819     The rest of the program deals with the boring details of actually invoking
820     the supervisor function to start the whole client process and handle the
821     actual terminal input, sending it to the server.
822    
823 elmex 1.31 You should now try to start the server and one or more clients in different
824 root 1.30 terminal windows (and the seed node):
825    
826     perl eg/chat_client nick1
827     perl eg/chat_client nick2
828     perl eg/chat_server
829     aemp run profile seed
830    
831     And then you can experiment with chatting, killing one or more clients, or
832     stopping and restarting the server, to see the monitoring in action.
833    
834 root 1.33 The crucial point you should understand from this example is that
835     monitoring is usually symmetric: when you monitor some other port,
836     potentially on another node, that other port usually should monitor you,
837     too, so when the connection dies, both ports get killed, or at least both
838     sides can take corrective action. Exceptions are "servers" that serve
839     multiple clients at once and might only wish to clean up, and supervisors,
840     who of course should not normally get killed (unless they, too, have a
841     supervisor).
842    
843     If you often think in object-oriented terms, then treat a port as an
844     object, C<port> is the constructor, the receive callbacks set by C<rcv>
845     act as methods, the C<kil> function becomes the explicit destructor and
846     C<mon> installs a destructor hook. Unlike conventional object oriented
847     programming, it can make sense to exchange ports more freely (for example,
848     to monitor one port from another).
849    
850 root 1.30 There is ample room for improvement: the server should probably remember
851     the nickname in the C<join> handler instead of expecting it in every chat
852     message, it should probably monitor itself, and the client should not try
853     to send any messages unless a server is actually connected.
854    
855     =head1 PART 3: TIMTOWTDI: Virtual Connections
856    
857 root 1.34 The chat system developed in the previous sections is very "traditional"
858     in a way: you start some server(s) and some clients statically and they
859     start talking to each other.
860    
861     Sometimes applications work more like "services": They can run on almost
862     any node and talks to itself on other nodes. The L<AnyEvent::MP::Global>
863     service for example monitors nodes joining the network and starts itself
864     automatically on other nodes (if it isn't running already).
865    
866     A good way to design such applications is to put them into a module and
867     create "virtual connections" to other nodes - we call this the "bridge
868     head" method, because you start by creating a remote port (the bridge
869     head) and from that you start to bootstrap your application.
870    
871     Since that sounds rather theoretical, let's redesign the chat server and
872     client using this design method.
873    
874     Here is the server:
875    
876     use common::sense;
877     use AnyEvent::MP;
878     use AnyEvent::MP::Global;
879    
880     configure;
881    
882 root 1.40 grp_reg eg_chat_server2 => $NODE;
883 root 1.34
884     my %clients;
885    
886     sub msg {
887     print "relaying: $_[0]\n";
888     snd $_, $_[0]
889     for values %clients;
890     }
891    
892     sub client_connect {
893     my ($client, $nick) = @_;
894    
895     mon $client;
896     mon $client, sub {
897     delete $clients{$client};
898     msg "$nick (quits, @_)";
899     };
900    
901     $clients{$client} = $client;
902    
903     msg "$nick (joins)";
904    
905     rcv $SELF, sub { msg "$nick: $_[0]" };
906     }
907    
908     warn "server ready.\n";
909    
910     AnyEvent->condvar->recv;
911    
912 root 1.39 It starts out not much different then the previous example, except that
913     this time, we register the node port in the global group and not any port
914     we created - the clients only want to know which node the server should be
915     running on. In fact, they could also use some kind of election mechanism,
916     to find the node with lowest load or something like that.
917    
918     The more interesting change is that indeed no server port is created -
919     the server consists only of code, and "does" nothing by itself. All it
920     does is define a function C<client_connect>, which expects a client port
921     and a nick name as arguments. It then monitors the client port and binds
922     a receive callback on C<$SELF>, which expects messages that in turn are
923     broadcast to all clients.
924 root 1.34
925     The two C<mon> calls are a bit tricky - the first C<mon> is a shorthand
926     for C<mon $client, $SELF>. The second does the normal "client has gone
927     away" clean-up action. Both could actually be rolled into one C<mon>
928     action.
929    
930 root 1.39 C<$SELF> is a good hint that something interesting is going on. And
931     indeed, when looking at the client code, there is a new function,
932     C<spawn>:
933 root 1.34
934     use common::sense;
935     use AnyEvent::MP;
936     use AnyEvent::MP::Global;
937    
938     my $nick = shift;
939    
940     configure;
941    
942     $| = 1;
943    
944     my $port = port;
945    
946     my ($client, $server);
947    
948     sub server_connect {
949 root 1.40 my $servernodes = grp_get "eg_chat_server2"
950 root 1.34 or return after 1, \&server_connect;
951    
952     print "\rconnecting...\n";
953    
954     $client = port { print "\r \r@_\n> " };
955     mon $client, sub {
956     print "\rdisconnected @_\n";
957     &server_connect;
958     };
959    
960     $server = spawn $servernodes->[0], "::client_connect", $client, $nick;
961     mon $server, $client;
962     }
963    
964     server_connect;
965    
966     my $w = AnyEvent->io (fh => 0, poll => 'r', cb => sub {
967     chomp (my $line = <STDIN>);
968     print "> ";
969     snd $server, $line
970     if $server;
971     });
972    
973     print "> ";
974     AnyEvent->condvar->recv;
975    
976     The client is quite similar to the previous one, but instead of contacting
977 root 1.39 the server I<port> (which no longer exists), it C<spawn>s (creates) a new
978     the server I<port on node>:
979 root 1.34
980     $server = spawn $servernodes->[0], "::client_connect", $client, $nick;
981     mon $server, $client;
982    
983 root 1.39 And of course the first thing after creating it is monitoring it.
984 root 1.34
985 root 1.39 The C<spawn> function creates a new port on a remote node and returns
986     its port ID. After creating the port it calls a function on the remote
987     node, passing any remaining arguments to it, and - most importantly -
988     executes the function within the context of the new port, so it can be
989 root 1.43 manipulated by referring to C<$SELF>. The init function can reside in a
990 root 1.39 module (actually it normally I<should> reside in a module) - AnyEvent::MP
991     will automatically load the module if the function isn't defined.
992    
993     The C<spawn> function returns immediately, which means you can instantly
994 root 1.34 send messages to the port, long before the remote node has even heard
995     of our request to create a port on it. In fact, the remote node might
996     not even be running. Despite these troubling facts, everything should
997     work just fine: if the node isn't running (or the init function throws an
998     exception), then the monitor will trigger because the port doesn't exist.
999    
1000     If the spawn message gets delivered, but the monitoring message is not
1001 root 1.39 because of network problems (extremely unlikely, but monitoring, after
1002     all, is implemented by passing a message, and messages can get lost), then
1003     this connection loss will eventually trigger the monitoring action. On the
1004     remote node (which in return monitors the client) the port will also be
1005     cleaned up on connection loss. When the remote node comes up again and our
1006     monitoring message can be delivered, it will instantly fail because the
1007     port has been cleaned up in the meantime.
1008 root 1.34
1009     If your head is spinning by now, that's fine - just keep in mind, after
1010 root 1.39 creating a port, monitor it on the local node, and monitor "the other
1011     side" from the remote node, and all will be cleaned up just fine.
1012 root 1.34
1013 root 1.36 =head2 Services
1014 root 1.34
1015 root 1.36 Above it was mentioned that C<spawn> automatically loads modules, and this
1016     can be exploited in various ways.
1017    
1018     Assume for a moment you put the server into a file called
1019     F<mymod/chatserver.pm> reachable from the current directory. Then you
1020     could run a node there with:
1021    
1022     aemp run
1023    
1024     The other nodes could C<spawn> the server by using
1025     C<mymod::chatserver::client_connect> as init function.
1026    
1027     Likewise, when you have some service that starts automatically (similar to
1028     AnyEvent::MP::Global), then you can configure this service statically:
1029    
1030     aemp profile mysrvnode services mymod::service::
1031     aemp run profile mysrvnode
1032    
1033 root 1.39 And the module will automatically be loaded in the node, as specifying a
1034 root 1.38 module name (with C<::>-suffix) will simply load the module, which is then
1035     free to do whatever it wants.
1036 root 1.36
1037     Of course, you can also do it in the much more standard way by writing
1038     a module (e.g. C<BK::Backend::IRC>), installing it as part of a module
1039     distribution and then configure nodes, for example, if I want to run the
1040     Bummskraut IRC backend on a machine named "ruth", I could do this:
1041    
1042     aemp profile ruth addservice BK::Backend::IRC::
1043    
1044 root 1.43 And any F<aemp run> on that host will automatically have the Bummskraut
1045     IRC backend running.
1046 root 1.36
1047     That's plenty of possibilities you can use - it's all up to you how you
1048     structure your application.
1049 elmex 1.7
1050 root 1.42 =head1 PART 4: Coro::MP - selective receive
1051    
1052     Not all problems lend themselves naturally to an event-based solution:
1053     sometimes things are easier if you can decide in what order you want to
1054     receive messages, irregardless of the order in which they were sent.
1055    
1056     In these cases, L<Coro::MP> can provide a nice solution: instead of
1057     registering callbacks for each message type, C<Coro::MP> attached a
1058     (coro-) thread to a port. The thread can then opt to selectively receive
1059     messages it is interested in. Other messages are not lost, but queued, and
1060     can be received at a later time.
1061    
1062 root 1.43 The C<Coro::MP> module is not part of L<AnyEvent::MP>, but a separate
1063 root 1.42 module. It is, however, tightly integrated into C<AnyEvent::MP> - the
1064     ports it creates are fully compatible to C<AnyEvent::MP> ports.
1065    
1066     In fact, C<Coro::MP> is more of an extension than a separate module: all
1067     functions exported by C<AnyEvent::MP> are exported by it as well.
1068    
1069     To illustrate how programing with C<Coro::MP> looks like, consider the
1070     following (slightly contrived) example: Let's implement a server that
1071     accepts a C<< (write_file =>, $port, $path) >> message with a (source)
1072     port and a filename, followed by as many C<< (data => $port, $data) >>
1073     messages as required to fill the file, followed by an empty C<< (data =>
1074     $port) >> message.
1075    
1076     The server only writes a single file at a time, other requests will stay
1077     in the queue until the current file has been finished.
1078    
1079     Here is an example implementation that uses L<Coro::AIO> and largely
1080     ignores error handling:
1081    
1082     my $ioserver = port_async {
1083     while () {
1084     my ($tag, $port, $path) = get_cond;
1085    
1086     $tag eq "write_file"
1087     or die "only write_file messages expected";
1088    
1089     my $fh = aio_open $path, O_WRONLY|O_CREAT, 0666
1090     or die "$path: $!";
1091    
1092     while () {
1093     my (undef, undef, $data) = get_cond {
1094     $_[0] eq "data" && $_[1] eq $port
1095     } 5
1096     or die "timeout waiting for data message from $port\n";
1097    
1098     length $data or last;
1099    
1100     aio_write $fh, undef, undef, $data, 0;
1101     };
1102     }
1103     };
1104    
1105     mon $ioserver, sub {
1106     warn "ioserver was killed: @_\n";
1107     };
1108    
1109     Let's go through it part by part.
1110    
1111     my $ioserver = port_async {
1112    
1113 root 1.43 Ports can be created by attaching a thread to an existing port via
1114     C<rcv_async>, or as here by calling C<port_async> with the code to execute
1115 root 1.42 as a thread. The C<async> component comes from the fact that threads are
1116     created using the C<Coro::async> function.
1117    
1118     The thread runs in a normal port context (so C<$SELF> is set). In
1119     addition, when the thread returns, it will be C<kil> I<normally>, i.e.
1120     without a reason argument.
1121    
1122     while () {
1123     my ($tag, $port, $path) = get_cond;
1124     or die "only write_file messages expected";
1125    
1126     The thread is supposed to serve many file writes, which is why it executes
1127     in a loop. The first thing it does is fetch the next message, using
1128     C<get_cond>, the "conditional message get". Without a condition, it simply
1129     fetches the next message from the queue, which I<must> be a C<write_file>
1130     message.
1131    
1132     The message contains the C<$path> to the file, which is then created:
1133    
1134     my $fh = aio_open $path, O_WRONLY|O_CREAT, 0666
1135     or die "$path: $!";
1136    
1137     Then we enter a loop again, to serve as many C<data> messages as
1138 root 1.43 necessary:
1139 root 1.42
1140     while () {
1141     my (undef, undef, $data) = get_cond {
1142     $_[0] eq "data" && $_[1] eq $port
1143     } 5
1144     or die "timeout waiting for data message from $port\n";
1145    
1146     This time, the condition is not empty, but instead a code block: similarly
1147     to grep, the code block will be called with C<@_> set to each message in
1148     the queue, and it has to return whether it wants to receive the message or
1149     not.
1150    
1151     In this case we are interested in C<data> messages (C<< $_[0] eq "data"
1152     >>), whose first element is the source port (C<< $_[1] eq $port >>).
1153    
1154     The condition must be this strict, as it is possible to receive both
1155     C<write_file> messages and C<data> messages from other ports while we
1156     handle the file writing.
1157    
1158     The lone C<5> at the end is a timeout - when no matching message is
1159     received within C<5> seconds, we assume an error and C<die>.
1160    
1161     When an empty C<data> message is received we are done and can close the
1162     file (which is done automatically as C<$fh> goes out of scope):
1163    
1164     length $data or last;
1165    
1166     Otherwise we need to write the data:
1167    
1168     aio_write $fh, undef, undef, $data, 0;
1169    
1170 root 1.43 That's basically it. Note that every process should have some kind of
1171 root 1.42 supervisor. In our case, the supervisor simply prints any error message:
1172    
1173     mon $ioserver, sub {
1174     warn "ioserver was killed: @_\n";
1175     };
1176    
1177     Here is a usage example:
1178    
1179     port_async {
1180     snd $ioserver, write_file => $SELF, "/tmp/unsafe";
1181     snd $ioserver, data => $SELF, "abc\n";
1182     snd $ioserver, data => $SELF, "def\n";
1183     snd $ioserver, data => $SELF;
1184     };
1185    
1186     The messages are sent without any flow control or acknowledgement (feel
1187     free to improve). Also, the source port does not actually need to be a
1188     port - any unique ID will do - but port identifiers happen to be a simple
1189     source of network-wide unique IDs.
1190    
1191     Apart from C<get_cond> as seen above, there are other ways to receive
1192     messages. The C<write_file> message above could also selectively be
1193     received using a C<get> call:
1194    
1195     my ($port, $path) = get "write_file";
1196    
1197     This is simpler, but when some other code part sends an unexpected message
1198     to the C<$ioserver> it will stay in the queue forever. As a rule of thumb,
1199     every threaded port should have a "fetch next message unconditionally"
1200     somewhere, to avoid filling up the queue.
1201    
1202     It is also possible to switch-like C<get_conds>:
1203    
1204     get_cond {
1205     $_[0] eq "msg1" and return sub {
1206     my (undef, @msg1_data) = @_;
1207     ...;
1208     };
1209    
1210     $_[0] eq "msg2" and return sub {
1211     my (undef, @msg2_data) = @_;
1212     ...;
1213     };
1214    
1215     die "unexpected message $_[0] received";
1216     };
1217    
1218 root 1.37 =head1 THE END
1219    
1220     This is the end of this introduction, but hopefully not the end of
1221 root 1.43 your career as AEMP user. I hope the tutorial was enough to make the
1222 root 1.37 basic concepts clear. Keep in mind that distributed programming is not
1223     completely trivial, that AnyEvent::MP is still in it's infancy, and I hope
1224     it will be useful to create exciting new applications.
1225    
1226 elmex 1.1 =head1 SEE ALSO
1227    
1228     L<AnyEvent::MP>
1229    
1230 elmex 1.20 L<AnyEvent::MP::Global>
1231    
1232 root 1.42 L<Coro::MP>
1233    
1234 root 1.34 L<AnyEvent>
1235    
1236 elmex 1.1 =head1 AUTHOR
1237    
1238     Robin Redeker <elmex@ta-sa.org>
1239 root 1.32 Marc Lehmann <schmorp@schmorp.de>
1240 root 1.4