=head1 Message Passing for the Non-Blocked Mind =head1 Introduction and Terminology This is a tutorial about how to get the swing of the new L module, which allows programs to transparently pass messages within the process and to other processes on the same or a different host. What kind of messages? Basically a message here means a list of Perl strings, numbers, hashes and arrays, anything that can be expressed as a L text (as JSON is used by default in the protocol). Here are two examples: write_log => 1251555874, "action was successful.\n" 123, ["a", "b", "c"], { foo => "bar" } When using L it is customary to use a descriptive string as first element of a message, that indictes the type of the message. This element is called a I in L, as some API functions (C) support matching it directly. Supposedly you want to send a ping message with your current time to somewhere, this is how such a message might look like (in Perl syntax): ping => 1251381636 Now that we know what a message is, to which entities are those messages being I? They are I to I. A I is a destination for messages but also a context to execute code: when a runtime error occurs while executing code belonging to a port, the exception will be raised on the port and can even travel to interested parties on other nodes, which makes supervision of distributed processes easy. How do these ports relate to things you know? Each I belongs to a I, and a I is just the UNIX process that runs your L application. Each I is distinguished from other I running on the same or another host in a network by its I. A I is simply a unique string chosen manually or assigned by L in some way (UNIX nodename, random string...). Here is a diagram about how I, I and UNIX processes relate to each other. The setup consists of two nodes (more are of course possible): Node C (in UNIX process 7066) with the ports C and C. And the node C (in UNIX process 8321) with the ports C and C. |- PID: 7066 -| |- PID: 8321 -| | | | | | Node ID: A | | Node ID: B | | | | | | Port ABC =|= <----\ /-----> =|= Port FOO | | | X | | | Port DEF =|= <----/ \-----> =|= Port BAR | | | | | |-------------| |-------------| The strings for the I here are just for illustrative purposes: Even though I in L are also identified by strings, they can't be choosen manually and are assigned by the system dynamically. These I are unique within a network and can also be used to identify senders or as message tags for instance. The next sections will explain the API of L by going through a few simple examples. Later some more complex idioms are introduced, which are hopefully useful to solve some real world problems. =head1 Passing Your First Message As a start lets have a look at the messaging API. The following example is just a demo to show the basic elements of message passing with L. The example should print: C, in a rather complicated way, by passing some message to a port. use AnyEvent; use AnyEvent::MP; my $end_cv = AnyEvent->condvar; my $port = port; rcv $port, test => sub { my ($data) = @_; $end_cv->send ($data); }; snd $port, test => 123; print "Ending with: " . $end_cv->recv . "\n"; It already uses most of the essential functions inside L: First there is the C function which will create a I and will return it's I, a simple string. This I can be used to send messages to the port and install handlers to receive messages on the port. Since it is a simple string it can be safely passed to other I in the network when you want to refer to that specific port (usually used for RPC, where you need to tell the other end which I to send the reply to - messages in L have a destination, but no source). The next function is C: rcv $port, test => sub { ... }; It installs a receiver callback on the I that specified as the first argument (it only works for "local" ports, i.e. ports created on the same node). The next argument, in this example C, specifies a I to match. This means that whenever a message with the first element being the string C is received, the callback is called with the remaining parts of that message. Messages can be sent with the C function, which is used like this in the example above: snd $port, test => 123; This will send the message C<'test', 123> to the I with the I stored in C<$port>. Since in this case the receiver has a I match on C it will call the callback with the first argument being the number C<123>. The callback is a typicall AnyEvent idiom: the callback just passes that number on to the I C<$end_cv> which will then pass the value to the print. Condition variables are out of the scope of this tutorial and not often used with ports, so please consult the L about them. Passing messages inside just one process is boring. Before we can move on and do interprocess message passing we first have to make sure some things have been set up correctly for our nodes to talk to each other. =head1 System Requirements and System Setup Before we can start with real IPC we have to make sure some things work on your system. First we have to setup a I: for two L I to be able to communicate with each other over the network it is necessary to setup the same I for both of them, so they can prove their trustworthyness to each other. The easiest way is to set this up is to use the F utility: aemp gensecret This creates a F<$HOME/.perl-anyevent-mp> config file and generates a random shared secret. You can copy this file to any other system and then communicate over the network (via TCP) with it. You can also select your own shared secret (F) and for increased security requirements you can even create (or configure) a TLS certificate (F), causing connections to not just be securely authenticated, but also to be encrypted and protected against tinkering. Connections will only be successfully established when the I that want to connect to each other have the same I (or successfully verify the TLS certificate of the other side, in which case no shared secret is required). B is the same on all hosts/user accounts that you try to connect with each other!> Thats is all for now, you will find some more advanced fiddling with the C utility later. =head1 Passing Messages Between Processes =head2 The Receiver Lets split the previous example up into two programs: one that contains the sender and one for the receiver. First the receiver application, in full: use AnyEvent; use AnyEvent::MP; use AnyEvent::MP::Global; configure nodeid => "eg_receiver", binds => ["*:4040"]; my $port = port; AnyEvent::MP::Global::register $port, "eg_receivers"; rcv $port, test => sub { my ($data, $reply_port) = @_; print "Received data: " . $data . "\n"; }; AnyEvent->condvar->recv; =head3 AnyEvent::MP::Global Now, that wasn't too bad, was it? Ok, let's step through the new functions and modules that have been used. For starters, there is now an additional module being used: L. This module provides us with a I, which lets us register ports in groups that are visible on all I in a network. What is this useful for? Well, the I are random-looking strings, assigned by L. We cannot know those I in advance, so we don't know which I to send messages to, especially when the message is to be passed between different I (or UNIX processes). To find the right I of another I in the network we will need to communicate this somehow to the sender. And exactly that is what L provides. Especially in larger, more anonymous networks this is handy: imagine you have a few database backends, a few web frontends and some processing distributed over a number of hosts: all of these would simply register themselves in the appropriate group, and your web frontends can start to find some database backend. =head3 C and the Network Now, let's have a look at the new function, C: configure nodeid => "eg_receiver", binds => ["*:4040"]; Before we are able to send messages to other nodes we have to initialise ourself to become a "distributed node". Initialising a node means naming the node, optionally binding some TCP listeners so that other nodes can contact it and connecting to a predefined set of seed addresses so the node can discover the existing network - and the existing network can discover the node! All of this (and more) can be passed to the C function - later we will see how we can do all this without even passing anything to C! The first parameter, C, specified the node ID (in this case C - the default is to use the node name of the current host, but for this example we want to be able to run many nodes on the same machine). Node IDs need to be unique within the network and can be almost any string - if you don't care, you can specify a node ID of C which will then be replaced by a random node name. The second parameter, C, specifies a list of C pairs to bind TCP listeners on. The special "address" of C<*> means to bind on every local IP address. The reason to bind on a TCP port is not just that other nodes can connect to us: if no binds are specified, the node will still bind on a dynamic port on all local addresses - but in this case we won't know the port, and cannot tell other nodes to connect to it as seed node. A I is a (fixed) TCP address of some other node in the network. To explain the need for seeds we have to look at the topology of a typical L network. The topology is called a I, here an example with 4 nodes: N1--N2 | \/ | | /\ | N3--N4 Now imagine another node - C - wants to connect itself to that network: N1--N2 | \/ | N5 | /\ | N3--N4 The new node needs to know the I of all nodes already connected. Exactly this is what the I are for: Let's assume that the new node (C) uses the TCP address of the node C as seed. This cuases it to connect to C: N1--N2____ | \/ | N5 | /\ | N3--N4 C then tells C about the I of the other nodes it is connected to, and C creates the rest of the connections: /--------\ N1--N2____| | \/ | N5 | /\ | /| N3--N4--- | \________/ All done: C is now happily connected to the rest of the network. Of course, this process takes time, during which the node is already running. This also means it takes time until the node is fully connected, and global groups and other information is available. The best way to deal with this is to either retry regularly until you found the resource you were looking for, or to only start services on demand after a node has become available. =head3 Registering the Receiver Coming back to our example, we have now introduced the basic purpose of L and C and its use of profiles. We also set up our profiles for later use and now we will finally continue talking about the receiver. Let's look at the next line(s): my $port = port; AnyEvent::MP::Global::register $port, "eg_receivers"; The C function has already been discussed. It simply creates a new I and returns the I. The C function, however, is new: The first argument is the I that we want to add to a I, and its second argument is the name of that I. You can choose the name of such a I freely (prefixing your package name is highly recommended!). The purpose of such a group is to store a set of I. This set is made available throughout the L network, so that each node can see which ports belong to that group. Later we will see how the sender looks for the ports in this I to send messages to them. The last step in the example is to set up a receiver callback for those messages, just as was discussed in the first example. We again match for the tag C. The difference is that this time we don't exit the application after receiving the first message. Instead we continue to wait for new messages indefinitely. =head2 The Sender Ok, now let's take a look at the sender code: use AnyEvent; use AnyEvent::MP; use AnyEvent::MP::Global; configure nodeid => "eg_sender", seeds => ["*:4040"]; my $find_timer = AnyEvent->timer (after => 0, interval => 1, cb => sub { my $ports = AnyEvent::MP::Global::find "eg_receivers" or return; snd $_, test => time for @$ports; }); AnyEvent->condvar->recv; It's even less code. The C serves the same purpose as in the receiver, but instead of specifying binds we specify a list of seeds - which happens to be the same as the binds used by the receiver, which becomes our seed node. Next we set up a timer that repeatedly (every second) calls this chunk of code: my $ports = AnyEvent::MP::Global::find "eg_receivers" or return; snd $_, test => time for @$ports; The only new function here is the C function of L. It searches in the global group named C for ports. If none are found, it returns C, which makes our code return instantly and wait for the next round, as nobody is interested in our message. As soon as the receiver application has connected and the information about the newly added port in the receiver has propagated to the sender node, C returns an array reference that contains the I of the receiver I. We then just send a message with a tag and the current time to every I in the global group. =head3 Splitting Network Configuration and Application Code Ok, so far, this works. In the real world, however, the person configuring your application to run on a specific network (the end user or network administrator) is often different to the person coding the application. Or to put it differently: the arguments passed to configure are usually provided not by the programmer, but by whoeever is deplying the program. To make this easy, AnyEvent::MP supports a simple configuration database, using profiles, which can be managed using the F command-line utility. When you change both programs above to simply call configure; then AnyEvent::MP tries to look up a profile using the current node name in its configuration database, falling back to some global default. You can run "generic" nodes using the F utility as well, and we will exploit this in the following way: we configure a profile "seed" and run a node using it, whose sole purpose is to be a seed node for our example programs. We bind the seed node to port 4040 on all interfaces: aemp profile seed setbinds "*:4040" And we configure all nodes to use this as seed node (this only works when running on the same host, for multiple machines you would provide the IP address or hostname of the node running the seed): aemp setseeds "*:4040" Then we run the seed node: aemp run profile seed After that, we can start as many other nodes as we want, and they will all use our generic seed node to discover each other. In fact, starting many receivers nicely illustrates that the time sender can have multiple receivers. That's all for now - next time we will teach you about monitoring by writing a simple chat client and server :) =head1 SEE ALSO L L L L =head1 AUTHOR Robin Redeker