--- AnyEvent-MP/MP.pm 2011/06/30 09:31:58 1.118 +++ AnyEvent-MP/MP.pm 2012/02/26 10:29:59 1.119 @@ -80,6 +80,8 @@ some messages. Messages send to ports will not be queued, regardless of anything was listening for them or not. +Ports are represented by (printable) strings called "port IDs". + =item port ID - C A port ID is the concatenation of a node ID, a hash-mark (C<#>) as @@ -93,49 +95,77 @@ Nodes are either public (have one or more listening ports) or private (no listening ports). Private nodes cannot talk to other private nodes -currently. +currently, but all nodes can talk to public nodes. + +Nodes is represented by (printable) strings called "node IDs". =item node ID - C<[A-Za-z0-9_\-.:]*> A node ID is a string that uniquely identifies the node within a network. Depending on the configuration used, node IDs can look like a hostname, a hostname and a port, or a random string. AnyEvent::MP itself -doesn't interpret node IDs in any way. +doesn't interpret node IDs in any way except to uniquely identify a node. =item binds - C Nodes can only talk to each other by creating some kind of connection to each other. To do this, nodes should listen on one or more local transport -endpoints - binds. Currently, only standard C specifications can -be used, which specify TCP ports to listen on. +endpoints - binds. + +Currently, only standard C specifications can be used, which +specify TCP ports to listen on. So a bind is basically just a tcp socket +in listening mode thta accepts conenctions form other nodes. =item seed nodes -When a node starts, it knows nothing about the network. To teach the node -about the network it first has to contact some other node within the -network. This node is called a seed. - -Apart from the fact that other nodes know them as seed nodes and they have -to have fixed listening addresses, seed nodes are perfectly normal nodes - -any node can function as a seed node for others. +When a node starts, it knows nothing about the network it is in - it +needs to connect to at least one other node that is already in the +network. These other nodes are called "seed nodes". + +Seed nodes themselves are not special - they are seed nodes only because +some other node I them as such, but any node can be used as seed +node for other nodes, and eahc node cna use a different set of seed nodes. In addition to discovering the network, seed nodes are also used to -maintain the network and to connect nodes that otherwise would have -trouble connecting. They form the backbone of an AnyEvent::MP network. +maintain the network - all nodes using the same seed node form are part of +the same network. If a network is split into multiple subnets because e.g. +the network link between the parts goes down, then using the same seed +nodes for all nodes ensures that eventually the subnets get merged again. Seed nodes are expected to be long-running, and at least one seed node should always be available. They should also be relatively responsive - a seed node that blocks for long periods will slow down everybody else. -=item seeds - C +For small networks, it's best if every node uses the same set of seed +nodes. For large networks, it can be useful to specify "regional" seed +nodes for most nodes in an area, and use all seed nodes as seed nodes for +each other. What's important is that all seed nodes connections form a +complete graph, so that the network cannot split into separate subnets +forever. + +Seed nodes are represented by seed IDs. + +=item seed IDs - C -Seeds are transport endpoint(s) (usually a hostname/IP address and a +Seed IDs are transport endpoint(s) (usually a hostname/IP address and a TCP port) of nodes that should be used as seed nodes. -The nodes listening on those endpoints are expected to be long-running, -and at least one of those should always be available. When nodes run out -of connections (e.g. due to a network error), they try to re-establish -connections to some seednodes again to join the network. +=item global nodes + +An AEMP network needs a discovery service - nodes need to know how to +connect to other nodes they only know by name. In addition, AEMP offers a +distributed "group database", which maps group names to a list of strings +- for example, to register worker ports. + +A network needs at least one global node to work, and allows every node to +be a global node. + +Any node that loads the L module becomes a global +node and tries to keep connections to all other nodes. So while it can +make sense to make every node "global" in small networks, it usually makes +sense to only make seed nodes into global nodes in large networks (nodes +keep connections to seed nodes and global nodes, so makign them the same +reduces overhead). =back @@ -238,7 +268,7 @@ =item step 3, connect to seed nodes -As the last step, the seeds list from the profile is passed to the +As the last step, the seed ID list from the profile is passed to the L module, which will then use it to keep connectivity with at least one node at any point in time. @@ -864,19 +894,24 @@ =item * Erlang uses processes and a mailbox, AEMP does not queue. -Erlang uses processes that selectively receive messages, and therefore -needs a queue. AEMP is event based, queuing messages would serve no -useful purpose. For the same reason the pattern-matching abilities of -AnyEvent::MP are more limited, as there is little need to be able to +Erlang uses processes that selectively receive messages out of order, and +therefore needs a queue. AEMP is event based, queuing messages would serve +no useful purpose. For the same reason the pattern-matching abilities +of AnyEvent::MP are more limited, as there is little need to be able to filter messages without dequeuing them. -(But see L for a more Erlang-like process model on top of AEMP). +This is not a philosophical difference, but simply stems from AnyEvent::MP +being event-based, while Erlang is process-based. + +You cna have a look at L for a more Erlang-like process model on +top of AEMP and Coro threads. =item * Erlang sends are synchronous, AEMP sends are asynchronous. -Sending messages in Erlang is synchronous and blocks the process (and -so does not need a queue that can overflow). AEMP sends are immediate, -connection establishment is handled in the background. +Sending messages in Erlang is synchronous and blocks the process until +a conenction has been established and the message sent (and so does not +need a queue that can overflow). AEMP sends return immediately, connection +establishment is handled in the background. =item * Erlang suffers from silent message loss, AEMP does not. @@ -889,13 +924,19 @@ same port are lost as well, until monitoring raises an error, so there are no silent "holes" in the message sequence. +If you want your software to be very reliable, you have to cope with +corrupted and even out-of-order messages in both Erlang and AEMP. AEMP +simply tries to work better in common error cases, such as when a network +link goes down. + =item * Erlang can send messages to the wrong port, AEMP does not. -In Erlang it is quite likely that a node that restarts reuses a process ID -known to other nodes for a completely different process, causing messages -destined for that process to end up in an unrelated process. +In Erlang it is quite likely that a node that restarts reuses an Erlang +process ID known to other nodes for a completely different process, +causing messages destined for that process to end up in an unrelated +process. -AEMP never reuses port IDs, so old messages or old port IDs floating +AEMP does not reuse port IDs, so old messages or old port IDs floating around in the network will not be sent to an unrelated port. =item * Erlang uses unprotected connections, AEMP uses secure @@ -908,7 +949,7 @@ communications. The AEMP protocol, unlike the Erlang protocol, supports both programming -language independent text-only protocols (good for debugging) and binary, +language independent text-only protocols (good for debugging), and binary, language-specific serialisers (e.g. Storable). By default, unless TLS is used, the protocol is actually completely text-based. @@ -918,11 +959,12 @@ =item * AEMP has more flexible monitoring options than Erlang. -In Erlang, you can chose to receive I exit signals as messages -or I, there is no in-between, so monitoring single processes is -difficult to implement. Monitoring in AEMP is more flexible than in -Erlang, as one can choose between automatic kill, exit message or callback -on a per-process basis. +In Erlang, you can chose to receive I exit signals as messages or +I, there is no in-between, so monitoring single Erlang processes is +difficult to implement. + +Monitoring in AEMP is more flexible than in Erlang, as one can choose +between automatic kill, exit message or callback on a per-port basis. =item * Erlang tries to hide remote/local connections, AEMP does not.