… | |
… | |
612 | The C<peval> function temporarily replaces C<$SELF> by the given C<$port> |
612 | The C<peval> function temporarily replaces C<$SELF> by the given C<$port> |
613 | and then executes the given sub in a port context. |
613 | and then executes the given sub in a port context. |
614 | |
614 | |
615 | =head3 Network Errors and the AEMP Guarantee |
615 | =head3 Network Errors and the AEMP Guarantee |
616 | |
616 | |
617 | I mentioned another important source of monitoring failures: network |
617 | Earlier we mentioned another important source of monitoring failures: |
618 | problems. When a node loses connection to another node, it will invoke all |
618 | network problems. When a node loses connection to another node, it will |
619 | monitoring actions as if the port was killed, even if it is possible that |
619 | invoke all monitoring actions, just as if the port was killed, I<even if |
620 | the port is still happily alive on another node (not being able to talk to |
620 | it is possible that the port is still happily alive on another node> (not |
621 | a node means we have no clue what's going on with it, it could be crashed, |
621 | being able to talk to a node means we have no clue what's going on with |
622 | but also still running without knowing we lost the connection). |
622 | it, it could be crashed, but also still running without knowing we lost |
|
|
623 | the connection). |
623 | |
624 | |
624 | So another way to view monitors is "notify me when some of my messages |
625 | So another way to view monitors is: "notify me when some of my messages |
625 | couldn't be delivered". AEMP has a guarantee about message delivery to a |
626 | couldn't be delivered". AEMP has a guarantee about message delivery to a |
626 | port: After starting a monitor, any message sent to a port will either |
627 | port: After starting a monitor, any message sent to a port will either |
627 | be delivered, or, when it is lost, any further messages will also be lost |
628 | be delivered, or, when it is lost, any further messages will also be lost |
628 | until the monitoring action is invoked. After that, further messages |
629 | until the monitoring action is invoked. After that, further messages |
629 | I<might> get delivered again. |
630 | I<might> get delivered again. |
630 | |
631 | |
631 | This doesn't sound like a very big guarantee, but it is kind of the best |
632 | This doesn't sound like a very big guarantee, but it is kind of the best |
632 | you can get while staying sane: Specifically, it means that there will |
633 | you can get while staying sane: Specifically, it means that there will be |
633 | be no "holes" in the message sequence: all messages sent are delivered |
634 | no "holes" in the message sequence: all messages sent are delivered in |
634 | in order, without any missing in between, and when some were lost, you |
635 | order, without any of them missing in between, and when some were lost, |
635 | I<will> be notified of that, so you can take recovery action. |
636 | you I<will> be notified of that, so you can take recovery action. |
636 | |
637 | |
637 | And, obviously, the guarantee only works in the presence of |
638 | And, obviously, the guarantee only works in the presence of |
638 | correctly-working hardware, and no relevant bugs inside AEMP itself. |
639 | correctly-working hardware, and no relevant bugs inside AEMP itself. |
639 | |
640 | |
640 | =head3 Supervising |
641 | =head3 Supervising |
641 | |
642 | |
642 | OK, so how is this crashing-everything-stuff going to make applications |
643 | OK, so how is this crashing-everything-stuff going to make applications |
643 | I<more> stable? Well, in fact, the goal is not really to make them more |
644 | I<more> stable? Well, in fact, the goal is not really to make them |
644 | stable, but to make them more resilient against actual errors and |
645 | more stable, but to make them more resilient against actual errors |
645 | crashes. And this is not done by crashing I<everything>, but by crashing |
646 | and crashes. And this is not done by crashing I<everything>, but by |
646 | everything except a I<supervisor>. |
647 | crashing everything except a I<supervisor> that then cleans up and sgtarts |
|
|
648 | everything again. |
647 | |
649 | |
648 | A supervisor is simply some code that ensures that an application (or a |
650 | A supervisor is simply some code that ensures that an application (or a |
649 | part of it) is running, and if it crashes, is restarted properly. That is, |
651 | part of it) is running, and if it crashes, is restarted properly. That is, |
650 | it supervises a service by starting and restarting it, as necessary. |
652 | it supervises a service by starting and restarting it, as necessary. |
651 | |
653 | |
… | |
… | |
742 | }; |
744 | }; |
743 | |
745 | |
744 | And finally, the server registers itself in the server group, so that |
746 | And finally, the server registers itself in the server group, so that |
745 | clients can find it: |
747 | clients can find it: |
746 | |
748 | |
747 | grp_reg eg_chat_server => $server; |
749 | db_set eg_chat_server => $server; |
748 | |
750 | |
749 | Well, well... and where is this supervisor stuff? Well... we cheated, |
751 | Well, well... and where is this supervisor stuff? Well... we cheated, |
750 | it's not there. To not overcomplicate the example, we only put it into |
752 | it's not there. To not overcomplicate the example, we only put it into |
751 | the..... CLIENT! |
753 | the..... CLIENT! |
752 | |
754 | |
… | |
… | |
802 | expected as the only command line argument) in C<$nick>, for further |
804 | expected as the only command line argument) in C<$nick>, for further |
803 | usage. |
805 | usage. |
804 | |
806 | |
805 | The next relevant thing is... finally... the supervisor: |
807 | The next relevant thing is... finally... the supervisor: |
806 | |
808 | |
807 | #todo#d# |
|
|
808 | sub server_connect { |
809 | sub server_connect { |
809 | my $servernodes = grp_get "eg_chat_server" |
810 | my $db_mon; |
810 | or return after 1, \&server_connect; |
811 | $db_mon = db_mon eg_chat_server => sub { |
|
|
812 | return unless %{ $_[0] }; |
|
|
813 | undef $db_mon; # stop monitoring |
811 | |
814 | |
812 | This looks up the server in the C<eg_chat_server> global group. If it |
815 | This monitors the C<eg_chat_server> database family. It waits until a |
813 | cannot find it (which is likely when the node is just starting up), |
816 | chat server becomes available. When that happens, it "connects" to it |
814 | it will wait a second and then retry. This "wait a bit and retry" |
817 | by creating a client port that receives and prints chat messages, and |
815 | is an important pattern, as distributed programming means lots of |
818 | monitoring it: |
816 | things are going on asynchronously. In practise, one should use a more |
|
|
817 | intelligent algorithm, to possibly warn after an excessive number of |
|
|
818 | retries. Hopefully future versions of AnyEvent::MP will offer some |
|
|
819 | predefined supervisors, for now you will have to code it on your own. |
|
|
820 | |
|
|
821 | Next it creates a local port for the server to send messages to, and |
|
|
822 | monitors it. When the port is killed, it will print "disconnected" and |
|
|
823 | tell the supervisor function to retry again. |
|
|
824 | |
819 | |
825 | $client = port { print "\r \r@_\n> " }; |
820 | $client = port { print "\r \r@_\n> " }; |
826 | mon $client, sub { |
821 | mon $client, sub { |
827 | print "\rdisconnected @_\n"; |
822 | print "\rdisconnected @_\n"; |
828 | &server_connect; |
823 | &server_connect; |
829 | }; |
824 | }; |
830 | |
825 | |
|
|
826 | If the client port dies (for whatever reason), the "supervisor" will start |
|
|
827 | looking for a server again - the semantics of C<db_mon> ensure that it |
|
|
828 | will immediately find it if there is a server port. |
|
|
829 | |
831 | Then everything is ready: the client will send a C<join> message with it's |
830 | After this, everything is ready: the client will send a C<join> message |
832 | local port to the server, and start monitoring it: |
831 | with its local port to the server, and start monitoring it: |
833 | |
832 | |
834 | $server = $servernodes->[0]; |
833 | $server = (keys %{ $_[0] })[0]; |
|
|
834 | |
835 | snd $server, join => $client, $nick; |
835 | snd $server, join => $client, $nick; |
836 | mon $server, $client; |
836 | mon $server, $client; |
837 | } |
837 | } |
838 | |
838 | |
839 | The monitor will ensure that if the server crashes or goes away, the |
839 | This second monitor will ensure that, when the server port crashes or goes |
840 | client will be killed as well. This tells the user that the client was |
840 | away (e.g. due to network problems), the client port will be killed as |
841 | disconnected, and will then start to connect the server again. |
841 | well. This tells the user that the client was disconnected, and will then |
|
|
842 | start to connect the server again. |
842 | |
843 | |
843 | The rest of the program deals with the boring details of actually invoking |
844 | The rest of the program deals with the boring details of actually invoking |
844 | the supervisor function to start the whole client process and handle the |
845 | the supervisor function to start the whole client process and handle the |
845 | actual terminal input, sending it to the server. |
846 | actual terminal input, sending it to the server. |
|
|
847 | |
|
|
848 | Now... the "supervisor" in this example is a bit of a cheat - it doesn't |
|
|
849 | really clean up much (because the cleanup done by AnyEvent::MP suffices), |
|
|
850 | and there isn't much of a restarting action either - if the server isn't |
|
|
851 | there because it crashed, well, it isn't there. |
|
|
852 | |
|
|
853 | In the real world, one would often add a timeout that would trigger when |
|
|
854 | the server couldn't be found within some time limit, and then complain, |
|
|
855 | or even try to start a new server. Or the supervisor would have to do |
|
|
856 | some real cleanups, such as rolling back database transactions when the |
|
|
857 | database thread crashes. For this simple chat server, however, this simple |
|
|
858 | supervisor works fine. Hopefully future versions of AnyEvent::MP will |
|
|
859 | offer some predefined supervisors, for now you will have to code it on |
|
|
860 | your own. |
846 | |
861 | |
847 | You should now try to start the server and one or more clients in different |
862 | You should now try to start the server and one or more clients in different |
848 | terminal windows (and the seed node): |
863 | terminal windows (and the seed node): |
849 | |
864 | |
850 | perl eg/chat_client nick1 |
865 | perl eg/chat_client nick1 |
… | |
… | |
862 | sides can take corrective action. Exceptions are "servers" that serve |
877 | sides can take corrective action. Exceptions are "servers" that serve |
863 | multiple clients at once and might only wish to clean up, and supervisors, |
878 | multiple clients at once and might only wish to clean up, and supervisors, |
864 | who of course should not normally get killed (unless they, too, have a |
879 | who of course should not normally get killed (unless they, too, have a |
865 | supervisor). |
880 | supervisor). |
866 | |
881 | |
867 | If you often think in object-oriented terms, then treat a port as an |
882 | If you often think in object-oriented terms, then you can think of a port |
868 | object, C<port> is the constructor, the receive callbacks set by C<rcv> |
883 | as an object: C<port> is the constructor, the receive callbacks set by |
869 | act as methods, the C<kil> function becomes the explicit destructor and |
884 | C<rcv> act as methods, the C<kil> function becomes the explicit destructor |
870 | C<mon> installs a destructor hook. Unlike conventional object oriented |
885 | and C<mon> installs a destructor hook. Unlike conventional object oriented |
871 | programming, it can make sense to exchange ports more freely (for example, |
886 | programming, it can make sense to exchange port IDs more freely (for |
872 | to monitor one port from another). |
887 | example, to monitor one port from another), because it is cheap to send |
|
|
888 | port IDs over the network, and AnyEvent::MP blurs the distinction between |
|
|
889 | local and remote ports. |
873 | |
890 | |
874 | There is ample room for improvement: the server should probably remember |
891 | Lastly, there is ample room for improvement in this example: the server |
875 | the nickname in the C<join> handler instead of expecting it in every chat |
892 | should probably remember the nickname in the C<join> handler instead of |
876 | message, it should probably monitor itself, and the client should not try |
893 | expecting it in every chat message, it should probably monitor itself, and |
877 | to send any messages unless a server is actually connected. |
894 | the client should not try to send any messages unless a server is actually |
|
|
895 | connected. |
878 | |
896 | |
879 | =head1 PART 3: TIMTOWTDI: Virtual Connections |
897 | =head1 PART 3: TIMTOWTDI: Virtual Connections |
880 | |
898 | |
881 | The chat system developed in the previous sections is very "traditional" |
899 | The chat system developed in the previous sections is very "traditional" |
882 | in a way: you start some server(s) and some clients statically and they |
900 | in a way: you start some server(s) and some clients statically and they |
883 | start talking to each other. |
901 | start talking to each other. |
884 | |
902 | |
885 | Sometimes applications work more like "services": They can run on almost |
903 | Sometimes applications work more like "services": They can run on almost |
886 | any node and talks to itself on other nodes. The L<AnyEvent::MP::Global> |
904 | any node and even talk to copies of themselves on other nodes in case they |
887 | service for example monitors nodes joining the network and starts itself |
905 | are distributed. The L<AnyEvent::MP::Global> service for example monitors |
888 | automatically on other nodes (if it isn't running already). |
906 | nodes joining the network and sometimes even starts itself on other nodes. |
889 | |
907 | |
890 | A good way to design such applications is to put them into a module and |
908 | One good way to design such services is to put them into a module and |
891 | create "virtual connections" to other nodes - we call this the "bridge |
909 | create "virtual connections" to other nodes. We call this the "bridge |
892 | head" method, because you start by creating a remote port (the bridge |
910 | head" method, because you start by I<creating a remote port> (the bridge |
893 | head) and from that you start to bootstrap your application. |
911 | head) and from that you start to bootstrap your application. |
894 | |
912 | |
895 | Since that sounds rather theoretical, let's redesign the chat server and |
913 | Since that sounds rather theoretical, let us redesign the chat server and |
896 | client using this design method. |
914 | client using this design method. |
897 | |
915 | |
898 | Here is the server: |
916 | As usual, we start with the full program - here is the server: |
899 | |
917 | |
900 | use common::sense; |
918 | use common::sense; |
901 | use AnyEvent::MP; |
919 | use AnyEvent::MP; |
902 | use AnyEvent::MP::Global; |
|
|
903 | |
920 | |
904 | configure; |
921 | configure; |
905 | |
922 | |
906 | grp_reg eg_chat_server2 => $NODE; |
923 | db_set eg_chat_server2 => $NODE; |
907 | |
924 | |
908 | my %clients; |
925 | my %clients; |
909 | |
926 | |
910 | sub msg { |
927 | sub msg { |
911 | print "relaying: $_[0]\n"; |
928 | print "relaying: $_[0]\n"; |
… | |
… | |
915 | |
932 | |
916 | sub client_connect { |
933 | sub client_connect { |
917 | my ($client, $nick) = @_; |
934 | my ($client, $nick) = @_; |
918 | |
935 | |
919 | mon $client; |
936 | mon $client; |
920 | mon $client, sub { |
937 | mon $client, psub { |
921 | delete $clients{$client}; |
938 | delete $clients{$client}; |
922 | msg "$nick (quits, @_)"; |
939 | msg "$nick (quits, @_)"; |
923 | }; |
940 | }; |
924 | |
941 | |
925 | $clients{$client} = $client; |
942 | $clients{$client} = $client; |
… | |
… | |
932 | warn "server ready.\n"; |
949 | warn "server ready.\n"; |
933 | |
950 | |
934 | AnyEvent->condvar->recv; |
951 | AnyEvent->condvar->recv; |
935 | |
952 | |
936 | It starts out not much different then the previous example, except that |
953 | It starts out not much different then the previous example, except that |
937 | this time, we register the node port in the global group and not any port |
954 | this time, we register the node port in the database and not a port we |
938 | we created - the clients only want to know which node the server should be |
955 | created - the clients only want to know which node the server should |
|
|
956 | be running on, and there can only be one such server (or service) per |
939 | running on. In fact, they could also use some kind of election mechanism, |
957 | node. In fact, the clients could also use some kind of election mechanism, |
940 | to find the node with lowest load or something like that. |
958 | to find the node with lowest node ID, or lowest load, or something like |
|
|
959 | that. |
941 | |
960 | |
942 | The more interesting change is that indeed no server port is created - |
961 | The much more interesting difference to the previous server is that |
943 | the server consists only of code, and "does" nothing by itself. All it |
962 | indeed no server port is created - the server consists only of code, |
944 | does is define a function C<client_connect>, which expects a client port |
963 | and "does" nothing by itself. All it "does" is to define a function |
945 | and a nick name as arguments. It then monitors the client port and binds |
964 | named C<client_connect>, which expects a client port and a nick name as |
946 | a receive callback on C<$SELF>, which expects messages that in turn are |
965 | arguments. It then monitors the client port and binds a receive callback |
947 | broadcast to all clients. |
966 | on C<$SELF>, which expects messages that in turn are broadcast to all |
|
|
967 | clients. |
948 | |
968 | |
949 | The two C<mon> calls are a bit tricky - the first C<mon> is a shorthand |
969 | The two C<mon> calls are a bit tricky - the first C<mon> is a shorthand |
950 | for C<mon $client, $SELF>. The second does the normal "client has gone |
970 | for C<mon $client, $SELF>. The second does the normal "client has gone |
951 | away" clean-up action. Both could actually be rolled into one C<mon> |
971 | away" clean-up action. |
952 | action. |
|
|
953 | |
972 | |
954 | C<$SELF> is a good hint that something interesting is going on. And |
973 | The last line, the C<rcv $SELF>, is a good hint that something interesting |
955 | indeed, when looking at the client code, there is a new function, |
974 | is going on. And indeed, when looking at the client code, you can see a |
956 | C<spawn>: |
975 | new function, C<spawn>: |
|
|
976 | #todo# |
957 | |
977 | |
958 | use common::sense; |
978 | use common::sense; |
959 | use AnyEvent::MP; |
979 | use AnyEvent::MP; |
960 | use AnyEvent::MP::Global; |
980 | use AnyEvent::MP::Global; |
961 | |
981 | |
… | |
… | |
1004 | $server = spawn $servernodes->[0], "::client_connect", $client, $nick; |
1024 | $server = spawn $servernodes->[0], "::client_connect", $client, $nick; |
1005 | mon $server, $client; |
1025 | mon $server, $client; |
1006 | |
1026 | |
1007 | And of course the first thing after creating it is monitoring it. |
1027 | And of course the first thing after creating it is monitoring it. |
1008 | |
1028 | |
1009 | The C<spawn> function creates a new port on a remote node and returns |
1029 | Phew, let's go through this in slow motion: the C<spawn> function creates |
1010 | its port ID. After creating the port it calls a function on the remote |
1030 | a new port on a remote node and returns its port ID. After creating |
1011 | node, passing any remaining arguments to it, and - most importantly - |
1031 | the port it calls a function on the remote node, passing any remaining |
1012 | executes the function within the context of the new port, so it can be |
1032 | arguments to it, and - most importantly - executes the function within |
1013 | manipulated by referring to C<$SELF>. The init function can reside in a |
1033 | the context of the new port, so it can be manipulated by referring to |
1014 | module (actually it normally I<should> reside in a module) - AnyEvent::MP |
1034 | C<$SELF>. The init function can reside in a module (actually it normally |
1015 | will automatically load the module if the function isn't defined. |
1035 | I<should> reside in a module) - AnyEvent::MP will automatically load the |
|
|
1036 | module if the function isn't defined. |
1016 | |
1037 | |
1017 | The C<spawn> function returns immediately, which means you can instantly |
1038 | The C<spawn> function returns immediately, which means you can instantly |
1018 | send messages to the port, long before the remote node has even heard |
1039 | send messages to the port, long before the remote node has even heard |
1019 | of our request to create a port on it. In fact, the remote node might |
1040 | of our request to create a port on it. In fact, the remote node might |
1020 | not even be running. Despite these troubling facts, everything should |
1041 | not even be running. Despite these troubling facts, everything should |
… | |
… | |
1029 | cleaned up on connection loss. When the remote node comes up again and our |
1050 | cleaned up on connection loss. When the remote node comes up again and our |
1030 | monitoring message can be delivered, it will instantly fail because the |
1051 | monitoring message can be delivered, it will instantly fail because the |
1031 | port has been cleaned up in the meantime. |
1052 | port has been cleaned up in the meantime. |
1032 | |
1053 | |
1033 | If your head is spinning by now, that's fine - just keep in mind, after |
1054 | If your head is spinning by now, that's fine - just keep in mind, after |
1034 | creating a port, monitor it on the local node, and monitor "the other |
1055 | creating a port using C<spawn>, monitor it on the local node, and monitor |
1035 | side" from the remote node, and all will be cleaned up just fine. |
1056 | "the other side" from the remote node, and all will be cleaned up just |
|
|
1057 | fine. |
1036 | |
1058 | |
1037 | =head2 Services |
1059 | =head2 Services |
1038 | |
1060 | |
1039 | Above it was mentioned that C<spawn> automatically loads modules, and this |
1061 | Above it was mentioned that C<spawn> automatically loads modules, and this |
1040 | can be exploited in various ways. |
1062 | can be exploited in various ways. |