… | |
… | |
169 | |
169 | |
170 | Thats is all for now, you will find some more advanced fiddling with the |
170 | Thats is all for now, you will find some more advanced fiddling with the |
171 | C<aemp> utility later. |
171 | C<aemp> utility later. |
172 | |
172 | |
173 | |
173 | |
174 | =head1 Passing Messages Between Processes |
174 | =head1 PART 1: Passing Messages Between Processes |
175 | |
175 | |
176 | =head2 The Receiver |
176 | =head2 The Receiver |
177 | |
177 | |
178 | Lets split the previous example up into two programs: one that contains |
178 | Lets split the previous example up into two programs: one that contains |
179 | the sender and one for the receiver. First the receiver application, in |
179 | the sender and one for the receiver. First the receiver application, in |
… | |
… | |
391 | Or to put it differently: the arguments passed to configure are usually |
391 | Or to put it differently: the arguments passed to configure are usually |
392 | provided not by the programmer, but by whoeever is deplying the program. |
392 | provided not by the programmer, but by whoeever is deplying the program. |
393 | |
393 | |
394 | To make this easy, AnyEvent::MP supports a simple configuration database, |
394 | To make this easy, AnyEvent::MP supports a simple configuration database, |
395 | using profiles, which can be managed using the F<aemp> command-line |
395 | using profiles, which can be managed using the F<aemp> command-line |
396 | utility. |
396 | utility (yes, this section is about the advanced tinkering we mentioned |
|
|
397 | before). |
397 | |
398 | |
398 | When you change both programs above to simply call |
399 | When you change both programs above to simply call |
399 | |
400 | |
400 | configure; |
401 | configure; |
401 | |
402 | |
… | |
… | |
411 | |
412 | |
412 | aemp profile seed binds "*:4040" |
413 | aemp profile seed binds "*:4040" |
413 | |
414 | |
414 | And we configure all nodes to use this as seed node (this only works when |
415 | And we configure all nodes to use this as seed node (this only works when |
415 | running on the same host, for multiple machines you would provide the IP |
416 | running on the same host, for multiple machines you would provide the IP |
416 | address or hostname of the node running the seed): |
417 | address or hostname of the node running the seed), and use a random name |
|
|
418 | (because we want to start multiple nodes on the same host): |
417 | |
419 | |
418 | aemp seeds "*:4040" |
420 | aemp seeds "*:4040" nodeid anon/ |
419 | |
421 | |
420 | Then we run the seed node: |
422 | Then we run the seed node: |
421 | |
423 | |
422 | aemp run profile seed |
424 | aemp run profile seed |
423 | |
425 | |
… | |
… | |
425 | use our generic seed node to discover each other. |
427 | use our generic seed node to discover each other. |
426 | |
428 | |
427 | In fact, starting many receivers nicely illustrates that the time sender |
429 | In fact, starting many receivers nicely illustrates that the time sender |
428 | can have multiple receivers. |
430 | can have multiple receivers. |
429 | |
431 | |
430 | That's all for now - next time we will teach you about monitoring by |
432 | That's all for now - next we will teach you about monitoring by writing a |
431 | writing a simple chat client and server :) |
433 | simple chat client and server :) |
|
|
434 | |
|
|
435 | =head1 PART 2: Monitoring, Supervising, Exception Handling and Recovery |
|
|
436 | |
|
|
437 | That's a mouthful, so what does it mean? Our previous example is what one |
|
|
438 | could call "very loosely coupled" - the sender doesn't care about whether |
|
|
439 | there are any receivers, and the receivers do not care if there is any |
|
|
440 | sender. |
|
|
441 | |
|
|
442 | This can work fine for simple services, but most real-world applications |
|
|
443 | want to ensure that the side they are expecting to be there is actually |
|
|
444 | there. Going one step further: most bigger real-world applications even |
|
|
445 | want to ensure that if some component is missing, or has crashed, it will |
|
|
446 | still be there, by recovering and restarting the service. |
|
|
447 | |
|
|
448 | AnyEvent::MP supports this by catching exceptions and network problems, |
|
|
449 | and notifying interested parties of this. |
|
|
450 | |
|
|
451 | =head2 Exceptions, Network Errors and Monitors |
|
|
452 | |
|
|
453 | =head3 Exceptions |
|
|
454 | |
|
|
455 | Exceptions are handled on a per-port basis: receive callbacks are executed |
|
|
456 | in a special context, the port-context, and code that throws an uncaught |
|
|
457 | exception will cause the port to be C<kil>led. Killed ports are destroyed |
|
|
458 | automatically (killing ports is the only way to free ports, incidentally). |
|
|
459 | |
|
|
460 | Ports can be monitored, even from a different host, and when a port is |
|
|
461 | killed any entity monitoring it will be notified. |
|
|
462 | |
|
|
463 | Here is a simple example: |
|
|
464 | |
|
|
465 | use AnyEvent::MP; |
|
|
466 | |
|
|
467 | # create a port, it always dies |
|
|
468 | my $port = port { die "oops" }; |
|
|
469 | |
|
|
470 | # monitor it |
|
|
471 | mon $port, sub { |
|
|
472 | warn "$port was killed (with reason @_)"; |
|
|
473 | }; |
|
|
474 | |
|
|
475 | # now send it some message, causing it to die: |
|
|
476 | snd $port; |
|
|
477 | |
|
|
478 | It first creates a port whose only action is to throw an exception, |
|
|
479 | and the monitors it with the C<mon> function. Afterwards it sends it a |
|
|
480 | message, causing it to die and call the monitoring callback: |
|
|
481 | |
|
|
482 | anon/6WmIpj.a was killed (with reason die oops at xxx line 5.) at xxx line 9. |
|
|
483 | |
|
|
484 | The callback was actually passed two arguments: C<die> (to indicate it did |
|
|
485 | throw an exception as opposed to, say, a network error) and the exception |
|
|
486 | message itself. |
|
|
487 | |
|
|
488 | What happens when a port is killed before we have a chance to monitor |
|
|
489 | it? Granted, this is highly unlikely in our example, but when you program |
|
|
490 | in a network this can easily happen due to races between nodes. |
|
|
491 | |
|
|
492 | use AnyEvent::MP; |
|
|
493 | |
|
|
494 | my $port = port { die "oops" }; |
|
|
495 | |
|
|
496 | snd $port; |
|
|
497 | |
|
|
498 | mon $port, sub { |
|
|
499 | warn "$port was killed (with reason @_)"; |
|
|
500 | }; |
|
|
501 | |
|
|
502 | This time we will get something like: |
|
|
503 | |
|
|
504 | anon/zpX.a was killed (with reason no_such_port cannot monitor nonexistent port) |
|
|
505 | |
|
|
506 | Since the port was already gone, the kill reason is now C<no_such_port> |
|
|
507 | with some descriptive (we hope) error message. |
|
|
508 | |
|
|
509 | In fact, the kill reason is usually some identifier as first argument |
|
|
510 | and a human-readable error message as second argument, but can be about |
|
|
511 | anything (it's a list) or even nothing - which is called a "normal" kill. |
|
|
512 | |
|
|
513 | You can kill ports manually using the C<kil> function, which will be |
|
|
514 | treated like an error when any reason is specified: |
|
|
515 | |
|
|
516 | kil $port, custom_error => "don't like your steenking face"; |
|
|
517 | |
|
|
518 | And a clean kill without any reason arguments: |
|
|
519 | |
|
|
520 | kil $port; |
|
|
521 | |
|
|
522 | By now you probably wonder what this "normal" kill business is: A common |
|
|
523 | idiom is to not specify a callback to C<mon>, but another port, such as |
|
|
524 | C<$SELF>: |
|
|
525 | |
|
|
526 | mon $port, $SELF; |
|
|
527 | |
|
|
528 | This basically means "monitor $port and kill me when it crashes". And a |
|
|
529 | "normal" kill does not count as a crash. This way you can easily link |
|
|
530 | ports together and make them crash together on errors (but allow you to |
|
|
531 | remove a port silently). |
|
|
532 | |
|
|
533 | =head3 Network Errors and the AEMP Guarantee |
|
|
534 | |
|
|
535 | I mentioned another important source of monitoring failures: network |
|
|
536 | problems. When a node loses connection to another node, it will invoke all |
|
|
537 | monitoring actions as if the port was killed, even if it is possible that |
|
|
538 | the prot sitll lives happily on another node (not being able to talk to a |
|
|
539 | node means we have no clue what's going on with it, it could be crashed, |
|
|
540 | but also still running without knowing we lost the connection). |
|
|
541 | |
|
|
542 | So another way to view monitors is "notify me when some of my messages |
|
|
543 | couldn't be delivered". AEMP has a guarantee about message delivery to a |
|
|
544 | port: After starting a monitor, any message sent to a port will either |
|
|
545 | be delivered, or, when it is lost, any further messages will also be lost |
|
|
546 | until the monitoring action is invoked. After that, further messgaes |
|
|
547 | I<might> get delivered again. |
|
|
548 | |
|
|
549 | This doesn't sound like a very big guarantee, but it is kind of the best |
|
|
550 | you can get whiel staying sane: Specifically, it means that there will |
|
|
551 | be no "wholes" in the message sequence: all messages sent are delivered |
|
|
552 | in order, without any missing in between, and when some were lost, you |
|
|
553 | I<will> be notified of that, so you can take recovery action. |
|
|
554 | |
|
|
555 | =head3 Supervising |
|
|
556 | |
|
|
557 | Ok, so what is this crashing-everything-stuff going to make applications |
|
|
558 | I<more> stable? Well in fact, the goal is not really to make them more |
|
|
559 | stable, but to make them more resilient against actual errors and |
|
|
560 | crashes. And this is not done by crashing I<everything>, but by crashing |
|
|
561 | everything except a supervisor. |
|
|
562 | |
|
|
563 | A supervisor is simply some code that ensures that an applciation (or a |
|
|
564 | part of it) is running, and if it crashes, is restarted properly. |
|
|
565 | |
|
|
566 | To show how to do all this we will create a simple chat server that can |
|
|
567 | handle many chat clients. Both server and clients can be killed and |
|
|
568 | restarted, and even crash, to some extent. |
|
|
569 | |
|
|
570 | =head2 Chatting, the Resilient Way |
|
|
571 | |
|
|
572 | Without further ado, here is the chat server (to run it, we assume the |
|
|
573 | set-up explained earlier, with a separate F<aemp run> seed node): |
|
|
574 | |
|
|
575 | use common::sense; |
|
|
576 | use AnyEvent::MP; |
|
|
577 | use AnyEvent::MP::Global; |
|
|
578 | |
|
|
579 | configure; |
|
|
580 | |
|
|
581 | my %clients; |
|
|
582 | |
|
|
583 | sub msg { |
|
|
584 | print "relaying: $_[0]\n"; |
|
|
585 | snd $_, $_[0] |
|
|
586 | for values %clients; |
|
|
587 | } |
|
|
588 | |
|
|
589 | our $server = port; |
|
|
590 | |
|
|
591 | rcv $server, join => sub { |
|
|
592 | my ($client, $nick) = @_; |
|
|
593 | |
|
|
594 | $clients{$client} = $client; |
|
|
595 | |
|
|
596 | mon $client, sub { |
|
|
597 | delete $clients{$client}; |
|
|
598 | msg "$nick (quits, @_)"; |
|
|
599 | }; |
|
|
600 | msg "$nick (joins)"; |
|
|
601 | }; |
|
|
602 | |
|
|
603 | rcv $server, privmsg => sub { |
|
|
604 | my ($nick, $msg) = @_; |
|
|
605 | msg "$nick: $msg"; |
|
|
606 | }; |
|
|
607 | |
|
|
608 | AnyEvent::MP::Global::register $server, "eg_chat_server"; |
|
|
609 | |
|
|
610 | warn "server ready.\n"; |
|
|
611 | |
|
|
612 | AnyEvent->condvar->recv; |
|
|
613 | |
|
|
614 | Looks like a lot, but it is actually quite simple: after your usualy |
|
|
615 | preamble (this time we use common sense), we define a helper function that |
|
|
616 | sends some message to every registered chat client: |
|
|
617 | |
|
|
618 | sub msg { |
|
|
619 | print "relaying: $_[0]\n"; |
|
|
620 | snd $_, $_[0] |
|
|
621 | for values %clients; |
|
|
622 | } |
|
|
623 | |
|
|
624 | The clients are stored in the hash C<%client>. Then we define a server |
|
|
625 | port and install two receivers on it, C<join>, which is sent by clients |
|
|
626 | to join the chat, and C<privmsg>, that clients use to send actual chat |
|
|
627 | messages. |
|
|
628 | |
|
|
629 | C<join> is most complicated. It expects the client port and the nickname |
|
|
630 | to be passed in the message, and registers the client in C<%clients>. |
|
|
631 | |
|
|
632 | rcv $server, join => sub { |
|
|
633 | my ($client, $nick) = @_; |
|
|
634 | |
|
|
635 | $clients{$client} = $client; |
|
|
636 | |
|
|
637 | The next step is to monitor the client. The monitoring action removes the |
|
|
638 | client and sends a quit message with the error to all remaining clients. |
|
|
639 | |
|
|
640 | mon $client, sub { |
|
|
641 | delete $clients{$client}; |
|
|
642 | msg "$nick (quits, @_)"; |
|
|
643 | }; |
|
|
644 | |
|
|
645 | And finally, it creates a join message and sends it to all clients. |
|
|
646 | |
|
|
647 | msg "$nick (joins)"; |
|
|
648 | }; |
|
|
649 | |
|
|
650 | The C<privmsg> callback simply broadcasts the message to all clients: |
|
|
651 | |
|
|
652 | rcv $server, privmsg => sub { |
|
|
653 | my ($nick, $msg) = @_; |
|
|
654 | msg "$nick: $msg"; |
|
|
655 | }; |
|
|
656 | |
|
|
657 | And finally, the server rgeisters itself in the server group, so that |
|
|
658 | clients can find it: |
|
|
659 | |
|
|
660 | AnyEvent::MP::Global::register $server, "eg_chat_server"; |
|
|
661 | |
|
|
662 | Well, well... and where is this supervisor stuff? Well... we cheated, |
|
|
663 | it's not there. To not overcomplicate the example, we only put it into |
|
|
664 | the..... CLIENT! |
|
|
665 | |
|
|
666 | =head3 The Client, and a Supervisor! |
|
|
667 | |
|
|
668 | Again, here is the client, including supervisor, which makes it a bit |
|
|
669 | longer: |
|
|
670 | |
|
|
671 | use common::sense; |
|
|
672 | use AnyEvent::MP; |
|
|
673 | use AnyEvent::MP::Global; |
|
|
674 | |
|
|
675 | my $nick = shift; |
|
|
676 | |
|
|
677 | configure; |
|
|
678 | |
|
|
679 | my ($client, $server); |
|
|
680 | |
|
|
681 | sub server_connect { |
|
|
682 | my $servernodes = AnyEvent::MP::Global::find "eg_chat_server" |
|
|
683 | or return after 1, \&server_connect; |
|
|
684 | |
|
|
685 | print "\rconnecting...\n"; |
|
|
686 | |
|
|
687 | $client = port { print "\r \r@_\n> " }; |
|
|
688 | mon $client, sub { |
|
|
689 | print "\rdisconnected @_\n"; |
|
|
690 | &server_connect; |
|
|
691 | }; |
|
|
692 | |
|
|
693 | $server = $servernodes->[0]; |
|
|
694 | snd $server, join => $client, $nick; |
|
|
695 | mon $server, $client; |
|
|
696 | } |
|
|
697 | |
|
|
698 | server_connect; |
|
|
699 | |
|
|
700 | my $w = AnyEvent->io (fh => *STDIN, poll => 'r', cb => sub { |
|
|
701 | chomp (my $line = <STDIN>); |
|
|
702 | print "> "; |
|
|
703 | snd $server, privmsg => $nick, $line |
|
|
704 | if $server; |
|
|
705 | }); |
|
|
706 | |
|
|
707 | $| = 1; |
|
|
708 | print "> "; |
|
|
709 | AnyEvent->condvar->recv; |
|
|
710 | |
|
|
711 | The first thing the client does is to store the nick name (which is |
|
|
712 | expected as the only command line argument) in C<$nick>, for further |
|
|
713 | usage. |
|
|
714 | |
|
|
715 | The next relevant thing is... finally... the supervisor: |
|
|
716 | |
|
|
717 | sub server_connect { |
|
|
718 | my $servernodes = AnyEvent::MP::Global::find "eg_chat_server" |
|
|
719 | or return after 1, \&server_connect; |
|
|
720 | |
|
|
721 | This looks up the server in the C<eg_chat_server> global group. If it |
|
|
722 | cannot find it (which is likely when the node is just starting up), |
|
|
723 | it will wait a second and then retry. This "wait a bit and retry" |
|
|
724 | is an important pattern, as distributed programming means lots of |
|
|
725 | things are going on asynchronously. In practise, one should use a more |
|
|
726 | intelligent algorithm, to possibly warn after an excessive number of |
|
|
727 | retries. Hopefully future versions of AnyEvent::MP will offer some |
|
|
728 | predefined supervisors, for now you will have to code it on your own. |
|
|
729 | |
|
|
730 | Next it creates a local port for the server to send messages to, and |
|
|
731 | monitors it. When the port is killed, it will print "disconnected" and |
|
|
732 | tell the supervisor function to retry again. |
|
|
733 | |
|
|
734 | $client = port { print "\r \r@_\n> " }; |
|
|
735 | mon $client, sub { |
|
|
736 | print "\rdisconnected @_\n"; |
|
|
737 | &server_connect; |
|
|
738 | }; |
|
|
739 | |
|
|
740 | Then everything is ready: the client will send a C<join> message with it's |
|
|
741 | local port to the server, and start monitoring it: |
|
|
742 | |
|
|
743 | $server = $servernodes->[0]; |
|
|
744 | snd $server, join => $client, $nick; |
|
|
745 | mon $server, $client; |
|
|
746 | } |
|
|
747 | |
|
|
748 | The monitor will ensure that if the server crashes or goes away, the |
|
|
749 | client will be killed as well. This tells the user that the client was |
|
|
750 | disconnected, and will then start to connect the server again. |
|
|
751 | |
|
|
752 | The rest of the program deals with the boring details of actually invoking |
|
|
753 | the supervisor function to start the whole client process and handle the |
|
|
754 | actual terminal input, sending it to the server. |
|
|
755 | |
|
|
756 | You should now try to start the server and one or more clients in diferent |
|
|
757 | terminal windows (and the seed node): |
|
|
758 | |
|
|
759 | perl eg/chat_client nick1 |
|
|
760 | perl eg/chat_client nick2 |
|
|
761 | perl eg/chat_server |
|
|
762 | aemp run profile seed |
|
|
763 | |
|
|
764 | And then you can experiment with chatting, killing one or more clients, or |
|
|
765 | stopping and restarting the server, to see the monitoring in action. |
|
|
766 | |
|
|
767 | There is ample room for improvement: the server should probably remember |
|
|
768 | the nickname in the C<join> handler instead of expecting it in every chat |
|
|
769 | message, it should probably monitor itself, and the client should not try |
|
|
770 | to send any messages unless a server is actually connected. |
|
|
771 | |
|
|
772 | =head1 PART 3: TIMTOWTDI: Virtual Connections |
|
|
773 | |
|
|
774 | #TODO |
432 | |
775 | |
433 | =head1 SEE ALSO |
776 | =head1 SEE ALSO |
434 | |
777 | |
435 | L<AnyEvent> |
778 | L<AnyEvent> |
436 | |
779 | |