… | |
… | |
410 | to see which messages are exchanged, or starting the sender first and see |
410 | to see which messages are exchanged, or starting the sender first and see |
411 | how long it takes it to find the receiver. |
411 | how long it takes it to find the receiver. |
412 | |
412 | |
413 | =head3 Splitting Network Configuration and Application Code |
413 | =head3 Splitting Network Configuration and Application Code |
414 | |
414 | |
415 | #TODO# |
|
|
416 | OK, so far, this works. In the real world, however, the person configuring |
415 | OK, so far, this works reasonably. In the real world, however, the person |
417 | your application to run on a specific network (the end user or network |
416 | configuring your application to run on a specific network (the end user |
418 | administrator) is often different to the person coding the application. |
417 | or network administrator) is often different to the person coding the |
|
|
418 | application. |
419 | |
419 | |
420 | Or to put it differently: the arguments passed to configure are usually |
420 | Or to put it differently: the arguments passed to configure are usually |
421 | provided not by the programmer, but by whoever is deploying the program. |
421 | provided not by the programmer, but by whoever is deploying the program - |
|
|
422 | even in the example above, we would like to be able to just start senders |
|
|
423 | and receivers without having to patch the programs. |
422 | |
424 | |
423 | To make this easy, AnyEvent::MP supports a simple configuration database, |
425 | To make this easy, AnyEvent::MP supports a simple configuration database, |
424 | using profiles, which can be managed using the F<aemp> command-line |
426 | using profiles, which can be managed using the F<aemp> command-line |
425 | utility (yes, this section is about the advanced tinkering we mentioned |
427 | utility (yes, this section is about the advanced tinkering mentioned |
426 | before). |
428 | before). |
427 | |
429 | |
428 | When you change both programs above to simply call |
430 | When you change both programs above to simply call |
429 | |
431 | |
430 | configure; |
432 | configure; |
… | |
… | |
441 | |
443 | |
442 | aemp profile seed binds "*:4040" |
444 | aemp profile seed binds "*:4040" |
443 | |
445 | |
444 | And we configure all nodes to use this as seed node (this only works when |
446 | And we configure all nodes to use this as seed node (this only works when |
445 | running on the same host, for multiple machines you would provide the IP |
447 | running on the same host, for multiple machines you would provide the IP |
446 | address or hostname of the node running the seed), and use a random name |
448 | address or hostname of the node running the seed), by changing the global |
447 | (because we want to start multiple nodes on the same host): |
449 | settings shared between all profiles: |
448 | |
450 | |
449 | aemp seeds "*:4040" nodeid anon/ |
451 | aemp seeds "*:4040" |
450 | |
452 | |
451 | Then we run the seed node: |
453 | Then we run the seed node: |
452 | |
454 | |
453 | aemp run profile seed |
455 | aemp run profile seed |
454 | |
456 | |
455 | After that, we can start as many other nodes as we want, and they will all |
457 | After that, we can start as many other nodes as we want, and they will |
456 | use our generic seed node to discover each other. |
458 | all use our generic seed node to discover each other. The reason we can |
457 | |
459 | start our existing programs even though they specify "incompatible" |
458 | In fact, starting many receivers nicely illustrates that the time sender |
460 | parameters to C<configure> is that the configuration file (by default) |
459 | can have multiple receivers. |
461 | takes precedence over any arguments passed to C<configure>. |
460 | |
462 | |
461 | That's all for now - next we will teach you about monitoring by writing a |
463 | That's all for now - next we will teach you about monitoring by writing a |
462 | simple chat client and server :) |
464 | simple chat client and server :) |
463 | |
465 | |
464 | =head1 PART 2: Monitoring, Supervising, Exception Handling and Recovery |
466 | =head1 PART 2: Monitoring, Supervising, Exception Handling and Recovery |
… | |
… | |
473 | there. Going one step further: most bigger real-world applications even |
475 | there. Going one step further: most bigger real-world applications even |
474 | want to ensure that if some component is missing, or has crashed, it will |
476 | want to ensure that if some component is missing, or has crashed, it will |
475 | still be there, by recovering and restarting the service. |
477 | still be there, by recovering and restarting the service. |
476 | |
478 | |
477 | AnyEvent::MP supports this by catching exceptions and network problems, |
479 | AnyEvent::MP supports this by catching exceptions and network problems, |
478 | and notifying interested parties of this. |
480 | and notifying interested parties of these. |
479 | |
481 | |
480 | =head2 Exceptions, Port Context, Network Errors and Monitors |
482 | =head2 Exceptions, Port Context, Network Errors and Monitors |
481 | |
483 | |
482 | =head3 Exceptions |
484 | =head3 Exceptions |
483 | |
485 | |
484 | Exceptions are handled on a per-port basis: receive callbacks are executed |
486 | Exceptions are handled on a per-port basis: all receive callbacks are |
485 | in a special context, the so-called I<port-context>: code that throws an |
487 | executed in a special context, the so-called I<port-context>: code |
486 | otherwise uncaught exception will cause the port to be C<kil>led. Killed |
488 | that throws an otherwise uncaught exception will cause the port to be |
487 | ports are destroyed automatically (killing ports is the only way to free |
489 | C<kil>led. Killed ports are destroyed automatically (killing ports is |
488 | ports, incidentally). |
490 | actually the only way to free ports). |
489 | |
491 | |
490 | Ports can be monitored, even from a different host, and when a port is |
492 | Ports can be monitored, even from a different node and host, and when a |
491 | killed any entity monitoring it will be notified. |
493 | port is killed, any entity monitoring it will be notified. |
492 | |
494 | |
493 | Here is a simple example: |
495 | Here is a simple example: |
494 | |
496 | |
495 | use AnyEvent::MP; |
497 | use AnyEvent::MP; |
496 | |
498 | |
… | |
… | |
503 | }; |
505 | }; |
504 | |
506 | |
505 | # now send it some message, causing it to die: |
507 | # now send it some message, causing it to die: |
506 | snd $port; |
508 | snd $port; |
507 | |
509 | |
|
|
510 | AnyEvent->condvar->recv; |
|
|
511 | |
508 | It first creates a port whose only action is to throw an exception, |
512 | It first creates a port whose only action is to throw an exception, |
509 | and the monitors it with the C<mon> function. Afterwards it sends it a |
513 | and the monitors it with the C<mon> function. Afterwards it sends it a |
510 | message, causing it to die and call the monitoring callback: |
514 | message, causing it to die and call the monitoring callback: |
511 | |
515 | |
512 | anon/6WmIpj.a was killed (with reason die oops at xxx line 5.) at xxx line 9. |
516 | anon/6WmIpj.a was killed (with reason die oops at xxx line 5.) at xxx line 9. |
513 | |
517 | |
514 | The callback was actually passed two arguments: C<die> (to indicate it did |
518 | The callback was actually passed two arguments: C<die>, to indicate it |
515 | throw an exception as opposed to, say, a network error) and the exception |
519 | did throw an I<exception> as opposed to, say, a network error, and the |
516 | message itself. |
520 | exception message itself. |
517 | |
521 | |
518 | What happens when a port is killed before we have a chance to monitor |
522 | What happens when a port is killed before we have a chance to monitor |
519 | it? Granted, this is highly unlikely in our example, but when you program |
523 | it? Granted, this is highly unlikely in our example, but when you program |
520 | in a network this can easily happen due to races between nodes. |
524 | in a network this can easily happen due to races between nodes. |
521 | |
525 | |
… | |
… | |
527 | |
531 | |
528 | mon $port, sub { |
532 | mon $port, sub { |
529 | warn "$port was killed (with reason @_)"; |
533 | warn "$port was killed (with reason @_)"; |
530 | }; |
534 | }; |
531 | |
535 | |
|
|
536 | AnyEvent->condvar->recv; |
|
|
537 | |
532 | This time we will get something like: |
538 | This time we will get something like: |
533 | |
539 | |
534 | anon/zpX.a was killed (with reason no_such_port cannot monitor nonexistent port) |
540 | anon/zpX.a was killed (with reason no_such_port cannot monitor nonexistent port) |
535 | |
541 | |
536 | Since the port was already gone, the kill reason is now C<no_such_port> |
542 | Since the port was already gone, the kill reason is now C<no_such_port> |
537 | with some descriptive (we hope) error message. |
543 | with some descriptive (we hope) error message. |
538 | |
544 | |
539 | In fact, the kill reason is usually some identifier as first argument |
545 | In fact, the kill reason is usually some identifier as first argument |
540 | and a human-readable error message as second argument, but can be about |
546 | and a human-readable error message as second argument, but can be about |
541 | anything (it's a list) or even nothing - which is called a "normal" kill. |
547 | anything (it is simply a list of values you cna choose yourself) or even |
|
|
548 | nothing - which is called a "normal" kill. |
542 | |
549 | |
543 | You can kill ports manually using the C<kil> function, which will be |
550 | You can kill ports manually using the C<kil> function, which will be |
544 | treated like an error when any reason is specified: |
551 | treated like an error when any reason is specified: |
545 | |
552 | |
546 | kil $port, custom_error => "don't like your steenking face"; |
553 | kil $port, custom_error => "don't like your steenking face"; |
… | |
… | |
555 | |
562 | |
556 | mon $port, $SELF; |
563 | mon $port, $SELF; |
557 | |
564 | |
558 | This basically means "monitor $port and kill me when it crashes". And a |
565 | This basically means "monitor $port and kill me when it crashes". And a |
559 | "normal" kill does not count as a crash. This way you can easily link |
566 | "normal" kill does not count as a crash. This way you can easily link |
560 | ports together and make them crash together on errors (but allow you to |
567 | ports together and make them crash together on errors, while allowing you |
561 | remove a port silently). |
568 | to remove a port silently. |
562 | |
569 | |
563 | =head3 Port Context |
570 | =head3 Port Context |
564 | |
571 | |
565 | When code runs in an environment where C<$SELF> contains its own port ID |
572 | When code runs in a port context, that means C<$SELF> contains its own |
566 | and exceptions will be caught, it is said to run in a port context. |
573 | port ID and exceptions that the code throws will be caught. |
567 | |
574 | |
568 | Since AnyEvent::MP is event-based, it is not uncommon to register |
575 | Since AnyEvent::MP is event-based, it is not uncommon to register |
569 | callbacks from C<rcv> handlers. As example, assume that the port receive |
576 | callbacks from C<rcv> handlers. As example, assume that the port receive |
570 | handler wants to C<die> a second later, using C<after>: |
577 | handler wants to C<die> a second later, using C<after>: |
571 | |
578 | |
… | |
… | |
585 | |
592 | |
586 | C<psub> stores C<$SELF> and returns a code reference. When the code |
593 | C<psub> stores C<$SELF> and returns a code reference. When the code |
587 | reference is invoked, it will run the code block within the context of |
594 | reference is invoked, it will run the code block within the context of |
588 | that port, so exception handling once more works as expected. |
595 | that port, so exception handling once more works as expected. |
589 | |
596 | |
590 | There is also a way to temporarily execute code in the context of some |
597 | There is even a way to temporarily execute code in the context of some |
591 | port, namely C<peval>: |
598 | port, namely C<peval>: |
592 | |
599 | |
593 | peval $port, sub { |
600 | peval $port, sub { |
594 | # die'ing here will kil $port |
601 | # die'ing here will kil $port |
595 | }; |
602 | }; |
… | |
… | |
600 | =head3 Network Errors and the AEMP Guarantee |
607 | =head3 Network Errors and the AEMP Guarantee |
601 | |
608 | |
602 | I mentioned another important source of monitoring failures: network |
609 | I mentioned another important source of monitoring failures: network |
603 | problems. When a node loses connection to another node, it will invoke all |
610 | problems. When a node loses connection to another node, it will invoke all |
604 | monitoring actions as if the port was killed, even if it is possible that |
611 | monitoring actions as if the port was killed, even if it is possible that |
605 | the port still lives happily on another node (not being able to talk to a |
612 | the port is still happily alive on another node (not being able to talk to |
606 | node means we have no clue what's going on with it, it could be crashed, |
613 | a node means we have no clue what's going on with it, it could be crashed, |
607 | but also still running without knowing we lost the connection). |
614 | but also still running without knowing we lost the connection). |
608 | |
615 | |
609 | So another way to view monitors is "notify me when some of my messages |
616 | So another way to view monitors is "notify me when some of my messages |
610 | couldn't be delivered". AEMP has a guarantee about message delivery to a |
617 | couldn't be delivered". AEMP has a guarantee about message delivery to a |
611 | port: After starting a monitor, any message sent to a port will either |
618 | port: After starting a monitor, any message sent to a port will either |
… | |
… | |
617 | you can get while staying sane: Specifically, it means that there will |
624 | you can get while staying sane: Specifically, it means that there will |
618 | be no "holes" in the message sequence: all messages sent are delivered |
625 | be no "holes" in the message sequence: all messages sent are delivered |
619 | in order, without any missing in between, and when some were lost, you |
626 | in order, without any missing in between, and when some were lost, you |
620 | I<will> be notified of that, so you can take recovery action. |
627 | I<will> be notified of that, so you can take recovery action. |
621 | |
628 | |
|
|
629 | And, obviously, the guarantee only works in the presence of |
|
|
630 | correctly-working hardware, and no relevant bugs inside AEMP itself. |
|
|
631 | |
622 | =head3 Supervising |
632 | =head3 Supervising |
623 | |
633 | |
624 | Ok, so what is this crashing-everything-stuff going to make applications |
634 | OK, so how is this crashing-everything-stuff going to make applications |
625 | I<more> stable? Well in fact, the goal is not really to make them more |
635 | I<more> stable? Well, in fact, the goal is not really to make them more |
626 | stable, but to make them more resilient against actual errors and |
636 | stable, but to make them more resilient against actual errors and |
627 | crashes. And this is not done by crashing I<everything>, but by crashing |
637 | crashes. And this is not done by crashing I<everything>, but by crashing |
628 | everything except a supervisor. |
638 | everything except a I<supervisor>. |
629 | |
639 | |
630 | A supervisor is simply some code that ensures that an application (or a |
640 | A supervisor is simply some code that ensures that an application (or a |
631 | part of it) is running, and if it crashes, is restarted properly. |
641 | part of it) is running, and if it crashes, is restarted properly. That is, |
|
|
642 | it supervises a service by starting and restarting it, as necessary. |
632 | |
643 | |
633 | To show how to do all this we will create a simple chat server that can |
644 | To show how to do all this we will create a simple chat server that can |
634 | handle many chat clients. Both server and clients can be killed and |
645 | handle many chat clients. Both server and clients can be killed and |
635 | restarted, and even crash, to some extent. |
646 | restarted, and even crash, to some extent, without disturbing the chat |
|
|
647 | functionality. |
636 | |
648 | |
637 | =head2 Chatting, the Resilient Way |
649 | =head2 Chatting, the Resilient Way |
638 | |
650 | |
639 | Without further ado, here is the chat server (to run it, we assume the |
651 | Without further ado, here is the chat server (to run it, we assume the |
640 | set-up explained earlier, with a separate F<aemp run> seed node): |
652 | set-up explained earlier, with a separate F<aemp run seed> node): |
641 | |
653 | |
642 | use common::sense; |
654 | use common::sense; |
643 | use AnyEvent::MP; |
655 | use AnyEvent::MP; |
644 | use AnyEvent::MP::Global; |
656 | use AnyEvent::MP::Global; |
645 | |
657 | |
… | |
… | |
670 | rcv $server, privmsg => sub { |
682 | rcv $server, privmsg => sub { |
671 | my ($nick, $msg) = @_; |
683 | my ($nick, $msg) = @_; |
672 | msg "$nick: $msg"; |
684 | msg "$nick: $msg"; |
673 | }; |
685 | }; |
674 | |
686 | |
675 | grp_reg eg_chat_server => $server; |
687 | db_set eg_chat_server => $server; |
676 | |
688 | |
677 | warn "server ready.\n"; |
689 | warn "server ready.\n"; |
678 | |
690 | |
679 | AnyEvent->condvar->recv; |
691 | AnyEvent->condvar->recv; |
680 | |
692 | |
… | |
… | |
735 | Again, here is the client, including supervisor, which makes it a bit |
747 | Again, here is the client, including supervisor, which makes it a bit |
736 | longer: |
748 | longer: |
737 | |
749 | |
738 | use common::sense; |
750 | use common::sense; |
739 | use AnyEvent::MP; |
751 | use AnyEvent::MP; |
740 | use AnyEvent::MP::Global; |
|
|
741 | |
752 | |
742 | my $nick = shift; |
753 | my $nick = shift || "anonymous"; |
743 | |
754 | |
744 | configure; |
755 | configure; |
745 | |
756 | |
746 | my ($client, $server); |
757 | my ($client, $server); |
747 | |
758 | |
748 | sub server_connect { |
759 | sub server_connect { |
749 | my $servernodes = grp_get "eg_chat_server" |
760 | my $db_mon; |
750 | or return after 1, \&server_connect; |
761 | $db_mon = db_mon eg_chat_server => sub { |
|
|
762 | return unless %{ $_[0] }; |
|
|
763 | undef $db_mon; |
751 | |
764 | |
752 | print "\rconnecting...\n"; |
765 | print "\rconnecting...\n"; |
753 | |
766 | |
754 | $client = port { print "\r \r@_\n> " }; |
767 | $client = port { print "\r \r@_\n> " }; |
755 | mon $client, sub { |
768 | mon $client, sub { |
756 | print "\rdisconnected @_\n"; |
769 | print "\rdisconnected @_\n"; |
757 | &server_connect; |
770 | &server_connect; |
|
|
771 | }; |
|
|
772 | |
|
|
773 | $server = (keys %{ $_[0] })[0]; |
|
|
774 | |
|
|
775 | snd $server, join => $client, $nick; |
|
|
776 | mon $server, $client; |
758 | }; |
777 | }; |
759 | |
|
|
760 | $server = $servernodes->[0]; |
|
|
761 | snd $server, join => $client, $nick; |
|
|
762 | mon $server, $client; |
|
|
763 | } |
778 | } |
764 | |
779 | |
765 | server_connect; |
780 | server_connect; |
766 | |
781 | |
767 | my $w = AnyEvent->io (fh => 0, poll => 'r', cb => sub { |
782 | my $w = AnyEvent->io (fh => 0, poll => 'r', cb => sub { |
… | |
… | |
779 | expected as the only command line argument) in C<$nick>, for further |
794 | expected as the only command line argument) in C<$nick>, for further |
780 | usage. |
795 | usage. |
781 | |
796 | |
782 | The next relevant thing is... finally... the supervisor: |
797 | The next relevant thing is... finally... the supervisor: |
783 | |
798 | |
|
|
799 | #todo#d# |
784 | sub server_connect { |
800 | sub server_connect { |
785 | my $servernodes = grp_get "eg_chat_server" |
801 | my $servernodes = grp_get "eg_chat_server" |
786 | or return after 1, \&server_connect; |
802 | or return after 1, \&server_connect; |
787 | |
803 | |
788 | This looks up the server in the C<eg_chat_server> global group. If it |
804 | This looks up the server in the C<eg_chat_server> global group. If it |