… | |
… | |
754 | The C<GET ...> and the empty line were entered manually, the rest of the |
754 | The C<GET ...> and the empty line were entered manually, the rest of the |
755 | telnet output is google's response, in which case a C<404 not found> one. |
755 | telnet output is google's response, in which case a C<404 not found> one. |
756 | |
756 | |
757 | So, here is how you would do it with C<AnyEvent::Handle>: |
757 | So, here is how you would do it with C<AnyEvent::Handle>: |
758 | |
758 | |
759 | ###TODO |
759 | sub http_get { |
|
|
760 | my ($host, $uri, $cb) = @_; |
|
|
761 | |
|
|
762 | tcp_connect $host, "http", sub { |
|
|
763 | my ($fh) = @_ |
|
|
764 | or $cb->("HTTP/1.0 500 $!"); |
|
|
765 | |
|
|
766 | # store results here |
|
|
767 | my ($response, $header, $body); |
|
|
768 | |
|
|
769 | my $handle; $handle = new AnyEvent::Handle |
|
|
770 | fh => $fh, |
|
|
771 | on_error => sub { |
|
|
772 | undef $handle; |
|
|
773 | $cb->("HTTP/1.0 500 $!"); |
|
|
774 | }, |
|
|
775 | on_eof => sub { |
|
|
776 | undef $handle; # keep it alive till eof |
|
|
777 | $cb->($response, $header, $body); |
|
|
778 | }; |
|
|
779 | |
|
|
780 | $handle->push_write ("GET $uri HTTP/1.0\015\012\015\012"); |
|
|
781 | |
|
|
782 | # now fetch response status line |
|
|
783 | $handle->push_read (line => sub { |
|
|
784 | my ($handle, $line) = @_; |
|
|
785 | $response = $line; |
|
|
786 | }); |
|
|
787 | |
|
|
788 | # then the headers |
|
|
789 | $handle->push_read (line => "\015\012\015\012", sub { |
|
|
790 | my ($handle, $line) = @_; |
|
|
791 | $header = $line; |
|
|
792 | }); |
|
|
793 | |
|
|
794 | # and finally handle any remaining data as body |
|
|
795 | $handle->on_read (sub { |
|
|
796 | $body .= $_[0]->rbuf; |
|
|
797 | $_[0]->rbuf = ""; |
|
|
798 | }); |
|
|
799 | }; |
|
|
800 | } |
760 | |
801 | |
761 | And now let's go through it step by step. First, as usual, the overall |
802 | And now let's go through it step by step. First, as usual, the overall |
762 | C<http_get> function structure: |
803 | C<http_get> function structure: |
763 | |
804 | |
764 | sub http_get { |
805 | sub http_get { |
… | |
… | |
860 | |
901 | |
861 | Reading the response if far more interesting: |
902 | Reading the response if far more interesting: |
862 | |
903 | |
863 | =head3 The read queue |
904 | =head3 The read queue |
864 | |
905 | |
865 | the response consists of three parts: a single line of response status, a |
906 | The response consists of three parts: a single line of response status, a |
866 | single paragraph of headers ended by an empty line, and the request body, |
907 | single paragraph of headers ended by an empty line, and the request body, |
867 | which is simply the remaining data on that connection. |
908 | which is simply the remaining data on that connection. |
868 | |
909 | |
869 | For the first two, we push two read requests onto the read queue: |
910 | For the first two, we push two read requests onto the read queue: |
870 | |
911 | |
… | |
… | |
904 | $_[0]->rbuf = ""; |
945 | $_[0]->rbuf = ""; |
905 | }); |
946 | }); |
906 | |
947 | |
907 | This callback is invoked every time data arrives and the read queue is |
948 | This callback is invoked every time data arrives and the read queue is |
908 | empty - which in this example will only be the case when both response and |
949 | empty - which in this example will only be the case when both response and |
909 | header have been read. |
950 | header have been read. The C<on_read> callback could actually have been |
|
|
951 | specified when constructing the object, but doing it this way preserves |
|
|
952 | logical ordering. |
910 | |
953 | |
|
|
954 | The read callback simply adds the current read buffer to it's C<$body> |
|
|
955 | variable and, most importantly, I<empties> it by assign the empty string |
|
|
956 | to it. |
911 | |
957 | |
912 | ############################################################################# |
958 | After AnyEvent::Handle has been so instructed, it will now handle incoming |
|
|
959 | data according to these instructions - if all goes well, the callback will |
|
|
960 | be invoked with the response data, if not, it will get an error. |
913 | |
961 | |
914 | Now let's start with something simple: a program that reads from standard |
962 | In general, you get pipelining very easy with AnyEvent::Handle: If |
915 | input in a non-blocking way, that is, in a way that lets your program do |
963 | you have a protocol with a request/response structure, your request |
916 | other things while it is waiting for input. |
964 | methods/functions will all look like this (simplified): |
917 | |
965 | |
918 | First, the full program listing: |
966 | sub request { |
919 | |
967 | |
920 | #!/usr/bin/perl |
968 | # send the request to the server |
|
|
969 | $handle->push_write (...); |
921 | |
970 | |
922 | use AnyEvent; |
971 | # push some response handlers |
923 | use AnyEvent::Handle; |
972 | $handle->push_read (...); |
|
|
973 | } |
924 | |
974 | |
925 | my $end_prog = AnyEvent->condvar; |
975 | =head3 Using it |
926 | |
976 | |
927 | my $handle = |
977 | And here is how you would use it: |
928 | AnyEvent::Handle->new ( |
978 | |
929 | fh => \*STDIN, |
979 | http_get "www.google.com", "/", sub { |
930 | on_eof => sub { |
980 | my ($response, $header, $body) = @_; |
931 | print "received EOF, exiting...\n"; |
981 | |
932 | $end_prog->broadcast; |
982 | print |
933 | }, |
983 | $response, "\n", |
934 | on_error => sub { |
984 | $body; |
935 | print "error while reading from STDIN: $!\n"; |
985 | }; |
936 | $end_prog->broadcast; |
986 | |
|
|
987 | And of course, you can run as many of these requests in parallel as you |
|
|
988 | want (and your memory supports). |
|
|
989 | |
|
|
990 | =head3 HTTPS |
|
|
991 | |
|
|
992 | Now, as promised, let's implement the same thing for HTTPS, or more |
|
|
993 | correctly, let's change our C<http_get> function into a function that |
|
|
994 | speaks HTTPS instead. |
|
|
995 | |
|
|
996 | HTTPS is, quite simply, a standard TLS connection (B<T>ransport B<L>ayer |
|
|
997 | B<S>ecurity is the official name for what most people refer to as C<SSL>) |
|
|
998 | that contains standard HTTP protocol exchanges. The other difference to |
|
|
999 | HTTP is that it uses port C<443> instead of port C<80>. |
|
|
1000 | |
|
|
1001 | To implement these two differences we need two tiny changes, first, in the C<tcp_connect> call |
|
|
1002 | we replace C<http> by C<https>): |
|
|
1003 | |
|
|
1004 | tcp_connect $host, "https", sub { ... |
|
|
1005 | |
|
|
1006 | The other change deals with TLS, which is something L<AnyEvent::Handle> |
|
|
1007 | does for us, as long as I<you> made sure that the L<Net::SSLeay> module is |
|
|
1008 | around. To enable TLS with L<AnyEvent::Handle>, we simply pass an addition |
|
|
1009 | C<tls> parameter to the call to C<AnyEvent::Handle::new>: |
|
|
1010 | |
|
|
1011 | tls => "connect", |
|
|
1012 | |
|
|
1013 | Specifying C<tls> enables TLS, and the argument specifies whether |
|
|
1014 | AnyEvent::Handle is the server side ("accept") or the client side |
|
|
1015 | ("connect") for the TLS connection, as unlike TCP, there is a clear |
|
|
1016 | server/client relationship in TLS. |
|
|
1017 | |
|
|
1018 | That's all. |
|
|
1019 | |
|
|
1020 | Of course, all this should be handled transparently by C<http_get> after |
|
|
1021 | parsing the URL. See the part about exercising your inspiration earlier in |
|
|
1022 | this document. |
|
|
1023 | |
|
|
1024 | =head3 The read queue - revisited |
|
|
1025 | |
|
|
1026 | HTTP always uses the same structure in its responses, but many protocols |
|
|
1027 | require parsing responses different depending on the response itself. |
|
|
1028 | |
|
|
1029 | For example, in SMTP, you normally get a single response line: |
|
|
1030 | |
|
|
1031 | 220 mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
|
|
1032 | |
|
|
1033 | But SMTP also supports multi-line responses: |
|
|
1034 | |
|
|
1035 | 220-mail.example.net Neverusesendmail 8.8.8 <mailme@example.net> |
|
|
1036 | 220-hey guys |
|
|
1037 | 220 my response is longer than yours |
|
|
1038 | |
|
|
1039 | To handle this, we need C<unshift_read>. As the name (hopefully) implies, |
|
|
1040 | C<unshift_read> will not append your read request tot he end of the read |
|
|
1041 | queue, but instead it will prepend it to the queue. |
|
|
1042 | |
|
|
1043 | This is useful for this this situation: You push your response-line read |
|
|
1044 | request when sending the SMTP command, and when handling it, you look at |
|
|
1045 | the line to see if more is to come, and C<unshift_read> another reader, |
|
|
1046 | like this: |
|
|
1047 | |
|
|
1048 | my $response; # response lines end up in here |
|
|
1049 | |
|
|
1050 | my $read_response; $read_response = sub { |
|
|
1051 | my ($handle, $line) = @_; |
|
|
1052 | |
|
|
1053 | $response .= "$line\n"; |
|
|
1054 | |
|
|
1055 | # check for continuation lines ("-" as 4th character") |
|
|
1056 | if ($line =~ /^...-/) { |
|
|
1057 | # if yes, then unshift another line read |
|
|
1058 | $handle->unshift_read (line => $read_response); |
|
|
1059 | |
|
|
1060 | } else { |
|
|
1061 | # otherwise we are done |
|
|
1062 | |
|
|
1063 | # free callback |
|
|
1064 | undef $read_response; |
937 | } |
1065 | |
|
|
1066 | print "we are don reading: $response\n"; |
938 | ); |
1067 | } |
|
|
1068 | }; |
|
|
1069 | |
|
|
1070 | $handle->push_read (line => $read_response); |
|
|
1071 | |
|
|
1072 | This recipe can be used for all similar parsing problems, for example in |
|
|
1073 | NNTP, the response code to some commands indicates that more data will be |
|
|
1074 | sent: |
|
|
1075 | |
|
|
1076 | $handle->push_write ("article 42"); |
|
|
1077 | |
|
|
1078 | # read response line |
|
|
1079 | $handle->push_read (line => sub { |
|
|
1080 | my ($handle, $status) = @_; |
|
|
1081 | |
|
|
1082 | # article data following? |
|
|
1083 | if ($status =~ /^2/) { |
|
|
1084 | # yes, read article body |
|
|
1085 | |
|
|
1086 | $handle->unshift_read (line => "\012.\015\012", sub { |
|
|
1087 | my ($handle, $body) = @_; |
|
|
1088 | |
|
|
1089 | $finish->($status, $body); |
|
|
1090 | }); |
|
|
1091 | |
|
|
1092 | } else { |
|
|
1093 | # some error occured, no article data |
|
|
1094 | |
|
|
1095 | $finish->($status); |
|
|
1096 | } |
|
|
1097 | } |
|
|
1098 | |
|
|
1099 | =head3 Your own read queue handler |
|
|
1100 | |
|
|
1101 | Sometimes, your protocol doesn't play nice and uses lines or chunks of |
|
|
1102 | data, in which case you have to implement your own read parser. |
|
|
1103 | |
|
|
1104 | To make up a contorted example, imagine you are looking for an even |
|
|
1105 | number of characters followed by a colon (":"). Also imagine that |
|
|
1106 | AnyEvent::Handle had no C<regex> read type which could be used, so you'd |
|
|
1107 | had to do it manually. |
|
|
1108 | |
|
|
1109 | To implement this, you would C<push_read> (or C<unshift_read>) just a |
|
|
1110 | single code reference. |
|
|
1111 | |
|
|
1112 | This code reference will then be called each time there is (new) data |
|
|
1113 | available in the read buffer, and is expected to either eat/consume some |
|
|
1114 | of that data (and return true) or to return false to indicate that it |
|
|
1115 | wants to be called again. |
|
|
1116 | |
|
|
1117 | If the code reference returns true, then it will be removed from the read |
|
|
1118 | queue, otherwise it stays in front of it. |
|
|
1119 | |
|
|
1120 | The example above could be coded like this: |
939 | |
1121 | |
940 | $handle->push_read (sub { |
1122 | $handle->push_read (sub { |
941 | my ($handle) = @_; |
1123 | my ($handle) = @_; |
942 | |
1124 | |
943 | if ($handle->rbuf =~ s/^.*?\bend\b.*$//s) { |
1125 | # check for even number of characters + ":" |
944 | print "got 'end', existing...\n"; |
1126 | # and remove the data if a match is found. |
945 | $end_prog->broadcast; |
1127 | # if not, return false (actually nothing) |
|
|
1128 | |
|
|
1129 | $handle->{rbuf} =~ s/^( (?:..)* ) ://x |
946 | return 1 |
1130 | or return; |
|
|
1131 | |
|
|
1132 | # we got some data in $1, pass it to whoever wants it |
|
|
1133 | $finish->($1); |
|
|
1134 | |
|
|
1135 | # and return true to indicate we are done |
947 | } |
1136 | 1 |
948 | |
|
|
949 | 0 |
|
|
950 | }); |
1137 | }); |
951 | |
1138 | |
952 | $end_prog->recv; |
|
|
953 | |
|
|
954 | That's a mouthful, so let's go through it step by step: |
|
|
955 | |
|
|
956 | #!/usr/bin/perl |
|
|
957 | |
|
|
958 | use AnyEvent; |
|
|
959 | use AnyEvent::Handle; |
|
|
960 | |
|
|
961 | Nothing unexpected here, just load AnyEvent for the event functionality |
|
|
962 | and AnyEvent::Handle for your file handling needs. |
|
|
963 | |
|
|
964 | my $end_prog = AnyEvent->condvar; |
|
|
965 | |
|
|
966 | Here the program creates a so-called 'condition variable': Condition |
|
|
967 | variables are a great way to signal the completion of some event, or to |
|
|
968 | state that some condition became true (thus the name). |
|
|
969 | |
|
|
970 | This condition variable represents the condition that the program wants to |
|
|
971 | terminate. Later in the program, we will 'recv' that condition (call the |
|
|
972 | C<recv> method on it), which will wait until the condition gets signalled |
|
|
973 | (which is done by calling the C<send> method on it). |
|
|
974 | |
|
|
975 | The next step is to create the handle object: |
|
|
976 | |
|
|
977 | my $handle = |
|
|
978 | AnyEvent::Handle->new ( |
|
|
979 | fh => \*STDIN, |
|
|
980 | on_eof => sub { |
|
|
981 | print "received EOF, exiting...\n"; |
|
|
982 | $end_prog->broadcast; |
|
|
983 | }, |
|
|
984 | |
|
|
985 | This handle object will read from standard input. Setting the C<on_eof> |
|
|
986 | callback should be done for every file handle, as that is a condition that |
|
|
987 | we always need to check for when working with file handles, to prevent |
|
|
988 | reading or writing to a closed file handle, or getting stuck indefinitely |
|
|
989 | in case of an error. |
|
|
990 | |
|
|
991 | Speaking of errors: |
|
|
992 | |
|
|
993 | on_error => sub { |
|
|
994 | print "error while reading from STDIN: $!\n"; |
|
|
995 | $end_prog->broadcast; |
|
|
996 | } |
|
|
997 | ); |
|
|
998 | |
|
|
999 | The C<on_error> callback is also not required, but we set it here in case |
|
|
1000 | any error happens when we read from the file handle. It is usually a good |
|
|
1001 | idea to set this callback and at least print some diagnostic message: Even |
|
|
1002 | in our small example an error can happen. More on this later... |
|
|
1003 | |
|
|
1004 | $handle->push_read (sub { |
|
|
1005 | |
|
|
1006 | Next we push a general read callback on the read queue, which |
|
|
1007 | will wait until we have received all the data we wanted to |
|
|
1008 | receive. L<AnyEvent::Handle> has two queues per file handle, a read and a |
|
|
1009 | write queue. The write queue queues pending data that waits to be written |
|
|
1010 | to the file handle. And the read queue queues reading callbacks. For more |
|
|
1011 | details see the documentation L<AnyEvent::Handle> about the READ QUEUE and |
|
|
1012 | WRITE QUEUE. |
|
|
1013 | |
|
|
1014 | my ($handle) = @_; |
|
|
1015 | |
|
|
1016 | if ($handle->rbuf =~ s/^.*?\bend\b.*$//s) { |
|
|
1017 | print "got 'end', existing...\n"; |
|
|
1018 | $end_prog->broadcast; |
|
|
1019 | return 1 |
|
|
1020 | } |
|
|
1021 | |
|
|
1022 | 0 |
|
|
1023 | }); |
|
|
1024 | |
|
|
1025 | The actual callback waits until the word 'end' has been seen in the data |
|
|
1026 | received on standard input. Once we encounter the stop word 'end' we |
|
|
1027 | remove everything from the read buffer and call the condition variable |
|
|
1028 | we setup earlier, that signals our 'end of program' condition. And the |
|
|
1029 | callback returns with a true value, that signals we are done with reading |
|
|
1030 | all the data we were interested in (all data until the word 'end' has been |
|
|
1031 | seen). |
|
|
1032 | |
|
|
1033 | In all other cases, when the stop word has not been seen yet, we just |
|
|
1034 | return a false value, to indicate that we are not finished yet. |
|
|
1035 | |
|
|
1036 | The C<rbuf> method returns our read buffer, that we can directly modify as |
|
|
1037 | lvalue. Alternatively we also could have written: |
|
|
1038 | |
|
|
1039 | if ($handle->{rbuf} =~ s/^.*?\bend\b.*$//s) { |
|
|
1040 | |
|
|
1041 | The last line will wait for the condition that our program wants to exit: |
|
|
1042 | |
|
|
1043 | $end_prog->recv; |
|
|
1044 | |
|
|
1045 | The call to C<recv> will setup an event loop for us and wait for IO, timer |
|
|
1046 | or signal events and will handle them until the condition gets sent (by |
|
|
1047 | calling its C<send> method). |
|
|
1048 | |
|
|
1049 | The key points to learn from this example are: |
|
|
1050 | |
|
|
1051 | =over 4 |
|
|
1052 | |
|
|
1053 | =item * Condition variables are used to start an event loop. |
|
|
1054 | |
|
|
1055 | =item * How to registering some basic callbacks on AnyEvent::Handle's. |
|
|
1056 | |
|
|
1057 | =item * How to process data in the read buffer. |
|
|
1058 | |
|
|
1059 | =back |
|
|
1060 | |
1139 | |
1061 | =head1 AUTHORS |
1140 | =head1 AUTHORS |
1062 | |
1141 | |
1063 | Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>. |
1142 | Robin Redeker C<< <elmex at ta-sa.org> >>, Marc Lehmann <schmorp@schmorp.de>. |
1064 | |
1143 | |