=head1 Introduction to Coro

This tutorial will introduce you to the main features of the Coro module family.

It first introduces some basic concepts, and later gives a short overview
of the module family.

=head1 What is Coro?

Coro started as a simple module that implemented a specific form of
first class continuations called Coroutines. These basically allow you
to capture the current point execution and jump to another point, while
allowing you to return at any time, as kind of non-local jump. This is
nowadays known as a L<Coro::State>.

The natural application for these is to include a scheduler, resulting in
cooperative threads, which is the main use case for Coro today.

A thread is very much like a stripped-down perl interpreter, or a
process: Unlike a full interpreter process, a thread doesn't have it's own
variable or code namespaces - everything is shared. That means that when
one thread modifies a variable (or any value, e.g. through a reference),
then other threads immediately see this change when they look at the same
variable or location.

Cooperative means that these threads must cooperate with each other, when
it coems to CPU usage - only one thread ever has the CPU, and if another
thread wants the CPU, the running thread has to give it up. The latter
is either explicitly, by calling a function to do so, or implicity, when
waiting on a resource (such as a Semaphore, or the completion of some I/O
request).

Perl itself uses rather confusing terminilogy - what perl calls a "thread"
is actually called a "process" everywhere else: The so-called "perl
threads" are actually artifacts of the unix process emulation code used
on Windows, which is consequently why they are actually processes and not
threads. The biggest difference is that the variables (nor code) are not
shared between processes.

=head1 Cooperative Threads

Cooperative threads is what the Coro module gives you:

   use Coro;

To create a thread, you can use the C<async> function that automatically
gets exported from that module:

   async {
      print "hello\n";
   };

Async expects a code block as first argument (in indirect object
notation). You can pass it more arguments, and these will end up in C<@_>
when executing the codeblock, but since it is a closure, you can also just
refer to any lexical variables that are currently visible.

If you save the above lines in a file and execute it as a perl program,
you will not get any output.

The reasons is that, although you created a thread, and the thread is
ready to execute (because C<async> puts it into the so-called I<ready
queue>), it never gets any CPU time to actually execute, as the main
program - which also is a thread almost like any other - never gives up
the CPU but instead exits the whole program, by running off the end of the
file.

To explicitly give up the CPU, use the C<cede> function (which is often
called C<yield> in other thread implementations):

   use Coro;

   async {
      print "hello\n";
   };

   cede;

Running the above prints C<hello> and exits.

This is not very interetsing, so let's try a slightly more interesting
program:

   use Coro;

   async {
      print "async 1\n";
      cede;
      print "async 2\n";
   };

   print "main 1\n";
   cede;
   print "main 2\n";

Running this program prints:

   main 1
   async 1
   main 2
   async 2

This nicely illustrates the non-local jump ability: the main program
prints the first line, and then yields the CPU to whatever other
coroutines there are. And there there is one other, which runs and prints
"async 1", and itself yields the CPU. Since the only other coroutine
available is the main program, it continues running and so on.

In more detail, C<async> creates a new coroutine. All new coroutines
start in a suspended state. To make them run, they need to be put into
the ready queue, which is the second thing that C<async> does. Each time
a coroutine gives up the CPU, Coro runs a so-called I<scheduler>. The
scheduler selects the next coroutine from the ready queue, removes it from
the queue, and runs it.

C<cede> also does two things: first it puts the running coroutine into the
ready queue, and then it jumps into the scheduler. This has the effect of
giving up the CPU, but also ensures that, eventually, the coroutine gets
run again.

In fact, C<cede> could be implemented like this:

   sub my_cede {
      $Coro::current->ready;
      schedule;
   }

This works because C<$Coro::current> always contains the currently running
coroutine, and the scheduler itself can be called via C<Coro::schedule>.

What's the effect of just calling C<schedule>? Simple, the scheduler
selects the next ready coroutine and runs it - the current coroutine, as
it hasn't been put into the ready queue, will go to sleep until something
wakes it up.

The following example remembers the current coroutine in a variable,
creates a coroutine and then puts the main program to sleep.

The newly created coroutine uses rand to wake up the main coroutine by
calling its C<ready> method - or not.

   use Coro;

   my $wakeme = $Coro::current;

   async {
      $wakeme->ready if 0.5 < rand;
   };

   schedule;

Now, when you run it, one of two things happen: Either the C<async>
coroutine wakes up main again, in which case the program silently exits,
or it doesn't, in which case you get:

   FATAL: deadlock detected at - line 0

Why is that? When the C<async> coroutine falls of the end, it will be
terminated (via a call to C<Coro::terminate>) and the scheduler gets run
again. Since the C<async> coroutine hasn't woken up the main coroutine,
and there aren't any other coroutines, there is nothing to wake up,
and the program cannot continue. Since there I<are> coroutines that
I<could> be running (main) but none are I<ready> to do so, Coro signals a
I<deadlock> - no progress is possible.

In fact, there is an important case where progress I<is>, in fact,
possible - namely in an event-based program. In such a case, the program
could well wait for I<external> events, such as a timeout, or some data to
arrive on a socket.

Since a deadlock in such a case would not be very useful, there is a
module named L<Coro::AnyEvent> that integrates coroutines into an event
loop. It configures Coro in a way that instead of dieing with an error
message, it instead runs the event loop in the hope of receiving an event
that will wake up some coroutine.

=head2 Semaphores and other locks

Using only C<ready>, C<cede> and C<schedule> to synchronise coroutines
is difficult, especially if many coroutines are ready at the same
time. Coro supports a number of modules to help synchronising coroutines
in easier ways. The first such module is L<Coro::Semaphore>, which
implements counting semaphores (binary semaphores are also available as
L<Coro::Signal>):

   use Coro;
   use Coro::Semaphore;

   my $sem = new Coro::Semaphore 0; # a locked semaphore

   async {
      print "unlocking semaphore\n";
      $sem->up;
   };

   print "trying to lock semaphore\n";
   $sem->down;
   print "we got it!\n";

This program creates a I<locked> semaphore (a semaphore with count C<0>)
and tries to lock it.  Since the semaphore is already locked, this will
block the main coroutine until the semaphore becomes available.

This yields the CPU to the C<async> coroutine, which unlocks the
semaphore (and instantly gets terminated).

Since the semaphore is now available, the main program locks it and
continues.

Semaphores are most often used to lock resources, or to exclude other
coroutines from accessing or using a resource. For example, consider
a very costly function (that temporarily allocates a lot of ram, for
example). You wouldn't want to have many coroutines calling this function,
so you use a semaphore:

   my $lock = new Coro::Semaphore; # unlocked initially

   sub costly_function {
      $lock->down; # acquire semaphore

      # do costly operation that blocks

      $lock->up; # unlock it
   }

No matter how many coroutines call C<costly_function>, only one will run
the body of it, all others will wait in the C<down> call.

Why does the comment mention "operation the blocks"? That's because coro
threads are cooperative: unless C<costly_function> willingly gives up the
CPU, other threads of control will simply not run. This makes locking
superfluous in cases where the fucntion itself never gives up the CPU, but
when dealing with the outside world, this is rare.

Now consider what happens when the code C<die>s after executing C<down>,
but before C<up>. This will leave the semaphore in a locked state, which
usually isn't what you want: normally you would want to free the lock
again.

This is what the C<guard> method is for:

   my $lock = new Coro::Semaphore; # unlocked initially

   sub costly_function {
      my $guard = $lock->guard; # acquire guard

      # do costly operation that blocks
   }

This method C<down>s the semaphore and returns a so-called guard
object. Nothing happens as long as there are references to it, but when
all references are gone, for example, when C<costly_function> returns or
throws an exception, it will automatically call C<up> on the semaphore,
no way to forget it. even when the coroutine gets C<cancel>ed by another
coroutine will the guard object ensure that the lock is freed.

Apart from L<Coro::Semaphore> and L<Coro::Signal>, there is
also a reader-writer lock (L<Coro::RWLock>) and a semaphore set
(L<Coro::SemaphoreSet>).

=head2 Channels

Semaphores are fine, but usualyl you want to communicate by exchanging
data as well. This is where L<Coro::Channel> comes in useful: Channels
are the Coro equivalent of a unix pipe - you cna put stuff into it on one
side, and read data from it on the other.

Here is a simple example that creates a coroutine and sends it
numbers. The coroutine calculates the square of the number, and puts it
into another channel, which the main coroutine reads the result from:

   use Coro;
   use Coro::Channel;

   my $calculate = new Coro::Channel;
   my $result    = new Coro::Channel;

   async {
      # endless loop
      while () {
         my $num = $calculate->get; # read a number
         $num **= 2; # square it
         $result->put ($num); # put the result into the result queue
      }
   };

   for (1, 2, 5, 10, 77) {
      $calculate->put ($_);
      print "$_ ** 2 = ", $result->get, "\n";
   }

Gives:

   1 ** 2 = 1
   2 ** 2 = 4
   5 ** 2 = 25
   10 ** 2 = 100
   77 ** 2 = 5929

Be careful, however: when multiple coroutines put numbers into
C<$calculate> and read from C<$result>, they won't know which result is
theirs. The solution for this is to ither use a semaphore, or send not
just the number, but also your own private result channel.

L<Coro::Channel> can buffer some amount of items. It is also very
instructive to read it's source code, as it is very simple and uses two
counting semaphores internally for synchronisation.

=head2 Debugging

Sometimes it can be useful to find out what each coroutine is doing (or
which coroutines exist in the first place). The L<Coro::Debug> module has
(among other goodies), a function that allows you to print a "ps"-like
listing:

   use Coro::Debug;

   Coro::Debug::command "ps";

Running it just after C<< $calculate->get >> outputs something similar to this:

        PID SC  RSS USES Description              Where
    8917312 -C  22k    0 [main::]                 [introscript:20]
    8964448 N-  152    0 [coro manager]           -
    8964520 N-  152    0 [unblock_sub scheduler]  -
    8591752 UC  152    1                          [introscript:12]
   11546944 N-  152    0 [EV idle process]        -

Interesting - there is more going on in the background than one would
expect. Ignoring the extra coroutines, the main coroutine has pid
C<8917312>, and the one started by C<async> has pid C<8591752>.

The latter is also the only coroutine that doesn't have a description,
simply because we haven't set one. Setting one is easy, just put it into
C<< $Coro::current->{desc} >>:

   async {
      $Coro::current->{desc} = "cruncher";
      ...
   };

This can be rather useful when debugging a program, or when using the
interactive debug shell of L<Coro::Debug>.

=head1 The Real World - Event Loops

Coro really wants to run in a program using some event loop. In fact, most
real-world programs using Coro's threads are written in a combination of
event-based and thread-based techniques, as it is easy to gett he best of
both worlds with Coro.

Coro integrates well into any event loop supported by L<AnyEvent>, simply
by C<use>ing L<Coro::AnyEvent>, but can take special advantage of the
L<EV> and L<Event> modules.

Here is a simple finger client, using whatever event loop L<AnyEvent>
comes up with (L<Coro::Socket> automatically initialises all the event
stuff):

   use Coro;
   use Coro::Socket;

   sub finger {
      my ($user, $host) = @_;

      my $fh = new Coro::Socket PeerHost => $host, PeerPort => "finger"
         or die "$user\@$host: $!";

      print $fh "$user\n";

      print "$user\@$host: $_" while <$fh>;
      print "$user\@$host: done\n";
   }

   # now finger a few accounts
   for (
      (async { finger "abc", "cornell.edu" }),
      (async { finger "sebbo", "world.std.com" }),
      (async { finger "trouble", "noc.dfn.de" }),
   ) {
      $_->join; # wait for the result
   }

There are quite a few new things here. First of all, there
is L<Coro::Socket>. This module works much the same way as
L<IO::Socket::INET>, except that it is coroutine-aware. This means that
L<IO::Socket::INET>, when waiting for the network, will block the whole
process - that means all threads, which is clearly undesirable.

On the other hand, L<Coro::Socket> knows how to give up the CPU to other
coroutines when it waits for the network, which makes parallel processing
possible.

The other new thing is the C<join> method: All we want to do in this
example is start three C<async> coroutines and only exit when they have
done their job. This could be done using a counting semaphore, but it is
much simpler to synchronously wait for them to C<terminate>, which is
exactly what the C<join> method does.

It doesn't matter that the three async's will probably finish in a
different order then the for loop C<join>s them - when the coroutine
is still running, C<join> simply waits. If the coroutine has already
terminated, it will simply fetch its return status.

If you are experienced in event-based programming, you will see that the
above program doesn't quite follow the normal pattern, where you start
some work, and then run the event loop (e.v. C<EV::loop>).

In fact, nontrivial programs follow this pattern even with Coro, so a Coro
program that uses EV usually looks like this:

   use EV;
   use Coro;
   use Coro::AnyEvent;

   # start coroutines or event watchers

   EV::loop; # and loop

In fact, for debugging, you often do something like this:

   use EV;
   use Coro::Debug;

   my $shell = new_unix_server Coro::Debug "/tmp/myshell";

   EV::loop; # and loop

This runs your program, but also an interactive shell on the unix domain
socket in F</tmp/myshell>. You can use the F<socat> program to access it:

   # socat readline /tmp/myshell
   coro debug session. use help for more info

   > ps
           PID SC  RSS USES Description              Where
     136672312 RC  19k 177k [main::]                 [myprog:28]
     136710424 -- 1268   48 [coro manager]           [Coro.pm:349]
   > help
   ps [w|v]                show the list of all coroutines (wide, verbose)
   bt <pid>                show a full backtrace of coroutine <pid>
   eval <pid> <perl>       evaluate <perl> expression in context of <pid>
   trace <pid>             enable tracing for this coroutine
   untrace <pid>           disable tracing for this coroutine
   kill <pid> <reason>     throws the given <reason> string in <pid>
   cancel <pid>            cancels this coroutine
   ready <pid>             force <pid> into the ready queue
   <anything else>         evaluate as perl and print results
   <anything else> &       same as above, but evaluate asynchronously
                           you can use (find_coro <pid>) in perl expressions
                           to find the coro with the given pid, e.g.
                           (find_coro 9768720)->ready
   loglevel <int>          enable logging for messages of level <int> and lower
   exit                    end this session

Microsft victims can of course use the even less secure C<new_tcp_server>
constructor.

=head2 The Real World - File I/O

Disk I/O, while often much faster than the network, nevertheless can take
quite a long time in which the CPU could do other things, if one would
only be able to do something.

Fortunately, the L<IO::AIO> module on CPAN allows you to move these
I/O calls into the background, letting you do useful work in the
foreground. It is event-/callback-based, but Coro has a nice interface to
it, called L<Coro::AIO>, which let's you use its functions naturally from
within coroutines:

   use Fcntl;
   use Coro::AIO;

   my $fh = aio_open "$filename~", O_WRONLY | O_CREAT, 0600
      or die "$filename~: $!";

   aio_write $fh, 0, (length $data), $data, 0;
   aio_fsync $fh;
   aio_close $fh;
   aio_rename "$filename~", "$filename";

The above creates a new file, writes data into it, syncs the data to disk
and atomically replaces a base file with a new copy.

=head1 Other Modules

This introduction only mentions a very few methods and modules, Coro has
many other functions (see the L<Coro> manpage) and modules (documented in
the C<SEE ALSO> section of the L<Coro> manpage).

Noteworthy modules are L<Coro::LWP> (for parallel LWP request, but
see L<AnyEvent::HTTP> for a better, but more limited, alternative),
L<Coro::BDB>, for when you need an asynchronous database, L<Coro::Handle>,
when you need to use any file handle in a coroutine (popular to access
C<STDIN> and C<STDOUT>) and L<Coro::EV>, the optimised interface to L<EV>
(which gets used automatically by L<Coro::AnyEvent>).

=head1 AUTHOR

   Marc Lehmann <schmorp@schmorp.de>
   http://home.schmorp.de/