--- cvsroot/Coro/Coro.pm	2009/04/16 20:07:21	1.252
+++ cvsroot/Coro/Coro.pm	2009/10/01 23:16:27	1.268
@@ -42,14 +42,14 @@
 thread models.
 
 Unlike the so-called "Perl threads" (which are not actually real threads
-but only the windows process emulation ported to unix, and as such act
-as processes), Coro provides a full shared address space, which makes
-communication between threads very easy. And Coro's threads are fast,
-too: disabling the Windows process emulation code in your perl and using
-Coro can easily result in a two to four times speed increase for your
-programs. A parallel matrix multiplication benchmark runs over 300 times
-faster on a single core than perl's pseudo-threads on a quad core using
-all four cores.
+but only the windows process emulation (see section of same name for more
+details) ported to unix, and as such act as processes), Coro provides
+a full shared address space, which makes communication between threads
+very easy. And Coro's threads are fast, too: disabling the Windows
+process emulation code in your perl and using Coro can easily result in
+a two to four times speed increase for your programs. A parallel matrix
+multiplication benchmark runs over 300 times faster on a single core than
+perl's pseudo-threads on a quad core using all four cores.
 
 Coro achieves that by supporting multiple running interpreters that share
 data, which is especially useful to code pseudo-parallel processes and
@@ -69,8 +69,9 @@
 
 package Coro;
 
-use strict qw(vars subs);
-no warnings "uninitialized";
+use common::sense;
+
+use Carp ();
 
 use Guard ();
 
@@ -82,7 +83,7 @@
 our $main;    # main coro
 our $current; # current coro
 
-our $VERSION = 5.131;
+our $VERSION = 5.17;
 
 our @EXPORT = qw(async async_pool cede schedule terminate current unblock_sub);
 our %EXPORT_TAGS = (
@@ -155,8 +156,8 @@
 =cut
 
 $idle = sub {
-   require Carp;
-   Carp::croak ("FATAL: deadlock detected");
+   warn "oi\n";#d#
+   Carp::confess ("FATAL: deadlock detected");
 };
 
 # this coro is necessary because a coro
@@ -209,14 +210,6 @@
       print "@_\n";
    } 1,2,3,4;
 
-=cut
-
-sub async(&@) {
-   my $coro = new Coro @_;
-   $coro->ready;
-   $coro
-}
-
 =item async_pool { ... } [@args...]
 
 Similar to C<async>, but uses a coro pool, so you should not call
@@ -340,9 +333,9 @@
 does, and are useful when you want to localise some resource to a specific
 coro.
 
-They slow down coro switching considerably for coros that use
-them (But coro switching is still reasonably fast if the handlers are
-fast).
+They slow down thread switching considerably for coros that use them
+(about 40% for a BLOCK with a single assignment, so thread switching is
+still reasonably fast if the handlers are fast).
 
 These functions are best understood by an example: The following function
 will change the current timezone to "Antarctica/South_Pole", which
@@ -376,6 +369,36 @@
 working directory etc.) to a block, despite the existance of other
 coros.
 
+Another interesting example implements time-sliced multitasking using
+interval timers (this could obviously be optimised, but does the job):
+
+   # "timeslice" the given block
+   sub timeslice(&) {
+      use Time::HiRes ();
+
+      Coro::on_enter {
+         # on entering the thread, we set an VTALRM handler to cede
+         $SIG{VTALRM} = sub { cede };
+         # and then start the interval timer
+         Time::HiRes::setitimer &Time::HiRes::ITIMER_VIRTUAL, 0.01, 0.01;
+      }; 
+      Coro::on_leave {
+         # on leaving the thread, we stop the interval timer again
+         Time::HiRes::setitimer &Time::HiRes::ITIMER_VIRTUAL, 0, 0;
+      }; 
+
+      &{+shift};
+   }  
+
+   # use like this:
+   timeslice {
+      # The following is an endless loop that would normally
+      # monopolise the process. Since it runs in a timesliced
+      # environment, it will regularly cede to other threads.
+      while () { }
+   }; 
+
+
 =item killall
 
 Kills/terminates/cancels all coros except the currently running one.
@@ -723,7 +746,9 @@
 
 As soon as the callback is invoked (or when the callback was invoked
 before C<rouse_wait>), it will return the arguments originally passed to
-the rouse callback.
+the rouse callback. In scalar context, that means you get the I<last>
+argument, just as if C<rouse_wait> had a C<return ($a1, $a2, $a3...)>
+statement at the end.
 
 See the section B<HOW TO WAIT FOR A CALLBACK> for an actual usage example.
 
@@ -832,6 +857,67 @@
 =back
 
 
+=head1 WINDOWS PROCESS EMULATION
+
+A great many people seem to be confused about ithreads (for example, Chip
+Salzenberg called me unintelligent, incapable, stupid and gullible,
+while in the same mail making rather confused statements about perl
+ithreads (for example, that memory or files would be shared), showing his
+lack of understanding of this area - if it is hard to understand for Chip,
+it is probably not obvious to everybody).
+
+What follows is an ultra-condensed version of my talk about threads in
+scripting languages given onthe perl workshop 2009:
+
+The so-called "ithreads" were originally implemented for two reasons:
+first, to (badly) emulate unix processes on native win32 perls, and
+secondly, to replace the older, real thread model ("5.005-threads").
+
+It does that by using threads instead of OS processes. The difference
+between processes and threads is that threads share memory (and other
+state, such as files) between threads within a single process, while
+processes do not share anything (at least not semantically). That
+means that modifications done by one thread are seen by others, while
+modifications by one process are not seen by other processes.
+
+The "ithreads" work exactly like that: when creating a new ithreads
+process, all state is copied (memory is copied physically, files and code
+is copied logically). Afterwards, it isolates all modifications. On UNIX,
+the same behaviour can be achieved by using operating system processes,
+except that UNIX typically uses hardware built into the system to do this
+efficiently, while the windows process emulation emulates this hardware in
+software (rather efficiently, but of course it is still much slower than
+dedicated hardware).
+
+As mentioned before, loading code, modifying code, modifying data
+structures and so on is only visible in the ithreads process doing the
+modification, not in other ithread processes within the same OS process.
+
+This is why "ithreads" do not implement threads for perl at all, only
+processes. What makes it so bad is that on non-windows platforms, you can
+actually take advantage of custom hardware for this purpose (as evidenced
+by the forks module, which gives you the (i-) threads API, just much
+faster).
+
+Sharing data is in the i-threads model is done by transfering data
+structures between threads using copying semantics, which is very slow -
+shared data simply does not exist. Benchmarks using i-threads which are
+communication-intensive show extremely bad behaviour with i-threads (in
+fact, so bad that Coro, which cannot take direct advantage of multiple
+CPUs, is often orders of magnitude faster because it shares data using
+real threads, refer to my talk for details).
+
+As summary, i-threads *use* threads to implement processes, while
+the compatible forks module *uses* processes to emulate, uhm,
+processes. I-threads slow down every perl program when enabled, and
+outside of windows, serve no (or little) practical purpose, but
+disadvantages every single-threaded Perl program.
+
+This is the reason that I try to avoid the name "ithreads", as it is
+misleading as it implies that it implements some kind of thread model for
+perl, and prefer the name "windows process emulation", which describes the
+actual use and behaviour of it much better.
+
 =head1 SEE ALSO
 
 Event-Loop integration: L<Coro::AnyEvent>, L<Coro::EV>, L<Coro::Event>.