--- libev/ev.pod	2009/12/29 13:11:00	1.276
+++ libev/ev.pod	2011/07/12 23:32:10	1.379
@@ -28,8 +28,8 @@
      // with its corresponding stop function.
      ev_io_stop (EV_A_ w);
 
-     // this causes all nested ev_loop's to stop iterating
-     ev_unloop (EV_A_ EVUNLOOP_ALL);
+     // this causes all nested ev_run's to stop iterating
+     ev_break (EV_A_ EVBREAK_ALL);
    }
 
    // another callback, this time for a time-out
@@ -37,15 +37,15 @@
    timeout_cb (EV_P_ ev_timer *w, int revents)
    {
      puts ("timeout");
-     // this causes the innermost ev_loop to stop iterating
-     ev_unloop (EV_A_ EVUNLOOP_ONE);
+     // this causes the innermost ev_run to stop iterating
+     ev_break (EV_A_ EVBREAK_ONE);
    }
 
    int
    main (void)
    {
      // use the default event loop unless you have special needs
-     struct ev_loop *loop = ev_default_loop (0);
+     struct ev_loop *loop = EV_DEFAULT;
 
      // initialise an io watcher, then start it
      // this one will watch for stdin to become readable
@@ -58,9 +58,9 @@
      ev_timer_start (loop, &timeout_watcher);
 
      // now wait for events to arrive
-     ev_loop (loop, 0);
+     ev_run (loop, 0);
 
-     // unloop was called, so exit
+     // break was called, so exit
      return 0;
    }
 
@@ -77,9 +77,17 @@
 on event-based programming, nor will it introduce event-based programming
 with libev.
 
-Familarity with event based programming techniques in general is assumed
+Familiarity with event based programming techniques in general is assumed
 throughout this document.
 
+=head1 WHAT TO READ WHEN IN A HURRY
+
+This manual tries to be very detailed, but unfortunately, this also makes
+it very long. If you just want to know the basics of libev, I suggest
+reading L<ANATOMY OF A WATCHER>, then the L<EXAMPLE PROGRAM> above and
+look up the missing functions in L<GLOBAL FUNCTIONS> and the C<ev_io> and
+C<ev_timer> sections in L<WATCHER TYPES>.
+
 =head1 ABOUT LIBEV
 
 Libev is an event loop: you register interest in certain events (such as a
@@ -126,13 +134,14 @@
 =head2 TIME REPRESENTATION
 
 Libev represents time as a single floating point number, representing
-the (fractional) number of seconds since the (POSIX) epoch (somewhere
-near the beginning of 1970, details are complicated, don't ask). This
-type is called C<ev_tstamp>, which is what you should use too. It usually
-aliases to the C<double> type in C. When you need to do any calculations
-on it, you should treat it as some floating point value. Unlike the name
-component C<stamp> might indicate, it is also used for time differences
-throughout libev.
+the (fractional) number of seconds since the (POSIX) epoch (in practice
+somewhere near the beginning of 1970, details are complicated, don't
+ask). This type is called C<ev_tstamp>, which is what you should use
+too. It usually aliases to the C<double> type in C. When you need to do
+any calculations on it, you should treat it as some floating point value.
+
+Unlike the name component C<stamp> might indicate, it is also used for
+time differences (e.g. delays) throughout libev.
 
 =head1 ERROR HANDLING
 
@@ -166,13 +175,20 @@
 
 Returns the current time as libev would use it. Please note that the
 C<ev_now> function is usually faster and also often returns the timestamp
-you actually want to know.
+you actually want to know. Also interesting is the combination of
+C<ev_update_now> and C<ev_now>.
 
 =item ev_sleep (ev_tstamp interval)
 
-Sleep for the given interval: The current thread will be blocked until
-either it is interrupted or the given time interval has passed. Basically
-this is a sub-second-resolution C<sleep ()>.
+Sleep for the given interval: The current thread will be blocked
+until either it is interrupted or the given time interval has
+passed (approximately - it might return a bit earlier even if not
+interrupted). Returns immediately if C<< interval <= 0 >>.
+
+Basically this is a sub-second-resolution C<sleep ()>.
+
+The range of the C<interval> is limited - libev only guarantees to work
+with sleep times of up to one day (C<< interval <= 86400 >>).
 
 =item int ev_version_major ()
 
@@ -193,7 +209,8 @@
 not a problem.
 
 Example: Make sure we haven't accidentally been linked against the wrong
-version.
+version (note, however, that this will not detect other ABI mismatches,
+such as LFS or reentrancy).
 
    assert (("libev version mismatch",
             ev_version_major () == EV_VERSION_MAJOR
@@ -214,24 +231,25 @@
 
 =item unsigned int ev_recommended_backends ()
 
-Return the set of all backends compiled into this binary of libev and also
-recommended for this platform. This set is often smaller than the one
-returned by C<ev_supported_backends>, as for example kqueue is broken on
-most BSDs and will not be auto-detected unless you explicitly request it
-(assuming you know what you are doing). This is the set of backends that
-libev will probe for if you specify no backends explicitly.
+Return the set of all backends compiled into this binary of libev and
+also recommended for this platform, meaning it will work for most file
+descriptor types. This set is often smaller than the one returned by
+C<ev_supported_backends>, as for example kqueue is broken on most BSDs
+and will not be auto-detected unless you explicitly request it (assuming
+you know what you are doing). This is the set of backends that libev will
+probe for if you specify no backends explicitly.
 
 =item unsigned int ev_embeddable_backends ()
 
 Returns the set of backends that are embeddable in other event loops. This
-is the theoretical, all-platform, value. To find which backends
-might be supported on the current system, you would need to look at
-C<ev_embeddable_backends () & ev_supported_backends ()>, likewise for
-recommended ones.
+value is platform-specific but can include backends not available on the
+current system. To find which embeddable backends might be supported on
+the current system, you would need to look at C<ev_embeddable_backends ()
+& ev_supported_backends ()>, likewise for recommended ones.
 
 See the description of C<ev_embed> watchers for more info.
 
-=item ev_set_allocator (void *(*cb)(void *ptr, long size)) [NOT REENTRANT]
+=item ev_set_allocator (void *(*cb)(void *ptr, long size))
 
 Sets the allocation function to use (the prototype is similar - the
 semantics are identical to the C<realloc> C89/SuS/POSIX function). It is
@@ -267,7 +285,7 @@
    ...
    ev_set_allocator (persistent_realloc);
 
-=item ev_set_syserr_cb (void (*cb)(const char *msg)); [NOT REENTRANT]
+=item ev_set_syserr_cb (void (*cb)(const char *msg))
 
 Set the callback function to call on a retryable system call error (such
 as failed select, poll, epoll_wait). The message is a printable string
@@ -289,40 +307,78 @@
    ...
    ev_set_syserr_cb (fatal_error);
 
+=item ev_feed_signal (int signum)
+
+This function can be used to "simulate" a signal receive. It is completely
+safe to call this function at any time, from any context, including signal
+handlers or random threads.
+
+Its main use is to customise signal handling in your process, especially
+in the presence of threads. For example, you could block signals
+by default in all threads (and specifying C<EVFLAG_NOSIGMASK> when
+creating any loops), and in one thread, use C<sigwait> or any other
+mechanism to wait for signals, then "deliver" them to libev by calling
+C<ev_feed_signal>.
+
 =back
 
-=head1 FUNCTIONS CONTROLLING THE EVENT LOOP
+=head1 FUNCTIONS CONTROLLING EVENT LOOPS
 
-An event loop is described by a C<struct ev_loop *> (the C<struct>
-is I<not> optional in this case, as there is also an C<ev_loop>
-I<function>).
+An event loop is described by a C<struct ev_loop *> (the C<struct> is
+I<not> optional in this case unless libev 3 compatibility is disabled, as
+libev 3 had an C<ev_loop> function colliding with the struct name).
 
 The library knows two types of such loops, the I<default> loop, which
-supports signals and child events, and dynamically created loops which do
-not.
+supports child process events, and dynamically created event loops which
+do not.
 
 =over 4
 
 =item struct ev_loop *ev_default_loop (unsigned int flags)
 
-This will initialise the default event loop if it hasn't been initialised
-yet and return it. If the default loop could not be initialised, returns
-false. If it already was initialised it simply returns it (and ignores the
-flags. If that is troubling you, check C<ev_backend ()> afterwards).
+This returns the "default" event loop object, which is what you should
+normally use when you just need "the event loop". Event loop objects and
+the C<flags> parameter are described in more detail in the entry for
+C<ev_loop_new>.
+
+If the default loop is already initialised then this function simply
+returns it (and ignores the flags. If that is troubling you, check
+C<ev_backend ()> afterwards). Otherwise it will create it with the given
+flags, which should almost always be C<0>, unless the caller is also the
+one calling C<ev_run> or otherwise qualifies as "the main program".
 
 If you don't know what event loop to use, use the one returned from this
-function.
+function (or via the C<EV_DEFAULT> macro).
 
 Note that this function is I<not> thread-safe, so if you want to use it
-from multiple threads, you have to lock (note also that this is unlikely,
-as loops cannot be shared easily between threads anyway).
+from multiple threads, you have to employ some kind of mutex (note also
+that this case is unlikely, as loops cannot be shared easily between
+threads anyway).
+
+The default loop is the only loop that can handle C<ev_child> watchers,
+and to do this, it always registers a handler for C<SIGCHLD>. If this is
+a problem for your application you can either create a dynamic loop with
+C<ev_loop_new> which doesn't do that, or you can simply overwrite the
+C<SIGCHLD> signal handler I<after> calling C<ev_default_init>.
+
+Example: This is the most typical usage.
+
+   if (!ev_default_loop (0))
+     fatal ("could not initialise libev, bad $LIBEV_FLAGS in environment?");
+
+Example: Restrict libev to the select and poll backends, and do not allow
+environment settings to be taken into account:
+
+   ev_default_loop (EVBACKEND_POLL | EVBACKEND_SELECT | EVFLAG_NOENV);
+
+=item struct ev_loop *ev_loop_new (unsigned int flags)
+
+This will create and initialise a new event loop object. If the loop
+could not be initialised, returns false.
 
-The default loop is the only loop that can handle C<ev_signal> and
-C<ev_child> watchers, and to do this, it always registers a handler
-for C<SIGCHLD>. If this is a problem for your application you can either
-create a dynamic loop with C<ev_loop_new> that doesn't do that, or you
-can simply overwrite the C<SIGCHLD> signal handler I<after> calling
-C<ev_default_init>.
+This function is thread-safe, and one common way to use libev with
+threads is indeed to create one loop per thread, and using the default
+loop in the "main" or "initial" thread.
 
 The flags argument can be used to specify special behaviour or specific
 backends to use, and is usually specified as C<0> (or C<EVFLAG_AUTO>).
@@ -347,9 +403,8 @@
 
 =item C<EVFLAG_FORKCHECK>
 
-Instead of calling C<ev_default_fork> or C<ev_loop_fork> manually after
-a fork, you can also make libev check for a fork in each iteration by
-enabling this flag.
+Instead of calling C<ev_loop_fork> manually after a fork, you can also
+make libev check for a fork in each iteration by enabling this flag.
 
 This works by calling C<getpid ()> on every iteration of the loop,
 and thus this might slow down your event loop if you do a lot of loop
@@ -368,17 +423,37 @@
 =item C<EVFLAG_NOINOTIFY>
 
 When this flag is specified, then libev will not attempt to use the
-I<inotify> API for it's C<ev_stat> watchers. Apart from debugging and
+I<inotify> API for its C<ev_stat> watchers. Apart from debugging and
 testing, this flag can be useful to conserve inotify file descriptors, as
 otherwise each loop using C<ev_stat> watchers consumes one inotify handle.
 
-=item C<EVFLAG_NOSIGFD>
+=item C<EVFLAG_SIGNALFD>
 
-When this flag is specified, then libev will not attempt to use the
-I<signalfd> API for it's C<ev_signal> (and C<ev_child>) watchers. This is
-probably only useful to work around any bugs in libev. Consequently, this
-flag might go away once the signalfd functionality is considered stable,
-so it's useful mostly in environment variables and not in program code.
+When this flag is specified, then libev will attempt to use the
+I<signalfd> API for its C<ev_signal> (and C<ev_child>) watchers. This API
+delivers signals synchronously, which makes it both faster and might make
+it possible to get the queued signal data. It can also simplify signal
+handling with threads, as long as you properly block signals in your
+threads that are not interested in handling them.
+
+Signalfd will not be used by default as this changes your signal mask, and
+there are a lot of shoddy libraries and programs (glib's threadpool for
+example) that can't properly initialise their signal masks.
+
+=item C<EVFLAG_NOSIGMASK>
+
+When this flag is specified, then libev will avoid to modify the signal
+mask. Specifically, this means you have to make sure signals are unblocked
+when you want to receive them.
+
+This behaviour is useful when you want to do your own signal handling, or
+want to handle signals only in specific threads and want to avoid libev
+unblocking the signals.
+
+It's also required by POSIX in a threaded program, as libev calls
+C<sigprocmask>, whose behaviour is officially unspecified.
+
+This flag's behaviour will become the default in future versions of libev.
 
 =item C<EVBACKEND_SELECT>  (value 1, portable select backend)
 
@@ -416,27 +491,38 @@
 Use the linux-specific epoll(7) interface (for both pre- and post-2.6.9
 kernels).
 
-For few fds, this backend is a bit little slower than poll and select,
-but it scales phenomenally better. While poll and select usually scale
-like O(total_fds) where n is the total number of fds (or the highest fd),
-epoll scales either O(1) or O(active_fds).
+For few fds, this backend is a bit little slower than poll and select, but
+it scales phenomenally better. While poll and select usually scale like
+O(total_fds) where total_fds is the total number of fds (or the highest
+fd), epoll scales either O(1) or O(active_fds).
 
 The epoll mechanism deserves honorable mention as the most misdesigned
 of the more advanced event mechanisms: mere annoyances include silently
 dropping file descriptors, requiring a system call per change per file
-descriptor (and unnecessary guessing of parameters), problems with dup and
-so on. The biggest issue is fork races, however - if a program forks then
-I<both> parent and child process have to recreate the epoll set, which can
-take considerable time (one syscall per file descriptor) and is of course
-hard to detect.
-
-Epoll is also notoriously buggy - embedding epoll fds I<should> work, but
-of course I<doesn't>, and epoll just loves to report events for totally
-I<different> file descriptors (even already closed ones, so one cannot
-even remove them from the set) than registered in the set (especially
-on SMP systems). Libev tries to counter these spurious notifications by
-employing an additional generation counter and comparing that against the
-events to filter out spurious ones, recreating the set when required.
+descriptor (and unnecessary guessing of parameters), problems with dup,
+returning before the timeout value, resulting in additional iterations
+(and only giving 5ms accuracy while select on the same platform gives
+0.1ms) and so on. The biggest issue is fork races, however - if a program
+forks then I<both> parent and child process have to recreate the epoll
+set, which can take considerable time (one syscall per file descriptor)
+and is of course hard to detect.
+
+Epoll is also notoriously buggy - embedding epoll fds I<should> work,
+but of course I<doesn't>, and epoll just loves to report events for
+totally I<different> file descriptors (even already closed ones, so
+one cannot even remove them from the set) than registered in the set
+(especially on SMP systems). Libev tries to counter these spurious
+notifications by employing an additional generation counter and comparing
+that against the events to filter out spurious ones, recreating the set
+when required. Epoll also erroneously rounds down timeouts, but gives you
+no way to know when and by how much, so sometimes you have to busy-wait
+because epoll returns immediately despite a nonzero timeout. And last
+not least, it also refuses to work with some file descriptors which work
+perfectly fine with C<select> (files, many character devices...).
+
+Epoll is truly the train wreck among event poll mechanisms, a frankenpoll,
+cobbled together in a hurry, no thought to design or interaction with
+others. Oh, the pain, will it ever stop...
 
 While stopping, setting and starting an I/O watcher in the same iteration
 will result in some caching, there is still a system call per such
@@ -512,19 +598,25 @@
 This uses the Solaris 10 event port mechanism. As with everything on Solaris,
 it's really slow, but it still scales very well (O(active_fds)).
 
-Please note that Solaris event ports can deliver a lot of spurious
-notifications, so you need to use non-blocking I/O or other means to avoid
-blocking when no data (or space) is available.
-
 While this backend scales well, it requires one system call per active
 file descriptor per loop iteration. For small and medium numbers of file
 descriptors a "slow" C<EVBACKEND_SELECT> or C<EVBACKEND_POLL> backend
 might perform better.
 
-On the positive side, with the exception of the spurious readiness
-notifications, this backend actually performed fully to specification
-in all tests and is fully embeddable, which is a rare feat among the
-OS-specific backends (I vastly prefer correctness over speed hacks).
+On the positive side, this backend actually performed fully to
+specification in all tests and is fully embeddable, which is a rare feat
+among the OS-specific backends (I vastly prefer correctness over speed
+hacks).
+
+On the negative side, the interface is I<bizarre> - so bizarre that
+even sun itself gets it wrong in their code examples: The event polling
+function sometimes returns events to the caller even though an error
+occurred, but with no indication whether it has done so or not (yes, it's
+even documented that way) - deadly for edge-triggered interfaces where you
+absolutely have to know whether an event occurred or not because you have
+to re-arm the watcher.
+
+Fortunately libev seems to be able to work around these idiocies.
 
 This backend maps C<EV_READ> and C<EV_WRITE> in the same way as
 C<EVBACKEND_POLL>.
@@ -535,7 +627,15 @@
 with C<EVFLAG_AUTO>). Since this is a mask, you can do stuff such as
 C<EVBACKEND_ALL & ~EVBACKEND_KQUEUE>.
 
-It is definitely not recommended to use this flag.
+It is definitely not recommended to use this flag, use whatever
+C<ev_recommended_backends ()> returns, or simply do not specify a backend
+at all.
+
+=item C<EVBACKEND_MASK>
+
+Not a backend at all, but a mask to select all backend bits from a
+C<flags> value, in case you want to mask out any backends from a flags
+value (e.g. when modifying the C<LIBEV_FLAGS> environment variable).
 
 =back
 
@@ -544,43 +644,20 @@
 here). If none are specified, all backends in C<ev_recommended_backends
 ()> will be tried.
 
-Example: This is the most typical usage.
-
-   if (!ev_default_loop (0))
-     fatal ("could not initialise libev, bad $LIBEV_FLAGS in environment?");
-
-Example: Restrict libev to the select and poll backends, and do not allow
-environment settings to be taken into account:
-
-   ev_default_loop (EVBACKEND_POLL | EVBACKEND_SELECT | EVFLAG_NOENV);
-
-Example: Use whatever libev has to offer, but make sure that kqueue is
-used if available (warning, breaks stuff, best use only with your own
-private event loop and only if you know the OS supports your types of
-fds):
-
-   ev_default_loop (ev_recommended_backends () | EVBACKEND_KQUEUE);
-
-=item struct ev_loop *ev_loop_new (unsigned int flags)
-
-Similar to C<ev_default_loop>, but always creates a new event loop that is
-always distinct from the default loop. Unlike the default loop, it cannot
-handle signal and child watchers, and attempts to do so will be greeted by
-undefined behaviour (or a failed assertion if assertions are enabled).
-
-Note that this function I<is> thread-safe, and the recommended way to use
-libev with threads is indeed to create one loop per thread, and using the
-default loop in the "main" or "initial" thread.
-
 Example: Try to create a event loop that uses epoll and nothing else.
 
    struct ev_loop *epoller = ev_loop_new (EVBACKEND_EPOLL | EVFLAG_NOENV);
    if (!epoller)
      fatal ("no epoll found here, maybe it hides under your chair");
 
-=item ev_default_destroy ()
+Example: Use whatever libev has to offer, but make sure that kqueue is
+used if available.
+
+   struct ev_loop *loop = ev_loop_new (ev_recommended_backends () | EVBACKEND_KQUEUE);
 
-Destroys the default loop again (frees all memory and kernel state
+=item ev_loop_destroy (loop)
+
+Destroys an event loop object (frees all memory and kernel state
 etc.). None of the active event watchers will be stopped in the normal
 sense, so e.g. C<ev_is_active> might still return true. It is your
 responsibility to either stop all watchers cleanly yourself I<before>
@@ -592,68 +669,79 @@
 handlers), will not be freed by this function, and related watchers (such
 as signal and child watchers) would need to be stopped manually.
 
-In general it is not advisable to call this function except in the
-rare occasion where you really need to free e.g. the signal handling
-pipe fds. If you need dynamically allocated loops it is better to use
-C<ev_loop_new> and C<ev_loop_destroy>.
+This function is normally used on loop objects allocated by
+C<ev_loop_new>, but it can also be used on the default loop returned by
+C<ev_default_loop>, in which case it is not thread-safe.
+
+Note that it is not advisable to call this function on the default loop
+except in the rare occasion where you really need to free its resources.
+If you need dynamically allocated loops it is better to use C<ev_loop_new>
+and C<ev_loop_destroy>.
 
-=item ev_loop_destroy (loop)
-
-Like C<ev_default_destroy>, but destroys an event loop created by an
-earlier call to C<ev_loop_new>.
-
-=item ev_default_fork ()
+=item ev_loop_fork (loop)
 
-This function sets a flag that causes subsequent C<ev_loop> iterations
-to reinitialise the kernel state for backends that have one. Despite the
+This function sets a flag that causes subsequent C<ev_run> iterations to
+reinitialise the kernel state for backends that have one. Despite the
 name, you can call it anytime, but it makes most sense after forking, in
-the child process (or both child and parent, but that again makes little
-sense). You I<must> call it in the child before using any of the libev
-functions, and it will only take effect at the next C<ev_loop> iteration.
+the child process. You I<must> call it (or use C<EVFLAG_FORKCHECK>) in the
+child before resuming or calling C<ev_run>.
+
+Again, you I<have> to call it on I<any> loop that you want to re-use after 
+a fork, I<even if you do not plan to use the loop in the parent>. This is
+because some kernel interfaces *cough* I<kqueue> *cough* do funny things
+during fork.
 
 On the other hand, you only need to call this function in the child
-process if and only if you want to use the event library in the child. If
-you just fork+exec, you don't have to call it at all.
+process if and only if you want to use the event loop in the child. If
+you just fork+exec or create a new loop in the child, you don't have to
+call it at all (in fact, C<epoll> is so badly broken that it makes a
+difference, but libev will usually detect this case on its own and do a
+costly reset of the backend).
 
 The function itself is quite fast and it's usually not a problem to call
-it just in case after a fork. To make this easy, the function will fit in
-quite nicely into a call to C<pthread_atfork>:
+it just in case after a fork.
 
-    pthread_atfork (0, 0, ev_default_fork);
+Example: Automate calling C<ev_loop_fork> on the default loop when
+using pthreads.
 
-=item ev_loop_fork (loop)
+   static void
+   post_fork_child (void)
+   {
+     ev_loop_fork (EV_DEFAULT);
+   }
 
-Like C<ev_default_fork>, but acts on an event loop created by
-C<ev_loop_new>. Yes, you have to call this on every allocated event loop
-after fork that you want to re-use in the child, and how you do this is
-entirely your own problem.
+   ...
+   pthread_atfork (0, 0, post_fork_child);
 
 =item int ev_is_default_loop (loop)
 
 Returns true when the given loop is, in fact, the default loop, and false
 otherwise.
 
-=item unsigned int ev_loop_count (loop)
+=item unsigned int ev_iteration (loop)
 
-Returns the count of loop iterations for the loop, which is identical to
-the number of times libev did poll for new events. It starts at C<0> and
-happily wraps around with enough iterations.
+Returns the current iteration count for the event loop, which is identical
+to the number of times libev did poll for new events. It starts at C<0>
+and happily wraps around with enough iterations.
 
 This value can sometimes be useful as a generation counter of sorts (it
 "ticks" the number of loop iterations), as it roughly corresponds with
-C<ev_prepare> and C<ev_check> calls.
+C<ev_prepare> and C<ev_check> calls - and is incremented between the
+prepare and check phases.
 
-=item unsigned int ev_loop_depth (loop)
+=item unsigned int ev_depth (loop)
 
-Returns the number of times C<ev_loop> was entered minus the number of
-times C<ev_loop> was exited, in other words, the recursion depth.
+Returns the number of times C<ev_run> was entered minus the number of
+times C<ev_run> was exited normally, in other words, the recursion depth.
 
-Outside C<ev_loop>, this number is zero. In a callback, this number is
-C<1>, unless C<ev_loop> was invoked recursively (or from another thread),
+Outside C<ev_run>, this number is zero. In a callback, this number is
+C<1>, unless C<ev_run> was invoked recursively (or from another thread),
 in which case it is higher.
 
-Leaving C<ev_loop> abnormally (setjmp/longjmp, cancelling the thread
-etc.), doesn't count as exit.
+Leaving C<ev_run> abnormally (setjmp/longjmp, cancelling the thread,
+throwing an exception etc.), doesn't count as "exit" - consider this
+as a hint to avoid such ungentleman-like behaviour unless it's really
+convenient, in which case it is fully supported.
 
 =item unsigned int ev_backend (loop)
 
@@ -672,7 +760,7 @@
 
 Establishes the current time by querying the kernel, updating the time
 returned by C<ev_now ()> in the progress. This is a costly operation and
-is usually done automatically within C<ev_loop ()>.
+is usually done automatically within C<ev_run ()>.
 
 This function is rarely useful, but when some event callback runs for a
 very long time without entering the event loop, updating libev's idea of
@@ -684,8 +772,8 @@
 
 =item ev_resume (loop)
 
-These two functions suspend and resume a loop, for use when the loop is
-not used for a while and timeouts should not be processed.
+These two functions suspend and resume an event loop, for use when the
+loop is not used for a while and timeouts should not be processed.
 
 A typical use case would be an interactive program such as a game:  When
 the user presses C<^Z> to suspend the game and resumes it an hour later it
@@ -697,7 +785,7 @@
 Effectively, all C<ev_timer> watchers will be delayed by the time spend
 between C<ev_suspend> and C<ev_resume>, and all C<ev_periodic> watchers
 will be rescheduled (that is, they will lose any events that would have
-occured while suspended).
+occurred while suspended).
 
 After calling C<ev_suspend> you B<must not> call I<any> function on the
 given loop other than C<ev_resume>, and you B<must not> call C<ev_resume>
@@ -706,28 +794,37 @@
 Calling C<ev_suspend>/C<ev_resume> has the side effect of updating the
 event loop time (see C<ev_now_update>).
 
-=item ev_loop (loop, int flags)
+=item ev_run (loop, int flags)
 
 Finally, this is it, the event handler. This function usually is called
 after you have initialised all your watchers and you want to start
-handling events.
+handling events. It will ask the operating system for any new events, call
+the watcher callbacks, an then repeat the whole process indefinitely: This
+is why event loops are called I<loops>.
 
-If the flags argument is specified as C<0>, it will not return until
-either no event watchers are active anymore or C<ev_unloop> was called.
+If the flags argument is specified as C<0>, it will keep handling events
+until either no event watchers are active anymore or C<ev_break> was
+called.
 
-Please note that an explicit C<ev_unloop> is usually better than
+Please note that an explicit C<ev_break> is usually better than
 relying on all watchers to be stopped when deciding when a program has
 finished (especially in interactive programs), but having a program
 that automatically loops as long as it has to and no longer by virtue
 of relying on its watchers stopping correctly, that is truly a thing of
 beauty.
 
-A flags value of C<EVLOOP_NONBLOCK> will look for new events, will handle
-those events and any already outstanding ones, but will not block your
-process in case there are no events and will return after one iteration of
-the loop.
+This function is also I<mostly> exception-safe - you can break out of
+a C<ev_run> call by calling C<longjmp> in a callback, throwing a C++
+exception and so on. This does not decrement the C<ev_depth> value, nor
+will it clear any outstanding C<EVBREAK_ONE> breaks.
+
+A flags value of C<EVRUN_NOWAIT> will look for new events, will handle
+those events and any already outstanding ones, but will not wait and
+block your process in case there are no events and will return after one
+iteration of the loop. This is sometimes useful to poll and handle new
+events while doing lengthy calculations, to keep the program responsive.
 
-A flags value of C<EVLOOP_ONESHOT> will look for new events (waiting if
+A flags value of C<EVRUN_ONCE> will look for new events (waiting if
 necessary) and will handle those and any already outstanding ones. It
 will block your process until at least one new event arrives (which could
 be an event internal to libev itself, so there is no guarantee that a
@@ -736,55 +833,67 @@
 
 This is useful if you are waiting for some external event in conjunction
 with something not expressible using other libev watchers (i.e. "roll your
-own C<ev_loop>"). However, a pair of C<ev_prepare>/C<ev_check> watchers is
+own C<ev_run>"). However, a pair of C<ev_prepare>/C<ev_check> watchers is
 usually a better approach for this kind of thing.
 
-Here are the gory details of what C<ev_loop> does:
+Here are the gory details of what C<ev_run> does (this is for your
+understanding, not a guarantee that things will work exactly like this in
+future versions):
 
+   - Increment loop depth.
+   - Reset the ev_break status.
    - Before the first iteration, call any pending watchers.
-   * If EVFLAG_FORKCHECK was used, check for a fork.
+   LOOP:
+   - If EVFLAG_FORKCHECK was used, check for a fork.
    - If a fork was detected (by any means), queue and call all fork watchers.
    - Queue and call all prepare watchers.
+   - If ev_break was called, goto FINISH.
    - If we have been forked, detach and recreate the kernel state
      as to not disturb the other process.
    - Update the kernel state with all outstanding changes.
    - Update the "event loop time" (ev_now ()).
    - Calculate for how long to sleep or block, if at all
-     (active idle watchers, EVLOOP_NONBLOCK or not having
+     (active idle watchers, EVRUN_NOWAIT or not having
      any active watchers at all will result in not sleeping).
    - Sleep if the I/O and timer collect interval say so.
+   - Increment loop iteration counter.
    - Block the process, waiting for any events.
    - Queue all outstanding I/O (fd) events.
    - Update the "event loop time" (ev_now ()), and do time jump adjustments.
    - Queue all expired timers.
    - Queue all expired periodics.
-   - Unless any events are pending now, queue all idle watchers.
+   - Queue all idle watchers with priority higher than that of pending events.
    - Queue all check watchers.
    - Call all queued watchers in reverse order (i.e. check watchers first).
      Signals and child watchers are implemented as I/O watchers, and will
      be handled here by queueing them when their watcher gets executed.
-   - If ev_unloop has been called, or EVLOOP_ONESHOT or EVLOOP_NONBLOCK
-     were used, or there are no active watchers, return, otherwise
-     continue with step *.
+   - If ev_break has been called, or EVRUN_ONCE or EVRUN_NOWAIT
+     were used, or there are no active watchers, goto FINISH, otherwise
+     continue with step LOOP.
+   FINISH:
+   - Reset the ev_break status iff it was EVBREAK_ONE.
+   - Decrement the loop depth.
+   - Return.
 
 Example: Queue some jobs and then loop until no events are outstanding
 anymore.
 
    ... queue jobs here, make sure they register event watchers as long
    ... as they still have work to do (even an idle watcher will do..)
-   ev_loop (my_loop, 0);
-   ... jobs done or somebody called unloop. yeah!
+   ev_run (my_loop, 0);
+   ... jobs done or somebody called break. yeah!
 
-=item ev_unloop (loop, how)
+=item ev_break (loop, how)
 
-Can be used to make a call to C<ev_loop> return early (but only after it
+Can be used to make a call to C<ev_run> return early (but only after it
 has processed all outstanding events). The C<how> argument must be either
-C<EVUNLOOP_ONE>, which will make the innermost C<ev_loop> call return, or
-C<EVUNLOOP_ALL>, which will make all nested C<ev_loop> calls return.
+C<EVBREAK_ONE>, which will make the innermost C<ev_run> call return, or
+C<EVBREAK_ALL>, which will make all nested C<ev_run> calls return.
 
-This "unloop state" will be cleared when entering C<ev_loop> again.
+This "break state" will be cleared on the next call to C<ev_run>.
 
-It is safe to call C<ev_unloop> from otuside any C<ev_loop> calls.
+It is safe to call C<ev_break> from outside any C<ev_run> calls, too, in
+which case it will have no effect.
 
 =item ev_ref (loop)
 
@@ -792,15 +901,15 @@
 
 Ref/unref can be used to add or remove a reference count on the event
 loop: Every watcher keeps one reference, and as long as the reference
-count is nonzero, C<ev_loop> will not return on its own.
+count is nonzero, C<ev_run> will not return on its own.
 
 This is useful when you have a watcher that you never intend to
-unregister, but that nevertheless should not keep C<ev_loop> from
+unregister, but that nevertheless should not keep C<ev_run> from
 returning. In such a case, call C<ev_unref> after starting, and C<ev_ref>
 before stopping it.
 
 As an example, libev itself uses this for its internal signal pipe: It
-is not visible to the libev user and should not keep C<ev_loop> from
+is not visible to the libev user and should not keep C<ev_run> from
 exiting if no event watchers registered by it are active. It is also an
 excellent way to do this for generic recurring timers or from within
 third-party libraries. Just remember to I<unref after start> and I<ref
@@ -809,13 +918,13 @@
 (e.g. non-repeating timers) in which case you have to C<ev_ref>
 in the callback).
 
-Example: Create a signal watcher, but keep it from keeping C<ev_loop>
+Example: Create a signal watcher, but keep it from keeping C<ev_run>
 running when nothing else is active.
 
    ev_signal exitsig;
    ev_signal_init (&exitsig, sig_cb, SIGINT);
    ev_signal_start (loop, &exitsig);
-   evf_unref (loop);
+   ev_unref (loop);
 
 Example: For some weird reason, unregister the above signal handler again.
 
@@ -845,10 +954,11 @@
 By setting a higher I<io collect interval> you allow libev to spend more
 time collecting I/O events, so you can handle more events per iteration,
 at the cost of increasing latency. Timeouts (both C<ev_periodic> and
-C<ev_timer>) will be not affected. Setting this to a non-null value will
+C<ev_timer>) will not be affected. Setting this to a non-null value will
 introduce an additional C<ev_sleep ()> call into most loop iterations. The
 sleep time ensures that libev will not poll for I/O events more often then
-once per this interval, on average.
+once per this interval, on average (as long as the host time resolution is
+good enough).
 
 Likewise, by setting a higher I<timeout collect interval> you allow libev
 to spend more time collecting timeouts, at the expense of increased
@@ -864,7 +974,7 @@
 you do transactions with the outside world and you can't increase the
 parallelity, then this setting will limit your transaction rate (if you
 need to poll once per transaction and the I/O collect interval is 0.01,
-then you can't do more than 100 transations per second).
+then you can't do more than 100 transactions per second).
 
 Setting the I<timeout collect interval> can improve the opportunity for
 saving power, as the program will "bundle" timer callback invocations that
@@ -882,8 +992,12 @@
 =item ev_invoke_pending (loop)
 
 This call will simply invoke all pending watchers while resetting their
-pending state. Normally, C<ev_loop> does this automatically when required,
-but when overriding the invoke callback this call comes handy.
+pending state. Normally, C<ev_run> does this automatically when required,
+but when overriding the invoke callback this call comes handy. This
+function can be invoked from a watcher - this can be useful for example
+when you want to do some lengthy calculation and want to pass further
+event handling to another thread (you still have to make sure only one
+thread executes within C<ev_invoke_pending> or C<ev_run> of course).
 
 =item int ev_pending_count (loop)
 
@@ -893,7 +1007,7 @@
 =item ev_set_invoke_pending_cb (loop, void (*invoke_pending_cb)(EV_P))
 
 This overrides the invoke pending functionality of the loop: Instead of
-invoking all pending watchers when there are any, C<ev_loop> will call
+invoking all pending watchers when there are any, C<ev_run> will call
 this callback instead. This is useful, for example, when you want to
 invoke the actual watchers inside another context (another thread etc.).
 
@@ -906,10 +1020,10 @@
 can be done relatively simply by putting mutex_lock/unlock calls around
 each call to a libev function.
 
-However, C<ev_loop> can run an indefinite time, so it is not feasible to
-wait for it to return. One way around this is to wake up the loop via
-C<ev_unloop> and C<av_async_send>, another way is to set these I<release>
-and I<acquire> callbacks on the loop.
+However, C<ev_run> can run an indefinite time, so it is not feasible
+to wait for it to return. One way around this is to wake up the event
+loop via C<ev_break> and C<av_async_send>, another way is to set these
+I<release> and I<acquire> callbacks on the loop.
 
 When set, then C<release> will be called just before the thread is
 suspended waiting for new events, and C<acquire> is called just
@@ -922,10 +1036,10 @@
 C<release> and C<acquire> (that's their only purpose after all), no
 modifications done will affect the event loop, i.e. adding watchers will
 have no effect on the set of file descriptors being watched, or the time
-waited. Use an C<ev_async> watcher to wake up C<ev_loop> when you want it
+waited. Use an C<ev_async> watcher to wake up C<ev_run> when you want it
 to take note of any changes you made.
 
-In theory, threads executing C<ev_loop> will be async-cancel safe between
+In theory, threads executing C<ev_run> will be async-cancel safe between
 invocations of C<release> and C<acquire>.
 
 See also the locking example in the C<THREADS> section later in this
@@ -933,18 +1047,18 @@
 
 =item ev_set_userdata (loop, void *data)
 
-=item ev_userdata (loop)
+=item void *ev_userdata (loop)
 
 Set and retrieve a single C<void *> associated with a loop. When
 C<ev_set_userdata> has never been called, then C<ev_userdata> returns
-C<0.>
+C<0>.
 
 These two functions can be used to associate arbitrary data with a loop,
 and are intended solely for the C<invoke_pending_cb>, C<release> and
 C<acquire> callbacks described above, but of course can be (ab-)used for
 any other purpose as well.
 
-=item ev_loop_verify (loop)
+=item ev_verify (loop)
 
 This function only does something when C<EV_VERIFY> support has been
 compiled in, which is the default for non-minimal builds. It tries to go
@@ -965,14 +1079,15 @@
 watcher type, e.g. C<ev_TYPE_start> can mean C<ev_timer_start> for timer
 watchers and C<ev_io_start> for I/O watchers.
 
-A watcher is a structure that you create and register to record your
-interest in some event. For instance, if you want to wait for STDIN to
-become readable, you would create an C<ev_io> watcher for that:
+A watcher is an opaque structure that you allocate and register to record
+your interest in some event. To make a concrete example, imagine you want
+to wait for STDIN to become readable, you would create an C<ev_io> watcher
+for that:
 
    static void my_cb (struct ev_loop *loop, ev_io *w, int revents)
    {
      ev_io_stop (w);
-     ev_unloop (loop, EVUNLOOP_ALL);
+     ev_break (loop, EVBREAK_ALL);
    }
 
    struct ev_loop *loop = ev_default_loop (0);
@@ -983,7 +1098,7 @@
    ev_io_set (&stdin_watcher, STDIN_FILENO, EV_READ);
    ev_io_start (loop, &stdin_watcher);
 
-   ev_loop (loop, 0);
+   ev_run (loop, 0);
 
 As you can see, you are responsible for allocating the memory for your
 watcher structures (and it is I<usually> a bad idea to do this on the
@@ -992,11 +1107,11 @@
 Each watcher has an associated watcher structure (called C<struct ev_TYPE>
 or simply C<ev_TYPE>, as typedefs are provided for all watcher structs).
 
-Each watcher structure must be initialised by a call to C<ev_init
-(watcher *, callback)>, which expects a callback to be provided. This
-callback gets invoked each time the event occurs (or, in the case of I/O
-watchers, each time the event loop detects that the file descriptor given
-is readable and/or writable).
+Each watcher structure must be initialised by a call to C<ev_init (watcher
+*, callback)>, which expects a callback to be provided. This callback is
+invoked each time the event occurs (or, in the case of I/O watchers, each
+time the event loop detects that the file descriptor given is readable
+and/or writable).
 
 Each watcher type further has its own C<< ev_TYPE_set (watcher *, ...) >>
 macro to configure it, with arguments specific to the watcher type. There
@@ -1029,7 +1144,7 @@
 The file descriptor in the C<ev_io> watcher has become readable and/or
 writable.
 
-=item C<EV_TIMEOUT>
+=item C<EV_TIMER>
 
 The C<ev_timer> watcher has timed out.
 
@@ -1057,13 +1172,13 @@
 
 =item C<EV_CHECK>
 
-All C<ev_prepare> watchers are invoked just I<before> C<ev_loop> starts
+All C<ev_prepare> watchers are invoked just I<before> C<ev_run> starts
 to gather new events, and all C<ev_check> watchers are invoked just after
-C<ev_loop> has gathered them, but before it invokes any callbacks for any
+C<ev_run> has gathered them, but before it invokes any callbacks for any
 received events. Callbacks of both watcher types can start and stop as
 many watchers as they want, and all of them will be taken into account
 (for example, a C<ev_prepare> watcher might start an idle watcher to keep
-C<ev_loop> from blocking).
+C<ev_run> from blocking).
 
 =item C<EV_EMBED>
 
@@ -1074,6 +1189,10 @@
 The event loop has been resumed in the child process after fork (see
 C<ev_fork>).
 
+=item C<EV_CLEANUP>
+
+The event loop is about to be destroyed (see C<ev_cleanup>).
+
 =item C<EV_ASYNC>
 
 The given async watcher has been asynchronously notified (see C<ev_async>).
@@ -1255,71 +1374,70 @@
 
 =back
 
+See also the L<ASSOCIATING CUSTOM DATA WITH A WATCHER> and L<BUILDING YOUR
+OWN COMPOSITE WATCHERS> idioms.
 
-=head2 ASSOCIATING CUSTOM DATA WITH A WATCHER
-
-Each watcher has, by default, a member C<void *data> that you can change
-and read at any time: libev will completely ignore it. This can be used
-to associate arbitrary data with your watcher. If you need more data and
-don't want to allocate memory and store a pointer to it in that data
-member, you can also "subclass" the watcher type and provide your own
-data:
-
-   struct my_io
-   {
-     ev_io io;
-     int otherfd;
-     void *somedata;
-     struct whatever *mostinteresting;
-   };
-
-   ...
-   struct my_io w;
-   ev_io_init (&w.io, my_cb, fd, EV_READ);
-
-And since your callback will be called with a pointer to the watcher, you
-can cast it back to your own type:
+=head2 WATCHER STATES
 
-   static void my_cb (struct ev_loop *loop, ev_io *w_, int revents)
-   {
-     struct my_io *w = (struct my_io *)w_;
-     ...
-   }
+There are various watcher states mentioned throughout this manual -
+active, pending and so on. In this section these states and the rules to
+transition between them will be described in more detail - and while these
+rules might look complicated, they usually do "the right thing".
 
-More interesting and less C-conformant ways of casting your callback type
-instead have been omitted.
+=over 4
 
-Another common scenario is to use some data structure with multiple
-embedded watchers:
+=item initialiased
 
-   struct my_biggy
-   {
-     int some_data;
-     ev_timer t1;
-     ev_timer t2;
-   }
+Before a watcher can be registered with the event loop it has to be
+initialised. This can be done with a call to C<ev_TYPE_init>, or calls to
+C<ev_init> followed by the watcher-specific C<ev_TYPE_set> function.
+
+In this state it is simply some block of memory that is suitable for
+use in an event loop. It can be moved around, freed, reused etc. at
+will - as long as you either keep the memory contents intact, or call
+C<ev_TYPE_init> again.
+
+=item started/running/active
+
+Once a watcher has been started with a call to C<ev_TYPE_start> it becomes
+property of the event loop, and is actively waiting for events. While in
+this state it cannot be accessed (except in a few documented ways), moved,
+freed or anything else - the only legal thing is to keep a pointer to it,
+and call libev functions on it that are documented to work on active watchers.
 
-In this case getting the pointer to C<my_biggy> is a bit more
-complicated: Either you store the address of your C<my_biggy> struct
-in the C<data> member of the watcher (for woozies), or you need to use
-some pointer arithmetic using C<offsetof> inside your watchers (for real
-programmers):
+=item pending
 
-   #include <stddef.h>
+If a watcher is active and libev determines that an event it is interested
+in has occurred (such as a timer expiring), it will become pending. It will
+stay in this pending state until either it is stopped or its callback is
+about to be invoked, so it is not normally pending inside the watcher
+callback.
 
-   static void
-   t1_cb (EV_P_ ev_timer *w, int revents)
-   {
-     struct my_biggy big = (struct my_biggy *)
-       (((char *)w) - offsetof (struct my_biggy, t1));
-   }
+The watcher might or might not be active while it is pending (for example,
+an expired non-repeating timer can be pending but no longer active). If it
+is stopped, it can be freely accessed (e.g. by calling C<ev_TYPE_set>),
+but it is still property of the event loop at this time, so cannot be
+moved, freed or reused. And if it is active the rules described in the
+previous item still apply.
+
+It is also possible to feed an event on a watcher that is not active (e.g.
+via C<ev_feed_event>), in which case it becomes pending without being
+active.
+
+=item stopped
+
+A watcher can be stopped implicitly by libev (in which case it might still
+be pending), or explicitly by calling its C<ev_TYPE_stop> function. The
+latter will clear any pending state the watcher might be in, regardless
+of whether it was active or not, so stopping a watcher explicitly before
+freeing it is often a good idea.
+
+While stopped (and not pending) the watcher is essentially in the
+initialised state, that is, it can be reused, moved, modified in any way
+you wish (but when you trash the memory block, you need to C<ev_TYPE_init>
+it again).
 
-   static void
-   t2_cb (EV_P_ ev_timer *w, int revents)
-   {
-     struct my_biggy big = (struct my_biggy *)
-       (((char *)w) - offsetof (struct my_biggy, t2));
-   }
+=back
 
 =head2 WATCHER PRIORITY MODELS
 
@@ -1372,7 +1490,7 @@
 you can associate an C<ev_idle> watcher to each such watcher, and in
 the normal watcher callback, you just start the idle watcher. The real
 processing is done in the idle watcher callback. This causes libev to
-continously poll and process kernel event data for the watcher, but when
+continuously poll and process kernel event data for the watcher, but when
 the lock-out case is known to be rare (which in turn is rare :), this is
 workable.
 
@@ -1396,7 +1514,7 @@
      // are not yet ready to handle it.
      ev_io_stop (EV_A_ w);
 
-     // start the idle watcher to ahndle the actual event.
+     // start the idle watcher to handle the actual event.
      // it will not be executed as long as other watchers
      // with the default priority are receiving events.
      ev_idle_start (EV_A_ &idle);
@@ -1456,26 +1574,19 @@
 descriptors to non-blocking mode is also usually a good idea (but not
 required if you know what you are doing).
 
-If you cannot use non-blocking mode, then force the use of a
-known-to-be-good backend (at the time of this writing, this includes only
-C<EVBACKEND_SELECT> and C<EVBACKEND_POLL>). The same applies to file
-descriptors for which non-blocking operation makes no sense (such as
-files) - libev doesn't guarentee any specific behaviour in that case.
-
 Another thing you have to watch out for is that it is quite easy to
-receive "spurious" readiness notifications, that is your callback might
+receive "spurious" readiness notifications, that is, your callback might
 be called with C<EV_READ> but a subsequent C<read>(2) will actually block
-because there is no data. Not only are some backends known to create a
-lot of those (for example Solaris ports), it is very easy to get into
-this situation even with a relatively standard program structure. Thus
-it is best to always use non-blocking I/O: An extra C<read>(2) returning
-C<EAGAIN> is far preferable to a program hanging until some data arrives.
+because there is no data. It is very easy to get into this situation even
+with a relatively standard program structure. Thus it is best to always
+use non-blocking I/O: An extra C<read>(2) returning C<EAGAIN> is far
+preferable to a program hanging until some data arrives.
 
 If you cannot run the fd in non-blocking mode (for example you should
 not play around with an Xlib connection), then you have to separately
 re-test whether a file descriptor is really ready with a known-to-be good
-interface such as poll (fortunately in our Xlib example, Xlib already
-does this on its own, so its quite safe to use). Some people additionally
+interface such as poll (fortunately in the case of Xlib, it already does
+this on its own, so its quite safe to use). Some people additionally
 use C<SIGALRM> and an interval timer, just to be sure you won't block
 indefinitely.
 
@@ -1513,16 +1624,48 @@
 for potentially C<dup ()>'ed file descriptors, or to resort to
 C<EVBACKEND_SELECT> or C<EVBACKEND_POLL>.
 
+=head3 The special problem of files
+
+Many people try to use C<select> (or libev) on file descriptors
+representing files, and expect it to become ready when their program
+doesn't block on disk accesses (which can take a long time on their own).
+
+However, this cannot ever work in the "expected" way - you get a readiness
+notification as soon as the kernel knows whether and how much data is
+there, and in the case of open files, that's always the case, so you
+always get a readiness notification instantly, and your read (or possibly
+write) will still block on the disk I/O.
+
+Another way to view it is that in the case of sockets, pipes, character
+devices and so on, there is another party (the sender) that delivers data
+on its own, but in the case of files, there is no such thing: the disk
+will not send data on its own, simply because it doesn't know what you
+wish to read - you would first have to request some data.
+
+Since files are typically not-so-well supported by advanced notification
+mechanism, libev tries hard to emulate POSIX behaviour with respect
+to files, even though you should not use it. The reason for this is
+convenience: sometimes you want to watch STDIN or STDOUT, which is
+usually a tty, often a pipe, but also sometimes files or special devices
+(for example, C<epoll> on Linux works with F</dev/random> but not with
+F</dev/urandom>), and even though the file might better be served with
+asynchronous I/O instead of with non-blocking I/O, it is still useful when
+it "just works" instead of freezing.
+
+So avoid file descriptors pointing to files when you know it (e.g. use
+libeio), but use them when it is convenient, e.g. for STDIN/STDOUT, or
+when you rarely read from a file instead of from a socket, and want to
+reuse the same code path.
+
 =head3 The special problem of fork
 
 Some backends (epoll, kqueue) do not support C<fork ()> at all or exhibit
 useless behaviour. Libev fully supports fork, but needs to be told about
-it in the child.
+it in the child if you want to continue to use it in the child.
 
-To support fork in your programs, you either have to call
-C<ev_default_fork ()> or C<ev_loop_fork ()> after a fork in the child,
-enable C<EVFLAG_FORKCHECK>, or resort to C<EVBACKEND_SELECT> or
-C<EVBACKEND_POLL>.
+To support fork in your child processes, you have to call C<ev_loop_fork
+()> after a fork in the child, enable C<EVFLAG_FORKCHECK>, or resort to
+C<EVBACKEND_SELECT> or C<EVBACKEND_POLL>.
 
 =head3 The special problem of SIGPIPE
 
@@ -1535,6 +1678,44 @@
 ignore SIGPIPE (and maybe make sure you log the exit status of your daemon
 somewhere, as that would have given you a big clue).
 
+=head3 The special problem of accept()ing when you can't
+
+Many implementations of the POSIX C<accept> function (for example,
+found in post-2004 Linux) have the peculiar behaviour of not removing a
+connection from the pending queue in all error cases.
+
+For example, larger servers often run out of file descriptors (because
+of resource limits), causing C<accept> to fail with C<ENFILE> but not
+rejecting the connection, leading to libev signalling readiness on
+the next iteration again (the connection still exists after all), and
+typically causing the program to loop at 100% CPU usage.
+
+Unfortunately, the set of errors that cause this issue differs between
+operating systems, there is usually little the app can do to remedy the
+situation, and no known thread-safe method of removing the connection to
+cope with overload is known (to me).
+
+One of the easiest ways to handle this situation is to just ignore it
+- when the program encounters an overload, it will just loop until the
+situation is over. While this is a form of busy waiting, no OS offers an
+event-based way to handle this situation, so it's the best one can do.
+
+A better way to handle the situation is to log any errors other than
+C<EAGAIN> and C<EWOULDBLOCK>, making sure not to flood the log with such
+messages, and continue as usual, which at least gives the user an idea of
+what could be wrong ("raise the ulimit!"). For extra points one could stop
+the C<ev_io> watcher on the listening fd "for a while", which reduces CPU
+usage.
+
+If your program is single-threaded, then you could also keep a dummy file
+descriptor for overload situations (e.g. by opening F</dev/null>), and
+when you run into C<ENFILE> or C<EMFILE>, close it, run C<accept>,
+close that fd, and create a new dummy fd. This will gracefully refuse
+clients under typical overload conditions.
+
+The last way to handle it is to simply log the error and C<exit>, as
+is often done with C<malloc> failures, but this results in an easy
+opportunity for a DoS attack.
 
 =head3 Watcher-Specific Functions
 
@@ -1576,7 +1757,7 @@
    ev_io stdin_readable;
    ev_io_init (&stdin_readable, stdin_readable_cb, STDIN_FILENO, EV_READ);
    ev_io_start (loop, &stdin_readable);
-   ev_loop (loop, 0);
+   ev_run (loop, 0);
 
 
 =head2 C<ev_timer> - relative and optionally repeating timeouts
@@ -1595,7 +1776,7 @@
 might introduce a small delay). If multiple timers become ready during the
 same loop iteration then the ones with earlier time-out values are invoked
 before ones of the same priority with later time-out values (but this is
-no longer true when a callback calls C<ev_loop> recursively).
+no longer true when a callback calls C<ev_run> recursively).
 
 =head3 Be smart about timeouts
 
@@ -1691,7 +1872,7 @@
      // if last_activity + 60. is older than now, we did time out
      if (timeout < now)
        {
-         // timeout occured, take action
+         // timeout occurred, take action
        }
      else
        {
@@ -1723,12 +1904,12 @@
 
    ev_init (timer, callback);
    last_activity = ev_now (loop);
-   callback (loop, timer, EV_TIMEOUT);
+   callback (loop, timer, EV_TIMER);
 
 And when there is some activity, simply store the current time in
 C<last_activity>, no libev calls at all:
 
-   last_actiivty = ev_now (loop);
+   last_activity = ev_now (loop);
 
 This technique is slightly more complex, but in most cases where the
 time-out is unlikely to be triggered, much more efficient.
@@ -1776,7 +1957,7 @@
 
 Establishing the current time is a costly operation (it usually takes at
 least two system calls): EV therefore updates its idea of the current
-time only before and after C<ev_loop> collects new events, which causes a
+time only before and after C<ev_run> collects new events, which causes a
 growing difference between C<ev_now ()> and C<ev_time ()> when handling
 lots of events in one iteration.
 
@@ -1844,7 +2025,7 @@
 
 =item ev_timer_again (loop, ev_timer *)
 
-This will act as if the timer timed out and restart it again if it is
+This will act as if the timer timed out and restarts it again if it is
 repeating. The exact semantics are:
 
 If the timer is pending, its pending status is cleared.
@@ -1864,7 +2045,7 @@
 the timeout value currently configured.
 
 That is, after an C<ev_timer_set (w, 5, 7)>, C<ev_timer_remaining> returns
-C<5>. When the timer is started and one second passes, C<ev_timer_remain>
+C<5>. When the timer is started and one second passes, C<ev_timer_remaining>
 will return C<4>. When the timer expires and is restarted, it will return
 roughly C<7> (likely slightly less as callback invocation takes some time,
 too), and so on.
@@ -1903,7 +2084,7 @@
    ev_timer mytimer;
    ev_timer_init (&mytimer, timeout_cb, 0., 10.); /* note, only repeat used */
    ev_timer_again (&mytimer); /* start timer */
-   ev_loop (loop, 0);
+   ev_run (loop, 0);
 
    // and in some piece of code that gets executed on any "activity":
    // reset the timeout to start ticking again at 10 seconds
@@ -1939,7 +2120,7 @@
 point in time where it is supposed to trigger has passed. If multiple
 timers become ready during the same loop iteration then the ones with
 earlier time-out values are invoked before ones with later time-out values
-(but this is no longer true when a callback calls C<ev_loop> recursively).
+(but this is no longer true when a callback calls C<ev_run> recursively).
 
 =head3 Watcher-Specific Functions and Data Members
 
@@ -1984,9 +2165,12 @@
 C<ev_periodic> will try to run the callback in this mode at the next possible
 time where C<time = offset (mod interval)>, regardless of any time jumps.
 
-For numerical stability it is preferable that the C<offset> value is near
-C<ev_now ()> (the current time), but there is no range requirement for
-this value, and in fact is often specified as zero.
+The C<interval> I<MUST> be positive, and for numerical stability, the
+interval value should be higher than C<1/8192> (which is around 100
+microseconds) and C<offset> should be higher than C<0> and should have
+at most a similar magnitude as the current time (say, within a factor of
+ten). Typical values for offset are, in fact, C<0> or something between
+C<0> and C<interval>, which is also the recommended range.
 
 Note also that there is an upper limit to how often a timer can fire (CPU
 speed for example), so if C<interval> is very small then timing stability
@@ -2077,7 +2261,7 @@
 potentially a lot of jitter, but good long-term stability.
 
    static void
-   clock_cb (struct ev_loop *loop, ev_io *w, int revents)
+   clock_cb (struct ev_loop *loop, ev_periodic *w, int revents)
    {
      ... its now a full hour (UTC, or TAI or whatever your clock follows)
    }
@@ -2110,7 +2294,7 @@
 
 Signal watchers will trigger an event when the process receives a specific
 signal one or more times. Even though signals are very asynchronous, libev
-will try it's best to deliver signals synchronously, i.e. as part of the
+will try its best to deliver signals synchronously, i.e. as part of the
 normal event processing, like any other event.
 
 If you want signals to be delivered truly asynchronously, just use
@@ -2134,12 +2318,13 @@
 interrupted by signals you can block all signals in an C<ev_check> watcher
 and unblock them in an C<ev_prepare> watcher.
 
-=head3 The special problem of inheritance over execve
+=head3 The special problem of inheritance over fork/execve/pthread_create
 
 Both the signal mask (C<sigprocmask>) and the signal disposition
 (C<sigaction>) are unspecified after starting a signal watcher (and after
 stopping it again), that is, libev might or might not block the signal,
-and might or might not set or restore the installed signal handler.
+and might or might not set or restore the installed signal handler (but
+see C<EVFLAG_NOSIGMASK>).
 
 While this does not matter for the signal disposition (libev never
 sets signals to C<SIG_IGN>, so handlers will be reset to C<SIG_DFL> on
@@ -2154,10 +2339,28 @@
 to install a fork handler with C<pthread_atfork> that resets it. That will
 catch fork calls done by libraries (such as the libc) as well.
 
-In current versions of libev, you can also ensure that the signal mask is
-not blocking any signals (except temporarily, so thread users watch out)
-by specifying the C<EVFLAG_NOSIGFD> when creating the event loop. This
-is not guaranteed for future versions, however.
+In current versions of libev, the signal will not be blocked indefinitely
+unless you use the C<signalfd> API (C<EV_SIGNALFD>). While this reduces
+the window of opportunity for problems, it will not go away, as libev
+I<has> to modify the signal mask, at least temporarily.
+
+So I can't stress this enough: I<If you do not reset your signal mask when
+you expect it to be empty, you have a race condition in your code>. This
+is not a libev-specific thing, this is true for most event libraries.
+
+=head3 The special problem of threads signal handling
+
+POSIX threads has problematic signal handling semantics, specifically,
+a lot of functionality (sigfd, sigwait etc.) only really works if all
+threads in a process block signals, which is hard to achieve.
+
+When you want to use sigwait (or mix libev signal handling with your own
+for the same signals), you can tackle this problem by globally blocking
+all signals before creating any threads (or creating them with a fully set
+sigprocmask) and also specifying the C<EVFLAG_NOSIGMASK> when creating
+loops. Then designate one thread as "signal receiver thread" which handles
+these signals. You can pass on any signals that libev might be interested
+in by calling C<ev_feed_signal>.
 
 =head3 Watcher-Specific Functions and Data Members
 
@@ -2183,7 +2386,7 @@
    static void
    sigint_cb (struct ev_loop *loop, ev_signal *w, int revents)
    {
-     ev_unloop (loop, EVUNLOOP_ALL);
+     ev_break (loop, EVBREAK_ALL);
    }
 
    ev_signal signal_watcher;
@@ -2579,7 +2782,7 @@
 prepare watchers get invoked before the process blocks and check watchers
 afterwards.
 
-You I<must not> call C<ev_loop> or similar functions that enter
+You I<must not> call C<ev_run> or similar functions that enter
 the current event loop from either C<ev_prepare> or C<ev_check>
 watchers. Other loops than the current one are fine, however. The
 rationale behind this is that you do not need to check for recursion in
@@ -2757,7 +2960,7 @@
        // create/start timer
 
      // poll
-     ev_loop (EV_A_ 0);
+     ev_run (EV_A_ 0);
 
      // stop timer again
      if (timeout >= 0)
@@ -2845,7 +3048,7 @@
 =item ev_embed_sweep (loop, ev_embed *)
 
 Make a single, non-blocking sweep over the embedded loop. This works
-similarly to C<ev_loop (embedded_loop, EVLOOP_NONBLOCK)>, but in the most
+similarly to C<ev_run (embedded_loop, EVRUN_NOWAIT)>, but in the most
 appropriate way for embedded loops.
 
 =item struct ev_loop *other [read-only]
@@ -2915,7 +3118,7 @@
 
 =head3 The special problem of life after fork - how is it possible?
 
-Most uses of C<fork()> consist of forking, then some simple calls to ste
+Most uses of C<fork()> consist of forking, then some simple calls to set
 up/change the process environment, followed by a call to C<exec()>. This
 sequence should be handled by libev without any problems.
 
@@ -2941,43 +3144,83 @@
 
 When this is not possible, or you want to use the default loop for
 other reasons, then in the process that wants to start "fresh", call
-C<ev_default_destroy ()> followed by C<ev_default_loop (...)>. Destroying
-the default loop will "orphan" (not stop) all registered watchers, so you
-have to be careful not to execute code that modifies those watchers. Note
-also that in that case, you have to re-register any signal watchers.
+C<ev_loop_destroy (EV_DEFAULT)> followed by C<ev_default_loop (...)>.
+Destroying the default loop will "orphan" (not stop) all registered
+watchers, so you have to be careful not to execute code that modifies
+those watchers. Note also that in that case, you have to re-register any
+signal watchers.
 
 =head3 Watcher-Specific Functions and Data Members
 
 =over 4
 
-=item ev_fork_init (ev_signal *, callback)
+=item ev_fork_init (ev_fork *, callback)
 
 Initialises and configures the fork watcher - it has no parameters of any
 kind. There is a C<ev_fork_set> macro, but using it is utterly pointless,
-believe me.
+really.
 
 =back
 
 
-=head2 C<ev_async> - how to wake up another event loop
+=head2 C<ev_cleanup> - even the best things end
+
+Cleanup watchers are called just before the event loop is being destroyed
+by a call to C<ev_loop_destroy>.
+
+While there is no guarantee that the event loop gets destroyed, cleanup
+watchers provide a convenient method to install cleanup hooks for your
+program, worker threads and so on - you just to make sure to destroy the
+loop when you want them to be invoked.
+
+Cleanup watchers are invoked in the same way as any other watcher. Unlike
+all other watchers, they do not keep a reference to the event loop (which
+makes a lot of sense if you think about it). Like all other watchers, you
+can call libev functions in the callback, except C<ev_cleanup_start>.
+
+=head3 Watcher-Specific Functions and Data Members
+
+=over 4
+
+=item ev_cleanup_init (ev_cleanup *, callback)
+
+Initialises and configures the cleanup watcher - it has no parameters of
+any kind. There is a C<ev_cleanup_set> macro, but using it is utterly
+pointless, I assure you.
+
+=back
+
+Example: Register an atexit handler to destroy the default loop, so any
+cleanup functions are called.
+
+   static void
+   program_exits (void)
+   {
+     ev_loop_destroy (EV_DEFAULT_UC);
+   }
+
+   ...
+   atexit (program_exits);
+
+
+=head2 C<ev_async> - how to wake up an event loop
 
 In general, you cannot use an C<ev_loop> from multiple threads or other
 asynchronous sources such as signal handlers (as opposed to multiple event
 loops - those are of course safe to use in different threads).
 
-Sometimes, however, you need to wake up another event loop you do not
-control, for example because it belongs to another thread. This is what
-C<ev_async> watchers do: as long as the C<ev_async> watcher is active, you
-can signal it by calling C<ev_async_send>, which is thread- and signal
-safe.
+Sometimes, however, you need to wake up an event loop you do not control,
+for example because it belongs to another thread. This is what C<ev_async>
+watchers do: as long as the C<ev_async> watcher is active, you can signal
+it by calling C<ev_async_send>, which is thread- and signal safe.
 
 This functionality is very similar to C<ev_signal> watchers, as signals,
 too, are asynchronous in nature, and signals, too, will be compressed
 (i.e. the number of callback invocations may be less than the number of
-C<ev_async_sent> calls).
-
-Unlike C<ev_signal> watchers, C<ev_async> works with any event loop, not
-just the default loop.
+C<ev_async_sent> calls). In fact, you could use signal watchers as a kind
+of "global async watchers" by using a watcher on an otherwise unused
+signal, and C<ev_feed_signal> to signal this watcher from another thread,
+even without knowing which loop owns the signal.
 
 =head3 Queueing
 
@@ -3079,19 +3322,24 @@
 =item ev_async_send (loop, ev_async *)
 
 Sends/signals/activates the given C<ev_async> watcher, that is, feeds
-an C<EV_ASYNC> event on the watcher into the event loop. Unlike
-C<ev_feed_event>, this call is safe to do from other threads, signal or
-similar contexts (see the discussion of C<EV_ATOMIC_T> in the embedding
-section below on what exactly this means).
+an C<EV_ASYNC> event on the watcher into the event loop, and instantly
+returns.
+
+Unlike C<ev_feed_event>, this call is safe to do from other threads,
+signal or similar contexts (see the discussion of C<EV_ATOMIC_T> in the
+embedding section below on what exactly this means).
 
 Note that, as with other watchers in libev, multiple events might get
-compressed into a single callback invocation (another way to look at this
-is that C<ev_async> watchers are level-triggered, set on C<ev_async_send>,
-reset when the event loop detects that).
-
-This call incurs the overhead of a system call only once per event loop
-iteration, so while the overhead might be noticeable, it doesn't apply to
-repeated calls to C<ev_async_send> for the same event loop.
+compressed into a single callback invocation (another way to look at
+this is that C<ev_async> watchers are level-triggered: they are set on
+C<ev_async_send>, reset when the event loop detects that).
+
+This call incurs the overhead of at most one extra system call per event
+loop iteration, if the event loop is blocked, and no syscall at all if
+the event loop (or your program) is processing events. That means that
+repeated calls are basically free (there is no need to avoid calls for
+performance reasons) and that the overhead becomes smaller (typically
+zero) under load.
 
 =item bool = ev_async_pending (ev_async *)
 
@@ -3134,9 +3382,9 @@
 started. Otherwise an C<ev_timer> watcher with after = C<timeout> (and
 repeat = 0) will be started. C<0> is a valid timeout.
 
-The callback has the type C<void (*cb)(int revents, void *arg)> and gets
+The callback has the type C<void (*cb)(int revents, void *arg)> and is
 passed an C<revents> set like normal event callbacks (a combination of
-C<EV_ERROR>, C<EV_READ>, C<EV_WRITE> or C<EV_TIMEOUT>) and the C<arg>
+C<EV_ERROR>, C<EV_READ>, C<EV_WRITE> or C<EV_TIMER>) and the C<arg>
 value passed to C<ev_once>. Note that it is possible to receive I<both>
 a timeout and an io event at the same time - you probably should give io
 events precedence.
@@ -3147,7 +3395,7 @@
    {
      if (revents & EV_READ)
        /* stdin might have data for us, joy! */;
-     else if (revents & EV_TIMEOUT)
+     else if (revents & EV_TIMER)
        /* doh, nothing entered */;
    }
 
@@ -3160,12 +3408,322 @@
 
 =item ev_feed_signal_event (loop, int signum)
 
-Feed an event as if the given signal occurred (C<loop> must be the default
-loop!).
+Feed an event as if the given signal occurred. See also C<ev_feed_signal>,
+which is async-safe.
 
 =back
 
 
+=head1 COMMON OR USEFUL IDIOMS (OR BOTH)
+
+This section explains some common idioms that are not immediately
+obvious. Note that examples are sprinkled over the whole manual, and this
+section only contains stuff that wouldn't fit anywhere else.
+
+=head2 ASSOCIATING CUSTOM DATA WITH A WATCHER
+
+Each watcher has, by default, a C<void *data> member that you can read
+or modify at any time: libev will completely ignore it. This can be used
+to associate arbitrary data with your watcher. If you need more data and
+don't want to allocate memory separately and store a pointer to it in that
+data member, you can also "subclass" the watcher type and provide your own
+data:
+
+   struct my_io
+   {
+     ev_io io;
+     int otherfd;
+     void *somedata;
+     struct whatever *mostinteresting;
+   };
+
+   ...
+   struct my_io w;
+   ev_io_init (&w.io, my_cb, fd, EV_READ);
+
+And since your callback will be called with a pointer to the watcher, you
+can cast it back to your own type:
+
+   static void my_cb (struct ev_loop *loop, ev_io *w_, int revents)
+   {
+     struct my_io *w = (struct my_io *)w_;
+     ...
+   }
+
+More interesting and less C-conformant ways of casting your callback
+function type instead have been omitted.
+
+=head2 BUILDING YOUR OWN COMPOSITE WATCHERS
+
+Another common scenario is to use some data structure with multiple
+embedded watchers, in effect creating your own watcher that combines
+multiple libev event sources into one "super-watcher":
+
+   struct my_biggy
+   {
+     int some_data;
+     ev_timer t1;
+     ev_timer t2;
+   }
+
+In this case getting the pointer to C<my_biggy> is a bit more
+complicated: Either you store the address of your C<my_biggy> struct in
+the C<data> member of the watcher (for woozies or C++ coders), or you need
+to use some pointer arithmetic using C<offsetof> inside your watchers (for
+real programmers):
+
+   #include <stddef.h>
+
+   static void
+   t1_cb (EV_P_ ev_timer *w, int revents)
+   {
+     struct my_biggy big = (struct my_biggy *)
+       (((char *)w) - offsetof (struct my_biggy, t1));
+   }
+
+   static void
+   t2_cb (EV_P_ ev_timer *w, int revents)
+   {
+     struct my_biggy big = (struct my_biggy *)
+       (((char *)w) - offsetof (struct my_biggy, t2));
+   }
+
+=head2 MODEL/NESTED EVENT LOOP INVOCATIONS AND EXIT CONDITIONS
+
+Often (especially in GUI toolkits) there are places where you have
+I<modal> interaction, which is most easily implemented by recursively
+invoking C<ev_run>.
+
+This brings the problem of exiting - a callback might want to finish the
+main C<ev_run> call, but not the nested one (e.g. user clicked "Quit", but
+a modal "Are you sure?" dialog is still waiting), or just the nested one
+and not the main one (e.g. user clocked "Ok" in a modal dialog), or some
+other combination: In these cases, C<ev_break> will not work alone.
+
+The solution is to maintain "break this loop" variable for each C<ev_run>
+invocation, and use a loop around C<ev_run> until the condition is
+triggered, using C<EVRUN_ONCE>:
+
+   // main loop
+   int exit_main_loop = 0;
+
+   while (!exit_main_loop)
+     ev_run (EV_DEFAULT_ EVRUN_ONCE);
+
+   // in a model watcher
+   int exit_nested_loop = 0;
+
+   while (!exit_nested_loop)
+     ev_run (EV_A_ EVRUN_ONCE);
+
+To exit from any of these loops, just set the corresponding exit variable:
+
+   // exit modal loop
+   exit_nested_loop = 1;
+
+   // exit main program, after modal loop is finished
+   exit_main_loop = 1;
+
+   // exit both
+   exit_main_loop = exit_nested_loop = 1;
+
+=head2 THREAD LOCKING EXAMPLE
+
+Here is a fictitious example of how to run an event loop in a different
+thread from where callbacks are being invoked and watchers are
+created/added/removed.
+
+For a real-world example, see the C<EV::Loop::Async> perl module,
+which uses exactly this technique (which is suited for many high-level
+languages).
+
+The example uses a pthread mutex to protect the loop data, a condition
+variable to wait for callback invocations, an async watcher to notify the
+event loop thread and an unspecified mechanism to wake up the main thread.
+
+First, you need to associate some data with the event loop:
+
+   typedef struct {
+     mutex_t lock; /* global loop lock */
+     ev_async async_w;
+     thread_t tid;
+     cond_t invoke_cv;
+   } userdata;
+
+   void prepare_loop (EV_P)
+   {
+      // for simplicity, we use a static userdata struct.
+      static userdata u;
+
+      ev_async_init (&u->async_w, async_cb);
+      ev_async_start (EV_A_ &u->async_w);
+
+      pthread_mutex_init (&u->lock, 0);
+      pthread_cond_init (&u->invoke_cv, 0);
+
+      // now associate this with the loop
+      ev_set_userdata (EV_A_ u);
+      ev_set_invoke_pending_cb (EV_A_ l_invoke);
+      ev_set_loop_release_cb (EV_A_ l_release, l_acquire);
+
+      // then create the thread running ev_run
+      pthread_create (&u->tid, 0, l_run, EV_A);
+   }
+
+The callback for the C<ev_async> watcher does nothing: the watcher is used
+solely to wake up the event loop so it takes notice of any new watchers
+that might have been added:
+
+   static void
+   async_cb (EV_P_ ev_async *w, int revents)
+   {
+      // just used for the side effects
+   }
+
+The C<l_release> and C<l_acquire> callbacks simply unlock/lock the mutex
+protecting the loop data, respectively.
+
+   static void
+   l_release (EV_P)
+   {
+     userdata *u = ev_userdata (EV_A);
+     pthread_mutex_unlock (&u->lock);
+   }
+
+   static void
+   l_acquire (EV_P)
+   {
+     userdata *u = ev_userdata (EV_A);
+     pthread_mutex_lock (&u->lock);
+   }
+
+The event loop thread first acquires the mutex, and then jumps straight
+into C<ev_run>:
+
+   void *
+   l_run (void *thr_arg)
+   {
+     struct ev_loop *loop = (struct ev_loop *)thr_arg;
+
+     l_acquire (EV_A);
+     pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, 0);
+     ev_run (EV_A_ 0);
+     l_release (EV_A);
+
+     return 0;
+   }
+
+Instead of invoking all pending watchers, the C<l_invoke> callback will
+signal the main thread via some unspecified mechanism (signals? pipe
+writes? C<Async::Interrupt>?) and then waits until all pending watchers
+have been called (in a while loop because a) spurious wakeups are possible
+and b) skipping inter-thread-communication when there are no pending
+watchers is very beneficial):
+
+   static void
+   l_invoke (EV_P)
+   {
+     userdata *u = ev_userdata (EV_A);
+
+     while (ev_pending_count (EV_A))
+       {
+         wake_up_other_thread_in_some_magic_or_not_so_magic_way ();
+         pthread_cond_wait (&u->invoke_cv, &u->lock);
+       }
+   }
+
+Now, whenever the main thread gets told to invoke pending watchers, it
+will grab the lock, call C<ev_invoke_pending> and then signal the loop
+thread to continue:
+
+   static void
+   real_invoke_pending (EV_P)
+   {
+     userdata *u = ev_userdata (EV_A);
+
+     pthread_mutex_lock (&u->lock);
+     ev_invoke_pending (EV_A);
+     pthread_cond_signal (&u->invoke_cv);
+     pthread_mutex_unlock (&u->lock);
+   }
+
+Whenever you want to start/stop a watcher or do other modifications to an
+event loop, you will now have to lock:
+
+   ev_timer timeout_watcher;
+   userdata *u = ev_userdata (EV_A);
+
+   ev_timer_init (&timeout_watcher, timeout_cb, 5.5, 0.);
+
+   pthread_mutex_lock (&u->lock);
+   ev_timer_start (EV_A_ &timeout_watcher);
+   ev_async_send (EV_A_ &u->async_w);
+   pthread_mutex_unlock (&u->lock);
+
+Note that sending the C<ev_async> watcher is required because otherwise
+an event loop currently blocking in the kernel will have no knowledge
+about the newly added timer. By waking up the loop it will pick up any new
+watchers in the next event loop iteration.
+
+=head2 THREADS, COROUTINES, CONTINUATIONS, QUEUES... INSTEAD OF CALLBACKS
+
+While the overhead of a callback that e.g. schedules a thread is small, it
+is still an overhead. If you embed libev, and your main usage is with some
+kind of threads or coroutines, you might want to customise libev so that
+doesn't need callbacks anymore.
+
+Imagine you have coroutines that you can switch to using a function
+C<switch_to (coro)>, that libev runs in a coroutine called C<libev_coro>
+and that due to some magic, the currently active coroutine is stored in a
+global called C<current_coro>. Then you can build your own "wait for libev
+event" primitive by changing C<EV_CB_DECLARE> and C<EV_CB_INVOKE> (note
+the differing C<;> conventions):
+
+   #define EV_CB_DECLARE(type)   struct my_coro *cb;
+   #define EV_CB_INVOKE(watcher) switch_to ((watcher)->cb)
+
+That means instead of having a C callback function, you store the
+coroutine to switch to in each watcher, and instead of having libev call
+your callback, you instead have it switch to that coroutine.
+
+A coroutine might now wait for an event with a function called
+C<wait_for_event>. (the watcher needs to be started, as always, but it doesn't
+matter when, or whether the watcher is active or not when this function is
+called):
+
+   void
+   wait_for_event (ev_watcher *w)
+   {
+     ev_cb_set (w) = current_coro;
+     switch_to (libev_coro);
+   }
+
+That basically suspends the coroutine inside C<wait_for_event> and
+continues the libev coroutine, which, when appropriate, switches back to
+this or any other coroutine. I am sure if you sue this your own :)
+
+You can do similar tricks if you have, say, threads with an event queue -
+instead of storing a coroutine, you store the queue object and instead of
+switching to a coroutine, you push the watcher onto the queue and notify
+any waiters.
+
+To embed libev, see L<EMBEDDING>, but in short, it's easiest to create two
+files, F<my_ev.h> and F<my_ev.c> that include the respective libev files:
+
+   // my_ev.h
+   #define EV_CB_DECLARE(type)   struct my_coro *cb;
+   #define EV_CB_INVOKE(watcher) switch_to ((watcher)->cb);
+   #include "../libev/ev.h"
+
+   // my_ev.c
+   #define EV_H "my_ev.h"
+   #include "../libev/ev.c"
+
+And then use F<my_ev.h> when you would normally use F<ev.h>, and compile
+F<my_ev.c> into your project. When properly specifying include paths, you
+can even use F<ev.h> as header file name directly.
+
+
 =head1 LIBEVENT EMULATION
 
 Libev offers a compatibility emulation layer for libevent. It cannot
@@ -3173,6 +3731,11 @@
 
 =over 4
 
+=item * Only the libevent-1.4.1-beta API is being emulated.
+
+This was the newest libevent version available when libev was implemented,
+and is still mostly unchanged in 2010.
+
 =item * Use it by including <event.h>, as usual.
 
 =item * The following members are fully supported: ev_base, ev_callback,
@@ -3187,7 +3750,7 @@
 is an ev_pri field.
 
 =item * In libevent, the last base created gets the signals, in libev, the
-first base created (== the default loop) gets the signals.
+base that registered the signal gets the signals.
 
 =item * Other members are not supported.
 
@@ -3216,11 +3779,11 @@
 that the watcher is associated with (or no additional members at all if
 you disable C<EV_MULTIPLICITY> when embedding libev).
 
-Currently, functions, and static and non-static member functions can be
-used as callbacks. Other types should be easy to add as long as they only
-need one additional pointer for context. If you need support for other
-types of functors please contact the author (preferably after implementing
-it).
+Currently, functions, static and non-static member functions and classes
+with C<operator ()> can be used as callbacks. Other types should be easy
+to add as long as they only need one additional pointer for context. If
+you need support for other types of functors please contact the author
+(preferably after implementing it).
 
 Here is a list of things available in the C<ev> namespace:
 
@@ -3292,8 +3855,6 @@
 
 =item w->set (object *)
 
-This is an B<experimental> feature that might go away in a future version.
-
 This is a variation of a method callback - leaving out the method to call
 will default the method to C<operator ()>, which makes it possible to use
 functor objects without having to manually specify the C<operator ()> all
@@ -3342,16 +3903,22 @@
 
 =item w->set ([arguments])
 
-Basically the same as C<ev_TYPE_set>, with the same arguments. Must be
-called at least once. Unlike the C counterpart, an active watcher gets
-automatically stopped and restarted when reconfiguring it with this
-method.
+Basically the same as C<ev_TYPE_set>, with the same arguments. Either this
+method or a suitable start method must be called at least once. Unlike the
+C counterpart, an active watcher gets automatically stopped and restarted
+when reconfiguring it with this method.
 
 =item w->start ()
 
 Starts the watcher. Note that there is no C<loop> argument, as the
 constructor already stores the event loop.
 
+=item w->start ([arguments])
+
+Instead of calling C<set> and C<start> methods separately, it is often
+convenient to wrap them in one call. Uses the same type of arguments as
+the configure C<set> method of the watcher.
+
 =item w->stop ()
 
 Stops the watcher if it is active. Again, no C<loop> argument.
@@ -3373,20 +3940,25 @@
 
 =back
 
-Example: Define a class with an IO and idle watcher, start one of them in
-the constructor.
+Example: Define a class with two I/O and idle watchers, start the I/O
+watchers in the constructor.
 
    class myclass
    {
      ev::io   io  ; void io_cb   (ev::io   &w, int revents);
+     ev::io   io2 ; void io2_cb  (ev::io   &w, int revents);
      ev::idle idle; void idle_cb (ev::idle &w, int revents);
 
      myclass (int fd)
      {
        io  .set <myclass, &myclass::io_cb  > (this);
+       io2 .set <myclass, &myclass::io2_cb > (this);
        idle.set <myclass, &myclass::idle_cb> (this);
 
-       io.start (fd, ev::READ);
+       io.set (fd, ev::WRITE); // configure the watcher
+       io.start ();            // start it whenever convenient
+
+       io2.start (fd, ev::READ); // set + start in one call
      }
    };
 
@@ -3435,7 +4007,7 @@
 =item D
 
 Leandro Lucarella has written a D language binding (F<ev.d>) for libev, to
-be found at L<http://proj.llucax.com.ar/wiki/evd>.
+be found at L<http://www.llucax.com.ar/proj/ev.d/index.html>.
 
 =item Ocaml
 
@@ -3444,8 +4016,8 @@
 
 =item Lua
 
-Brian Maher has written a partial interface to libev
-for lua (only C<ev_io> and C<ev_timer>), to be found at
+Brian Maher has written a partial interface to libev for lua (at the
+time of this writing, only C<ev_io> and C<ev_timer>), to be found at
 L<http://github.com/brimworks/lua-ev>.
 
 =back
@@ -3470,7 +4042,7 @@
 
    ev_unref (EV_A);
    ev_timer_add (EV_A_ watcher);
-   ev_loop (EV_A_ 0);
+   ev_run (EV_A_ 0);
 
 It assumes the variable C<loop> of type C<struct ev_loop *> is in scope,
 which is often provided by the following macro.
@@ -3520,7 +4092,7 @@
    ev_check check;
    ev_check_init (&check, check_cb);
    ev_check_start (EV_DEFAULT_ &check);
-   ev_loop (EV_DEFAULT_ 0);
+   ev_run (EV_DEFAULT_ 0);
 
 =head1 EMBEDDING
 
@@ -3610,12 +4182,35 @@
 =head2 PREPROCESSOR SYMBOLS/MACROS
 
 Libev can be configured via a variety of preprocessor symbols you have to
-define before including any of its files. The default in the absence of
-autoconf is documented for every option.
+define before including (or compiling) any of its files. The default in
+the absence of autoconf is documented for every option.
+
+Symbols marked with "(h)" do not change the ABI, and can have different
+values when compiling libev vs. including F<ev.h>, so it is permissible
+to redefine them before including F<ev.h> without breaking compatibility
+to a compiled library. All other symbols change the ABI, which means all
+users of libev and the libev code itself must be compiled with compatible
+settings.
 
 =over 4
 
-=item EV_STANDALONE
+=item EV_COMPAT3 (h)
+
+Backwards compatibility is a major concern for libev. This is why this
+release of libev comes with wrappers for the functions and symbols that
+have been renamed between libev version 3 and 4.
+
+You can disable these wrappers (to test compatibility with future
+versions) by defining C<EV_COMPAT3> to C<0> when compiling your
+sources. This has the additional advantage that you can drop the C<struct>
+from C<struct ev_loop> declarations, as libev will provide an C<ev_loop>
+typedef in that case.
+
+In some future version, the default for C<EV_COMPAT3> will become C<0>,
+and in some even more future version the compatibility code will be
+removed completely.
+
+=item EV_STANDALONE (h)
 
 Must always be C<1> if you do not use autoconf configuration, which
 keeps libev from including F<config.h>, and it also defines dummy
@@ -3626,6 +4221,15 @@
 In standalone mode, libev will still try to automatically deduce the
 configuration, but has to be more conservative.
 
+=item EV_USE_FLOOR
+
+If defined to be C<1>, libev will use the C<floor ()> function for its
+periodic reschedule calculations, otherwise libev will fall back on a
+portable (slower) implementation. If you enable this, you usually have to
+link against libm or something equivalent. Enabling this when the C<floor>
+function is not available will fail, so the safe default is to not enable
+this.
+
 =item EV_USE_MONOTONIC
 
 If defined to be C<1>, libev will try to detect the availability of the
@@ -3767,32 +4371,35 @@
 =item EV_ATOMIC_T
 
 Libev requires an integer type (suitable for storing C<0> or C<1>) whose
-access is atomic with respect to other threads or signal contexts. No such
-type is easily found in the C language, so you can provide your own type
-that you know is safe for your purposes. It is used both for signal handler "locking"
-as well as for signal and thread safety in C<ev_async> watchers.
+access is atomic and serialised with respect to other threads or signal
+contexts. No such type is easily found in the C language, so you can
+provide your own type that you know is safe for your purposes. It is used
+both for signal handler "locking" as well as for signal and thread safety
+in C<ev_async> watchers.
 
 In the absence of this define, libev will use C<sig_atomic_t volatile>
-(from F<signal.h>), which is usually good enough on most platforms.
+(from F<signal.h>), which is usually good enough on most platforms,
+although strictly speaking using a type that also implies a memory fence
+is required.
 
-=item EV_H
+=item EV_H (h)
 
 The name of the F<ev.h> header file used to include it. The default if
 undefined is C<"ev.h"> in F<event.h>, F<ev.c> and F<ev++.h>. This can be
 used to virtually rename the F<ev.h> header file in case of conflicts.
 
-=item EV_CONFIG_H
+=item EV_CONFIG_H (h)
 
 If C<EV_STANDALONE> isn't C<1>, this variable can be used to override
 F<ev.c>'s idea of where to find the F<config.h> file, similarly to
 C<EV_H>, above.
 
-=item EV_EVENT_H
+=item EV_EVENT_H (h)
 
 Similarly to C<EV_H>, this macro can be used to override F<event.c>'s idea
 of how the F<event.h> header can be found, the default is C<"event.h">.
 
-=item EV_PROTOTYPES
+=item EV_PROTOTYPES (h)
 
 If defined to be C<0>, then F<ev.h> will not define any function
 prototypes, but still define all the structs and other symbols. This is
@@ -3824,55 +4431,106 @@
 If your embedding application does not need any priorities, defining these
 both to C<0> will save some memory and CPU.
 
-=item EV_PERIODIC_ENABLE
+=item EV_PERIODIC_ENABLE, EV_IDLE_ENABLE, EV_EMBED_ENABLE, EV_STAT_ENABLE,
+EV_PREPARE_ENABLE, EV_CHECK_ENABLE, EV_FORK_ENABLE, EV_SIGNAL_ENABLE,
+EV_ASYNC_ENABLE, EV_CHILD_ENABLE.
+
+If undefined or defined to be C<1> (and the platform supports it), then
+the respective watcher type is supported. If defined to be C<0>, then it
+is not. Disabling watcher types mainly saves code size.
 
-If undefined or defined to be C<1>, then periodic timers are supported. If
-defined to be C<0>, then they are not. Disabling them saves a few kB of
-code.
+=item EV_FEATURES
 
-=item EV_IDLE_ENABLE
+If you need to shave off some kilobytes of code at the expense of some
+speed (but with the full API), you can define this symbol to request
+certain subsets of functionality. The default is to enable all features
+that can be enabled on the platform.
+
+A typical way to use this symbol is to define it to C<0> (or to a bitset
+with some broad features you want) and then selectively re-enable
+additional parts you want, for example if you want everything minimal,
+but multiple event loop support, async and child watchers and the poll
+backend, use this:
+
+   #define EV_FEATURES 0
+   #define EV_MULTIPLICITY 1
+   #define EV_USE_POLL 1
+   #define EV_CHILD_ENABLE 1
+   #define EV_ASYNC_ENABLE 1
 
-If undefined or defined to be C<1>, then idle watchers are supported. If
-defined to be C<0>, then they are not. Disabling them saves a few kB of
-code.
+The actual value is a bitset, it can be a combination of the following
+values:
 
-=item EV_EMBED_ENABLE
+=over 4
 
-If undefined or defined to be C<1>, then embed watchers are supported. If
-defined to be C<0>, then they are not. Embed watchers rely on most other
-watcher types, which therefore must not be disabled.
+=item C<1> - faster/larger code
 
-=item EV_STAT_ENABLE
+Use larger code to speed up some operations.
 
-If undefined or defined to be C<1>, then stat watchers are supported. If
-defined to be C<0>, then they are not.
+Currently this is used to override some inlining decisions (enlarging the
+code size by roughly 30% on amd64).
 
-=item EV_FORK_ENABLE
+When optimising for size, use of compiler flags such as C<-Os> with
+gcc is recommended, as well as C<-DNDEBUG>, as libev contains a number of
+assertions.
 
-If undefined or defined to be C<1>, then fork watchers are supported. If
-defined to be C<0>, then they are not.
+=item C<2> - faster/larger data structures
 
-=item EV_ASYNC_ENABLE
+Replaces the small 2-heap for timer management by a faster 4-heap, larger
+hash table sizes and so on. This will usually further increase code size
+and can additionally have an effect on the size of data structures at
+runtime.
 
-If undefined or defined to be C<1>, then async watchers are supported. If
-defined to be C<0>, then they are not.
+=item C<4> - full API configuration
 
-=item EV_MINIMAL
+This enables priorities (sets C<EV_MAXPRI>=2 and C<EV_MINPRI>=-2), and
+enables multiplicity (C<EV_MULTIPLICITY>=1).
 
-If you need to shave off some kilobytes of code at the expense of some
-speed (but with the full API), define this symbol to C<1>. Currently this
-is used to override some inlining decisions, saves roughly 30% code size
-on amd64. It also selects a much smaller 2-heap for timer management over
-the default 4-heap.
-
-You can save even more by disabling watcher types you do not need
-and setting C<EV_MAXPRI> == C<EV_MINPRI>. Also, disabling C<assert>
-(C<-DNDEBUG>) will usually reduce code size a lot.
-
-Defining C<EV_MINIMAL> to C<2> will additionally reduce the core API to
-provide a bare-bones event library. See C<ev.h> for details on what parts
-of the API are still available, and do not complain if this subset changes
-over time.
+=item C<8> - full API
+
+This enables a lot of the "lesser used" API functions. See C<ev.h> for
+details on which parts of the API are still available without this
+feature, and do not complain if this subset changes over time.
+
+=item C<16> - enable all optional watcher types
+
+Enables all optional watcher types.  If you want to selectively enable
+only some watcher types other than I/O and timers (e.g. prepare,
+embed, async, child...) you can enable them manually by defining
+C<EV_watchertype_ENABLE> to C<1> instead.
+
+=item C<32> - enable all backends
+
+This enables all backends - without this feature, you need to enable at
+least one backend manually (C<EV_USE_SELECT> is a good choice).
+
+=item C<64> - enable OS-specific "helper" APIs
+
+Enable inotify, eventfd, signalfd and similar OS-specific helper APIs by
+default.
+
+=back
+
+Compiling with C<gcc -Os -DEV_STANDALONE -DEV_USE_EPOLL=1 -DEV_FEATURES=0>
+reduces the compiled size of libev from 24.7Kb code/2.8Kb data to 6.5Kb
+code/0.3Kb data on my GNU/Linux amd64 system, while still giving you I/O
+watchers, timers and monotonic clock support.
+
+With an intelligent-enough linker (gcc+binutils are intelligent enough
+when you use C<-Wl,--gc-sections -ffunction-sections>) functions unused by
+your program might be left out as well - a binary starting a timer and an
+I/O watcher then might come out at only 5Kb.
+
+=item EV_AVOID_STDIO
+
+If this is set to C<1> at compiletime, then libev will avoid using stdio
+functions (printf, scanf, perror etc.). This will increase the code size
+somewhat, but if your program doesn't otherwise depend on stdio and your
+libc allows it, this avoids linking in the stdio library which is quite
+big.
+
+Note that error messages might become less precise when this option is
+enabled.
 
 =item EV_NSIG
 
@@ -3880,23 +4538,23 @@
 signals): Normally, libev tries to deduce the maximum number of signals
 automatically, but sometimes this fails, in which case it can be
 specified. Also, using a lower number than detected (C<32> should be
-good for about any system in existance) can save some memory, as libev
+good for about any system in existence) can save some memory, as libev
 statically allocates some 12-24 bytes per signal number.
 
 =item EV_PID_HASHSIZE
 
 C<ev_child> watchers use a small hash table to distribute workload by
-pid. The default size is C<16> (or C<1> with C<EV_MINIMAL>), usually more
-than enough. If you need to manage thousands of children you might want to
-increase this value (I<must> be a power of two).
+pid. The default size is C<16> (or C<1> with C<EV_FEATURES> disabled),
+usually more than enough. If you need to manage thousands of children you
+might want to increase this value (I<must> be a power of two).
 
 =item EV_INOTIFY_HASHSIZE
 
 C<ev_stat> watchers use a small hash table to distribute workload by
-inotify watch id. The default size is C<16> (or C<1> with C<EV_MINIMAL>),
-usually more than enough. If you need to manage thousands of C<ev_stat>
-watchers you might want to increase this value (I<must> be a power of
-two).
+inotify watch id. The default size is C<16> (or C<1> with C<EV_FEATURES>
+disabled), usually more than enough. If you need to manage thousands of
+C<ev_stat> watchers you might want to increase this value (I<must> be a
+power of two).
 
 =item EV_USE_4HEAP
 
@@ -3905,8 +4563,8 @@
 to C<1>. The 4-heap uses more complicated (longer) code but has noticeably
 faster performance with many (thousands) of watchers.
 
-The default is C<1> unless C<EV_MINIMAL> is set in which case it is C<0>
-(disabled).
+The default is C<1>, unless C<EV_FEATURES> overrides it, in which case it
+will be C<0>.
 
 =item EV_HEAP_CACHE_AT
 
@@ -3917,12 +4575,12 @@
 but avoids random read accesses on heap changes. This improves performance
 noticeably with many (hundreds) of watchers.
 
-The default is C<1> unless C<EV_MINIMAL> is set in which case it is C<0>
-(disabled).
+The default is C<1>, unless C<EV_FEATURES> overrides it, in which case it
+will be C<0>.
 
 =item EV_VERIFY
 
-Controls how much internal verification (see C<ev_loop_verify ()>) will
+Controls how much internal verification (see C<ev_verify ()>) will
 be done: If set to C<0>, no internal verification code will be compiled
 in. If set to C<1>, then verification code will be compiled in, but not
 called. If set to C<2>, then the internal verification code will be
@@ -3930,13 +4588,13 @@
 verification code will be called very frequently, which will slow down
 libev considerably.
 
-The default is C<1>, unless C<EV_MINIMAL> is set, in which case it will be
-C<0>.
+The default is C<1>, unless C<EV_FEATURES> overrides it, in which case it
+will be C<0>.
 
 =item EV_COMMON
 
 By default, all watchers have a C<void *data> member. By redefining
-this macro to a something else you can include more and other types of
+this macro to something else you can include more and other types of
 members. You have to define it each time you include one of the files,
 though, and it must be identical each time.
 
@@ -3999,15 +4657,14 @@
 The usage in rxvt-unicode is simpler. It has a F<ev_cpp.h> header file
 that everybody includes and which overrides some configure choices:
 
-   #define EV_MINIMAL 1
-   #define EV_USE_POLL 0
-   #define EV_MULTIPLICITY 0
-   #define EV_PERIODIC_ENABLE 0
-   #define EV_STAT_ENABLE 0
-   #define EV_FORK_ENABLE 0
+   #define EV_FEATURES 8
+   #define EV_USE_SELECT 1
+   #define EV_PREPARE_ENABLE 1
+   #define EV_IDLE_ENABLE 1
+   #define EV_SIGNAL_ENABLE 1
+   #define EV_CHILD_ENABLE 1
+   #define EV_USE_STDEXCEPT 0
    #define EV_CONFIG_H <config.h>
-   #define EV_MINPRI 0
-   #define EV_MAXPRI 0
 
    #include "ev++.h"
 
@@ -4016,7 +4673,7 @@
    #include "ev_cpp.h"
    #include "ev.c"
 
-=head1 INTERACTION WITH OTHER PROGRAMS OR LIBRARIES
+=head1 INTERACTION WITH OTHER PROGRAMS, LIBRARIES OR THE ENVIRONMENT
 
 =head2 THREADS AND COROUTINES
 
@@ -4077,155 +4734,19 @@
 
 =back
 
-=head4 THREAD LOCKING EXAMPLE
-
-Here is a fictitious example of how to run an event loop in a different
-thread than where callbacks are being invoked and watchers are
-created/added/removed.
-
-For a real-world example, see the C<EV::Loop::Async> perl module,
-which uses exactly this technique (which is suited for many high-level
-languages).
-
-The example uses a pthread mutex to protect the loop data, a condition
-variable to wait for callback invocations, an async watcher to notify the
-event loop thread and an unspecified mechanism to wake up the main thread.
-
-First, you need to associate some data with the event loop:
-
-   typedef struct {
-     mutex_t lock; /* global loop lock */
-     ev_async async_w;
-     thread_t tid;
-     cond_t invoke_cv;
-   } userdata;
-
-   void prepare_loop (EV_P)
-   {
-      // for simplicity, we use a static userdata struct.
-      static userdata u;
-
-      ev_async_init (&u->async_w, async_cb);
-      ev_async_start (EV_A_ &u->async_w);
-
-      pthread_mutex_init (&u->lock, 0);
-      pthread_cond_init (&u->invoke_cv, 0);
-
-      // now associate this with the loop
-      ev_set_userdata (EV_A_ u);
-      ev_set_invoke_pending_cb (EV_A_ l_invoke);
-      ev_set_loop_release_cb (EV_A_ l_release, l_acquire);
-
-      // then create the thread running ev_loop
-      pthread_create (&u->tid, 0, l_run, EV_A);
-   }
-
-The callback for the C<ev_async> watcher does nothing: the watcher is used
-solely to wake up the event loop so it takes notice of any new watchers
-that might have been added:
-
-   static void
-   async_cb (EV_P_ ev_async *w, int revents)
-   {
-      // just used for the side effects
-   }
-
-The C<l_release> and C<l_acquire> callbacks simply unlock/lock the mutex
-protecting the loop data, respectively.
-
-   static void
-   l_release (EV_P)
-   {
-     userdata *u = ev_userdata (EV_A);
-     pthread_mutex_unlock (&u->lock);
-   }
-
-   static void
-   l_acquire (EV_P)
-   {
-     userdata *u = ev_userdata (EV_A);
-     pthread_mutex_lock (&u->lock);
-   }
-
-The event loop thread first acquires the mutex, and then jumps straight
-into C<ev_loop>:
-
-   void *
-   l_run (void *thr_arg)
-   {
-     struct ev_loop *loop = (struct ev_loop *)thr_arg;
-
-     l_acquire (EV_A);
-     pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, 0);
-     ev_loop (EV_A_ 0);
-     l_release (EV_A);
-
-     return 0;
-   }
-
-Instead of invoking all pending watchers, the C<l_invoke> callback will
-signal the main thread via some unspecified mechanism (signals? pipe
-writes? C<Async::Interrupt>?) and then waits until all pending watchers
-have been called (in a while loop because a) spurious wakeups are possible
-and b) skipping inter-thread-communication when there are no pending
-watchers is very beneficial):
-
-   static void
-   l_invoke (EV_P)
-   {
-     userdata *u = ev_userdata (EV_A);
-
-     while (ev_pending_count (EV_A))
-       {
-         wake_up_other_thread_in_some_magic_or_not_so_magic_way ();
-         pthread_cond_wait (&u->invoke_cv, &u->lock);
-       }
-   }
-
-Now, whenever the main thread gets told to invoke pending watchers, it
-will grab the lock, call C<ev_invoke_pending> and then signal the loop
-thread to continue:
-
-   static void
-   real_invoke_pending (EV_P)
-   {
-     userdata *u = ev_userdata (EV_A);
-
-     pthread_mutex_lock (&u->lock);
-     ev_invoke_pending (EV_A);
-     pthread_cond_signal (&u->invoke_cv);
-     pthread_mutex_unlock (&u->lock);
-   }
-
-Whenever you want to start/stop a watcher or do other modifications to an
-event loop, you will now have to lock:
-
-   ev_timer timeout_watcher;
-   userdata *u = ev_userdata (EV_A);
-
-   ev_timer_init (&timeout_watcher, timeout_cb, 5.5, 0.);
-
-   pthread_mutex_lock (&u->lock);
-   ev_timer_start (EV_A_ &timeout_watcher);
-   ev_async_send (EV_A_ &u->async_w);
-   pthread_mutex_unlock (&u->lock);
-
-Note that sending the C<ev_async> watcher is required because otherwise
-an event loop currently blocking in the kernel will have no knowledge
-about the newly added timer. By waking up the loop it will pick up any new
-watchers in the next event loop iteration.
+See also L<THREAD LOCKING EXAMPLE>.
 
 =head3 COROUTINES
 
 Libev is very accommodating to coroutines ("cooperative threads"):
 libev fully supports nesting calls to its functions from different
-coroutines (e.g. you can call C<ev_loop> on the same loop from two
+coroutines (e.g. you can call C<ev_run> on the same loop from two
 different coroutines, and switch freely between both coroutines running
 the loop, as long as you don't confuse yourself). The only exception is
 that you must not do this from C<ev_periodic> reschedule callbacks.
 
 Care has been taken to ensure that libev does not keep local state inside
-C<ev_loop>, and other calls do not usually allow for coroutine switches as
+C<ev_run>, and other calls do not usually allow for coroutine switches as
 they do not call any callbacks.
 
 =head2 COMPILER WARNINGS
@@ -4246,7 +4767,7 @@
 And of course, some compiler warnings are just plain stupid, or simply
 wrong (because they don't actually warn about the condition their message
 seems to warn about). For example, certain older gcc versions had some
-warnings that resulted an extreme number of false positives. These have
+warnings that resulted in an extreme number of false positives. These have
 been fixed, but some people still insist on making code warn-free with
 such buggy versions.
 
@@ -4292,19 +4813,109 @@
 
 =head1 PORTABILITY NOTES
 
+=head2 GNU/LINUX 32 BIT LIMITATIONS
+
+GNU/Linux is the only common platform that supports 64 bit file/large file
+interfaces but I<disables> them by default.
+
+That means that libev compiled in the default environment doesn't support
+files larger than 2GiB or so, which mainly affects C<ev_stat> watchers.
+
+Unfortunately, many programs try to work around this GNU/Linux issue
+by enabling the large file API, which makes them incompatible with the
+standard libev compiled for their system.
+
+Likewise, libev cannot enable the large file API itself as this would
+suddenly make it incompatible to the default compile time environment,
+i.e. all programs not using special compile switches.
+
+=head2 OS/X AND DARWIN BUGS
+
+The whole thing is a bug if you ask me - basically any system interface
+you touch is broken, whether it is locales, poll, kqueue or even the
+OpenGL drivers.
+
+=head3 C<kqueue> is buggy
+
+The kqueue syscall is broken in all known versions - most versions support
+only sockets, many support pipes.
+
+Libev tries to work around this by not using C<kqueue> by default on this
+rotten platform, but of course you can still ask for it when creating a
+loop - embedding a socket-only kqueue loop into a select-based one is
+probably going to work well.
+
+=head3 C<poll> is buggy
+
+Instead of fixing C<kqueue>, Apple replaced their (working) C<poll>
+implementation by something calling C<kqueue> internally around the 10.5.6
+release, so now C<kqueue> I<and> C<poll> are broken.
+
+Libev tries to work around this by not using C<poll> by default on
+this rotten platform, but of course you can still ask for it when creating
+a loop.
+
+=head3 C<select> is buggy
+
+All that's left is C<select>, and of course Apple found a way to fuck this
+one up as well: On OS/X, C<select> actively limits the number of file
+descriptors you can pass in to 1024 - your program suddenly crashes when
+you use more.
+
+There is an undocumented "workaround" for this - defining
+C<_DARWIN_UNLIMITED_SELECT>, which libev tries to use, so select I<should>
+work on OS/X.
+
+=head2 SOLARIS PROBLEMS AND WORKAROUNDS
+
+=head3 C<errno> reentrancy
+
+The default compile environment on Solaris is unfortunately so
+thread-unsafe that you can't even use components/libraries compiled
+without C<-D_REENTRANT> in a threaded program, which, of course, isn't
+defined by default. A valid, if stupid, implementation choice.
+
+If you want to use libev in threaded environments you have to make sure
+it's compiled with C<_REENTRANT> defined.
+
+=head3 Event port backend
+
+The scalable event interface for Solaris is called "event
+ports". Unfortunately, this mechanism is very buggy in all major
+releases. If you run into high CPU usage, your program freezes or you get
+a large number of spurious wakeups, make sure you have all the relevant
+and latest kernel patches applied. No, I don't know which ones, but there
+are multiple ones to apply, and afterwards, event ports actually work
+great.
+
+If you can't get it to work, you can try running the program by setting
+the environment variable C<LIBEV_FLAGS=3> to only allow C<poll> and
+C<select> backends.
+
+=head2 AIX POLL BUG
+
+AIX unfortunately has a broken C<poll.h> header. Libev works around
+this by trying to avoid the poll backend altogether (i.e. it's not even
+compiled in), which normally isn't a big problem as C<select> works fine
+with large bitsets on AIX, and AIX is dead anyway.
+
 =head2 WIN32 PLATFORM LIMITATIONS AND WORKAROUNDS
 
+=head3 General issues
+
 Win32 doesn't support any of the standards (e.g. POSIX) that libev
 requires, and its I/O model is fundamentally incompatible with the POSIX
 model. Libev still offers limited functionality on this platform in
 the form of the C<EVBACKEND_SELECT> backend, and only supports socket
 descriptors. This only applies when using Win32 natively, not when using
-e.g. cygwin.
+e.g. cygwin. Actually, it only applies to the microsofts own compilers,
+as every compiler comes with a slightly differently broken/incompatible
+environment.
 
 Lifting these limitations would basically require the full
-re-implementation of the I/O system. If you are into these kinds of
-things, then note that glib does exactly that for you in a very portable
-way (note also that glib is the slowest event library known to man).
+re-implementation of the I/O system. If you are into this kind of thing,
+then note that glib does exactly that for you in a very portable way (note
+also that glib is the slowest event library known to man).
 
 There is no supported compilation method available on windows except
 embedding it into other applications.
@@ -4342,9 +4953,7 @@
    #include "evwrap.h"
    #include "ev.c"
 
-=over 4
-
-=item The winsocket select function
+=head3 The winsocket C<select> function
 
 The winsocket C<select> function doesn't follow POSIX in that it
 requires socket I<handles> and not socket I<file descriptors> (it is
@@ -4363,7 +4972,7 @@
 Note that winsockets handling of fd sets is O(n), so you can easily get a
 complexity in the O(n²) range when using win32.
 
-=item Limited number of file descriptors
+=head3 Limited number of file descriptors
 
 Windows has numerous arbitrary (and low) limits on things.
 
@@ -4388,8 +4997,6 @@
 you need to wrap all I/O functions and provide your own fd management, but
 the cost of calling select (O(n²)) will likely make this unworkable.
 
-=back
-
 =head2 PORTABILITY REQUIREMENTS
 
 In addition to a working ISO-C implementation and of course the
@@ -4406,6 +5013,11 @@
 callback: The watcher callbacks have different type signatures, but libev
 calls them using an C<ev_watcher *> internally.
 
+=item pointer accesses must be thread-atomic
+
+Accessing a pointer value must be atomic, it must both be readable and
+writable in one piece - this is the case on all current architectures.
+
 =item C<sig_atomic_t volatile> must be thread-atomic as well
 
 The type C<sig_atomic_t volatile> (or whatever is defined as
@@ -4437,11 +5049,15 @@
 =item C<double> must hold a time value in seconds with enough accuracy
 
 The type C<double> is used to represent timestamps. It is required to
-have at least 51 bits of mantissa (and 9 bits of exponent), which is good
-enough for at least into the year 4000. This requirement is fulfilled by
-implementations implementing IEEE 754, which is basically all existing
-ones. With IEEE 754 doubles, you get microsecond accuracy until at least
-2200.
+have at least 51 bits of mantissa (and 9 bits of exponent), which is
+good enough for at least into the year 4000 with millisecond accuracy
+(the design goal for libev). This requirement is overfulfilled by
+implementations using IEEE 754, which is basically all existing ones.
+
+With IEEE 754 doubles, you get microsecond accuracy until at least the
+year 2255 (and millisecond accuray till the year 287396 - by then, libev
+is either obsolete or somebody patched it to use C<long double> or
+something like that, just kidding).
 
 =back
 
@@ -4513,8 +5129,69 @@
 =item Processing signals: O(max_signal_number)
 
 Sending involves a system call I<iff> there were no other C<ev_async_send>
-calls in the current loop iteration. Checking for async and signal events
-involves iterating over all running async watchers or all signal numbers.
+calls in the current loop iteration and the loop is currently
+blocked. Checking for async and signal events involves iterating over all
+running async watchers or all signal numbers.
+
+=back
+
+
+=head1 PORTING FROM LIBEV 3.X TO 4.X
+
+The major version 4 introduced some incompatible changes to the API.
+
+At the moment, the C<ev.h> header file provides compatibility definitions
+for all changes, so most programs should still compile. The compatibility
+layer might be removed in later versions of libev, so better update to the
+new API early than late.
+
+=over 4
+
+=item C<EV_COMPAT3> backwards compatibility mechanism
+
+The backward compatibility mechanism can be controlled by
+C<EV_COMPAT3>. See L<PREPROCESSOR SYMBOLS/MACROS> in the L<EMBEDDING>
+section.
+
+=item C<ev_default_destroy> and C<ev_default_fork> have been removed
+
+These calls can be replaced easily by their C<ev_loop_xxx> counterparts:
+
+   ev_loop_destroy (EV_DEFAULT_UC);
+   ev_loop_fork (EV_DEFAULT);
+
+=item function/symbol renames
+
+A number of functions and symbols have been renamed:
+
+  ev_loop         => ev_run
+  EVLOOP_NONBLOCK => EVRUN_NOWAIT
+  EVLOOP_ONESHOT  => EVRUN_ONCE
+
+  ev_unloop       => ev_break
+  EVUNLOOP_CANCEL => EVBREAK_CANCEL
+  EVUNLOOP_ONE    => EVBREAK_ONE
+  EVUNLOOP_ALL    => EVBREAK_ALL
+
+  EV_TIMEOUT      => EV_TIMER
+
+  ev_loop_count   => ev_iteration
+  ev_loop_depth   => ev_depth
+  ev_loop_verify  => ev_verify
+
+Most functions working on C<struct ev_loop> objects don't have an
+C<ev_loop_> prefix, so it was removed; C<ev_loop>, C<ev_unloop> and
+associated constants have been renamed to not collide with the C<struct
+ev_loop> anymore and C<EV_TIMER> now follows the same naming scheme
+as all other watcher types. Note that C<ev_loop_fork> is still called
+C<ev_loop_fork> because it would otherwise clash with the C<ev_fork>
+typedef.
+
+=item C<EV_MINIMAL> mechanism replaced by C<EV_FEATURES>
+
+The preprocessor symbol C<EV_MINIMAL> has been replaced by a different
+mechanism, C<EV_FEATURES>. Programs using C<EV_MINIMAL> usually compile
+and work, but the library code will of course be larger.
 
 =back
 
@@ -4525,20 +5202,24 @@
 
 =item active
 
-A watcher is active as long as it has been started (has been attached to
-an event loop) but not yet stopped (disassociated from the event loop).
+A watcher is active as long as it has been started and not yet stopped.
+See L<WATCHER STATES> for details.
 
 =item application
 
 In this document, an application is whatever is using libev.
 
+=item backend
+
+The part of the code dealing with the operating system interfaces.
+
 =item callback
 
 The address of a function that is called when some event has been
 detected. Callbacks are being passed the event loop, the watcher that
 received the event, and the actual event bitset.
 
-=item callback invocation
+=item callback/watcher invocation
 
 The act of calling the callback associated with a watcher.
 
@@ -4549,7 +5230,7 @@
 any other events happening anymore.
 
 In libev, events are represented as single bits (such as C<EV_READ> or
-C<EV_TIMEOUT>).
+C<EV_TIMER>).
 
 =item event library
 
@@ -4567,12 +5248,8 @@
 
 =item pending
 
-A watcher is pending as soon as the corresponding event has been detected,
-and stops being pending as soon as the watcher will be invoked or its
-pending status is explicitly cleared by the application.
-
-A watcher can be pending, but not active. Stopping a watcher also clears
-its pending status.
+A watcher is pending as soon as the corresponding event has been
+detected. See L<WATCHER STATES> for details.
 
 =item real time
 
@@ -4581,7 +5258,7 @@
 =item wall-clock time
 
 The time and date as shown on clocks. Unlike real time, it can actually
-be wrong and jump forwards and backwards, e.g. when the you adjust your
+be wrong and jump forwards and backwards, e.g. when you adjust your
 clock.
 
 =item watcher
@@ -4589,13 +5266,10 @@
 A data structure that describes interest in certain events. Watchers need
 to be started (attached to an event loop) before they can receive events.
 
-=item watcher invocation
-
-The act of calling the callback associated with a watcher.
-
 =back
 
 =head1 AUTHOR
 
-Marc Lehmann <libev@schmorp.de>, with repeated corrections by Mikael Magnusson.
+Marc Lehmann <libev@schmorp.de>, with repeated corrections by Mikael
+Magnusson and Emanuele Giaquinta, and minor corrections by many others.