--- libev/ev.pod 2008/10/23 06:30:48 1.198 +++ libev/ev.pod 2008/11/20 00:35:10 1.218 @@ -11,8 +11,10 @@ // a single header file is required #include + #include // for puts + // every watcher type has its own typedef'd struct - // with the name ev_ + // with the name ev_TYPE ev_io stdin_watcher; ev_timer timeout_watcher; @@ -278,9 +280,13 @@ =head1 FUNCTIONS CONTROLLING THE EVENT LOOP -An event loop is described by a C. The library knows two -types of such loops, the I loop, which supports signals and child -events, and dynamically created loops which do not. +An event loop is described by a C (the C +is I optional in this case, as there is also an C +I). + +The library knows two types of such loops, the I loop, which +supports signals and child events, and dynamically created loops which do +not. =over 4 @@ -296,7 +302,7 @@ Note that this function is I thread-safe, so if you want to use it from multiple threads, you have to lock (note also that this is unlikely, -as loops cannot bes hared easily between threads anyway). +as loops cannot be shared easily between threads anyway). The default loop is the only loop that can handle C and C watchers, and to do this, it always registers a handler @@ -382,26 +388,43 @@ For few fds, this backend is a bit little slower than poll and select, but it scales phenomenally better. While poll and select usually scale like O(total_fds) where n is the total number of fds (or the highest fd), -epoll scales either O(1) or O(active_fds). The epoll design has a number -of shortcomings, such as silently dropping events in some hard-to-detect -cases and requiring a system call per fd change, no fork support and bad -support for dup. +epoll scales either O(1) or O(active_fds). + +The epoll mechanism deserves honorable mention as the most misdesigned +of the more advanced event mechanisms: mere annoyances include silently +dropping file descriptors, requiring a system call per change per file +descriptor (and unnecessary guessing of parameters), problems with dup and +so on. The biggest issue is fork races, however - if a program forks then +I parent and child process have to recreate the epoll set, which can +take considerable time (one syscall per file descriptor) and is of course +hard to detect. + +Epoll is also notoriously buggy - embedding epoll fds I work, but +of course I, and epoll just loves to report events for totally +I file descriptors (even already closed ones, so one cannot +even remove them from the set) than registered in the set (especially +on SMP systems). Libev tries to counter these spurious notifications by +employing an additional generation counter and comparing that against the +events to filter out spurious ones, recreating the set when required. While stopping, setting and starting an I/O watcher in the same iteration -will result in some caching, there is still a system call per such incident -(because the fd could point to a different file description now), so its -best to avoid that. Also, C'ed file descriptors might not work -very well if you register events for both fds. - -Please note that epoll sometimes generates spurious notifications, so you -need to use non-blocking I/O or other means to avoid blocking when no data -(or space) is available. +will result in some caching, there is still a system call per such +incident (because the same I could point to a different +I now), so its best to avoid that. Also, C'ed +file descriptors might not work very well if you register events for both +file descriptors. Best performance from this backend is achieved by not unregistering all watchers for a file descriptor until it has been closed, if possible, i.e. keep at least one watcher active per fd at all times. Stopping and starting a watcher (without re-setting it) also usually doesn't cause -extra overhead. +extra overhead. A fork can both result in spurious notifications as well +as in libev having to destroy and recreate the epoll object, which can +take considerable time and thus should be avoided. + +All this means that, in practice, C can be as fast or +faster than epoll for maybe up to a hundred file descriptors, depending on +the usage. So sad. While nominally embeddable in other event loops, this feature is broken in all kernel versions tested so far. @@ -411,12 +434,15 @@ =item C (value 8, most BSD clones) -Kqueue deserves special mention, as at the time of this writing, it was -broken on all BSDs except NetBSD (usually it doesn't work reliably with -anything but sockets and pipes, except on Darwin, where of course it's -completely useless). For this reason it's not being "auto-detected" unless -you explicitly specify it in the flags (i.e. using C) or -libev was compiled on a known-to-be-good (-enough) system like NetBSD. +Kqueue deserves special mention, as at the time of this writing, it +was broken on all BSDs except NetBSD (usually it doesn't work reliably +with anything but sockets and pipes, except on Darwin, where of course +it's completely useless). Unlike epoll, however, whose brokenness +is by design, these kqueue bugs can (and eventually will) be fixed +without API changes to existing programs. For this reason it's not being +"auto-detected" unless you explicitly specify it in the flags (i.e. using +C) or libev was compiled on a known-to-be-good (-enough) +system like NetBSD. You still can embed kqueue into a normal poll or select backend and use it only for sockets (after having made sure that sockets work with kqueue on @@ -426,8 +452,9 @@ kernel is more efficient (which says nothing about its actual speed, of course). While stopping, setting and starting an I/O watcher does never cause an extra system call as with C, it still adds up to -two event changes per incident. Support for C is very bad and it -drops fds silently in similarly hard-to-detect cases. +two event changes per incident. Support for C is very bad (but +sane, unlike epoll) and it drops fds silently in similarly hard-to-detect +cases This backend usually performs well under most conditions. @@ -466,7 +493,7 @@ On the positive side, with the exception of the spurious readiness notifications, this backend actually performed fully to specification in all tests and is fully embeddable, which is a rare feat among the -OS-specific backends. +OS-specific backends (I vastly prefer correctness over speed hacks). This backend maps C and C in the same way as C. @@ -529,9 +556,9 @@ the easiest thing, you can just ignore the watchers and/or C them for example). -Note that certain global state, such as signal state, will not be freed by -this function, and related watchers (such as signal and child watchers) -would need to be stopped manually. +Note that certain global state, such as signal state (and installed signal +handlers), will not be freed by this function, and related watchers (such +as signal and child watchers) would need to be stopped manually. In general it is not advisable to call this function except in the rare occasion where you really need to free e.g. the signal handling @@ -633,7 +660,7 @@ A flags value of C will look for new events (waiting if necessary) and will handle those and any already outstanding ones. It will block your process until at least one new event arrives (which could -be an event internal to libev itself, so there is no guarentee that a +be an event internal to libev itself, so there is no guarantee that a user-registered callback will be called), and will return after one iteration of the loop. @@ -770,7 +797,7 @@ =item ev_loop_verify (loop) This function only does something when C support has been -compiled in. which is the default for non-minimal builds. It tries to go +compiled in, which is the default for non-minimal builds. It tries to go through all internal structures and checks them for validity. If anything is found to be inconsistent, it will print an error message to standard error and call C. @@ -784,6 +811,10 @@ =head1 ANATOMY OF A WATCHER +In the following description, uppercase C in names stands for the +watcher type, e.g. C can mean C for timer +watchers and C for I/O watchers. + A watcher is a structure that you create and register to record your interest in some event. For instance, if you want to wait for STDIN to become readable, you would create an C watcher for that: @@ -795,15 +826,21 @@ } struct ev_loop *loop = ev_default_loop (0); + ev_io stdin_watcher; + ev_init (&stdin_watcher, my_cb); ev_io_set (&stdin_watcher, STDIN_FILENO, EV_READ); ev_io_start (loop, &stdin_watcher); + ev_loop (loop, 0); As you can see, you are responsible for allocating the memory for your -watcher structures (and it is usually a bad idea to do this on the stack, -although this can sometimes be quite valid). +watcher structures (and it is I a bad idea to do this on the +stack). + +Each watcher has an associated watcher structure (called C +or simply C, as typedefs are provided for all watcher structs). Each watcher structure must be initialised by a call to C, which expects a callback to be provided. This @@ -811,19 +848,19 @@ watchers, each time the event loop detects that the file descriptor given is readable and/or writable). -Each watcher type has its own C<< ev__set (watcher *, ...) >> macro -with arguments specific to this watcher type. There is also a macro -to combine initialisation and setting in one call: C<< ev__init -(watcher *, callback, ...) >>. +Each watcher type further has its own C<< ev_TYPE_set (watcher *, ...) >> +macro to configure it, with arguments specific to the watcher type. There +is also a macro to combine initialisation and setting in one call: C<< +ev_TYPE_init (watcher *, callback, ...) >>. To make the watcher actually watch out for events, you have to start it -with a watcher-specific start function (C<< ev__start (loop, watcher +with a watcher-specific start function (C<< ev_TYPE_start (loop, watcher *) >>), and you can stop watching for events at any time by calling the -corresponding stop function (C<< ev__stop (loop, watcher *) >>. +corresponding stop function (C<< ev_TYPE_stop (loop, watcher *) >>. As long as your watcher is active (has been started but not stopped) you must not touch the values stored in it. Most specifically you must never -reinitialise it or call its C macro. +reinitialise it or call its C macro. Each and every callback receives the event loop pointer as first, the registered watcher structure as second, and a bitset of received events as @@ -914,9 +951,6 @@ =head2 GENERIC WATCHER FUNCTIONS -In the following description, C stands for the watcher type, -e.g. C for C watchers and C for C watchers. - =over 4 =item C (ev_TYPE *watcher, callback) @@ -1034,7 +1068,7 @@ Setting a priority outside the range of C to C is fine, as long as you do not mind that the priority value you query might -or might not have been adjusted to be within valid range. +or might not have been clamped to the valid range. =item ev_invoke (loop, ev_TYPE *watcher, int revents) @@ -1290,20 +1324,20 @@ =head3 Be smart about timeouts -Many real-world problems invole some kind of time-out, usually for error +Many real-world problems involve some kind of timeout, usually for error recovery. A typical example is an HTTP request - if the other side hangs, you want to raise some error after a while. -Here are some ways on how to handle this problem, from simple and -inefficient to very efficient. +What follows are some ways to handle this problem, from obvious and +inefficient to smart and efficient. -In the following examples a 60 second activity timeout is assumed - a -timeout that gets reset to 60 seconds each time some data ("a lifesign") -was received. +In the following, a 60 second activity timeout is assumed - a timeout that +gets reset to 60 seconds each time there is activity (e.g. each time some +data or other life sign was received). =over 4 -=item 1. Use a timer and stop, reinitialise, start it on activity. +=item 1. Use a timer and stop, reinitialise and start it on activity. This is the most obvious, but not the most simple way: In the beginning, start the watcher: @@ -1311,55 +1345,61 @@ ev_timer_init (timer, callback, 60., 0.); ev_timer_start (loop, timer); -Then, each time there is some activity, C the timer, -initialise it again, and start it: +Then, each time there is some activity, C it, initialise it +and start it again: ev_timer_stop (loop, timer); ev_timer_set (timer, 60., 0.); ev_timer_start (loop, timer); -This is relatively simple to implement, but means that each time there -is some activity, libev will first have to remove the timer from it's -internal data strcuture and then add it again. +This is relatively simple to implement, but means that each time there is +some activity, libev will first have to remove the timer from its internal +data structure and then add it again. Libev tries to be fast, but it's +still not a constant-time operation. =item 2. Use a timer and re-start it with C inactivity. This is the easiest way, and involves using C instead of C. -For this, configure an C with a C value of C<60> and -then call C at start and each time you successfully read -or write some data. If you go into an idle state where you do not expect -data to travel on the socket, you can C the timer, and -C will automatically restart it if need be. - -That means you can ignore the C value and C -altogether and only ever use the C value and C. +To implement this, configure an C with a C value +of C<60> and then call C at start and each time you +successfully read or write some data. If you go into an idle state where +you do not expect data to travel on the socket, you can C +the timer, and C will automatically restart it if need be. + +That means you can ignore both the C function and the +C argument to C, and only ever use the C +member and C. At start: - ev_timer_init (timer, callback, 0., 60.); + ev_timer_init (timer, callback); + timer->repeat = 60.; ev_timer_again (loop, timer); -Each time you receive some data: +Each time there is some activity: ev_timer_again (loop, timer); -It is even possible to change the time-out on the fly: +It is even possible to change the time-out on the fly, regardless of +whether the watcher is active or not: timer->repeat = 30.; ev_timer_again (loop, timer); This is slightly more efficient then stopping/starting the timer each time you want to modify its timeout value, as libev does not have to completely -remove and re-insert the timer from/into it's internal data structure. +remove and re-insert the timer from/into its internal data structure. + +It is, however, even simpler than the "obvious" way to do it. =item 3. Let the timer time out, but then re-arm it as required. This method is more tricky, but usually most efficient: Most timeouts are -relatively long compared to the loop iteration time - in our example, -within 60 seconds, there are usually many I/O events with associated -activity resets. +relatively long compared to the intervals between other activity - in +our example, within 60 seconds, there are usually many I/O events with +associated activity resets. In this case, it would be more efficient to leave the C alone, but remember the time of last activity, and check for a real timeout only @@ -1370,10 +1410,10 @@ static void callback (EV_P_ ev_timer *w, int revents) { - ev_tstamp now = ev_now (EV_A); + ev_tstamp now = ev_now (EV_A); ev_tstamp timeout = last_activity + 60.; - // if last_activity is older than now - timeout, we did time out + // if last_activity + 60. is older than now, we did time out if (timeout < now) { // timeout occured, take action @@ -1381,41 +1421,82 @@ else { // callback was invoked, but there was some activity, re-arm - // to fire in last_activity + 60. - w->again = timeout - now; + // the watcher to fire in last_activity + 60, which is + // guaranteed to be in the future, so "again" is positive: + w->repeat = timeout - now; ev_timer_again (EV_A_ w); } } -To summarise the callback: first calculate the real time-out (defined as -"60 seconds after the last activity"), then check if that time has been -reached, which means there was a real timeout. Otherwise the callback was -invoked too early (timeout is in the future), so re-schedule the timer to -fire at that future time. +To summarise the callback: first calculate the real timeout (defined +as "60 seconds after the last activity"), then check if that time has +been reached, which means something I, in fact, time out. Otherwise +the callback was invoked too early (C is in the future), so +re-schedule the timer to fire at that future time, to see if maybe we have +a timeout then. Note how C is used, taking advantage of the C optimisation when the timer is already running. -This scheme causes more callback invocations (about one every 60 seconds), -but virtually no calls to libev to change the timeout. - -To start the timer, simply intiialise the watcher and C, -then call the callback: +This scheme causes more callback invocations (about one every 60 seconds +minus half the average time between activity), but virtually no calls to +libev to change the timeout. + +To start the timer, simply initialise the watcher and set C +to the current time (meaning we just have some activity :), then call the +callback, which will "do the right thing" and start the timer: ev_timer_init (timer, callback); last_activity = ev_now (loop); callback (loop, timer, EV_TIMEOUT); -And when there is some activity, simply remember the time in -C: +And when there is some activity, simply store the current time in +C, no libev calls at all: last_actiivty = ev_now (loop); This technique is slightly more complex, but in most cases where the time-out is unlikely to be triggered, much more efficient. +Changing the timeout is trivial as well (if it isn't hard-coded in the +callback :) - just change the timeout and invoke the callback, which will +fix things for you. + +=item 4. Wee, just use a double-linked list for your timeouts. + +If there is not one request, but many thousands (millions...), all +employing some kind of timeout with the same timeout value, then one can +do even better: + +When starting the timeout, calculate the timeout value and put the timeout +at the I of the list. + +Then use an C to fire when the timeout at the I of +the list is expected to fire (for example, using the technique #3). + +When there is some activity, remove the timer from the list, recalculate +the timeout, append it to the end of the list again, and make sure to +update the C if it was taken from the beginning of the list. + +This way, one can manage an unlimited number of timeouts in O(1) time for +starting, stopping and updating the timers, at the expense of a major +complication, and having to use a constant timeout. The constant timeout +ensures that the list stays sorted. + =back +So which method the best? + +Method #2 is a simple no-brain-required solution that is adequate in most +situations. Method #3 requires a bit more thinking, but handles many cases +better, and isn't very complicated either. In most case, choosing either +one is fine, with #3 being better in typical situations. + +Method #1 is almost always a bad idea, and buys you nothing. Method #4 is +rather complicated, but extremely efficient, something that really pays +off after the first million or so of active timers, i.e. it's usually +overkill :) + =head3 The special problem of time updates Establishing the current time is a costly operation (it usually takes at @@ -1854,35 +1935,38 @@ =head2 C - did the file attributes just change? This watches a file system path for attribute changes. That is, it calls -C regularly (or when the OS says it changed) and sees if it changed -compared to the last time, invoking the callback if it did. +C on that path in regular intervals (or when the OS says it changed) +and sees if it changed compared to the last time, invoking the callback if +it did. The path does not need to exist: changing from "path exists" to "path does -not exist" is a status change like any other. The condition "path does -not exist" is signified by the C field being zero (which is -otherwise always forced to be at least one) and all the other fields of -the stat buffer having unspecified contents. - -The path I be absolute and I end in a slash. If it is -relative and your working directory changes, the behaviour is undefined. - -Since there is no standard kernel interface to do this, the portable -implementation simply calls C regularly on the path to see if -it changed somehow. You can specify a recommended polling interval for -this case. If you specify a polling interval of C<0> (highly recommended!) -then a I value will be used (which -you can expect to be around five seconds, although this might change -dynamically). Libev will also impose a minimum interval which is currently -around C<0.1>, but thats usually overkill. +not exist" is a status change like any other. The condition "path does not +exist" (or more correctly "path cannot be stat'ed") is signified by the +C field being zero (which is otherwise always forced to be at +least one) and all the other fields of the stat buffer having unspecified +contents. + +The path I end in a slash or contain special components such as +C<.> or C<..>. The path I be absolute: If it is relative and +your working directory changes, then the behaviour is undefined. + +Since there is no portable change notification interface available, the +portable implementation simply calls C regularly on the path +to see if it changed somehow. You can specify a recommended polling +interval for this case. If you specify a polling interval of C<0> (highly +recommended!) then a I value will be used +(which you can expect to be around five seconds, although this might +change dynamically). Libev will also impose a minimum interval which is +currently around C<0.1>, but that's usually overkill. This watcher type is not meant for massive numbers of stat watchers, as even with OS-supported change notifications, this can be resource-intensive. At the time of this writing, the only OS-specific interface implemented -is the Linux inotify interface (implementing kqueue support is left as -an exercise for the reader. Note, however, that the author sees no way -of implementing C semantics with kqueue). +is the Linux inotify interface (implementing kqueue support is left as an +exercise for the reader. Note, however, that the author sees no way of +implementing C semantics with kqueue, except as a hint). =head3 ABI Issues (Largefile Support) @@ -1893,7 +1977,7 @@ use 64 bit file offsets the programs will fail. In that case you have to compile libev with the same flags to get binary compatibility. This is obviously the case with any flags that change the ABI, but the problem is -most noticeably disabled with ev_stat and large file support. +most noticeably displayed with ev_stat and large file support. The solution for this is to lobby your distribution maker to make large file interfaces available by default (as e.g. FreeBSD does) and not @@ -1903,28 +1987,48 @@ =head3 Inotify and Kqueue -When C support has been compiled into libev (generally -only available with Linux 2.6.25 or above due to bugs in earlier -implementations) and present at runtime, it will be used to speed up -change detection where possible. The inotify descriptor will be created -lazily when the first C watcher is being started. +When C support has been compiled into libev and present at +runtime, it will be used to speed up change detection where possible. The +inotify descriptor will be created lazily when the first C +watcher is being started. Inotify presence does not change the semantics of C watchers except that changes might be detected earlier, and in some cases, to avoid making regular C calls. Even in the presence of inotify support there are many cases where libev has to resort to regular C polling, -but as long as the path exists, libev usually gets away without polling. +but as long as kernel 2.6.25 or newer is used (2.6.24 and older have too +many bugs), the path exists (i.e. stat succeeds), and the path resides on +a local filesystem (libev currently assumes only ext2/3, jfs, reiserfs and +xfs are fully working) libev usually gets away without polling. There is no support for kqueue, as apparently it cannot be used to implement this functionality, due to the requirement of having a file descriptor open on the object at all times, and detecting renames, unlinks etc. is difficult. +=head3 C is a synchronous operation + +Libev doesn't normally do any kind of I/O itself, and so is not blocking +the process. The exception are C watchers - those call C, which is a synchronous operation. + +For local paths, this usually doesn't matter: unless the system is very +busy or the intervals between stat's are large, a stat call will be fast, +as the path data is suually in memory already (except when starting the +watcher). + +For networked file systems, calling C can block an indefinite +time due to network issues, and even under good conditions, a stat call +often takes multiple milliseconds. + +Therefore, it is best to avoid using C watchers on networked +paths, although this is fully supported by libev. + =head3 The special problem of stat time resolution -The C system call only supports full-second resolution portably, and -even on systems where the resolution is higher, most file systems still -only support whole seconds. +The C system call only supports full-second resolution portably, +and even on systems where the resolution is higher, most file systems +still only support whole seconds. That means that, if the time is the only thing that changes, you can easily miss updates: on the first update, C detects a change and @@ -2573,7 +2677,7 @@ =item ev_async_init (ev_async *, callback) Initialises and configures the async watcher - it has no parameters of any -kind. There is a C macro, but using it is utterly pointless, +kind. There is a C macro, but using it is utterly pointless, trust me. =item ev_async_send (loop, ev_async *) @@ -2897,11 +3001,19 @@ more on top of it. It can be found via gem servers. Its homepage is at L. +Roger Pack reports that using the link order C<-lws2_32 -lmsvcrt-ruby-190> +makes rev work even on mingw. + =item D Leandro Lucarella has written a D language binding (F) for libev, to be found at L. +=item Ocaml + +Erkki Seppala has written Ocaml bindings for libev, to be found at +L. + =back @@ -3011,7 +3123,7 @@ #include "ev.h" Both header files and implementation files can be compiled with a C++ -compiler (at least, thats a stated goal, and breakage will be treated +compiler (at least, that's a stated goal, and breakage will be treated as a bug). You need the following files in your source tree, or in a directory @@ -3077,15 +3189,18 @@ supported). It will also not define any of the structs usually found in F that are not directly supported by the libev core alone. +In stanbdalone mode, libev will still try to automatically deduce the +configuration, but has to be more conservative. + =item EV_USE_MONOTONIC If defined to be C<1>, libev will try to detect the availability of the -monotonic clock option at both compile time and runtime. Otherwise no use -of the monotonic clock option will be attempted. If you enable this, you -usually have to link against librt or something similar. Enabling it when -the functionality isn't available is safe, though, although you have +monotonic clock option at both compile time and runtime. Otherwise no +use of the monotonic clock option will be attempted. If you enable this, +you usually have to link against librt or something similar. Enabling it +when the functionality isn't available is safe, though, although you have to make sure you link against any libraries where the C -function is hiding in (often F<-lrt>). +function is hiding in (often F<-lrt>). See also C. =item EV_USE_REALTIME @@ -3096,6 +3211,16 @@ (CLOCK_REALTIME, ...)> and will not normally affect correctness. See the note about libraries in the description of C, though. +=item EV_USE_CLOCK_SYSCALL + +If defined to be C<1>, libev will try to use a direct syscall instead +of calling the system-provided C function. This option +exists because on GNU/Linux, C is in C, but C +unconditionally pulls in C, slowing down single-threaded +programs needlessly. Using a direct syscall is slightly slower, because +no optimised vdso implementation can be used, but avoids the pthread +dependency. Defaults to C<1> on GNU/Linux with glibc 2.x or higher. + =item EV_USE_NANOSLEEP If defined to be C<1>, libev will assume that C is available @@ -3120,11 +3245,11 @@ If defined to C<1>, then the select backend will use the system C structure. This is useful if libev doesn't compile due to a missing -C or C definition or it mis-guesses the bitset layout on -exotic systems. This usually limits the range of file descriptors to some -low limit such as 1024 or might have other limitations (winsocket only -allows 64 sockets). The C macro, set before compilation, might -influence the size of the C used. +C or C definition or it mis-guesses the bitset layout +on exotic systems. This usually limits the range of file descriptors to +some low limit such as 1024 or might have other limitations (winsocket +only allows 64 sockets). The C macro, set before compilation, +configures the maximum size of the C. =item EV_SELECT_IS_WINSOCKET @@ -3493,7 +3618,7 @@ Care has been taken to ensure that libev does not keep local state inside C, and other calls do not usually allow for coroutine switches as -they do not clal any callbacks. +they do not call any callbacks. =head2 COMPILER WARNINGS @@ -3537,7 +3662,7 @@ ==2274== still reachable: 256 bytes in 1 blocks. Then there is no memory leak, just as memory accounted to global variables -is not a memleak - the memory is still being refernced, and didn't leak. +is not a memleak - the memory is still being referenced, and didn't leak. Similarly, under some circumstances, valgrind might report kernel bugs as if it were a bug in libev (e.g. in realloc or in the poll backend, @@ -3785,5 +3910,5 @@ =head1 AUTHOR -Marc Lehmann . +Marc Lehmann , with repeated corrections by Mikael Magnusson.