--- libeio/eio.pod 2011/07/05 14:02:15 1.14 +++ libeio/eio.pod 2011/09/26 16:54:25 1.28 @@ -47,20 +47,24 @@ =head2 FORK SUPPORT -Calling C is fully supported by this module. It is implemented in these steps: - - 1. wait till all requests in "execute" state have been handled - (basically requests that are already handed over to the kernel). - 2. fork - 3. in the parent, continue business as usual, done - 4. in the child, destroy all ready and pending requests and free the - memory used by the worker threads. This gives you a fully empty - libeio queue. - -Note, however, since libeio does use threads, thr above guarantee doesn't -cover your libc, for example, malloc and other libc functions are not -fork-safe, so there is very little you can do after a fork, and in fatc, -the above might crash, and thus change. +Usage of pthreads in a program changes the semantics of fork +considerably. Specifically, only async-safe functions can be called after +fork. Libeio uses pthreads, so this applies, and makes using fork hard for +anything but relatively fork + exec uses. + +This library only works in the process that initialised it: Forking is +fully supported, but using libeio in any other process than the one that +called C is not. + +You might get around by not I libeio before (or after) forking in +the parent, and using it in the child afterwards. You could also try to +call the L function again in the child, which will brutally +reinitialise all data structures, which isn't POSIX conformant, but +typically works. + +Otherwise, the only recommendation you should follow is: treat fork code +the same way you treat signal handlers, and only ever call C in +the process that uses it, and only once ever. =head1 INITIALISATION/INTEGRATION @@ -80,6 +84,9 @@ It accepts two function pointers specifying callbacks as argument, both of which can be C<0>, in which case the callback isn't called. +There is currently no way to change these callbacks later, or to +"uninitialise" the library again. + =item want_poll callback The C callback is invoked whenever libeio wants attention (i.e. @@ -126,19 +133,66 @@ For libev, you would typically use an C watcher: the C callback would invoke C to wake up the event loop. Inside the callback set for the watcher, one would call C (followed by C again if C indicates that not -all requests have been handled yet). The race is taken care of because -libev resets/rearms the async watcher before calling your callback, -and therefore, before calling C. This might result in (some) -spurious wake-ups, but is generally harmless. +()>. + +If C is configured to not handle all results in one go +(i.e. it returns C<-1>) then you should start an idle watcher that calls +C until it returns something C. + +A full-featured connector between libeio and libev would look as follows +(if C is handling all requests, it can of course be simplified a +lot by removing the idle watcher logic): + + static struct ev_loop *loop; + static ev_idle repeat_watcher; + static ev_async ready_watcher; + + /* idle watcher callback, only used when eio_poll */ + /* didn't handle all results in one call */ + static void + repeat (EV_P_ ev_idle *w, int revents) + { + if (eio_poll () != -1) + ev_idle_stop (EV_A_ w); + } + + /* eio has some results, process them */ + static void + ready (EV_P_ ev_async *w, int revents) + { + if (eio_poll () == -1) + ev_idle_start (EV_A_ &repeat_watcher); + } + + /* wake up the event loop */ + static void + want_poll (void) + { + ev_async_send (loop, &ready_watcher) + } + + void + my_init_eio () + { + loop = EV_DEFAULT; + + ev_idle_init (&repeat_watcher, repeat); + ev_async_init (&ready_watcher, ready); + ev_async_start (loop &watcher); + + eio_init (want_poll, 0); + } For most other event loops, you would typically use a pipe - the event loop should be told to wait for read readiness on the read end. In C you would write a single byte, in C you would try to read that byte, and in the callback for the read end, you would call -C. The race is avoided here because the event loop should invoke -your callback again and again until the byte has been read (as the pipe -read callback does not read it, only C). +C. + +You don't have to take special care in the case C doesn't handle +all requests, as the done callback will not be invoked, so the event loop +will still signal readiness for the pipe until I results have been +processed. =head1 HIGH LEVEL REQUEST API @@ -181,13 +235,17 @@ =back +Memmbers not explicitly described as accessible must not be +accessed. Specifically, there is no gurantee that any members will still +have the value they had when the request was submitted. + The return value of the callback is normally C<0>, which tells libeio to continue normally. If a callback returns a nonzero value, libeio will stop processing results (in C) and will return the value to its caller. -Memory areas passed to libeio must stay valid as long as a request -executes, with the exception of paths, which are being copied +Memory areas passed to libeio wrappers must stay valid as long as a +request executes, with the exception of paths, which are being copied internally. Any memory libeio itself allocates will be freed after the finish callback has been called. If you want to manage all memory passed to libeio yourself you can use the low-level API. @@ -215,11 +273,46 @@ /* the first three arguments are passed to open(2) */ /* the remaining are priority, callback and data */ if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0)) - abort (); /* something ent wrong, we will all die!!! */ + abort (); /* something went wrong, we will all die!!! */ Note that you additionally need to call C when the C indicates that requests are ready to be processed. +=head2 CANCELLING REQUESTS + +Sometimes the need for a request goes away before the request is +finished. In that case, one can cancel the request by a call to +C: + +=over 4 + +=item eio_cancel (eio_req *req) + +Cancel the request (and all its subrequests). If the request is currently +executing it might still continue to execute, and in other cases it might +still take a while till the request is cancelled. + +Even if cancelled, the finish callback will still be invoked - the +callbacks of all cancellable requests need to check whether the request +has been cancelled by calling C: + + static int + my_eio_cb (eio_req *req) + { + if (EIO_CANCELLED (req)) + return 0; + } + +In addition, cancelled requests will I have C<< req->result >> +set to C<-1> and C to C, or I they were +successfully executed, despite being cancelled (e.g. when they have +already been executed at the time they were cancelled). + +C is still true for requests that have successfully +executed, as long as C was called on them at some point. + +=back + =head2 AVAILABLE REQUESTS The following request functions are available. I of them return the @@ -324,9 +417,10 @@ =item eio_realpath (const char *path, int pri, eio_cb cb, void *data) -Similar to the realpath libc function, but unlike that one, result is -C<-1> on failure and the length of the returned path in C (which is -not 0-terminated) - this is similar to readlink. +Similar to the realpath libc function, but unlike that one, C<< +req->result >> is C<-1> on failure. On success, the result is the length +of the returned path in C (which is I 0-terminated) - this is +similar to readlink. =item eio_stat (const char *path, int pri, eio_cb cb, void *data) @@ -337,7 +431,7 @@ Stats a file - if C<< req->result >> indicates success, then you can access the C-like structure via C<< req->ptr2 >>: - EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2; + EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2; =item eio_statvfs (const char *path, int pri, eio_cb cb, void *data) @@ -346,15 +440,15 @@ Stats a filesystem - if C<< req->result >> indicates success, then you can access the C-like structure via C<< req->ptr2 >>: - EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2; + EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2; =back =head3 READING DIRECTORIES Reading directories sounds simple, but can be rather demanding, especially -if you want to do stuff such as traversing a diretcory hierarchy or -processing all files in a directory. Libeio can assist thess complex tasks +if you want to do stuff such as traversing a directory hierarchy or +processing all files in a directory. Libeio can assist these complex tasks with it's C call. =over 4 @@ -396,14 +490,14 @@ also an array of C is returned, in C. A C looks like this: - struct eio_dirent - { - int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */ - unsigned short namelen; /* size of filename without trailing 0 */ - unsigned char type; /* one of EIO_DT_* */ - signed char score; /* internal use */ - ino_t inode; /* the inode number, if available, otherwise unspecified */ - }; + struct eio_dirent + { + int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */ + unsigned short namelen; /* size of filename without trailing 0 */ + unsigned char type; /* one of EIO_DT_* */ + signed char score; /* internal use */ + ino_t inode; /* the inode number, if available, otherwise unspecified */ + }; The only members you normally would access are C, which is the byte-offset from C to the start of the name, C and C. @@ -456,15 +550,15 @@ all files in the given directory, then the returned order will likely be fastest. -If both this flag and C are specified, then -the likely dirs come first, resulting in a less optimal stat order. +If both this flag and C are specified, then the +likely directories come first, resulting in a less optimal stat order. =item EIO_READDIR_FOUND_UNKNOWN This flag should not be specified when calling C. Instead, it is being set by C (you can access the C via C<< req->int1 >>, when any of the C's found were C. The -absense of this flag therefore indicates that all C's are known, +absence of this flag therefore indicates that all C's are known, which can be used to speed up some algorithms. A typical use case would be to identify all subdirectories within a @@ -504,6 +598,13 @@ Calls C. If the syscall is missing, then the call is emulated by simply reading the data (currently in 64kiB chunks). +=item eio_syncfs (int fd, int pri, eio_cb cb, void *data) + +Calls Linux' C syscall, if available. Returns C<-1> and sets +C to C if the call is missing I, +if the C is C<< >= 0 >>, so you can probe for the availability of the +syscall with a negative C argument and checking for C<-1/ENOSYS>. + =item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data) Calls C. If the syscall is missing, then this is the same @@ -512,6 +613,15 @@ Flags can be any combination of C, C and C. +=item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data) + +Calls C (note: I C!). If the syscall is +missing, then it returns failure and sets C to C. + +The C argument can be C<0> (for behaviour similar to +C), or C, which keeps the size +of the file unchanged (but still preallocates space beyond end of file). + =back =head3 LIBEIO-SPECIFIC REQUESTS @@ -562,7 +672,7 @@ =item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data) -This is a a request that takes C seconds to execute, but otherwise +This is a request that takes C seconds to execute, but otherwise does nothing - it simply puts one of the worker threads to sleep for this long. @@ -588,29 +698,96 @@ cancel the whole request with its subrequests and b) limiting the number of "active" requests. -Further below you will find more dicussion of these topics - first follows -the reference section detailing the request generator and other methods. +Further below you will find more discussion of these topics - first +follows the reference section detailing the request generator and other +methods. =over 4 -=item eio_grp (eio_cb cb, void *data) +=item eio_req *grp = eio_grp (eio_cb cb, void *data) + +Creates, submits and returns a group request. Note that it doesn't have a +priority, unlike all other requests. + +=item eio_grp_add (eio_req *grp, eio_req *req) -Creates and submits a group request. +Adds a request to the request group. + +=item eio_grp_cancel (eio_req *grp) + +Cancels all requests I the group, but I the group request +itself. You can cancel the group request I all subrequests via a +normal C call. =back +=head4 GROUP REQUEST LIFETIME +Left alone, a group request will instantly move to the pending state and +will be finished at the next call of C. -#TODO +The usefulness stems from the fact that, if a subrequest is added to a +group I a call to C, via C, then the group +will not finish until all the subrequests have finished. + +So the usage cycle of a group request is like this: after it is created, +you normally instantly add a subrequest. If none is added, the group +request will finish on it's own. As long as subrequests are added before +the group request is finished it will be kept from finishing, that is the +callbacks of any subrequests can, in turn, add more requests to the group, +and as long as any requests are active, the group request itself will not +finish. + +=head4 CREATING COMPOSITE REQUESTS + +Imagine you wanted to create an C request that opens a file, +reads it and closes it. This means it has to execute at least three eio +requests, but for various reasons it might be nice if that request looked +like any other eio request. + +This can be done with groups: + +=over 4 + +=item 1) create the request object + +Create a group that contains all further requests. This is the request you +can return as "the load request". + +=item 2) open the file, maybe -/*****************************************************************************/ -/* groups */ +Next, open the file with C and add the request to the group +request and you are finished setting up the request. + +If, for some reason, you cannot C (path is a null ptr?) you +can set C<< grp->result >> to C<-1> to signal an error and let the group +request finish on its own. + +=item 3) open callback adds more requests + +In the open callback, if the open was not successful, copy C<< +req->errorno >> to C<< grp->errorno >> and set C<< grp->errorno >> to +C<-1> to signal an error. + +Otherwise, malloc some memory or so and issue a read request, adding the +read request to the group. + +=item 4) continue issuing requests till finished + +In the real callback, check for errors and possibly continue with +C or any other eio request in the same way. + +As soon as no new requests are added the group request will finish. Make +sure you I set C<< grp->result >> to some sensible value. + +=back + +=head4 REQUEST LIMITING + + +#TODO -eio_req *eio_grp (eio_cb cb, void *data); -void eio_grp_feed (eio_req *grp, void (*feed)(eio_req *req), int limit); void eio_grp_limit (eio_req *grp, int limit); -void eio_grp_add (eio_req *grp, eio_req *req); -void eio_grp_cancel (eio_req *grp); /* cancels all sub requests but not the group */ =back @@ -626,13 +803,13 @@ A request is represented by a structure of type C. To initialise it, clear it to all zero bytes: - eio_req req; + eio_req req; - memset (&req, 0, sizeof (req)); + memset (&req, 0, sizeof (req)); A more common way to initialise a new C is to use C: - eio_req *req = calloc (1, sizeof (*req)); + eio_req *req = calloc (1, sizeof (*req)); In either case, libeio neither allocates, initialises or frees the C structure for you - it merely uses it. @@ -660,14 +837,18 @@ Note that: -a) libeio doesn't know how long your request callbacks take, so the time -spent in C is up to one callback invocation longer then this -interval. +=over 4 + +=item a) libeio doesn't know how long your request callbacks take, so the +time spent in C is up to one callback invocation longer then +this interval. -b) this is implemented by calling C after each request, -which can be costly. +=item b) this is implemented by calling C after each +request, which can be costly. -c) at least one request will be handled. +=item c) at least one request will be handled. + +=back =item eio_set_max_poll_reqs (unsigned int nreqs) @@ -747,7 +928,7 @@ requests, you might want to increase this. If this symbol is undefined (the default) then libeio will use its default -stack size (C currently). If it is defined, but +stack size (C currently). If it is defined, but C<0>, then the default operating system stack size will be used. In all other cases, the value must be an expression that evaluates to the desired stack size.