ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/libeio/eio.pod
Revision: 1.36
Committed: Sun Jan 24 16:36:20 2016 UTC (8 years, 3 months ago) by root
Branch: MAIN
CVS Tags: rel-4_4, rel-4_5, rel-4_6, rel-4_7, rel-4_81, rel-4_80, rel-4_52, rel-4_53, rel-4_51, rel-4_78, rel-4_79, rel-4_54, rel-4_74, rel-4_75, rel-4_76, rel-4_77, rel-4_71, rel-4_72, rel-4_73, rel-4_34, HEAD
Changes since 1.35: +7 -2 lines
Log Message:
*** empty log message ***

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     libeio - truly asynchronous POSIX I/O
4    
5     =head1 SYNOPSIS
6    
7     #include <eio.h>
8    
9     =head1 DESCRIPTION
10    
11     The newest version of this document is also available as an html-formatted
12     web page you might find easier to navigate when reading it for the first
13     time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>.
14    
15     Note that this library is a by-product of the C<IO::AIO> perl
16 sf-exg 1.6 module, and many of the subtler points regarding requests lifetime
17 root 1.1 and so on are only documented in its documentation at the
18     moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>.
19    
20     =head2 FEATURES
21    
22     This library provides fully asynchronous versions of most POSIX functions
23 sf-exg 1.6 dealing with I/O. Unlike most asynchronous libraries, this not only
24 root 1.1 includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and
25     similar functions, as well as less rarely ones such as C<mknod>, C<futime>
26     or C<readlink>.
27    
28     It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and
29     FreeBSD, with emulation on other platforms) and C<readahead> (Linux, with
30 root 1.33 emulation elsewhere).
31 root 1.1
32 root 1.5 The goal is to enable you to write fully non-blocking programs. For
33 root 1.1 example, in a game server, you would not want to freeze for a few seconds
34     just because the server is running a backup and you happen to call
35     C<readdir>.
36    
37     =head2 TIME REPRESENTATION
38    
39     Libeio represents time as a single floating point number, representing the
40     (fractional) number of seconds since the (POSIX) epoch (somewhere near
41     the beginning of 1970, details are complicated, don't ask). This type is
42 sf-exg 1.6 called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or
43 root 1.1 better), so you can freely use C<double> yourself.
44    
45     Unlike the name component C<stamp> might indicate, it is also used for
46     time differences throughout libeio.
47    
48     =head2 FORK SUPPORT
49    
50 root 1.26 Usage of pthreads in a program changes the semantics of fork
51     considerably. Specifically, only async-safe functions can be called after
52     fork. Libeio uses pthreads, so this applies, and makes using fork hard for
53     anything but relatively fork + exec uses.
54    
55     This library only works in the process that initialised it: Forking is
56     fully supported, but using libeio in any other process than the one that
57     called C<eio_init> is not.
58    
59     You might get around by not I<using> libeio before (or after) forking in
60     the parent, and using it in the child afterwards. You could also try to
61     call the L<eio_init> function again in the child, which will brutally
62     reinitialise all data structures, which isn't POSIX conformant, but
63     typically works.
64    
65     Otherwise, the only recommendation you should follow is: treat fork code
66     the same way you treat signal handlers, and only ever call C<eio_init> in
67     the process that uses it, and only once ever.
68 root 1.7
69 root 1.1 =head1 INITIALISATION/INTEGRATION
70    
71     Before you can call any eio functions you first have to initialise the
72     library. The library integrates into any event loop, but can also be used
73     without one, including in polling mode.
74    
75     You have to provide the necessary glue yourself, however.
76    
77     =over 4
78    
79     =item int eio_init (void (*want_poll)(void), void (*done_poll)(void))
80    
81     This function initialises the library. On success it returns C<0>, on
82     failure it returns C<-1> and sets C<errno> appropriately.
83    
84     It accepts two function pointers specifying callbacks as argument, both of
85     which can be C<0>, in which case the callback isn't called.
86    
87 root 1.26 There is currently no way to change these callbacks later, or to
88     "uninitialise" the library again.
89    
90 root 1.1 =item want_poll callback
91    
92     The C<want_poll> callback is invoked whenever libeio wants attention (i.e.
93     it wants to be polled by calling C<eio_poll>). It is "edge-triggered",
94     that is, it will only be called once when eio wants attention, until all
95     pending requests have been handled.
96    
97     This callback is called while locks are being held, so I<you must
98     not call any libeio functions inside this callback>. That includes
99     C<eio_poll>. What you should do is notify some other thread, or wake up
100     your event loop, and then call C<eio_poll>.
101    
102     =item done_poll callback
103    
104     This callback is invoked when libeio detects that all pending requests
105     have been handled. It is "edge-triggered", that is, it will only be
106     called once after C<want_poll>. To put it differently, C<want_poll> and
107     C<done_poll> are invoked in pairs: after C<want_poll> you have to call
108     C<eio_poll ()> until either C<eio_poll> indicates that everything has been
109 root 1.36 handled or C<done_poll> has been called, which signals the same - only one
110     method is needed.
111 root 1.1
112     Note that C<eio_poll> might return after C<done_poll> and C<want_poll>
113     have been called again, so watch out for races in your code.
114    
115 root 1.36 It is quite common to have an empty C<done_call> callback and only use
116     the return value from C<eio_poll>, or, when C<eio_poll> is configured to
117     handle all outstanding replies, it's enough to call C<eio_poll> once.
118    
119 sf-exg 1.6 As with C<want_poll>, this callback is called while locks are being held,
120 root 1.36 so you I<must not call any libeio functions from within this callback>.
121 root 1.1
122     =item int eio_poll ()
123    
124     This function has to be called whenever there are pending requests that
125     need finishing. You usually call this after C<want_poll> has indicated
126     that you should do so, but you can also call this function regularly to
127     poll for new results.
128    
129     If any request invocation returns a non-zero value, then C<eio_poll ()>
130     immediately returns with that value as return value.
131    
132     Otherwise, if all requests could be handled, it returns C<0>. If for some
133     reason not all requests have been handled, i.e. some are still pending, it
134     returns C<-1>.
135    
136     =back
137    
138     For libev, you would typically use an C<ev_async> watcher: the
139     C<want_poll> callback would invoke C<ev_async_send> to wake up the event
140     loop. Inside the callback set for the watcher, one would call C<eio_poll
141 root 1.15 ()>.
142    
143     If C<eio_poll ()> is configured to not handle all results in one go
144     (i.e. it returns C<-1>) then you should start an idle watcher that calls
145     C<eio_poll> until it returns something C<!= -1>.
146    
147 sf-exg 1.20 A full-featured connector between libeio and libev would look as follows
148 root 1.16 (if C<eio_poll> is handling all requests, it can of course be simplified a
149     lot by removing the idle watcher logic):
150 root 1.15
151 root 1.17 static struct ev_loop *loop;
152     static ev_idle repeat_watcher;
153     static ev_async ready_watcher;
154 root 1.15
155 root 1.17 /* idle watcher callback, only used when eio_poll */
156     /* didn't handle all results in one call */
157     static void
158     repeat (EV_P_ ev_idle *w, int revents)
159     {
160     if (eio_poll () != -1)
161     ev_idle_stop (EV_A_ w);
162     }
163    
164     /* eio has some results, process them */
165     static void
166     ready (EV_P_ ev_async *w, int revents)
167     {
168     if (eio_poll () == -1)
169     ev_idle_start (EV_A_ &repeat_watcher);
170     }
171    
172     /* wake up the event loop */
173     static void
174     want_poll (void)
175     {
176     ev_async_send (loop, &ready_watcher)
177     }
178    
179     void
180     my_init_eio ()
181     {
182     loop = EV_DEFAULT;
183    
184     ev_idle_init (&repeat_watcher, repeat);
185     ev_async_init (&ready_watcher, ready);
186 root 1.34 ev_async_start (loop, &watcher);
187 root 1.17
188     eio_init (want_poll, 0);
189     }
190 root 1.1
191     For most other event loops, you would typically use a pipe - the event
192 sf-exg 1.6 loop should be told to wait for read readiness on the read end. In
193 root 1.1 C<want_poll> you would write a single byte, in C<done_poll> you would try
194     to read that byte, and in the callback for the read end, you would call
195 root 1.16 C<eio_poll>.
196    
197     You don't have to take special care in the case C<eio_poll> doesn't handle
198     all requests, as the done callback will not be invoked, so the event loop
199 root 1.18 will still signal readiness for the pipe until I<all> results have been
200 root 1.16 processed.
201 root 1.1
202    
203 root 1.7 =head1 HIGH LEVEL REQUEST API
204    
205     Libeio has both a high-level API, which consists of calling a request
206     function with a callback to be called on completion, and a low-level API
207     where you fill out request structures and submit them.
208    
209     This section describes the high-level API.
210    
211     =head2 REQUEST SUBMISSION AND RESULT PROCESSING
212    
213     You submit a request by calling the relevant C<eio_TYPE> function with the
214     required parameters, a callback of type C<int (*eio_cb)(eio_req *req)>
215     (called C<eio_cb> below) and a freely usable C<void *data> argument.
216    
217 root 1.12 The return value will either be 0, in case something went really wrong
218     (which can basically only happen on very fatal errors, such as C<malloc>
219     returning 0, which is rather unlikely), or a pointer to the newly-created
220     and submitted C<eio_req *>.
221 root 1.7
222     The callback will be called with an C<eio_req *> which contains the
223     results of the request. The members you can access inside that structure
224     vary from request to request, except for:
225    
226     =over 4
227    
228     =item C<ssize_t result>
229    
230     This contains the result value from the call (usually the same as the
231     syscall of the same name).
232    
233     =item C<int errorno>
234    
235     This contains the value of C<errno> after the call.
236    
237     =item C<void *data>
238    
239     The C<void *data> member simply stores the value of the C<data> argument.
240    
241     =back
242    
243 sf-exg 1.29 Members not explicitly described as accessible must not be
244     accessed. Specifically, there is no guarantee that any members will still
245 root 1.28 have the value they had when the request was submitted.
246    
247 root 1.7 The return value of the callback is normally C<0>, which tells libeio to
248     continue normally. If a callback returns a nonzero value, libeio will
249     stop processing results (in C<eio_poll>) and will return the value to its
250     caller.
251    
252 root 1.28 Memory areas passed to libeio wrappers must stay valid as long as a
253     request executes, with the exception of paths, which are being copied
254 root 1.7 internally. Any memory libeio itself allocates will be freed after the
255     finish callback has been called. If you want to manage all memory passed
256     to libeio yourself you can use the low-level API.
257    
258     For example, to open a file, you could do this:
259    
260     static int
261     file_open_done (eio_req *req)
262     {
263     if (req->result < 0)
264     {
265     /* open() returned -1 */
266     errno = req->errorno;
267     perror ("open");
268     }
269     else
270     {
271     int fd = req->result;
272     /* now we have the new fd in fd */
273     }
274    
275     return 0;
276     }
277    
278     /* the first three arguments are passed to open(2) */
279     /* the remaining are priority, callback and data */
280     if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0))
281 root 1.18 abort (); /* something went wrong, we will all die!!! */
282 root 1.7
283     Note that you additionally need to call C<eio_poll> when the C<want_cb>
284     indicates that requests are ready to be processed.
285    
286 root 1.17 =head2 CANCELLING REQUESTS
287    
288     Sometimes the need for a request goes away before the request is
289 root 1.18 finished. In that case, one can cancel the request by a call to
290 root 1.17 C<eio_cancel>:
291    
292     =over 4
293    
294     =item eio_cancel (eio_req *req)
295    
296 root 1.19 Cancel the request (and all its subrequests). If the request is currently
297 root 1.18 executing it might still continue to execute, and in other cases it might
298     still take a while till the request is cancelled.
299 root 1.17
300 root 1.35 When cancelled, the finish callback will not be invoked.
301 root 1.18
302     C<EIO_CANCELLED> is still true for requests that have successfully
303     executed, as long as C<eio_cancel> was called on them at some point.
304 root 1.17
305     =back
306    
307 root 1.7 =head2 AVAILABLE REQUESTS
308    
309     The following request functions are available. I<All> of them return the
310     C<eio_req *> on success and C<0> on failure, and I<all> of them have the
311     same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is
312     mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some
313     custom data value as C<data>.
314    
315     =head3 POSIX API WRAPPERS
316    
317     These requests simply wrap the POSIX call of the same name, with the same
318 root 1.11 arguments. If a function is not implemented by the OS and cannot be emulated
319 root 1.10 in some way, then all of these return C<-1> and set C<errorno> to C<ENOSYS>.
320 root 1.7
321     =over 4
322    
323     =item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data)
324    
325     =item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data)
326    
327     =item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
328    
329     =item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
330    
331     =item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
332    
333     =item eio_rmdir (const char *path, int pri, eio_cb cb, void *data)
334    
335     =item eio_unlink (const char *path, int pri, eio_cb cb, void *data)
336    
337 root 1.10 =item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
338 root 1.7
339     =item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data)
340    
341     =item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
342    
343     =item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
344    
345     =item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
346    
347     =item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data)
348    
349     =item eio_close (int fd, int pri, eio_cb cb, void *data)
350    
351     =item eio_sync (int pri, eio_cb cb, void *data)
352    
353     =item eio_fsync (int fd, int pri, eio_cb cb, void *data)
354    
355     =item eio_fdatasync (int fd, int pri, eio_cb cb, void *data)
356    
357     =item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
358    
359     =item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data)
360    
361     =item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data)
362    
363     =item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
364    
365     =item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data)
366    
367     These have the same semantics as the syscall of the same name, their
368     return value is available as C<< req->result >> later.
369    
370     =item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
371    
372     =item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
373    
374     These two requests are called C<read> and C<write>, but actually wrap
375     C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin),
376     libeio uses lseek/read_or_write/lseek and a mutex to serialise the
377     requests, so all these requests run serially and do not disturb each
378     other. However, they still disturb the file offset while they run, so it's
379     not safe to call these functions concurrently with non-libeio functions on
380     the same fd on these systems.
381    
382     Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X),
383     so it is advised not to submit multiple requests on the same fd on this
384     horrible pile of garbage.
385    
386 root 1.10 =item eio_mlockall (int flags, int pri, eio_cb cb, void *data)
387    
388     Like C<mlockall>, but the flag value constants are called
389     C<EIO_MCL_CURRENT> and C<EIO_MCL_FUTURE>.
390    
391     =item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
392    
393     Just like msync, except that the flag values are called C<EIO_MS_ASYNC>,
394     C<EIO_MS_INVALIDATE> and C<EIO_MS_SYNC>.
395    
396     =item eio_readlink (const char *path, int pri, eio_cb cb, void *data)
397    
398     If successful, the path read by C<readlink(2)> can be accessed via C<<
399     req->ptr2 >> and is I<NOT> null-terminated, with the length specified as
400     C<< req->result >>.
401    
402     if (req->result >= 0)
403     {
404     char *target = strndup ((char *)req->ptr2, req->result);
405    
406     free (target);
407     }
408    
409 root 1.13 =item eio_realpath (const char *path, int pri, eio_cb cb, void *data)
410    
411 root 1.22 Similar to the realpath libc function, but unlike that one, C<<
412     req->result >> is C<-1> on failure. On success, the result is the length
413     of the returned path in C<ptr2> (which is I<NOT> 0-terminated) - this is
414     similar to readlink.
415 root 1.13
416 root 1.10 =item eio_stat (const char *path, int pri, eio_cb cb, void *data)
417    
418     =item eio_lstat (const char *path, int pri, eio_cb cb, void *data)
419    
420 root 1.7 =item eio_fstat (int fd, int pri, eio_cb cb, void *data)
421    
422     Stats a file - if C<< req->result >> indicates success, then you can
423     access the C<struct stat>-like structure via C<< req->ptr2 >>:
424    
425 root 1.17 EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2;
426 root 1.7
427 root 1.10 =item eio_statvfs (const char *path, int pri, eio_cb cb, void *data)
428    
429     =item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data)
430 root 1.7
431     Stats a filesystem - if C<< req->result >> indicates success, then you can
432     access the C<struct statvfs>-like structure via C<< req->ptr2 >>:
433    
434 root 1.17 EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2;
435 root 1.7
436     =back
437    
438     =head3 READING DIRECTORIES
439    
440     Reading directories sounds simple, but can be rather demanding, especially
441 root 1.18 if you want to do stuff such as traversing a directory hierarchy or
442     processing all files in a directory. Libeio can assist these complex tasks
443 root 1.7 with it's C<eio_readdir> call.
444    
445     =over 4
446    
447     =item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data)
448    
449     This is a very complex call. It basically reads through a whole directory
450     (via the C<opendir>, C<readdir> and C<closedir> calls) and returns either
451     the names or an array of C<struct eio_dirent>, depending on the C<flags>
452     argument.
453    
454     The C<< req->result >> indicates either the number of files found, or
455 root 1.10 C<-1> on error. On success, null-terminated names can be found as C<< req->ptr2 >>,
456 root 1.7 and C<struct eio_dirents>, if requested by C<flags>, can be found via C<<
457     req->ptr1 >>.
458    
459     Here is an example that prints all the names:
460    
461     int i;
462     char *names = (char *)req->ptr2;
463    
464     for (i = 0; i < req->result; ++i)
465     {
466     printf ("name #%d: %s\n", i, names);
467    
468     /* move to next name */
469     names += strlen (names) + 1;
470     }
471    
472     Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>.
473    
474     C<flags> can be any combination of:
475    
476     =over 4
477    
478     =item EIO_READDIR_DENTS
479    
480     If this flag is specified, then, in addition to the names in C<ptr2>,
481     also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct
482     eio_dirent> looks like this:
483    
484 root 1.17 struct eio_dirent
485     {
486     int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */
487     unsigned short namelen; /* size of filename without trailing 0 */
488     unsigned char type; /* one of EIO_DT_* */
489     signed char score; /* internal use */
490     ino_t inode; /* the inode number, if available, otherwise unspecified */
491     };
492 root 1.7
493     The only members you normally would access are C<nameofs>, which is the
494     byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>.
495    
496     C<type> can be one of:
497    
498     C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat>
499     the name yourself if you need to know,
500     one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>,
501     C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>)
502     or some OS-specific type (currently
503     C<EIO_DT_MPC> - multiplexed char device (v7+coherent),
504     C<EIO_DT_NAM> - xenix special named file,
505     C<EIO_DT_MPB> - multiplexed block device (v7+coherent),
506     C<EIO_DT_NWK> - HP-UX network special,
507     C<EIO_DT_CMP> - VxFS compressed,
508     C<EIO_DT_DOOR> - solaris door, or
509     C<EIO_DT_WHT>).
510    
511     This example prints all names and their type:
512    
513     int i;
514     struct eio_dirent *ents = (struct eio_dirent *)req->ptr1;
515     char *names = (char *)req->ptr2;
516    
517     for (i = 0; i < req->result; ++i)
518     {
519     struct eio_dirent *ent = ents + i;
520     char *name = names + ent->nameofs;
521    
522     printf ("name #%d: %s (type %d)\n", i, name, ent->type);
523     }
524    
525     =item EIO_READDIR_DIRS_FIRST
526    
527     When this flag is specified, then the names will be returned in an order
528     where likely directories come first, in optimal C<stat> order. This is
529     useful when you need to quickly find directories, or you want to find all
530     directories while avoiding to stat() each entry.
531    
532     If the system returns type information in readdir, then this is used
533     to find directories directly. Otherwise, likely directories are names
534     beginning with ".", or otherwise names with no dots, of which names with
535     short names are tried first.
536    
537     =item EIO_READDIR_STAT_ORDER
538    
539     When this flag is specified, then the names will be returned in an order
540     suitable for stat()'ing each one. That is, when you plan to stat()
541     all files in the given directory, then the returned order will likely
542     be fastest.
543    
544 root 1.18 If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then the
545     likely directories come first, resulting in a less optimal stat order.
546 root 1.7
547     =item EIO_READDIR_FOUND_UNKNOWN
548    
549     This flag should not be specified when calling C<eio_readdir>. Instead,
550     it is being set by C<eio_readdir> (you can access the C<flags> via C<<
551     req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The
552 root 1.18 absence of this flag therefore indicates that all C<type>'s are known,
553 root 1.7 which can be used to speed up some algorithms.
554    
555     A typical use case would be to identify all subdirectories within a
556     directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If
557     then this flag is I<NOT> set, then all the entries at the beginning of the
558     returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you
559     should start C<stat()>'ing the entries starting at the beginning of the
560     array, stopping as soon as you found all directories (the count can be
561     deduced by the link count of the directory).
562    
563     =back
564    
565     =back
566    
567     =head3 OS-SPECIFIC CALL WRAPPERS
568    
569     These wrap OS-specific calls (usually Linux ones), and might or might not
570     be emulated on other operating systems. Calls that are not emulated will
571     return C<-1> and set C<errno> to C<ENOSYS>.
572    
573     =over 4
574    
575     =item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data)
576    
577     Wraps the C<sendfile> syscall. The arguments follow the Linux version, but
578     libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and
579     Darwin.
580    
581     If the OS doesn't support some sendfile-like call, or the call fails,
582     indicating support for the given file descriptor type (for example,
583     Linux's sendfile might not support file to file copies), then libeio will
584     emulate the call in userspace, so there are almost no limitations on its
585     use.
586    
587     =item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data)
588    
589     Calls C<readahead(2)>. If the syscall is missing, then the call is
590     emulated by simply reading the data (currently in 64kiB chunks).
591    
592 root 1.27 =item eio_syncfs (int fd, int pri, eio_cb cb, void *data)
593    
594     Calls Linux' C<syncfs> syscall, if available. Returns C<-1> and sets
595     C<errno> to C<ENOSYS> if the call is missing I<but still calls sync()>,
596     if the C<fd> is C<< >= 0 >>, so you can probe for the availability of the
597     syscall with a negative C<fd> argument and checking for C<-1/ENOSYS>.
598    
599 root 1.7 =item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data)
600    
601     Calls C<sync_file_range>. If the syscall is missing, then this is the same
602     as calling C<fdatasync>.
603    
604 root 1.10 Flags can be any combination of C<EIO_SYNC_FILE_RANGE_WAIT_BEFORE>,
605     C<EIO_SYNC_FILE_RANGE_WRITE> and C<EIO_SYNC_FILE_RANGE_WAIT_AFTER>.
606    
607 root 1.21 =item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data)
608    
609     Calls C<fallocate> (note: I<NOT> C<posix_fallocate>!). If the syscall is
610     missing, then it returns failure and sets C<errno> to C<ENOSYS>.
611    
612     The C<mode> argument can be C<0> (for behaviour similar to
613     C<posix_fallocate>), or C<EIO_FALLOC_FL_KEEP_SIZE>, which keeps the size
614     of the file unchanged (but still preallocates space beyond end of file).
615    
616 root 1.7 =back
617    
618     =head3 LIBEIO-SPECIFIC REQUESTS
619    
620     These requests are specific to libeio and do not correspond to any OS call.
621    
622     =over 4
623    
624 root 1.9 =item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
625 root 1.7
626 root 1.31 Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY>) the given
627 root 1.9 memory area, page-wise, that is, it reads (or reads and writes back) the
628     first octet of every page that spans the memory area.
629    
630     This can be used to page in some mmapped file, or dirty some pages. Note
631     that dirtying is an unlocked read-write access, so races can ensue when
632     the some other thread modifies the data stored in that memory area.
633    
634     =item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data)
635 root 1.7
636     Executes a custom request, i.e., a user-specified callback.
637    
638     The callback gets the C<eio_req *> as parameter and is expected to read
639     and modify any request-specific members. Specifically, it should set C<<
640     req->result >> to the result value, just like other requests.
641    
642     Here is an example that simply calls C<open>, like C<eio_open>, but it
643     uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If
644     you want to pass more/other parameters, you either need to pass some
645     struct or so via C<data> or provide your own wrapper using the low-level
646     API.
647    
648     static int
649     my_open_done (eio_req *req)
650     {
651     int fd = req->result;
652    
653     return 0;
654     }
655    
656     static void
657     my_open (eio_req *req)
658     {
659     req->result = open (req->data, O_RDONLY);
660     }
661    
662     eio_custom (my_open, 0, my_open_done, "/etc/passwd");
663    
664 root 1.9 =item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data)
665 root 1.7
666 root 1.18 This is a request that takes C<delay> seconds to execute, but otherwise
667 root 1.7 does nothing - it simply puts one of the worker threads to sleep for this
668     long.
669    
670     This request can be used to artificially increase load, e.g. for debugging
671     or benchmarking reasons.
672    
673 root 1.9 =item eio_nop (int pri, eio_cb cb, void *data)
674 root 1.7
675     This request does nothing, except go through the whole request cycle. This
676     can be used to measure latency or in some cases to simplify code, but is
677     not really of much use.
678    
679     =back
680    
681     =head3 GROUPING AND LIMITING REQUESTS
682 root 1.1
683 root 1.12 There is one more rather special request, C<eio_grp>. It is a very special
684     aio request: Instead of doing something, it is a container for other eio
685     requests.
686    
687     There are two primary use cases for this: a) bundle many requests into a
688     single, composite, request with a definite callback and the ability to
689     cancel the whole request with its subrequests and b) limiting the number
690     of "active" requests.
691    
692 root 1.18 Further below you will find more discussion of these topics - first
693     follows the reference section detailing the request generator and other
694     methods.
695 root 1.12
696     =over 4
697    
698 root 1.17 =item eio_req *grp = eio_grp (eio_cb cb, void *data)
699    
700 root 1.23 Creates, submits and returns a group request. Note that it doesn't have a
701     priority, unlike all other requests.
702 root 1.17
703     =item eio_grp_add (eio_req *grp, eio_req *req)
704    
705     Adds a request to the request group.
706    
707     =item eio_grp_cancel (eio_req *grp)
708    
709     Cancels all requests I<in> the group, but I<not> the group request
710 root 1.23 itself. You can cancel the group request I<and> all subrequests via a
711     normal C<eio_cancel> call.
712 root 1.17
713 root 1.23 =back
714    
715     =head4 GROUP REQUEST LIFETIME
716    
717     Left alone, a group request will instantly move to the pending state and
718     will be finished at the next call of C<eio_poll>.
719    
720 sf-exg 1.24 The usefulness stems from the fact that, if a subrequest is added to a
721 root 1.23 group I<before> a call to C<eio_poll>, via C<eio_grp_add>, then the group
722     will not finish until all the subrequests have finished.
723    
724     So the usage cycle of a group request is like this: after it is created,
725     you normally instantly add a subrequest. If none is added, the group
726     request will finish on it's own. As long as subrequests are added before
727     the group request is finished it will be kept from finishing, that is the
728     callbacks of any subrequests can, in turn, add more requests to the group,
729     and as long as any requests are active, the group request itself will not
730     finish.
731    
732     =head4 CREATING COMPOSITE REQUESTS
733    
734     Imagine you wanted to create an C<eio_load> request that opens a file,
735     reads it and closes it. This means it has to execute at least three eio
736     requests, but for various reasons it might be nice if that request looked
737     like any other eio request.
738    
739     This can be done with groups:
740    
741     =over 4
742    
743     =item 1) create the request object
744    
745     Create a group that contains all further requests. This is the request you
746     can return as "the load request".
747 root 1.12
748 root 1.23 =item 2) open the file, maybe
749    
750     Next, open the file with C<eio_open> and add the request to the group
751 sf-exg 1.24 request and you are finished setting up the request.
752 root 1.23
753     If, for some reason, you cannot C<eio_open> (path is a null ptr?) you
754 sf-exg 1.24 can set C<< grp->result >> to C<-1> to signal an error and let the group
755 root 1.23 request finish on its own.
756    
757     =item 3) open callback adds more requests
758    
759     In the open callback, if the open was not successful, copy C<<
760 root 1.30 req->errorno >> to C<< grp->errorno >> and set C<< grp->result >> to
761 root 1.23 C<-1> to signal an error.
762    
763     Otherwise, malloc some memory or so and issue a read request, adding the
764     read request to the group.
765    
766 sf-exg 1.24 =item 4) continue issuing requests till finished
767 root 1.23
768 root 1.30 In the read callback, check for errors and possibly continue with
769 root 1.23 C<eio_close> or any other eio request in the same way.
770    
771 root 1.30 As soon as no new requests are added, the group request will finish. Make
772 root 1.23 sure you I<always> set C<< grp->result >> to some sensible value.
773 root 1.12
774     =back
775    
776 root 1.23 =head4 REQUEST LIMITING
777 root 1.12
778    
779 root 1.1 #TODO
780    
781 root 1.7 void eio_grp_limit (eio_req *grp, int limit);
782 root 1.1
783    
784    
785     =head1 LOW LEVEL REQUEST API
786    
787     #TODO
788    
789 root 1.7
790     =head1 ANATOMY AND LIFETIME OF AN EIO REQUEST
791    
792     A request is represented by a structure of type C<eio_req>. To initialise
793     it, clear it to all zero bytes:
794    
795 root 1.17 eio_req req;
796 root 1.7
797 root 1.17 memset (&req, 0, sizeof (req));
798 root 1.7
799     A more common way to initialise a new C<eio_req> is to use C<calloc>:
800    
801 root 1.17 eio_req *req = calloc (1, sizeof (*req));
802 root 1.7
803     In either case, libeio neither allocates, initialises or frees the
804     C<eio_req> structure for you - it merely uses it.
805    
806     zero
807    
808     #TODO
809    
810 root 1.8 =head2 CONFIGURATION
811    
812     The functions in this section can sometimes be useful, but the default
813     configuration will do in most case, so you should skip this section on
814     first reading.
815    
816     =over 4
817    
818     =item eio_set_max_poll_time (eio_tstamp nseconds)
819    
820     This causes C<eio_poll ()> to return after it has detected that it was
821     running for C<nsecond> seconds or longer (this number can be fractional).
822    
823     This can be used to limit the amount of time spent handling eio requests,
824     for example, in interactive programs, you might want to limit this time to
825     C<0.01> seconds or so.
826    
827     Note that:
828    
829 root 1.18 =over 4
830    
831     =item a) libeio doesn't know how long your request callbacks take, so the
832     time spent in C<eio_poll> is up to one callback invocation longer then
833     this interval.
834 root 1.8
835 root 1.18 =item b) this is implemented by calling C<gettimeofday> after each
836     request, which can be costly.
837 root 1.8
838 root 1.18 =item c) at least one request will be handled.
839    
840     =back
841 root 1.8
842     =item eio_set_max_poll_reqs (unsigned int nreqs)
843    
844     When C<nreqs> is non-zero, then C<eio_poll> will not handle more than
845     C<nreqs> requests per invocation. This is a less costly way to limit the
846     amount of work done by C<eio_poll> then setting a time limit.
847    
848     If you know your callbacks are generally fast, you could use this to
849     encourage interactiveness in your programs by setting it to C<10>, C<100>
850     or even C<1000>.
851    
852     =item eio_set_min_parallel (unsigned int nthreads)
853    
854     Make sure libeio can handle at least this many requests in parallel. It
855     might be able handle more.
856    
857     =item eio_set_max_parallel (unsigned int nthreads)
858    
859     Set the maximum number of threads that libeio will spawn.
860    
861     =item eio_set_max_idle (unsigned int nthreads)
862    
863     Libeio uses threads internally to handle most requests, and will start and stop threads on demand.
864    
865     This call can be used to limit the number of idle threads (threads without
866     work to do): libeio will keep some threads idle in preparation for more
867     requests, but never longer than C<nthreads> threads.
868    
869     In addition to this, libeio will also stop threads when they are idle for
870     a few seconds, regardless of this setting.
871    
872     =item unsigned int eio_nthreads ()
873    
874     Return the number of worker threads currently running.
875    
876     =item unsigned int eio_nreqs ()
877    
878     Return the number of requests currently handled by libeio. This is the
879     total number of requests that have been submitted to libeio, but not yet
880     destroyed.
881    
882     =item unsigned int eio_nready ()
883    
884     Returns the number of ready requests, i.e. requests that have been
885     submitted but have not yet entered the execution phase.
886    
887     =item unsigned int eio_npending ()
888    
889     Returns the number of pending requests, i.e. requests that have been
890     executed and have results, but have not been finished yet by a call to
891     C<eio_poll>).
892    
893     =back
894    
895 root 1.1 =head1 EMBEDDING
896    
897     Libeio can be embedded directly into programs. This functionality is not
898     documented and not (yet) officially supported.
899    
900 root 1.3 Note that, when including C<libeio.m4>, you are responsible for defining
901     the compilation environment (C<_LARGEFILE_SOURCE>, C<_GNU_SOURCE> etc.).
902    
903 root 1.2 If you need to know how, check the C<IO::AIO> perl module, which does
904 root 1.1 exactly that.
905    
906    
907 root 1.4 =head1 COMPILETIME CONFIGURATION
908    
909     These symbols, if used, must be defined when compiling F<eio.c>.
910    
911     =over 4
912    
913     =item EIO_STACKSIZE
914    
915     This symbol governs the stack size for each eio thread. Libeio itself
916     was written to use very little stackspace, but when using C<EIO_CUSTOM>
917     requests, you might want to increase this.
918    
919     If this symbol is undefined (the default) then libeio will use its default
920 root 1.32 stack size (C<sizeof (void *) * 4096> currently). In all other cases, the
921     value must be an expression that evaluates to the desired stack size.
922 root 1.4
923     =back
924    
925    
926 root 1.1 =head1 PORTABILITY REQUIREMENTS
927    
928     In addition to a working ISO-C implementation, libeio relies on a few
929     additional extensions:
930    
931     =over 4
932    
933     =item POSIX threads
934    
935     To be portable, this module uses threads, specifically, the POSIX threads
936     library must be available (and working, which partially excludes many xBSD
937     systems, where C<fork ()> is buggy).
938    
939     =item POSIX-compatible filesystem API
940    
941     This is actually a harder portability requirement: The libeio API is quite
942     demanding regarding POSIX API calls (symlinks, user/group management
943     etc.).
944    
945     =item C<double> must hold a time value in seconds with enough accuracy
946    
947     The type C<double> is used to represent timestamps. It is required to
948     have at least 51 bits of mantissa (and 9 bits of exponent), which is good
949     enough for at least into the year 4000. This requirement is fulfilled by
950     implementations implementing IEEE 754 (basically all existing ones).
951    
952     =back
953    
954     If you know of other additional requirements drop me a note.
955    
956    
957     =head1 AUTHOR
958    
959     Marc Lehmann <libeio@schmorp.de>.
960