ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/libeio/eio.pod
Revision: 1.29
Committed: Mon Sep 26 17:10:10 2011 UTC (12 years, 7 months ago) by sf-exg
Branch: MAIN
CVS Tags: rel-4_1, rel-4_12, rel-4_11
Changes since 1.28: +2 -2 lines
Log Message:
Fix typos.

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3     libeio - truly asynchronous POSIX I/O
4    
5     =head1 SYNOPSIS
6    
7     #include <eio.h>
8    
9     =head1 DESCRIPTION
10    
11     The newest version of this document is also available as an html-formatted
12     web page you might find easier to navigate when reading it for the first
13     time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>.
14    
15     Note that this library is a by-product of the C<IO::AIO> perl
16 sf-exg 1.6 module, and many of the subtler points regarding requests lifetime
17 root 1.1 and so on are only documented in its documentation at the
18     moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>.
19    
20     =head2 FEATURES
21    
22     This library provides fully asynchronous versions of most POSIX functions
23 sf-exg 1.6 dealing with I/O. Unlike most asynchronous libraries, this not only
24 root 1.1 includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and
25     similar functions, as well as less rarely ones such as C<mknod>, C<futime>
26     or C<readlink>.
27    
28     It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and
29     FreeBSD, with emulation on other platforms) and C<readahead> (Linux, with
30     emulation elsewhere>).
31    
32 root 1.5 The goal is to enable you to write fully non-blocking programs. For
33 root 1.1 example, in a game server, you would not want to freeze for a few seconds
34     just because the server is running a backup and you happen to call
35     C<readdir>.
36    
37     =head2 TIME REPRESENTATION
38    
39     Libeio represents time as a single floating point number, representing the
40     (fractional) number of seconds since the (POSIX) epoch (somewhere near
41     the beginning of 1970, details are complicated, don't ask). This type is
42 sf-exg 1.6 called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or
43 root 1.1 better), so you can freely use C<double> yourself.
44    
45     Unlike the name component C<stamp> might indicate, it is also used for
46     time differences throughout libeio.
47    
48     =head2 FORK SUPPORT
49    
50 root 1.26 Usage of pthreads in a program changes the semantics of fork
51     considerably. Specifically, only async-safe functions can be called after
52     fork. Libeio uses pthreads, so this applies, and makes using fork hard for
53     anything but relatively fork + exec uses.
54    
55     This library only works in the process that initialised it: Forking is
56     fully supported, but using libeio in any other process than the one that
57     called C<eio_init> is not.
58    
59     You might get around by not I<using> libeio before (or after) forking in
60     the parent, and using it in the child afterwards. You could also try to
61     call the L<eio_init> function again in the child, which will brutally
62     reinitialise all data structures, which isn't POSIX conformant, but
63     typically works.
64    
65     Otherwise, the only recommendation you should follow is: treat fork code
66     the same way you treat signal handlers, and only ever call C<eio_init> in
67     the process that uses it, and only once ever.
68 root 1.7
69 root 1.1 =head1 INITIALISATION/INTEGRATION
70    
71     Before you can call any eio functions you first have to initialise the
72     library. The library integrates into any event loop, but can also be used
73     without one, including in polling mode.
74    
75     You have to provide the necessary glue yourself, however.
76    
77     =over 4
78    
79     =item int eio_init (void (*want_poll)(void), void (*done_poll)(void))
80    
81     This function initialises the library. On success it returns C<0>, on
82     failure it returns C<-1> and sets C<errno> appropriately.
83    
84     It accepts two function pointers specifying callbacks as argument, both of
85     which can be C<0>, in which case the callback isn't called.
86    
87 root 1.26 There is currently no way to change these callbacks later, or to
88     "uninitialise" the library again.
89    
90 root 1.1 =item want_poll callback
91    
92     The C<want_poll> callback is invoked whenever libeio wants attention (i.e.
93     it wants to be polled by calling C<eio_poll>). It is "edge-triggered",
94     that is, it will only be called once when eio wants attention, until all
95     pending requests have been handled.
96    
97     This callback is called while locks are being held, so I<you must
98     not call any libeio functions inside this callback>. That includes
99     C<eio_poll>. What you should do is notify some other thread, or wake up
100     your event loop, and then call C<eio_poll>.
101    
102     =item done_poll callback
103    
104     This callback is invoked when libeio detects that all pending requests
105     have been handled. It is "edge-triggered", that is, it will only be
106     called once after C<want_poll>. To put it differently, C<want_poll> and
107     C<done_poll> are invoked in pairs: after C<want_poll> you have to call
108     C<eio_poll ()> until either C<eio_poll> indicates that everything has been
109     handled or C<done_poll> has been called, which signals the same.
110    
111     Note that C<eio_poll> might return after C<done_poll> and C<want_poll>
112     have been called again, so watch out for races in your code.
113    
114 sf-exg 1.6 As with C<want_poll>, this callback is called while locks are being held,
115 root 1.1 so you I<must not call any libeio functions form within this callback>.
116    
117     =item int eio_poll ()
118    
119     This function has to be called whenever there are pending requests that
120     need finishing. You usually call this after C<want_poll> has indicated
121     that you should do so, but you can also call this function regularly to
122     poll for new results.
123    
124     If any request invocation returns a non-zero value, then C<eio_poll ()>
125     immediately returns with that value as return value.
126    
127     Otherwise, if all requests could be handled, it returns C<0>. If for some
128     reason not all requests have been handled, i.e. some are still pending, it
129     returns C<-1>.
130    
131     =back
132    
133     For libev, you would typically use an C<ev_async> watcher: the
134     C<want_poll> callback would invoke C<ev_async_send> to wake up the event
135     loop. Inside the callback set for the watcher, one would call C<eio_poll
136 root 1.15 ()>.
137    
138     If C<eio_poll ()> is configured to not handle all results in one go
139     (i.e. it returns C<-1>) then you should start an idle watcher that calls
140     C<eio_poll> until it returns something C<!= -1>.
141    
142 sf-exg 1.20 A full-featured connector between libeio and libev would look as follows
143 root 1.16 (if C<eio_poll> is handling all requests, it can of course be simplified a
144     lot by removing the idle watcher logic):
145 root 1.15
146 root 1.17 static struct ev_loop *loop;
147     static ev_idle repeat_watcher;
148     static ev_async ready_watcher;
149 root 1.15
150 root 1.17 /* idle watcher callback, only used when eio_poll */
151     /* didn't handle all results in one call */
152     static void
153     repeat (EV_P_ ev_idle *w, int revents)
154     {
155     if (eio_poll () != -1)
156     ev_idle_stop (EV_A_ w);
157     }
158    
159     /* eio has some results, process them */
160     static void
161     ready (EV_P_ ev_async *w, int revents)
162     {
163     if (eio_poll () == -1)
164     ev_idle_start (EV_A_ &repeat_watcher);
165     }
166    
167     /* wake up the event loop */
168     static void
169     want_poll (void)
170     {
171     ev_async_send (loop, &ready_watcher)
172     }
173    
174     void
175     my_init_eio ()
176     {
177     loop = EV_DEFAULT;
178    
179     ev_idle_init (&repeat_watcher, repeat);
180     ev_async_init (&ready_watcher, ready);
181     ev_async_start (loop &watcher);
182    
183     eio_init (want_poll, 0);
184     }
185 root 1.1
186     For most other event loops, you would typically use a pipe - the event
187 sf-exg 1.6 loop should be told to wait for read readiness on the read end. In
188 root 1.1 C<want_poll> you would write a single byte, in C<done_poll> you would try
189     to read that byte, and in the callback for the read end, you would call
190 root 1.16 C<eio_poll>.
191    
192     You don't have to take special care in the case C<eio_poll> doesn't handle
193     all requests, as the done callback will not be invoked, so the event loop
194 root 1.18 will still signal readiness for the pipe until I<all> results have been
195 root 1.16 processed.
196 root 1.1
197    
198 root 1.7 =head1 HIGH LEVEL REQUEST API
199    
200     Libeio has both a high-level API, which consists of calling a request
201     function with a callback to be called on completion, and a low-level API
202     where you fill out request structures and submit them.
203    
204     This section describes the high-level API.
205    
206     =head2 REQUEST SUBMISSION AND RESULT PROCESSING
207    
208     You submit a request by calling the relevant C<eio_TYPE> function with the
209     required parameters, a callback of type C<int (*eio_cb)(eio_req *req)>
210     (called C<eio_cb> below) and a freely usable C<void *data> argument.
211    
212 root 1.12 The return value will either be 0, in case something went really wrong
213     (which can basically only happen on very fatal errors, such as C<malloc>
214     returning 0, which is rather unlikely), or a pointer to the newly-created
215     and submitted C<eio_req *>.
216 root 1.7
217     The callback will be called with an C<eio_req *> which contains the
218     results of the request. The members you can access inside that structure
219     vary from request to request, except for:
220    
221     =over 4
222    
223     =item C<ssize_t result>
224    
225     This contains the result value from the call (usually the same as the
226     syscall of the same name).
227    
228     =item C<int errorno>
229    
230     This contains the value of C<errno> after the call.
231    
232     =item C<void *data>
233    
234     The C<void *data> member simply stores the value of the C<data> argument.
235    
236     =back
237    
238 sf-exg 1.29 Members not explicitly described as accessible must not be
239     accessed. Specifically, there is no guarantee that any members will still
240 root 1.28 have the value they had when the request was submitted.
241    
242 root 1.7 The return value of the callback is normally C<0>, which tells libeio to
243     continue normally. If a callback returns a nonzero value, libeio will
244     stop processing results (in C<eio_poll>) and will return the value to its
245     caller.
246    
247 root 1.28 Memory areas passed to libeio wrappers must stay valid as long as a
248     request executes, with the exception of paths, which are being copied
249 root 1.7 internally. Any memory libeio itself allocates will be freed after the
250     finish callback has been called. If you want to manage all memory passed
251     to libeio yourself you can use the low-level API.
252    
253     For example, to open a file, you could do this:
254    
255     static int
256     file_open_done (eio_req *req)
257     {
258     if (req->result < 0)
259     {
260     /* open() returned -1 */
261     errno = req->errorno;
262     perror ("open");
263     }
264     else
265     {
266     int fd = req->result;
267     /* now we have the new fd in fd */
268     }
269    
270     return 0;
271     }
272    
273     /* the first three arguments are passed to open(2) */
274     /* the remaining are priority, callback and data */
275     if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0))
276 root 1.18 abort (); /* something went wrong, we will all die!!! */
277 root 1.7
278     Note that you additionally need to call C<eio_poll> when the C<want_cb>
279     indicates that requests are ready to be processed.
280    
281 root 1.17 =head2 CANCELLING REQUESTS
282    
283     Sometimes the need for a request goes away before the request is
284 root 1.18 finished. In that case, one can cancel the request by a call to
285 root 1.17 C<eio_cancel>:
286    
287     =over 4
288    
289     =item eio_cancel (eio_req *req)
290    
291 root 1.19 Cancel the request (and all its subrequests). If the request is currently
292 root 1.18 executing it might still continue to execute, and in other cases it might
293     still take a while till the request is cancelled.
294 root 1.17
295     Even if cancelled, the finish callback will still be invoked - the
296     callbacks of all cancellable requests need to check whether the request
297     has been cancelled by calling C<EIO_CANCELLED (req)>:
298    
299     static int
300     my_eio_cb (eio_req *req)
301     {
302     if (EIO_CANCELLED (req))
303     return 0;
304     }
305    
306 root 1.18 In addition, cancelled requests will I<either> have C<< req->result >>
307     set to C<-1> and C<errno> to C<ECANCELED>, or I<otherwise> they were
308     successfully executed, despite being cancelled (e.g. when they have
309     already been executed at the time they were cancelled).
310    
311     C<EIO_CANCELLED> is still true for requests that have successfully
312     executed, as long as C<eio_cancel> was called on them at some point.
313 root 1.17
314     =back
315    
316 root 1.7 =head2 AVAILABLE REQUESTS
317    
318     The following request functions are available. I<All> of them return the
319     C<eio_req *> on success and C<0> on failure, and I<all> of them have the
320     same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is
321     mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some
322     custom data value as C<data>.
323    
324     =head3 POSIX API WRAPPERS
325    
326     These requests simply wrap the POSIX call of the same name, with the same
327 root 1.11 arguments. If a function is not implemented by the OS and cannot be emulated
328 root 1.10 in some way, then all of these return C<-1> and set C<errorno> to C<ENOSYS>.
329 root 1.7
330     =over 4
331    
332     =item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data)
333    
334     =item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data)
335    
336     =item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
337    
338     =item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
339    
340     =item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
341    
342     =item eio_rmdir (const char *path, int pri, eio_cb cb, void *data)
343    
344     =item eio_unlink (const char *path, int pri, eio_cb cb, void *data)
345    
346 root 1.10 =item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
347 root 1.7
348     =item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data)
349    
350     =item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
351    
352     =item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
353    
354     =item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
355    
356     =item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data)
357    
358     =item eio_close (int fd, int pri, eio_cb cb, void *data)
359    
360     =item eio_sync (int pri, eio_cb cb, void *data)
361    
362     =item eio_fsync (int fd, int pri, eio_cb cb, void *data)
363    
364     =item eio_fdatasync (int fd, int pri, eio_cb cb, void *data)
365    
366     =item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
367    
368     =item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data)
369    
370     =item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data)
371    
372     =item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
373    
374     =item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data)
375    
376     These have the same semantics as the syscall of the same name, their
377     return value is available as C<< req->result >> later.
378    
379     =item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
380    
381     =item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
382    
383     These two requests are called C<read> and C<write>, but actually wrap
384     C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin),
385     libeio uses lseek/read_or_write/lseek and a mutex to serialise the
386     requests, so all these requests run serially and do not disturb each
387     other. However, they still disturb the file offset while they run, so it's
388     not safe to call these functions concurrently with non-libeio functions on
389     the same fd on these systems.
390    
391     Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X),
392     so it is advised not to submit multiple requests on the same fd on this
393     horrible pile of garbage.
394    
395 root 1.10 =item eio_mlockall (int flags, int pri, eio_cb cb, void *data)
396    
397     Like C<mlockall>, but the flag value constants are called
398     C<EIO_MCL_CURRENT> and C<EIO_MCL_FUTURE>.
399    
400     =item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
401    
402     Just like msync, except that the flag values are called C<EIO_MS_ASYNC>,
403     C<EIO_MS_INVALIDATE> and C<EIO_MS_SYNC>.
404    
405     =item eio_readlink (const char *path, int pri, eio_cb cb, void *data)
406    
407     If successful, the path read by C<readlink(2)> can be accessed via C<<
408     req->ptr2 >> and is I<NOT> null-terminated, with the length specified as
409     C<< req->result >>.
410    
411     if (req->result >= 0)
412     {
413     char *target = strndup ((char *)req->ptr2, req->result);
414    
415     free (target);
416     }
417    
418 root 1.13 =item eio_realpath (const char *path, int pri, eio_cb cb, void *data)
419    
420 root 1.22 Similar to the realpath libc function, but unlike that one, C<<
421     req->result >> is C<-1> on failure. On success, the result is the length
422     of the returned path in C<ptr2> (which is I<NOT> 0-terminated) - this is
423     similar to readlink.
424 root 1.13
425 root 1.10 =item eio_stat (const char *path, int pri, eio_cb cb, void *data)
426    
427     =item eio_lstat (const char *path, int pri, eio_cb cb, void *data)
428    
429 root 1.7 =item eio_fstat (int fd, int pri, eio_cb cb, void *data)
430    
431     Stats a file - if C<< req->result >> indicates success, then you can
432     access the C<struct stat>-like structure via C<< req->ptr2 >>:
433    
434 root 1.17 EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2;
435 root 1.7
436 root 1.10 =item eio_statvfs (const char *path, int pri, eio_cb cb, void *data)
437    
438     =item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data)
439 root 1.7
440     Stats a filesystem - if C<< req->result >> indicates success, then you can
441     access the C<struct statvfs>-like structure via C<< req->ptr2 >>:
442    
443 root 1.17 EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2;
444 root 1.7
445     =back
446    
447     =head3 READING DIRECTORIES
448    
449     Reading directories sounds simple, but can be rather demanding, especially
450 root 1.18 if you want to do stuff such as traversing a directory hierarchy or
451     processing all files in a directory. Libeio can assist these complex tasks
452 root 1.7 with it's C<eio_readdir> call.
453    
454     =over 4
455    
456     =item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data)
457    
458     This is a very complex call. It basically reads through a whole directory
459     (via the C<opendir>, C<readdir> and C<closedir> calls) and returns either
460     the names or an array of C<struct eio_dirent>, depending on the C<flags>
461     argument.
462    
463     The C<< req->result >> indicates either the number of files found, or
464 root 1.10 C<-1> on error. On success, null-terminated names can be found as C<< req->ptr2 >>,
465 root 1.7 and C<struct eio_dirents>, if requested by C<flags>, can be found via C<<
466     req->ptr1 >>.
467    
468     Here is an example that prints all the names:
469    
470     int i;
471     char *names = (char *)req->ptr2;
472    
473     for (i = 0; i < req->result; ++i)
474     {
475     printf ("name #%d: %s\n", i, names);
476    
477     /* move to next name */
478     names += strlen (names) + 1;
479     }
480    
481     Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>.
482    
483     C<flags> can be any combination of:
484    
485     =over 4
486    
487     =item EIO_READDIR_DENTS
488    
489     If this flag is specified, then, in addition to the names in C<ptr2>,
490     also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct
491     eio_dirent> looks like this:
492    
493 root 1.17 struct eio_dirent
494     {
495     int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */
496     unsigned short namelen; /* size of filename without trailing 0 */
497     unsigned char type; /* one of EIO_DT_* */
498     signed char score; /* internal use */
499     ino_t inode; /* the inode number, if available, otherwise unspecified */
500     };
501 root 1.7
502     The only members you normally would access are C<nameofs>, which is the
503     byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>.
504    
505     C<type> can be one of:
506    
507     C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat>
508     the name yourself if you need to know,
509     one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>,
510     C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>)
511     or some OS-specific type (currently
512     C<EIO_DT_MPC> - multiplexed char device (v7+coherent),
513     C<EIO_DT_NAM> - xenix special named file,
514     C<EIO_DT_MPB> - multiplexed block device (v7+coherent),
515     C<EIO_DT_NWK> - HP-UX network special,
516     C<EIO_DT_CMP> - VxFS compressed,
517     C<EIO_DT_DOOR> - solaris door, or
518     C<EIO_DT_WHT>).
519    
520     This example prints all names and their type:
521    
522     int i;
523     struct eio_dirent *ents = (struct eio_dirent *)req->ptr1;
524     char *names = (char *)req->ptr2;
525    
526     for (i = 0; i < req->result; ++i)
527     {
528     struct eio_dirent *ent = ents + i;
529     char *name = names + ent->nameofs;
530    
531     printf ("name #%d: %s (type %d)\n", i, name, ent->type);
532     }
533    
534     =item EIO_READDIR_DIRS_FIRST
535    
536     When this flag is specified, then the names will be returned in an order
537     where likely directories come first, in optimal C<stat> order. This is
538     useful when you need to quickly find directories, or you want to find all
539     directories while avoiding to stat() each entry.
540    
541     If the system returns type information in readdir, then this is used
542     to find directories directly. Otherwise, likely directories are names
543     beginning with ".", or otherwise names with no dots, of which names with
544     short names are tried first.
545    
546     =item EIO_READDIR_STAT_ORDER
547    
548     When this flag is specified, then the names will be returned in an order
549     suitable for stat()'ing each one. That is, when you plan to stat()
550     all files in the given directory, then the returned order will likely
551     be fastest.
552    
553 root 1.18 If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then the
554     likely directories come first, resulting in a less optimal stat order.
555 root 1.7
556     =item EIO_READDIR_FOUND_UNKNOWN
557    
558     This flag should not be specified when calling C<eio_readdir>. Instead,
559     it is being set by C<eio_readdir> (you can access the C<flags> via C<<
560     req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The
561 root 1.18 absence of this flag therefore indicates that all C<type>'s are known,
562 root 1.7 which can be used to speed up some algorithms.
563    
564     A typical use case would be to identify all subdirectories within a
565     directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If
566     then this flag is I<NOT> set, then all the entries at the beginning of the
567     returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you
568     should start C<stat()>'ing the entries starting at the beginning of the
569     array, stopping as soon as you found all directories (the count can be
570     deduced by the link count of the directory).
571    
572     =back
573    
574     =back
575    
576     =head3 OS-SPECIFIC CALL WRAPPERS
577    
578     These wrap OS-specific calls (usually Linux ones), and might or might not
579     be emulated on other operating systems. Calls that are not emulated will
580     return C<-1> and set C<errno> to C<ENOSYS>.
581    
582     =over 4
583    
584     =item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data)
585    
586     Wraps the C<sendfile> syscall. The arguments follow the Linux version, but
587     libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and
588     Darwin.
589    
590     If the OS doesn't support some sendfile-like call, or the call fails,
591     indicating support for the given file descriptor type (for example,
592     Linux's sendfile might not support file to file copies), then libeio will
593     emulate the call in userspace, so there are almost no limitations on its
594     use.
595    
596     =item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data)
597    
598     Calls C<readahead(2)>. If the syscall is missing, then the call is
599     emulated by simply reading the data (currently in 64kiB chunks).
600    
601 root 1.27 =item eio_syncfs (int fd, int pri, eio_cb cb, void *data)
602    
603     Calls Linux' C<syncfs> syscall, if available. Returns C<-1> and sets
604     C<errno> to C<ENOSYS> if the call is missing I<but still calls sync()>,
605     if the C<fd> is C<< >= 0 >>, so you can probe for the availability of the
606     syscall with a negative C<fd> argument and checking for C<-1/ENOSYS>.
607    
608 root 1.7 =item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data)
609    
610     Calls C<sync_file_range>. If the syscall is missing, then this is the same
611     as calling C<fdatasync>.
612    
613 root 1.10 Flags can be any combination of C<EIO_SYNC_FILE_RANGE_WAIT_BEFORE>,
614     C<EIO_SYNC_FILE_RANGE_WRITE> and C<EIO_SYNC_FILE_RANGE_WAIT_AFTER>.
615    
616 root 1.21 =item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data)
617    
618     Calls C<fallocate> (note: I<NOT> C<posix_fallocate>!). If the syscall is
619     missing, then it returns failure and sets C<errno> to C<ENOSYS>.
620    
621     The C<mode> argument can be C<0> (for behaviour similar to
622     C<posix_fallocate>), or C<EIO_FALLOC_FL_KEEP_SIZE>, which keeps the size
623     of the file unchanged (but still preallocates space beyond end of file).
624    
625 root 1.7 =back
626    
627     =head3 LIBEIO-SPECIFIC REQUESTS
628    
629     These requests are specific to libeio and do not correspond to any OS call.
630    
631     =over 4
632    
633 root 1.9 =item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
634 root 1.7
635 root 1.9 Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY) the given
636     memory area, page-wise, that is, it reads (or reads and writes back) the
637     first octet of every page that spans the memory area.
638    
639     This can be used to page in some mmapped file, or dirty some pages. Note
640     that dirtying is an unlocked read-write access, so races can ensue when
641     the some other thread modifies the data stored in that memory area.
642    
643     =item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data)
644 root 1.7
645     Executes a custom request, i.e., a user-specified callback.
646    
647     The callback gets the C<eio_req *> as parameter and is expected to read
648     and modify any request-specific members. Specifically, it should set C<<
649     req->result >> to the result value, just like other requests.
650    
651     Here is an example that simply calls C<open>, like C<eio_open>, but it
652     uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If
653     you want to pass more/other parameters, you either need to pass some
654     struct or so via C<data> or provide your own wrapper using the low-level
655     API.
656    
657     static int
658     my_open_done (eio_req *req)
659     {
660     int fd = req->result;
661    
662     return 0;
663     }
664    
665     static void
666     my_open (eio_req *req)
667     {
668     req->result = open (req->data, O_RDONLY);
669     }
670    
671     eio_custom (my_open, 0, my_open_done, "/etc/passwd");
672    
673 root 1.9 =item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data)
674 root 1.7
675 root 1.18 This is a request that takes C<delay> seconds to execute, but otherwise
676 root 1.7 does nothing - it simply puts one of the worker threads to sleep for this
677     long.
678    
679     This request can be used to artificially increase load, e.g. for debugging
680     or benchmarking reasons.
681    
682 root 1.9 =item eio_nop (int pri, eio_cb cb, void *data)
683 root 1.7
684     This request does nothing, except go through the whole request cycle. This
685     can be used to measure latency or in some cases to simplify code, but is
686     not really of much use.
687    
688     =back
689    
690     =head3 GROUPING AND LIMITING REQUESTS
691 root 1.1
692 root 1.12 There is one more rather special request, C<eio_grp>. It is a very special
693     aio request: Instead of doing something, it is a container for other eio
694     requests.
695    
696     There are two primary use cases for this: a) bundle many requests into a
697     single, composite, request with a definite callback and the ability to
698     cancel the whole request with its subrequests and b) limiting the number
699     of "active" requests.
700    
701 root 1.18 Further below you will find more discussion of these topics - first
702     follows the reference section detailing the request generator and other
703     methods.
704 root 1.12
705     =over 4
706    
707 root 1.17 =item eio_req *grp = eio_grp (eio_cb cb, void *data)
708    
709 root 1.23 Creates, submits and returns a group request. Note that it doesn't have a
710     priority, unlike all other requests.
711 root 1.17
712     =item eio_grp_add (eio_req *grp, eio_req *req)
713    
714     Adds a request to the request group.
715    
716     =item eio_grp_cancel (eio_req *grp)
717    
718     Cancels all requests I<in> the group, but I<not> the group request
719 root 1.23 itself. You can cancel the group request I<and> all subrequests via a
720     normal C<eio_cancel> call.
721 root 1.17
722 root 1.23 =back
723    
724     =head4 GROUP REQUEST LIFETIME
725    
726     Left alone, a group request will instantly move to the pending state and
727     will be finished at the next call of C<eio_poll>.
728    
729 sf-exg 1.24 The usefulness stems from the fact that, if a subrequest is added to a
730 root 1.23 group I<before> a call to C<eio_poll>, via C<eio_grp_add>, then the group
731     will not finish until all the subrequests have finished.
732    
733     So the usage cycle of a group request is like this: after it is created,
734     you normally instantly add a subrequest. If none is added, the group
735     request will finish on it's own. As long as subrequests are added before
736     the group request is finished it will be kept from finishing, that is the
737     callbacks of any subrequests can, in turn, add more requests to the group,
738     and as long as any requests are active, the group request itself will not
739     finish.
740    
741     =head4 CREATING COMPOSITE REQUESTS
742    
743     Imagine you wanted to create an C<eio_load> request that opens a file,
744     reads it and closes it. This means it has to execute at least three eio
745     requests, but for various reasons it might be nice if that request looked
746     like any other eio request.
747    
748     This can be done with groups:
749    
750     =over 4
751    
752     =item 1) create the request object
753    
754     Create a group that contains all further requests. This is the request you
755     can return as "the load request".
756 root 1.12
757 root 1.23 =item 2) open the file, maybe
758    
759     Next, open the file with C<eio_open> and add the request to the group
760 sf-exg 1.24 request and you are finished setting up the request.
761 root 1.23
762     If, for some reason, you cannot C<eio_open> (path is a null ptr?) you
763 sf-exg 1.24 can set C<< grp->result >> to C<-1> to signal an error and let the group
764 root 1.23 request finish on its own.
765    
766     =item 3) open callback adds more requests
767    
768     In the open callback, if the open was not successful, copy C<<
769     req->errorno >> to C<< grp->errorno >> and set C<< grp->errorno >> to
770     C<-1> to signal an error.
771    
772     Otherwise, malloc some memory or so and issue a read request, adding the
773     read request to the group.
774    
775 sf-exg 1.24 =item 4) continue issuing requests till finished
776 root 1.23
777     In the real callback, check for errors and possibly continue with
778     C<eio_close> or any other eio request in the same way.
779    
780     As soon as no new requests are added the group request will finish. Make
781     sure you I<always> set C<< grp->result >> to some sensible value.
782 root 1.12
783     =back
784    
785 root 1.23 =head4 REQUEST LIMITING
786 root 1.12
787    
788 root 1.1 #TODO
789    
790 root 1.7 void eio_grp_limit (eio_req *grp, int limit);
791 root 1.1
792    
793     =back
794    
795    
796     =head1 LOW LEVEL REQUEST API
797    
798     #TODO
799    
800 root 1.7
801     =head1 ANATOMY AND LIFETIME OF AN EIO REQUEST
802    
803     A request is represented by a structure of type C<eio_req>. To initialise
804     it, clear it to all zero bytes:
805    
806 root 1.17 eio_req req;
807 root 1.7
808 root 1.17 memset (&req, 0, sizeof (req));
809 root 1.7
810     A more common way to initialise a new C<eio_req> is to use C<calloc>:
811    
812 root 1.17 eio_req *req = calloc (1, sizeof (*req));
813 root 1.7
814     In either case, libeio neither allocates, initialises or frees the
815     C<eio_req> structure for you - it merely uses it.
816    
817     zero
818    
819     #TODO
820    
821 root 1.8 =head2 CONFIGURATION
822    
823     The functions in this section can sometimes be useful, but the default
824     configuration will do in most case, so you should skip this section on
825     first reading.
826    
827     =over 4
828    
829     =item eio_set_max_poll_time (eio_tstamp nseconds)
830    
831     This causes C<eio_poll ()> to return after it has detected that it was
832     running for C<nsecond> seconds or longer (this number can be fractional).
833    
834     This can be used to limit the amount of time spent handling eio requests,
835     for example, in interactive programs, you might want to limit this time to
836     C<0.01> seconds or so.
837    
838     Note that:
839    
840 root 1.18 =over 4
841    
842     =item a) libeio doesn't know how long your request callbacks take, so the
843     time spent in C<eio_poll> is up to one callback invocation longer then
844     this interval.
845 root 1.8
846 root 1.18 =item b) this is implemented by calling C<gettimeofday> after each
847     request, which can be costly.
848 root 1.8
849 root 1.18 =item c) at least one request will be handled.
850    
851     =back
852 root 1.8
853     =item eio_set_max_poll_reqs (unsigned int nreqs)
854    
855     When C<nreqs> is non-zero, then C<eio_poll> will not handle more than
856     C<nreqs> requests per invocation. This is a less costly way to limit the
857     amount of work done by C<eio_poll> then setting a time limit.
858    
859     If you know your callbacks are generally fast, you could use this to
860     encourage interactiveness in your programs by setting it to C<10>, C<100>
861     or even C<1000>.
862    
863     =item eio_set_min_parallel (unsigned int nthreads)
864    
865     Make sure libeio can handle at least this many requests in parallel. It
866     might be able handle more.
867    
868     =item eio_set_max_parallel (unsigned int nthreads)
869    
870     Set the maximum number of threads that libeio will spawn.
871    
872     =item eio_set_max_idle (unsigned int nthreads)
873    
874     Libeio uses threads internally to handle most requests, and will start and stop threads on demand.
875    
876     This call can be used to limit the number of idle threads (threads without
877     work to do): libeio will keep some threads idle in preparation for more
878     requests, but never longer than C<nthreads> threads.
879    
880     In addition to this, libeio will also stop threads when they are idle for
881     a few seconds, regardless of this setting.
882    
883     =item unsigned int eio_nthreads ()
884    
885     Return the number of worker threads currently running.
886    
887     =item unsigned int eio_nreqs ()
888    
889     Return the number of requests currently handled by libeio. This is the
890     total number of requests that have been submitted to libeio, but not yet
891     destroyed.
892    
893     =item unsigned int eio_nready ()
894    
895     Returns the number of ready requests, i.e. requests that have been
896     submitted but have not yet entered the execution phase.
897    
898     =item unsigned int eio_npending ()
899    
900     Returns the number of pending requests, i.e. requests that have been
901     executed and have results, but have not been finished yet by a call to
902     C<eio_poll>).
903    
904     =back
905    
906 root 1.1 =head1 EMBEDDING
907    
908     Libeio can be embedded directly into programs. This functionality is not
909     documented and not (yet) officially supported.
910    
911 root 1.3 Note that, when including C<libeio.m4>, you are responsible for defining
912     the compilation environment (C<_LARGEFILE_SOURCE>, C<_GNU_SOURCE> etc.).
913    
914 root 1.2 If you need to know how, check the C<IO::AIO> perl module, which does
915 root 1.1 exactly that.
916    
917    
918 root 1.4 =head1 COMPILETIME CONFIGURATION
919    
920     These symbols, if used, must be defined when compiling F<eio.c>.
921    
922     =over 4
923    
924     =item EIO_STACKSIZE
925    
926     This symbol governs the stack size for each eio thread. Libeio itself
927     was written to use very little stackspace, but when using C<EIO_CUSTOM>
928     requests, you might want to increase this.
929    
930     If this symbol is undefined (the default) then libeio will use its default
931 root 1.25 stack size (C<sizeof (void *) * 4096> currently). If it is defined, but
932 root 1.4 C<0>, then the default operating system stack size will be used. In all
933     other cases, the value must be an expression that evaluates to the desired
934     stack size.
935    
936     =back
937    
938    
939 root 1.1 =head1 PORTABILITY REQUIREMENTS
940    
941     In addition to a working ISO-C implementation, libeio relies on a few
942     additional extensions:
943    
944     =over 4
945    
946     =item POSIX threads
947    
948     To be portable, this module uses threads, specifically, the POSIX threads
949     library must be available (and working, which partially excludes many xBSD
950     systems, where C<fork ()> is buggy).
951    
952     =item POSIX-compatible filesystem API
953    
954     This is actually a harder portability requirement: The libeio API is quite
955     demanding regarding POSIX API calls (symlinks, user/group management
956     etc.).
957    
958     =item C<double> must hold a time value in seconds with enough accuracy
959    
960     The type C<double> is used to represent timestamps. It is required to
961     have at least 51 bits of mantissa (and 9 bits of exponent), which is good
962     enough for at least into the year 4000. This requirement is fulfilled by
963     implementations implementing IEEE 754 (basically all existing ones).
964    
965     =back
966    
967     If you know of other additional requirements drop me a note.
968    
969    
970     =head1 AUTHOR
971    
972     Marc Lehmann <libeio@schmorp.de>.
973