ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/libeio/eio.pod
Revision: 1.27
Committed: Sun Jul 24 03:32:54 2011 UTC (12 years, 9 months ago) by root
Branch: MAIN
Changes since 1.26: +7 -0 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 libeio - truly asynchronous POSIX I/O
4
5 =head1 SYNOPSIS
6
7 #include <eio.h>
8
9 =head1 DESCRIPTION
10
11 The newest version of this document is also available as an html-formatted
12 web page you might find easier to navigate when reading it for the first
13 time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>.
14
15 Note that this library is a by-product of the C<IO::AIO> perl
16 module, and many of the subtler points regarding requests lifetime
17 and so on are only documented in its documentation at the
18 moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>.
19
20 =head2 FEATURES
21
22 This library provides fully asynchronous versions of most POSIX functions
23 dealing with I/O. Unlike most asynchronous libraries, this not only
24 includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and
25 similar functions, as well as less rarely ones such as C<mknod>, C<futime>
26 or C<readlink>.
27
28 It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and
29 FreeBSD, with emulation on other platforms) and C<readahead> (Linux, with
30 emulation elsewhere>).
31
32 The goal is to enable you to write fully non-blocking programs. For
33 example, in a game server, you would not want to freeze for a few seconds
34 just because the server is running a backup and you happen to call
35 C<readdir>.
36
37 =head2 TIME REPRESENTATION
38
39 Libeio represents time as a single floating point number, representing the
40 (fractional) number of seconds since the (POSIX) epoch (somewhere near
41 the beginning of 1970, details are complicated, don't ask). This type is
42 called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or
43 better), so you can freely use C<double> yourself.
44
45 Unlike the name component C<stamp> might indicate, it is also used for
46 time differences throughout libeio.
47
48 =head2 FORK SUPPORT
49
50 Usage of pthreads in a program changes the semantics of fork
51 considerably. Specifically, only async-safe functions can be called after
52 fork. Libeio uses pthreads, so this applies, and makes using fork hard for
53 anything but relatively fork + exec uses.
54
55 This library only works in the process that initialised it: Forking is
56 fully supported, but using libeio in any other process than the one that
57 called C<eio_init> is not.
58
59 You might get around by not I<using> libeio before (or after) forking in
60 the parent, and using it in the child afterwards. You could also try to
61 call the L<eio_init> function again in the child, which will brutally
62 reinitialise all data structures, which isn't POSIX conformant, but
63 typically works.
64
65 Otherwise, the only recommendation you should follow is: treat fork code
66 the same way you treat signal handlers, and only ever call C<eio_init> in
67 the process that uses it, and only once ever.
68
69 =head1 INITIALISATION/INTEGRATION
70
71 Before you can call any eio functions you first have to initialise the
72 library. The library integrates into any event loop, but can also be used
73 without one, including in polling mode.
74
75 You have to provide the necessary glue yourself, however.
76
77 =over 4
78
79 =item int eio_init (void (*want_poll)(void), void (*done_poll)(void))
80
81 This function initialises the library. On success it returns C<0>, on
82 failure it returns C<-1> and sets C<errno> appropriately.
83
84 It accepts two function pointers specifying callbacks as argument, both of
85 which can be C<0>, in which case the callback isn't called.
86
87 There is currently no way to change these callbacks later, or to
88 "uninitialise" the library again.
89
90 =item want_poll callback
91
92 The C<want_poll> callback is invoked whenever libeio wants attention (i.e.
93 it wants to be polled by calling C<eio_poll>). It is "edge-triggered",
94 that is, it will only be called once when eio wants attention, until all
95 pending requests have been handled.
96
97 This callback is called while locks are being held, so I<you must
98 not call any libeio functions inside this callback>. That includes
99 C<eio_poll>. What you should do is notify some other thread, or wake up
100 your event loop, and then call C<eio_poll>.
101
102 =item done_poll callback
103
104 This callback is invoked when libeio detects that all pending requests
105 have been handled. It is "edge-triggered", that is, it will only be
106 called once after C<want_poll>. To put it differently, C<want_poll> and
107 C<done_poll> are invoked in pairs: after C<want_poll> you have to call
108 C<eio_poll ()> until either C<eio_poll> indicates that everything has been
109 handled or C<done_poll> has been called, which signals the same.
110
111 Note that C<eio_poll> might return after C<done_poll> and C<want_poll>
112 have been called again, so watch out for races in your code.
113
114 As with C<want_poll>, this callback is called while locks are being held,
115 so you I<must not call any libeio functions form within this callback>.
116
117 =item int eio_poll ()
118
119 This function has to be called whenever there are pending requests that
120 need finishing. You usually call this after C<want_poll> has indicated
121 that you should do so, but you can also call this function regularly to
122 poll for new results.
123
124 If any request invocation returns a non-zero value, then C<eio_poll ()>
125 immediately returns with that value as return value.
126
127 Otherwise, if all requests could be handled, it returns C<0>. If for some
128 reason not all requests have been handled, i.e. some are still pending, it
129 returns C<-1>.
130
131 =back
132
133 For libev, you would typically use an C<ev_async> watcher: the
134 C<want_poll> callback would invoke C<ev_async_send> to wake up the event
135 loop. Inside the callback set for the watcher, one would call C<eio_poll
136 ()>.
137
138 If C<eio_poll ()> is configured to not handle all results in one go
139 (i.e. it returns C<-1>) then you should start an idle watcher that calls
140 C<eio_poll> until it returns something C<!= -1>.
141
142 A full-featured connector between libeio and libev would look as follows
143 (if C<eio_poll> is handling all requests, it can of course be simplified a
144 lot by removing the idle watcher logic):
145
146 static struct ev_loop *loop;
147 static ev_idle repeat_watcher;
148 static ev_async ready_watcher;
149
150 /* idle watcher callback, only used when eio_poll */
151 /* didn't handle all results in one call */
152 static void
153 repeat (EV_P_ ev_idle *w, int revents)
154 {
155 if (eio_poll () != -1)
156 ev_idle_stop (EV_A_ w);
157 }
158
159 /* eio has some results, process them */
160 static void
161 ready (EV_P_ ev_async *w, int revents)
162 {
163 if (eio_poll () == -1)
164 ev_idle_start (EV_A_ &repeat_watcher);
165 }
166
167 /* wake up the event loop */
168 static void
169 want_poll (void)
170 {
171 ev_async_send (loop, &ready_watcher)
172 }
173
174 void
175 my_init_eio ()
176 {
177 loop = EV_DEFAULT;
178
179 ev_idle_init (&repeat_watcher, repeat);
180 ev_async_init (&ready_watcher, ready);
181 ev_async_start (loop &watcher);
182
183 eio_init (want_poll, 0);
184 }
185
186 For most other event loops, you would typically use a pipe - the event
187 loop should be told to wait for read readiness on the read end. In
188 C<want_poll> you would write a single byte, in C<done_poll> you would try
189 to read that byte, and in the callback for the read end, you would call
190 C<eio_poll>.
191
192 You don't have to take special care in the case C<eio_poll> doesn't handle
193 all requests, as the done callback will not be invoked, so the event loop
194 will still signal readiness for the pipe until I<all> results have been
195 processed.
196
197
198 =head1 HIGH LEVEL REQUEST API
199
200 Libeio has both a high-level API, which consists of calling a request
201 function with a callback to be called on completion, and a low-level API
202 where you fill out request structures and submit them.
203
204 This section describes the high-level API.
205
206 =head2 REQUEST SUBMISSION AND RESULT PROCESSING
207
208 You submit a request by calling the relevant C<eio_TYPE> function with the
209 required parameters, a callback of type C<int (*eio_cb)(eio_req *req)>
210 (called C<eio_cb> below) and a freely usable C<void *data> argument.
211
212 The return value will either be 0, in case something went really wrong
213 (which can basically only happen on very fatal errors, such as C<malloc>
214 returning 0, which is rather unlikely), or a pointer to the newly-created
215 and submitted C<eio_req *>.
216
217 The callback will be called with an C<eio_req *> which contains the
218 results of the request. The members you can access inside that structure
219 vary from request to request, except for:
220
221 =over 4
222
223 =item C<ssize_t result>
224
225 This contains the result value from the call (usually the same as the
226 syscall of the same name).
227
228 =item C<int errorno>
229
230 This contains the value of C<errno> after the call.
231
232 =item C<void *data>
233
234 The C<void *data> member simply stores the value of the C<data> argument.
235
236 =back
237
238 The return value of the callback is normally C<0>, which tells libeio to
239 continue normally. If a callback returns a nonzero value, libeio will
240 stop processing results (in C<eio_poll>) and will return the value to its
241 caller.
242
243 Memory areas passed to libeio must stay valid as long as a request
244 executes, with the exception of paths, which are being copied
245 internally. Any memory libeio itself allocates will be freed after the
246 finish callback has been called. If you want to manage all memory passed
247 to libeio yourself you can use the low-level API.
248
249 For example, to open a file, you could do this:
250
251 static int
252 file_open_done (eio_req *req)
253 {
254 if (req->result < 0)
255 {
256 /* open() returned -1 */
257 errno = req->errorno;
258 perror ("open");
259 }
260 else
261 {
262 int fd = req->result;
263 /* now we have the new fd in fd */
264 }
265
266 return 0;
267 }
268
269 /* the first three arguments are passed to open(2) */
270 /* the remaining are priority, callback and data */
271 if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0))
272 abort (); /* something went wrong, we will all die!!! */
273
274 Note that you additionally need to call C<eio_poll> when the C<want_cb>
275 indicates that requests are ready to be processed.
276
277 =head2 CANCELLING REQUESTS
278
279 Sometimes the need for a request goes away before the request is
280 finished. In that case, one can cancel the request by a call to
281 C<eio_cancel>:
282
283 =over 4
284
285 =item eio_cancel (eio_req *req)
286
287 Cancel the request (and all its subrequests). If the request is currently
288 executing it might still continue to execute, and in other cases it might
289 still take a while till the request is cancelled.
290
291 Even if cancelled, the finish callback will still be invoked - the
292 callbacks of all cancellable requests need to check whether the request
293 has been cancelled by calling C<EIO_CANCELLED (req)>:
294
295 static int
296 my_eio_cb (eio_req *req)
297 {
298 if (EIO_CANCELLED (req))
299 return 0;
300 }
301
302 In addition, cancelled requests will I<either> have C<< req->result >>
303 set to C<-1> and C<errno> to C<ECANCELED>, or I<otherwise> they were
304 successfully executed, despite being cancelled (e.g. when they have
305 already been executed at the time they were cancelled).
306
307 C<EIO_CANCELLED> is still true for requests that have successfully
308 executed, as long as C<eio_cancel> was called on them at some point.
309
310 =back
311
312 =head2 AVAILABLE REQUESTS
313
314 The following request functions are available. I<All> of them return the
315 C<eio_req *> on success and C<0> on failure, and I<all> of them have the
316 same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is
317 mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some
318 custom data value as C<data>.
319
320 =head3 POSIX API WRAPPERS
321
322 These requests simply wrap the POSIX call of the same name, with the same
323 arguments. If a function is not implemented by the OS and cannot be emulated
324 in some way, then all of these return C<-1> and set C<errorno> to C<ENOSYS>.
325
326 =over 4
327
328 =item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data)
329
330 =item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data)
331
332 =item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
333
334 =item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
335
336 =item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
337
338 =item eio_rmdir (const char *path, int pri, eio_cb cb, void *data)
339
340 =item eio_unlink (const char *path, int pri, eio_cb cb, void *data)
341
342 =item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
343
344 =item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data)
345
346 =item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
347
348 =item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
349
350 =item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
351
352 =item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data)
353
354 =item eio_close (int fd, int pri, eio_cb cb, void *data)
355
356 =item eio_sync (int pri, eio_cb cb, void *data)
357
358 =item eio_fsync (int fd, int pri, eio_cb cb, void *data)
359
360 =item eio_fdatasync (int fd, int pri, eio_cb cb, void *data)
361
362 =item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
363
364 =item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data)
365
366 =item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data)
367
368 =item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
369
370 =item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data)
371
372 These have the same semantics as the syscall of the same name, their
373 return value is available as C<< req->result >> later.
374
375 =item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
376
377 =item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
378
379 These two requests are called C<read> and C<write>, but actually wrap
380 C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin),
381 libeio uses lseek/read_or_write/lseek and a mutex to serialise the
382 requests, so all these requests run serially and do not disturb each
383 other. However, they still disturb the file offset while they run, so it's
384 not safe to call these functions concurrently with non-libeio functions on
385 the same fd on these systems.
386
387 Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X),
388 so it is advised not to submit multiple requests on the same fd on this
389 horrible pile of garbage.
390
391 =item eio_mlockall (int flags, int pri, eio_cb cb, void *data)
392
393 Like C<mlockall>, but the flag value constants are called
394 C<EIO_MCL_CURRENT> and C<EIO_MCL_FUTURE>.
395
396 =item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
397
398 Just like msync, except that the flag values are called C<EIO_MS_ASYNC>,
399 C<EIO_MS_INVALIDATE> and C<EIO_MS_SYNC>.
400
401 =item eio_readlink (const char *path, int pri, eio_cb cb, void *data)
402
403 If successful, the path read by C<readlink(2)> can be accessed via C<<
404 req->ptr2 >> and is I<NOT> null-terminated, with the length specified as
405 C<< req->result >>.
406
407 if (req->result >= 0)
408 {
409 char *target = strndup ((char *)req->ptr2, req->result);
410
411 free (target);
412 }
413
414 =item eio_realpath (const char *path, int pri, eio_cb cb, void *data)
415
416 Similar to the realpath libc function, but unlike that one, C<<
417 req->result >> is C<-1> on failure. On success, the result is the length
418 of the returned path in C<ptr2> (which is I<NOT> 0-terminated) - this is
419 similar to readlink.
420
421 =item eio_stat (const char *path, int pri, eio_cb cb, void *data)
422
423 =item eio_lstat (const char *path, int pri, eio_cb cb, void *data)
424
425 =item eio_fstat (int fd, int pri, eio_cb cb, void *data)
426
427 Stats a file - if C<< req->result >> indicates success, then you can
428 access the C<struct stat>-like structure via C<< req->ptr2 >>:
429
430 EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2;
431
432 =item eio_statvfs (const char *path, int pri, eio_cb cb, void *data)
433
434 =item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data)
435
436 Stats a filesystem - if C<< req->result >> indicates success, then you can
437 access the C<struct statvfs>-like structure via C<< req->ptr2 >>:
438
439 EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2;
440
441 =back
442
443 =head3 READING DIRECTORIES
444
445 Reading directories sounds simple, but can be rather demanding, especially
446 if you want to do stuff such as traversing a directory hierarchy or
447 processing all files in a directory. Libeio can assist these complex tasks
448 with it's C<eio_readdir> call.
449
450 =over 4
451
452 =item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data)
453
454 This is a very complex call. It basically reads through a whole directory
455 (via the C<opendir>, C<readdir> and C<closedir> calls) and returns either
456 the names or an array of C<struct eio_dirent>, depending on the C<flags>
457 argument.
458
459 The C<< req->result >> indicates either the number of files found, or
460 C<-1> on error. On success, null-terminated names can be found as C<< req->ptr2 >>,
461 and C<struct eio_dirents>, if requested by C<flags>, can be found via C<<
462 req->ptr1 >>.
463
464 Here is an example that prints all the names:
465
466 int i;
467 char *names = (char *)req->ptr2;
468
469 for (i = 0; i < req->result; ++i)
470 {
471 printf ("name #%d: %s\n", i, names);
472
473 /* move to next name */
474 names += strlen (names) + 1;
475 }
476
477 Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>.
478
479 C<flags> can be any combination of:
480
481 =over 4
482
483 =item EIO_READDIR_DENTS
484
485 If this flag is specified, then, in addition to the names in C<ptr2>,
486 also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct
487 eio_dirent> looks like this:
488
489 struct eio_dirent
490 {
491 int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */
492 unsigned short namelen; /* size of filename without trailing 0 */
493 unsigned char type; /* one of EIO_DT_* */
494 signed char score; /* internal use */
495 ino_t inode; /* the inode number, if available, otherwise unspecified */
496 };
497
498 The only members you normally would access are C<nameofs>, which is the
499 byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>.
500
501 C<type> can be one of:
502
503 C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat>
504 the name yourself if you need to know,
505 one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>,
506 C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>)
507 or some OS-specific type (currently
508 C<EIO_DT_MPC> - multiplexed char device (v7+coherent),
509 C<EIO_DT_NAM> - xenix special named file,
510 C<EIO_DT_MPB> - multiplexed block device (v7+coherent),
511 C<EIO_DT_NWK> - HP-UX network special,
512 C<EIO_DT_CMP> - VxFS compressed,
513 C<EIO_DT_DOOR> - solaris door, or
514 C<EIO_DT_WHT>).
515
516 This example prints all names and their type:
517
518 int i;
519 struct eio_dirent *ents = (struct eio_dirent *)req->ptr1;
520 char *names = (char *)req->ptr2;
521
522 for (i = 0; i < req->result; ++i)
523 {
524 struct eio_dirent *ent = ents + i;
525 char *name = names + ent->nameofs;
526
527 printf ("name #%d: %s (type %d)\n", i, name, ent->type);
528 }
529
530 =item EIO_READDIR_DIRS_FIRST
531
532 When this flag is specified, then the names will be returned in an order
533 where likely directories come first, in optimal C<stat> order. This is
534 useful when you need to quickly find directories, or you want to find all
535 directories while avoiding to stat() each entry.
536
537 If the system returns type information in readdir, then this is used
538 to find directories directly. Otherwise, likely directories are names
539 beginning with ".", or otherwise names with no dots, of which names with
540 short names are tried first.
541
542 =item EIO_READDIR_STAT_ORDER
543
544 When this flag is specified, then the names will be returned in an order
545 suitable for stat()'ing each one. That is, when you plan to stat()
546 all files in the given directory, then the returned order will likely
547 be fastest.
548
549 If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then the
550 likely directories come first, resulting in a less optimal stat order.
551
552 =item EIO_READDIR_FOUND_UNKNOWN
553
554 This flag should not be specified when calling C<eio_readdir>. Instead,
555 it is being set by C<eio_readdir> (you can access the C<flags> via C<<
556 req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The
557 absence of this flag therefore indicates that all C<type>'s are known,
558 which can be used to speed up some algorithms.
559
560 A typical use case would be to identify all subdirectories within a
561 directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If
562 then this flag is I<NOT> set, then all the entries at the beginning of the
563 returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you
564 should start C<stat()>'ing the entries starting at the beginning of the
565 array, stopping as soon as you found all directories (the count can be
566 deduced by the link count of the directory).
567
568 =back
569
570 =back
571
572 =head3 OS-SPECIFIC CALL WRAPPERS
573
574 These wrap OS-specific calls (usually Linux ones), and might or might not
575 be emulated on other operating systems. Calls that are not emulated will
576 return C<-1> and set C<errno> to C<ENOSYS>.
577
578 =over 4
579
580 =item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data)
581
582 Wraps the C<sendfile> syscall. The arguments follow the Linux version, but
583 libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and
584 Darwin.
585
586 If the OS doesn't support some sendfile-like call, or the call fails,
587 indicating support for the given file descriptor type (for example,
588 Linux's sendfile might not support file to file copies), then libeio will
589 emulate the call in userspace, so there are almost no limitations on its
590 use.
591
592 =item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data)
593
594 Calls C<readahead(2)>. If the syscall is missing, then the call is
595 emulated by simply reading the data (currently in 64kiB chunks).
596
597 =item eio_syncfs (int fd, int pri, eio_cb cb, void *data)
598
599 Calls Linux' C<syncfs> syscall, if available. Returns C<-1> and sets
600 C<errno> to C<ENOSYS> if the call is missing I<but still calls sync()>,
601 if the C<fd> is C<< >= 0 >>, so you can probe for the availability of the
602 syscall with a negative C<fd> argument and checking for C<-1/ENOSYS>.
603
604 =item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data)
605
606 Calls C<sync_file_range>. If the syscall is missing, then this is the same
607 as calling C<fdatasync>.
608
609 Flags can be any combination of C<EIO_SYNC_FILE_RANGE_WAIT_BEFORE>,
610 C<EIO_SYNC_FILE_RANGE_WRITE> and C<EIO_SYNC_FILE_RANGE_WAIT_AFTER>.
611
612 =item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data)
613
614 Calls C<fallocate> (note: I<NOT> C<posix_fallocate>!). If the syscall is
615 missing, then it returns failure and sets C<errno> to C<ENOSYS>.
616
617 The C<mode> argument can be C<0> (for behaviour similar to
618 C<posix_fallocate>), or C<EIO_FALLOC_FL_KEEP_SIZE>, which keeps the size
619 of the file unchanged (but still preallocates space beyond end of file).
620
621 =back
622
623 =head3 LIBEIO-SPECIFIC REQUESTS
624
625 These requests are specific to libeio and do not correspond to any OS call.
626
627 =over 4
628
629 =item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
630
631 Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY) the given
632 memory area, page-wise, that is, it reads (or reads and writes back) the
633 first octet of every page that spans the memory area.
634
635 This can be used to page in some mmapped file, or dirty some pages. Note
636 that dirtying is an unlocked read-write access, so races can ensue when
637 the some other thread modifies the data stored in that memory area.
638
639 =item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data)
640
641 Executes a custom request, i.e., a user-specified callback.
642
643 The callback gets the C<eio_req *> as parameter and is expected to read
644 and modify any request-specific members. Specifically, it should set C<<
645 req->result >> to the result value, just like other requests.
646
647 Here is an example that simply calls C<open>, like C<eio_open>, but it
648 uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If
649 you want to pass more/other parameters, you either need to pass some
650 struct or so via C<data> or provide your own wrapper using the low-level
651 API.
652
653 static int
654 my_open_done (eio_req *req)
655 {
656 int fd = req->result;
657
658 return 0;
659 }
660
661 static void
662 my_open (eio_req *req)
663 {
664 req->result = open (req->data, O_RDONLY);
665 }
666
667 eio_custom (my_open, 0, my_open_done, "/etc/passwd");
668
669 =item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data)
670
671 This is a request that takes C<delay> seconds to execute, but otherwise
672 does nothing - it simply puts one of the worker threads to sleep for this
673 long.
674
675 This request can be used to artificially increase load, e.g. for debugging
676 or benchmarking reasons.
677
678 =item eio_nop (int pri, eio_cb cb, void *data)
679
680 This request does nothing, except go through the whole request cycle. This
681 can be used to measure latency or in some cases to simplify code, but is
682 not really of much use.
683
684 =back
685
686 =head3 GROUPING AND LIMITING REQUESTS
687
688 There is one more rather special request, C<eio_grp>. It is a very special
689 aio request: Instead of doing something, it is a container for other eio
690 requests.
691
692 There are two primary use cases for this: a) bundle many requests into a
693 single, composite, request with a definite callback and the ability to
694 cancel the whole request with its subrequests and b) limiting the number
695 of "active" requests.
696
697 Further below you will find more discussion of these topics - first
698 follows the reference section detailing the request generator and other
699 methods.
700
701 =over 4
702
703 =item eio_req *grp = eio_grp (eio_cb cb, void *data)
704
705 Creates, submits and returns a group request. Note that it doesn't have a
706 priority, unlike all other requests.
707
708 =item eio_grp_add (eio_req *grp, eio_req *req)
709
710 Adds a request to the request group.
711
712 =item eio_grp_cancel (eio_req *grp)
713
714 Cancels all requests I<in> the group, but I<not> the group request
715 itself. You can cancel the group request I<and> all subrequests via a
716 normal C<eio_cancel> call.
717
718 =back
719
720 =head4 GROUP REQUEST LIFETIME
721
722 Left alone, a group request will instantly move to the pending state and
723 will be finished at the next call of C<eio_poll>.
724
725 The usefulness stems from the fact that, if a subrequest is added to a
726 group I<before> a call to C<eio_poll>, via C<eio_grp_add>, then the group
727 will not finish until all the subrequests have finished.
728
729 So the usage cycle of a group request is like this: after it is created,
730 you normally instantly add a subrequest. If none is added, the group
731 request will finish on it's own. As long as subrequests are added before
732 the group request is finished it will be kept from finishing, that is the
733 callbacks of any subrequests can, in turn, add more requests to the group,
734 and as long as any requests are active, the group request itself will not
735 finish.
736
737 =head4 CREATING COMPOSITE REQUESTS
738
739 Imagine you wanted to create an C<eio_load> request that opens a file,
740 reads it and closes it. This means it has to execute at least three eio
741 requests, but for various reasons it might be nice if that request looked
742 like any other eio request.
743
744 This can be done with groups:
745
746 =over 4
747
748 =item 1) create the request object
749
750 Create a group that contains all further requests. This is the request you
751 can return as "the load request".
752
753 =item 2) open the file, maybe
754
755 Next, open the file with C<eio_open> and add the request to the group
756 request and you are finished setting up the request.
757
758 If, for some reason, you cannot C<eio_open> (path is a null ptr?) you
759 can set C<< grp->result >> to C<-1> to signal an error and let the group
760 request finish on its own.
761
762 =item 3) open callback adds more requests
763
764 In the open callback, if the open was not successful, copy C<<
765 req->errorno >> to C<< grp->errorno >> and set C<< grp->errorno >> to
766 C<-1> to signal an error.
767
768 Otherwise, malloc some memory or so and issue a read request, adding the
769 read request to the group.
770
771 =item 4) continue issuing requests till finished
772
773 In the real callback, check for errors and possibly continue with
774 C<eio_close> or any other eio request in the same way.
775
776 As soon as no new requests are added the group request will finish. Make
777 sure you I<always> set C<< grp->result >> to some sensible value.
778
779 =back
780
781 =head4 REQUEST LIMITING
782
783
784 #TODO
785
786 void eio_grp_limit (eio_req *grp, int limit);
787
788
789 =back
790
791
792 =head1 LOW LEVEL REQUEST API
793
794 #TODO
795
796
797 =head1 ANATOMY AND LIFETIME OF AN EIO REQUEST
798
799 A request is represented by a structure of type C<eio_req>. To initialise
800 it, clear it to all zero bytes:
801
802 eio_req req;
803
804 memset (&req, 0, sizeof (req));
805
806 A more common way to initialise a new C<eio_req> is to use C<calloc>:
807
808 eio_req *req = calloc (1, sizeof (*req));
809
810 In either case, libeio neither allocates, initialises or frees the
811 C<eio_req> structure for you - it merely uses it.
812
813 zero
814
815 #TODO
816
817 =head2 CONFIGURATION
818
819 The functions in this section can sometimes be useful, but the default
820 configuration will do in most case, so you should skip this section on
821 first reading.
822
823 =over 4
824
825 =item eio_set_max_poll_time (eio_tstamp nseconds)
826
827 This causes C<eio_poll ()> to return after it has detected that it was
828 running for C<nsecond> seconds or longer (this number can be fractional).
829
830 This can be used to limit the amount of time spent handling eio requests,
831 for example, in interactive programs, you might want to limit this time to
832 C<0.01> seconds or so.
833
834 Note that:
835
836 =over 4
837
838 =item a) libeio doesn't know how long your request callbacks take, so the
839 time spent in C<eio_poll> is up to one callback invocation longer then
840 this interval.
841
842 =item b) this is implemented by calling C<gettimeofday> after each
843 request, which can be costly.
844
845 =item c) at least one request will be handled.
846
847 =back
848
849 =item eio_set_max_poll_reqs (unsigned int nreqs)
850
851 When C<nreqs> is non-zero, then C<eio_poll> will not handle more than
852 C<nreqs> requests per invocation. This is a less costly way to limit the
853 amount of work done by C<eio_poll> then setting a time limit.
854
855 If you know your callbacks are generally fast, you could use this to
856 encourage interactiveness in your programs by setting it to C<10>, C<100>
857 or even C<1000>.
858
859 =item eio_set_min_parallel (unsigned int nthreads)
860
861 Make sure libeio can handle at least this many requests in parallel. It
862 might be able handle more.
863
864 =item eio_set_max_parallel (unsigned int nthreads)
865
866 Set the maximum number of threads that libeio will spawn.
867
868 =item eio_set_max_idle (unsigned int nthreads)
869
870 Libeio uses threads internally to handle most requests, and will start and stop threads on demand.
871
872 This call can be used to limit the number of idle threads (threads without
873 work to do): libeio will keep some threads idle in preparation for more
874 requests, but never longer than C<nthreads> threads.
875
876 In addition to this, libeio will also stop threads when they are idle for
877 a few seconds, regardless of this setting.
878
879 =item unsigned int eio_nthreads ()
880
881 Return the number of worker threads currently running.
882
883 =item unsigned int eio_nreqs ()
884
885 Return the number of requests currently handled by libeio. This is the
886 total number of requests that have been submitted to libeio, but not yet
887 destroyed.
888
889 =item unsigned int eio_nready ()
890
891 Returns the number of ready requests, i.e. requests that have been
892 submitted but have not yet entered the execution phase.
893
894 =item unsigned int eio_npending ()
895
896 Returns the number of pending requests, i.e. requests that have been
897 executed and have results, but have not been finished yet by a call to
898 C<eio_poll>).
899
900 =back
901
902 =head1 EMBEDDING
903
904 Libeio can be embedded directly into programs. This functionality is not
905 documented and not (yet) officially supported.
906
907 Note that, when including C<libeio.m4>, you are responsible for defining
908 the compilation environment (C<_LARGEFILE_SOURCE>, C<_GNU_SOURCE> etc.).
909
910 If you need to know how, check the C<IO::AIO> perl module, which does
911 exactly that.
912
913
914 =head1 COMPILETIME CONFIGURATION
915
916 These symbols, if used, must be defined when compiling F<eio.c>.
917
918 =over 4
919
920 =item EIO_STACKSIZE
921
922 This symbol governs the stack size for each eio thread. Libeio itself
923 was written to use very little stackspace, but when using C<EIO_CUSTOM>
924 requests, you might want to increase this.
925
926 If this symbol is undefined (the default) then libeio will use its default
927 stack size (C<sizeof (void *) * 4096> currently). If it is defined, but
928 C<0>, then the default operating system stack size will be used. In all
929 other cases, the value must be an expression that evaluates to the desired
930 stack size.
931
932 =back
933
934
935 =head1 PORTABILITY REQUIREMENTS
936
937 In addition to a working ISO-C implementation, libeio relies on a few
938 additional extensions:
939
940 =over 4
941
942 =item POSIX threads
943
944 To be portable, this module uses threads, specifically, the POSIX threads
945 library must be available (and working, which partially excludes many xBSD
946 systems, where C<fork ()> is buggy).
947
948 =item POSIX-compatible filesystem API
949
950 This is actually a harder portability requirement: The libeio API is quite
951 demanding regarding POSIX API calls (symlinks, user/group management
952 etc.).
953
954 =item C<double> must hold a time value in seconds with enough accuracy
955
956 The type C<double> is used to represent timestamps. It is required to
957 have at least 51 bits of mantissa (and 9 bits of exponent), which is good
958 enough for at least into the year 4000. This requirement is fulfilled by
959 implementations implementing IEEE 754 (basically all existing ones).
960
961 =back
962
963 If you know of other additional requirements drop me a note.
964
965
966 =head1 AUTHOR
967
968 Marc Lehmann <libeio@schmorp.de>.
969