ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/libeio/eio.pod
Revision: 1.36
Committed: Sun Jan 24 16:36:20 2016 UTC (8 years, 7 months ago) by root
Branch: MAIN
CVS Tags: rel-4_4, rel-4_5, rel-4_6, rel-4_7, rel-4_81, rel-4_80, rel-4_52, rel-4_53, rel-4_51, rel-4_78, rel-4_79, rel-4_54, rel-4_74, rel-4_75, rel-4_76, rel-4_77, rel-4_71, rel-4_72, rel-4_73, rel-4_34, HEAD
Changes since 1.35: +7 -2 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 libeio - truly asynchronous POSIX I/O
4
5 =head1 SYNOPSIS
6
7 #include <eio.h>
8
9 =head1 DESCRIPTION
10
11 The newest version of this document is also available as an html-formatted
12 web page you might find easier to navigate when reading it for the first
13 time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>.
14
15 Note that this library is a by-product of the C<IO::AIO> perl
16 module, and many of the subtler points regarding requests lifetime
17 and so on are only documented in its documentation at the
18 moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>.
19
20 =head2 FEATURES
21
22 This library provides fully asynchronous versions of most POSIX functions
23 dealing with I/O. Unlike most asynchronous libraries, this not only
24 includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and
25 similar functions, as well as less rarely ones such as C<mknod>, C<futime>
26 or C<readlink>.
27
28 It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and
29 FreeBSD, with emulation on other platforms) and C<readahead> (Linux, with
30 emulation elsewhere).
31
32 The goal is to enable you to write fully non-blocking programs. For
33 example, in a game server, you would not want to freeze for a few seconds
34 just because the server is running a backup and you happen to call
35 C<readdir>.
36
37 =head2 TIME REPRESENTATION
38
39 Libeio represents time as a single floating point number, representing the
40 (fractional) number of seconds since the (POSIX) epoch (somewhere near
41 the beginning of 1970, details are complicated, don't ask). This type is
42 called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or
43 better), so you can freely use C<double> yourself.
44
45 Unlike the name component C<stamp> might indicate, it is also used for
46 time differences throughout libeio.
47
48 =head2 FORK SUPPORT
49
50 Usage of pthreads in a program changes the semantics of fork
51 considerably. Specifically, only async-safe functions can be called after
52 fork. Libeio uses pthreads, so this applies, and makes using fork hard for
53 anything but relatively fork + exec uses.
54
55 This library only works in the process that initialised it: Forking is
56 fully supported, but using libeio in any other process than the one that
57 called C<eio_init> is not.
58
59 You might get around by not I<using> libeio before (or after) forking in
60 the parent, and using it in the child afterwards. You could also try to
61 call the L<eio_init> function again in the child, which will brutally
62 reinitialise all data structures, which isn't POSIX conformant, but
63 typically works.
64
65 Otherwise, the only recommendation you should follow is: treat fork code
66 the same way you treat signal handlers, and only ever call C<eio_init> in
67 the process that uses it, and only once ever.
68
69 =head1 INITIALISATION/INTEGRATION
70
71 Before you can call any eio functions you first have to initialise the
72 library. The library integrates into any event loop, but can also be used
73 without one, including in polling mode.
74
75 You have to provide the necessary glue yourself, however.
76
77 =over 4
78
79 =item int eio_init (void (*want_poll)(void), void (*done_poll)(void))
80
81 This function initialises the library. On success it returns C<0>, on
82 failure it returns C<-1> and sets C<errno> appropriately.
83
84 It accepts two function pointers specifying callbacks as argument, both of
85 which can be C<0>, in which case the callback isn't called.
86
87 There is currently no way to change these callbacks later, or to
88 "uninitialise" the library again.
89
90 =item want_poll callback
91
92 The C<want_poll> callback is invoked whenever libeio wants attention (i.e.
93 it wants to be polled by calling C<eio_poll>). It is "edge-triggered",
94 that is, it will only be called once when eio wants attention, until all
95 pending requests have been handled.
96
97 This callback is called while locks are being held, so I<you must
98 not call any libeio functions inside this callback>. That includes
99 C<eio_poll>. What you should do is notify some other thread, or wake up
100 your event loop, and then call C<eio_poll>.
101
102 =item done_poll callback
103
104 This callback is invoked when libeio detects that all pending requests
105 have been handled. It is "edge-triggered", that is, it will only be
106 called once after C<want_poll>. To put it differently, C<want_poll> and
107 C<done_poll> are invoked in pairs: after C<want_poll> you have to call
108 C<eio_poll ()> until either C<eio_poll> indicates that everything has been
109 handled or C<done_poll> has been called, which signals the same - only one
110 method is needed.
111
112 Note that C<eio_poll> might return after C<done_poll> and C<want_poll>
113 have been called again, so watch out for races in your code.
114
115 It is quite common to have an empty C<done_call> callback and only use
116 the return value from C<eio_poll>, or, when C<eio_poll> is configured to
117 handle all outstanding replies, it's enough to call C<eio_poll> once.
118
119 As with C<want_poll>, this callback is called while locks are being held,
120 so you I<must not call any libeio functions from within this callback>.
121
122 =item int eio_poll ()
123
124 This function has to be called whenever there are pending requests that
125 need finishing. You usually call this after C<want_poll> has indicated
126 that you should do so, but you can also call this function regularly to
127 poll for new results.
128
129 If any request invocation returns a non-zero value, then C<eio_poll ()>
130 immediately returns with that value as return value.
131
132 Otherwise, if all requests could be handled, it returns C<0>. If for some
133 reason not all requests have been handled, i.e. some are still pending, it
134 returns C<-1>.
135
136 =back
137
138 For libev, you would typically use an C<ev_async> watcher: the
139 C<want_poll> callback would invoke C<ev_async_send> to wake up the event
140 loop. Inside the callback set for the watcher, one would call C<eio_poll
141 ()>.
142
143 If C<eio_poll ()> is configured to not handle all results in one go
144 (i.e. it returns C<-1>) then you should start an idle watcher that calls
145 C<eio_poll> until it returns something C<!= -1>.
146
147 A full-featured connector between libeio and libev would look as follows
148 (if C<eio_poll> is handling all requests, it can of course be simplified a
149 lot by removing the idle watcher logic):
150
151 static struct ev_loop *loop;
152 static ev_idle repeat_watcher;
153 static ev_async ready_watcher;
154
155 /* idle watcher callback, only used when eio_poll */
156 /* didn't handle all results in one call */
157 static void
158 repeat (EV_P_ ev_idle *w, int revents)
159 {
160 if (eio_poll () != -1)
161 ev_idle_stop (EV_A_ w);
162 }
163
164 /* eio has some results, process them */
165 static void
166 ready (EV_P_ ev_async *w, int revents)
167 {
168 if (eio_poll () == -1)
169 ev_idle_start (EV_A_ &repeat_watcher);
170 }
171
172 /* wake up the event loop */
173 static void
174 want_poll (void)
175 {
176 ev_async_send (loop, &ready_watcher)
177 }
178
179 void
180 my_init_eio ()
181 {
182 loop = EV_DEFAULT;
183
184 ev_idle_init (&repeat_watcher, repeat);
185 ev_async_init (&ready_watcher, ready);
186 ev_async_start (loop, &watcher);
187
188 eio_init (want_poll, 0);
189 }
190
191 For most other event loops, you would typically use a pipe - the event
192 loop should be told to wait for read readiness on the read end. In
193 C<want_poll> you would write a single byte, in C<done_poll> you would try
194 to read that byte, and in the callback for the read end, you would call
195 C<eio_poll>.
196
197 You don't have to take special care in the case C<eio_poll> doesn't handle
198 all requests, as the done callback will not be invoked, so the event loop
199 will still signal readiness for the pipe until I<all> results have been
200 processed.
201
202
203 =head1 HIGH LEVEL REQUEST API
204
205 Libeio has both a high-level API, which consists of calling a request
206 function with a callback to be called on completion, and a low-level API
207 where you fill out request structures and submit them.
208
209 This section describes the high-level API.
210
211 =head2 REQUEST SUBMISSION AND RESULT PROCESSING
212
213 You submit a request by calling the relevant C<eio_TYPE> function with the
214 required parameters, a callback of type C<int (*eio_cb)(eio_req *req)>
215 (called C<eio_cb> below) and a freely usable C<void *data> argument.
216
217 The return value will either be 0, in case something went really wrong
218 (which can basically only happen on very fatal errors, such as C<malloc>
219 returning 0, which is rather unlikely), or a pointer to the newly-created
220 and submitted C<eio_req *>.
221
222 The callback will be called with an C<eio_req *> which contains the
223 results of the request. The members you can access inside that structure
224 vary from request to request, except for:
225
226 =over 4
227
228 =item C<ssize_t result>
229
230 This contains the result value from the call (usually the same as the
231 syscall of the same name).
232
233 =item C<int errorno>
234
235 This contains the value of C<errno> after the call.
236
237 =item C<void *data>
238
239 The C<void *data> member simply stores the value of the C<data> argument.
240
241 =back
242
243 Members not explicitly described as accessible must not be
244 accessed. Specifically, there is no guarantee that any members will still
245 have the value they had when the request was submitted.
246
247 The return value of the callback is normally C<0>, which tells libeio to
248 continue normally. If a callback returns a nonzero value, libeio will
249 stop processing results (in C<eio_poll>) and will return the value to its
250 caller.
251
252 Memory areas passed to libeio wrappers must stay valid as long as a
253 request executes, with the exception of paths, which are being copied
254 internally. Any memory libeio itself allocates will be freed after the
255 finish callback has been called. If you want to manage all memory passed
256 to libeio yourself you can use the low-level API.
257
258 For example, to open a file, you could do this:
259
260 static int
261 file_open_done (eio_req *req)
262 {
263 if (req->result < 0)
264 {
265 /* open() returned -1 */
266 errno = req->errorno;
267 perror ("open");
268 }
269 else
270 {
271 int fd = req->result;
272 /* now we have the new fd in fd */
273 }
274
275 return 0;
276 }
277
278 /* the first three arguments are passed to open(2) */
279 /* the remaining are priority, callback and data */
280 if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0))
281 abort (); /* something went wrong, we will all die!!! */
282
283 Note that you additionally need to call C<eio_poll> when the C<want_cb>
284 indicates that requests are ready to be processed.
285
286 =head2 CANCELLING REQUESTS
287
288 Sometimes the need for a request goes away before the request is
289 finished. In that case, one can cancel the request by a call to
290 C<eio_cancel>:
291
292 =over 4
293
294 =item eio_cancel (eio_req *req)
295
296 Cancel the request (and all its subrequests). If the request is currently
297 executing it might still continue to execute, and in other cases it might
298 still take a while till the request is cancelled.
299
300 When cancelled, the finish callback will not be invoked.
301
302 C<EIO_CANCELLED> is still true for requests that have successfully
303 executed, as long as C<eio_cancel> was called on them at some point.
304
305 =back
306
307 =head2 AVAILABLE REQUESTS
308
309 The following request functions are available. I<All> of them return the
310 C<eio_req *> on success and C<0> on failure, and I<all> of them have the
311 same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is
312 mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some
313 custom data value as C<data>.
314
315 =head3 POSIX API WRAPPERS
316
317 These requests simply wrap the POSIX call of the same name, with the same
318 arguments. If a function is not implemented by the OS and cannot be emulated
319 in some way, then all of these return C<-1> and set C<errorno> to C<ENOSYS>.
320
321 =over 4
322
323 =item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data)
324
325 =item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data)
326
327 =item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
328
329 =item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
330
331 =item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
332
333 =item eio_rmdir (const char *path, int pri, eio_cb cb, void *data)
334
335 =item eio_unlink (const char *path, int pri, eio_cb cb, void *data)
336
337 =item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
338
339 =item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data)
340
341 =item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
342
343 =item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
344
345 =item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
346
347 =item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data)
348
349 =item eio_close (int fd, int pri, eio_cb cb, void *data)
350
351 =item eio_sync (int pri, eio_cb cb, void *data)
352
353 =item eio_fsync (int fd, int pri, eio_cb cb, void *data)
354
355 =item eio_fdatasync (int fd, int pri, eio_cb cb, void *data)
356
357 =item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
358
359 =item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data)
360
361 =item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data)
362
363 =item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
364
365 =item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data)
366
367 These have the same semantics as the syscall of the same name, their
368 return value is available as C<< req->result >> later.
369
370 =item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
371
372 =item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
373
374 These two requests are called C<read> and C<write>, but actually wrap
375 C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin),
376 libeio uses lseek/read_or_write/lseek and a mutex to serialise the
377 requests, so all these requests run serially and do not disturb each
378 other. However, they still disturb the file offset while they run, so it's
379 not safe to call these functions concurrently with non-libeio functions on
380 the same fd on these systems.
381
382 Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X),
383 so it is advised not to submit multiple requests on the same fd on this
384 horrible pile of garbage.
385
386 =item eio_mlockall (int flags, int pri, eio_cb cb, void *data)
387
388 Like C<mlockall>, but the flag value constants are called
389 C<EIO_MCL_CURRENT> and C<EIO_MCL_FUTURE>.
390
391 =item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
392
393 Just like msync, except that the flag values are called C<EIO_MS_ASYNC>,
394 C<EIO_MS_INVALIDATE> and C<EIO_MS_SYNC>.
395
396 =item eio_readlink (const char *path, int pri, eio_cb cb, void *data)
397
398 If successful, the path read by C<readlink(2)> can be accessed via C<<
399 req->ptr2 >> and is I<NOT> null-terminated, with the length specified as
400 C<< req->result >>.
401
402 if (req->result >= 0)
403 {
404 char *target = strndup ((char *)req->ptr2, req->result);
405
406 free (target);
407 }
408
409 =item eio_realpath (const char *path, int pri, eio_cb cb, void *data)
410
411 Similar to the realpath libc function, but unlike that one, C<<
412 req->result >> is C<-1> on failure. On success, the result is the length
413 of the returned path in C<ptr2> (which is I<NOT> 0-terminated) - this is
414 similar to readlink.
415
416 =item eio_stat (const char *path, int pri, eio_cb cb, void *data)
417
418 =item eio_lstat (const char *path, int pri, eio_cb cb, void *data)
419
420 =item eio_fstat (int fd, int pri, eio_cb cb, void *data)
421
422 Stats a file - if C<< req->result >> indicates success, then you can
423 access the C<struct stat>-like structure via C<< req->ptr2 >>:
424
425 EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2;
426
427 =item eio_statvfs (const char *path, int pri, eio_cb cb, void *data)
428
429 =item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data)
430
431 Stats a filesystem - if C<< req->result >> indicates success, then you can
432 access the C<struct statvfs>-like structure via C<< req->ptr2 >>:
433
434 EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2;
435
436 =back
437
438 =head3 READING DIRECTORIES
439
440 Reading directories sounds simple, but can be rather demanding, especially
441 if you want to do stuff such as traversing a directory hierarchy or
442 processing all files in a directory. Libeio can assist these complex tasks
443 with it's C<eio_readdir> call.
444
445 =over 4
446
447 =item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data)
448
449 This is a very complex call. It basically reads through a whole directory
450 (via the C<opendir>, C<readdir> and C<closedir> calls) and returns either
451 the names or an array of C<struct eio_dirent>, depending on the C<flags>
452 argument.
453
454 The C<< req->result >> indicates either the number of files found, or
455 C<-1> on error. On success, null-terminated names can be found as C<< req->ptr2 >>,
456 and C<struct eio_dirents>, if requested by C<flags>, can be found via C<<
457 req->ptr1 >>.
458
459 Here is an example that prints all the names:
460
461 int i;
462 char *names = (char *)req->ptr2;
463
464 for (i = 0; i < req->result; ++i)
465 {
466 printf ("name #%d: %s\n", i, names);
467
468 /* move to next name */
469 names += strlen (names) + 1;
470 }
471
472 Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>.
473
474 C<flags> can be any combination of:
475
476 =over 4
477
478 =item EIO_READDIR_DENTS
479
480 If this flag is specified, then, in addition to the names in C<ptr2>,
481 also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct
482 eio_dirent> looks like this:
483
484 struct eio_dirent
485 {
486 int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */
487 unsigned short namelen; /* size of filename without trailing 0 */
488 unsigned char type; /* one of EIO_DT_* */
489 signed char score; /* internal use */
490 ino_t inode; /* the inode number, if available, otherwise unspecified */
491 };
492
493 The only members you normally would access are C<nameofs>, which is the
494 byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>.
495
496 C<type> can be one of:
497
498 C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat>
499 the name yourself if you need to know,
500 one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>,
501 C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>)
502 or some OS-specific type (currently
503 C<EIO_DT_MPC> - multiplexed char device (v7+coherent),
504 C<EIO_DT_NAM> - xenix special named file,
505 C<EIO_DT_MPB> - multiplexed block device (v7+coherent),
506 C<EIO_DT_NWK> - HP-UX network special,
507 C<EIO_DT_CMP> - VxFS compressed,
508 C<EIO_DT_DOOR> - solaris door, or
509 C<EIO_DT_WHT>).
510
511 This example prints all names and their type:
512
513 int i;
514 struct eio_dirent *ents = (struct eio_dirent *)req->ptr1;
515 char *names = (char *)req->ptr2;
516
517 for (i = 0; i < req->result; ++i)
518 {
519 struct eio_dirent *ent = ents + i;
520 char *name = names + ent->nameofs;
521
522 printf ("name #%d: %s (type %d)\n", i, name, ent->type);
523 }
524
525 =item EIO_READDIR_DIRS_FIRST
526
527 When this flag is specified, then the names will be returned in an order
528 where likely directories come first, in optimal C<stat> order. This is
529 useful when you need to quickly find directories, or you want to find all
530 directories while avoiding to stat() each entry.
531
532 If the system returns type information in readdir, then this is used
533 to find directories directly. Otherwise, likely directories are names
534 beginning with ".", or otherwise names with no dots, of which names with
535 short names are tried first.
536
537 =item EIO_READDIR_STAT_ORDER
538
539 When this flag is specified, then the names will be returned in an order
540 suitable for stat()'ing each one. That is, when you plan to stat()
541 all files in the given directory, then the returned order will likely
542 be fastest.
543
544 If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then the
545 likely directories come first, resulting in a less optimal stat order.
546
547 =item EIO_READDIR_FOUND_UNKNOWN
548
549 This flag should not be specified when calling C<eio_readdir>. Instead,
550 it is being set by C<eio_readdir> (you can access the C<flags> via C<<
551 req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The
552 absence of this flag therefore indicates that all C<type>'s are known,
553 which can be used to speed up some algorithms.
554
555 A typical use case would be to identify all subdirectories within a
556 directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If
557 then this flag is I<NOT> set, then all the entries at the beginning of the
558 returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you
559 should start C<stat()>'ing the entries starting at the beginning of the
560 array, stopping as soon as you found all directories (the count can be
561 deduced by the link count of the directory).
562
563 =back
564
565 =back
566
567 =head3 OS-SPECIFIC CALL WRAPPERS
568
569 These wrap OS-specific calls (usually Linux ones), and might or might not
570 be emulated on other operating systems. Calls that are not emulated will
571 return C<-1> and set C<errno> to C<ENOSYS>.
572
573 =over 4
574
575 =item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data)
576
577 Wraps the C<sendfile> syscall. The arguments follow the Linux version, but
578 libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and
579 Darwin.
580
581 If the OS doesn't support some sendfile-like call, or the call fails,
582 indicating support for the given file descriptor type (for example,
583 Linux's sendfile might not support file to file copies), then libeio will
584 emulate the call in userspace, so there are almost no limitations on its
585 use.
586
587 =item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data)
588
589 Calls C<readahead(2)>. If the syscall is missing, then the call is
590 emulated by simply reading the data (currently in 64kiB chunks).
591
592 =item eio_syncfs (int fd, int pri, eio_cb cb, void *data)
593
594 Calls Linux' C<syncfs> syscall, if available. Returns C<-1> and sets
595 C<errno> to C<ENOSYS> if the call is missing I<but still calls sync()>,
596 if the C<fd> is C<< >= 0 >>, so you can probe for the availability of the
597 syscall with a negative C<fd> argument and checking for C<-1/ENOSYS>.
598
599 =item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data)
600
601 Calls C<sync_file_range>. If the syscall is missing, then this is the same
602 as calling C<fdatasync>.
603
604 Flags can be any combination of C<EIO_SYNC_FILE_RANGE_WAIT_BEFORE>,
605 C<EIO_SYNC_FILE_RANGE_WRITE> and C<EIO_SYNC_FILE_RANGE_WAIT_AFTER>.
606
607 =item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data)
608
609 Calls C<fallocate> (note: I<NOT> C<posix_fallocate>!). If the syscall is
610 missing, then it returns failure and sets C<errno> to C<ENOSYS>.
611
612 The C<mode> argument can be C<0> (for behaviour similar to
613 C<posix_fallocate>), or C<EIO_FALLOC_FL_KEEP_SIZE>, which keeps the size
614 of the file unchanged (but still preallocates space beyond end of file).
615
616 =back
617
618 =head3 LIBEIO-SPECIFIC REQUESTS
619
620 These requests are specific to libeio and do not correspond to any OS call.
621
622 =over 4
623
624 =item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
625
626 Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY>) the given
627 memory area, page-wise, that is, it reads (or reads and writes back) the
628 first octet of every page that spans the memory area.
629
630 This can be used to page in some mmapped file, or dirty some pages. Note
631 that dirtying is an unlocked read-write access, so races can ensue when
632 the some other thread modifies the data stored in that memory area.
633
634 =item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data)
635
636 Executes a custom request, i.e., a user-specified callback.
637
638 The callback gets the C<eio_req *> as parameter and is expected to read
639 and modify any request-specific members. Specifically, it should set C<<
640 req->result >> to the result value, just like other requests.
641
642 Here is an example that simply calls C<open>, like C<eio_open>, but it
643 uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If
644 you want to pass more/other parameters, you either need to pass some
645 struct or so via C<data> or provide your own wrapper using the low-level
646 API.
647
648 static int
649 my_open_done (eio_req *req)
650 {
651 int fd = req->result;
652
653 return 0;
654 }
655
656 static void
657 my_open (eio_req *req)
658 {
659 req->result = open (req->data, O_RDONLY);
660 }
661
662 eio_custom (my_open, 0, my_open_done, "/etc/passwd");
663
664 =item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data)
665
666 This is a request that takes C<delay> seconds to execute, but otherwise
667 does nothing - it simply puts one of the worker threads to sleep for this
668 long.
669
670 This request can be used to artificially increase load, e.g. for debugging
671 or benchmarking reasons.
672
673 =item eio_nop (int pri, eio_cb cb, void *data)
674
675 This request does nothing, except go through the whole request cycle. This
676 can be used to measure latency or in some cases to simplify code, but is
677 not really of much use.
678
679 =back
680
681 =head3 GROUPING AND LIMITING REQUESTS
682
683 There is one more rather special request, C<eio_grp>. It is a very special
684 aio request: Instead of doing something, it is a container for other eio
685 requests.
686
687 There are two primary use cases for this: a) bundle many requests into a
688 single, composite, request with a definite callback and the ability to
689 cancel the whole request with its subrequests and b) limiting the number
690 of "active" requests.
691
692 Further below you will find more discussion of these topics - first
693 follows the reference section detailing the request generator and other
694 methods.
695
696 =over 4
697
698 =item eio_req *grp = eio_grp (eio_cb cb, void *data)
699
700 Creates, submits and returns a group request. Note that it doesn't have a
701 priority, unlike all other requests.
702
703 =item eio_grp_add (eio_req *grp, eio_req *req)
704
705 Adds a request to the request group.
706
707 =item eio_grp_cancel (eio_req *grp)
708
709 Cancels all requests I<in> the group, but I<not> the group request
710 itself. You can cancel the group request I<and> all subrequests via a
711 normal C<eio_cancel> call.
712
713 =back
714
715 =head4 GROUP REQUEST LIFETIME
716
717 Left alone, a group request will instantly move to the pending state and
718 will be finished at the next call of C<eio_poll>.
719
720 The usefulness stems from the fact that, if a subrequest is added to a
721 group I<before> a call to C<eio_poll>, via C<eio_grp_add>, then the group
722 will not finish until all the subrequests have finished.
723
724 So the usage cycle of a group request is like this: after it is created,
725 you normally instantly add a subrequest. If none is added, the group
726 request will finish on it's own. As long as subrequests are added before
727 the group request is finished it will be kept from finishing, that is the
728 callbacks of any subrequests can, in turn, add more requests to the group,
729 and as long as any requests are active, the group request itself will not
730 finish.
731
732 =head4 CREATING COMPOSITE REQUESTS
733
734 Imagine you wanted to create an C<eio_load> request that opens a file,
735 reads it and closes it. This means it has to execute at least three eio
736 requests, but for various reasons it might be nice if that request looked
737 like any other eio request.
738
739 This can be done with groups:
740
741 =over 4
742
743 =item 1) create the request object
744
745 Create a group that contains all further requests. This is the request you
746 can return as "the load request".
747
748 =item 2) open the file, maybe
749
750 Next, open the file with C<eio_open> and add the request to the group
751 request and you are finished setting up the request.
752
753 If, for some reason, you cannot C<eio_open> (path is a null ptr?) you
754 can set C<< grp->result >> to C<-1> to signal an error and let the group
755 request finish on its own.
756
757 =item 3) open callback adds more requests
758
759 In the open callback, if the open was not successful, copy C<<
760 req->errorno >> to C<< grp->errorno >> and set C<< grp->result >> to
761 C<-1> to signal an error.
762
763 Otherwise, malloc some memory or so and issue a read request, adding the
764 read request to the group.
765
766 =item 4) continue issuing requests till finished
767
768 In the read callback, check for errors and possibly continue with
769 C<eio_close> or any other eio request in the same way.
770
771 As soon as no new requests are added, the group request will finish. Make
772 sure you I<always> set C<< grp->result >> to some sensible value.
773
774 =back
775
776 =head4 REQUEST LIMITING
777
778
779 #TODO
780
781 void eio_grp_limit (eio_req *grp, int limit);
782
783
784
785 =head1 LOW LEVEL REQUEST API
786
787 #TODO
788
789
790 =head1 ANATOMY AND LIFETIME OF AN EIO REQUEST
791
792 A request is represented by a structure of type C<eio_req>. To initialise
793 it, clear it to all zero bytes:
794
795 eio_req req;
796
797 memset (&req, 0, sizeof (req));
798
799 A more common way to initialise a new C<eio_req> is to use C<calloc>:
800
801 eio_req *req = calloc (1, sizeof (*req));
802
803 In either case, libeio neither allocates, initialises or frees the
804 C<eio_req> structure for you - it merely uses it.
805
806 zero
807
808 #TODO
809
810 =head2 CONFIGURATION
811
812 The functions in this section can sometimes be useful, but the default
813 configuration will do in most case, so you should skip this section on
814 first reading.
815
816 =over 4
817
818 =item eio_set_max_poll_time (eio_tstamp nseconds)
819
820 This causes C<eio_poll ()> to return after it has detected that it was
821 running for C<nsecond> seconds or longer (this number can be fractional).
822
823 This can be used to limit the amount of time spent handling eio requests,
824 for example, in interactive programs, you might want to limit this time to
825 C<0.01> seconds or so.
826
827 Note that:
828
829 =over 4
830
831 =item a) libeio doesn't know how long your request callbacks take, so the
832 time spent in C<eio_poll> is up to one callback invocation longer then
833 this interval.
834
835 =item b) this is implemented by calling C<gettimeofday> after each
836 request, which can be costly.
837
838 =item c) at least one request will be handled.
839
840 =back
841
842 =item eio_set_max_poll_reqs (unsigned int nreqs)
843
844 When C<nreqs> is non-zero, then C<eio_poll> will not handle more than
845 C<nreqs> requests per invocation. This is a less costly way to limit the
846 amount of work done by C<eio_poll> then setting a time limit.
847
848 If you know your callbacks are generally fast, you could use this to
849 encourage interactiveness in your programs by setting it to C<10>, C<100>
850 or even C<1000>.
851
852 =item eio_set_min_parallel (unsigned int nthreads)
853
854 Make sure libeio can handle at least this many requests in parallel. It
855 might be able handle more.
856
857 =item eio_set_max_parallel (unsigned int nthreads)
858
859 Set the maximum number of threads that libeio will spawn.
860
861 =item eio_set_max_idle (unsigned int nthreads)
862
863 Libeio uses threads internally to handle most requests, and will start and stop threads on demand.
864
865 This call can be used to limit the number of idle threads (threads without
866 work to do): libeio will keep some threads idle in preparation for more
867 requests, but never longer than C<nthreads> threads.
868
869 In addition to this, libeio will also stop threads when they are idle for
870 a few seconds, regardless of this setting.
871
872 =item unsigned int eio_nthreads ()
873
874 Return the number of worker threads currently running.
875
876 =item unsigned int eio_nreqs ()
877
878 Return the number of requests currently handled by libeio. This is the
879 total number of requests that have been submitted to libeio, but not yet
880 destroyed.
881
882 =item unsigned int eio_nready ()
883
884 Returns the number of ready requests, i.e. requests that have been
885 submitted but have not yet entered the execution phase.
886
887 =item unsigned int eio_npending ()
888
889 Returns the number of pending requests, i.e. requests that have been
890 executed and have results, but have not been finished yet by a call to
891 C<eio_poll>).
892
893 =back
894
895 =head1 EMBEDDING
896
897 Libeio can be embedded directly into programs. This functionality is not
898 documented and not (yet) officially supported.
899
900 Note that, when including C<libeio.m4>, you are responsible for defining
901 the compilation environment (C<_LARGEFILE_SOURCE>, C<_GNU_SOURCE> etc.).
902
903 If you need to know how, check the C<IO::AIO> perl module, which does
904 exactly that.
905
906
907 =head1 COMPILETIME CONFIGURATION
908
909 These symbols, if used, must be defined when compiling F<eio.c>.
910
911 =over 4
912
913 =item EIO_STACKSIZE
914
915 This symbol governs the stack size for each eio thread. Libeio itself
916 was written to use very little stackspace, but when using C<EIO_CUSTOM>
917 requests, you might want to increase this.
918
919 If this symbol is undefined (the default) then libeio will use its default
920 stack size (C<sizeof (void *) * 4096> currently). In all other cases, the
921 value must be an expression that evaluates to the desired stack size.
922
923 =back
924
925
926 =head1 PORTABILITY REQUIREMENTS
927
928 In addition to a working ISO-C implementation, libeio relies on a few
929 additional extensions:
930
931 =over 4
932
933 =item POSIX threads
934
935 To be portable, this module uses threads, specifically, the POSIX threads
936 library must be available (and working, which partially excludes many xBSD
937 systems, where C<fork ()> is buggy).
938
939 =item POSIX-compatible filesystem API
940
941 This is actually a harder portability requirement: The libeio API is quite
942 demanding regarding POSIX API calls (symlinks, user/group management
943 etc.).
944
945 =item C<double> must hold a time value in seconds with enough accuracy
946
947 The type C<double> is used to represent timestamps. It is required to
948 have at least 51 bits of mantissa (and 9 bits of exponent), which is good
949 enough for at least into the year 4000. This requirement is fulfilled by
950 implementations implementing IEEE 754 (basically all existing ones).
951
952 =back
953
954 If you know of other additional requirements drop me a note.
955
956
957 =head1 AUTHOR
958
959 Marc Lehmann <libeio@schmorp.de>.
960