| 1 |
=head1 NAME |
| 2 |
|
| 3 |
libeio - truly asynchronous POSIX I/O |
| 4 |
|
| 5 |
=head1 SYNOPSIS |
| 6 |
|
| 7 |
#include <eio.h> |
| 8 |
|
| 9 |
=head1 DESCRIPTION |
| 10 |
|
| 11 |
The newest version of this document is also available as an html-formatted |
| 12 |
web page you might find easier to navigate when reading it for the first |
| 13 |
time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>. |
| 14 |
|
| 15 |
Note that this library is a by-product of the C<IO::AIO> perl |
| 16 |
module, and many of the subtler points regarding requests lifetime |
| 17 |
and so on are only documented in its documentation at the |
| 18 |
moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>. |
| 19 |
|
| 20 |
=head2 FEATURES |
| 21 |
|
| 22 |
This library provides fully asynchronous versions of most POSIX functions |
| 23 |
dealing with I/O. Unlike most asynchronous libraries, this not only |
| 24 |
includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and |
| 25 |
similar functions, as well as less rarely ones such as C<mknod>, C<futime> |
| 26 |
or C<readlink>. |
| 27 |
|
| 28 |
It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and |
| 29 |
FreeBSD, with emulation on other platforms) and C<readahead> (Linux, with |
| 30 |
emulation elsewhere). |
| 31 |
|
| 32 |
The goal is to enable you to write fully non-blocking programs. For |
| 33 |
example, in a game server, you would not want to freeze for a few seconds |
| 34 |
just because the server is running a backup and you happen to call |
| 35 |
C<readdir>. |
| 36 |
|
| 37 |
=head2 TIME REPRESENTATION |
| 38 |
|
| 39 |
Libeio represents time as a single floating point number, representing the |
| 40 |
(fractional) number of seconds since the (POSIX) epoch (somewhere near |
| 41 |
the beginning of 1970, details are complicated, don't ask). This type is |
| 42 |
called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or |
| 43 |
better), so you can freely use C<double> yourself. |
| 44 |
|
| 45 |
Unlike the name component C<stamp> might indicate, it is also used for |
| 46 |
time differences throughout libeio. |
| 47 |
|
| 48 |
=head2 FORK SUPPORT |
| 49 |
|
| 50 |
Usage of pthreads in a program changes the semantics of fork |
| 51 |
considerably. Specifically, only async-safe functions can be called after |
| 52 |
fork. Libeio uses pthreads, so this applies, and makes using fork hard for |
| 53 |
anything but relatively fork + exec uses. |
| 54 |
|
| 55 |
This library only works in the process that initialised it: Forking is |
| 56 |
fully supported, but using libeio in any other process than the one that |
| 57 |
called C<eio_init> is not. |
| 58 |
|
| 59 |
You might get around by not I<using> libeio before (or after) forking in |
| 60 |
the parent, and using it in the child afterwards. You could also try to |
| 61 |
call the L<eio_init> function again in the child, which will brutally |
| 62 |
reinitialise all data structures, which isn't POSIX conformant, but |
| 63 |
typically works. |
| 64 |
|
| 65 |
Otherwise, the only recommendation you should follow is: treat fork code |
| 66 |
the same way you treat signal handlers, and only ever call C<eio_init> in |
| 67 |
the process that uses it, and only once ever. |
| 68 |
|
| 69 |
=head1 INITIALISATION/INTEGRATION |
| 70 |
|
| 71 |
Before you can call any eio functions you first have to initialise the |
| 72 |
library. The library integrates into any event loop, but can also be used |
| 73 |
without one, including in polling mode. |
| 74 |
|
| 75 |
You have to provide the necessary glue yourself, however. |
| 76 |
|
| 77 |
=over 4 |
| 78 |
|
| 79 |
=item int eio_init (void (*want_poll)(void), void (*done_poll)(void)) |
| 80 |
|
| 81 |
This function initialises the library. On success it returns C<0>, on |
| 82 |
failure it returns C<-1> and sets C<errno> appropriately. |
| 83 |
|
| 84 |
It accepts two function pointers specifying callbacks as argument, both of |
| 85 |
which can be C<0>, in which case the callback isn't called. |
| 86 |
|
| 87 |
There is currently no way to change these callbacks later, or to |
| 88 |
"uninitialise" the library again. |
| 89 |
|
| 90 |
=item want_poll callback |
| 91 |
|
| 92 |
The C<want_poll> callback is invoked whenever libeio wants attention (i.e. |
| 93 |
it wants to be polled by calling C<eio_poll>). It is "edge-triggered", |
| 94 |
that is, it will only be called once when eio wants attention, until all |
| 95 |
pending requests have been handled. |
| 96 |
|
| 97 |
This callback is called while locks are being held, so I<you must |
| 98 |
not call any libeio functions inside this callback>. That includes |
| 99 |
C<eio_poll>. What you should do is notify some other thread, or wake up |
| 100 |
your event loop, and then call C<eio_poll>. |
| 101 |
|
| 102 |
=item done_poll callback |
| 103 |
|
| 104 |
This callback is invoked when libeio detects that all pending requests |
| 105 |
have been handled. It is "edge-triggered", that is, it will only be |
| 106 |
called once after C<want_poll>. To put it differently, C<want_poll> and |
| 107 |
C<done_poll> are invoked in pairs: after C<want_poll> you have to call |
| 108 |
C<eio_poll ()> until either C<eio_poll> indicates that everything has been |
| 109 |
handled or C<done_poll> has been called, which signals the same - only one |
| 110 |
method is needed. |
| 111 |
|
| 112 |
Note that C<eio_poll> might return after C<done_poll> and C<want_poll> |
| 113 |
have been called again, so watch out for races in your code. |
| 114 |
|
| 115 |
It is quite common to have an empty C<done_call> callback and only use |
| 116 |
the return value from C<eio_poll>, or, when C<eio_poll> is configured to |
| 117 |
handle all outstanding replies, it's enough to call C<eio_poll> once. |
| 118 |
|
| 119 |
As with C<want_poll>, this callback is called while locks are being held, |
| 120 |
so you I<must not call any libeio functions from within this callback>. |
| 121 |
|
| 122 |
=item int eio_poll () |
| 123 |
|
| 124 |
This function has to be called whenever there are pending requests that |
| 125 |
need finishing. You usually call this after C<want_poll> has indicated |
| 126 |
that you should do so, but you can also call this function regularly to |
| 127 |
poll for new results. |
| 128 |
|
| 129 |
If any request invocation returns a non-zero value, then C<eio_poll ()> |
| 130 |
immediately returns with that value as return value. |
| 131 |
|
| 132 |
Otherwise, if all requests could be handled, it returns C<0>. If for some |
| 133 |
reason not all requests have been handled, i.e. some are still pending, it |
| 134 |
returns C<-1>. |
| 135 |
|
| 136 |
=back |
| 137 |
|
| 138 |
For libev, you would typically use an C<ev_async> watcher: the |
| 139 |
C<want_poll> callback would invoke C<ev_async_send> to wake up the event |
| 140 |
loop. Inside the callback set for the watcher, one would call C<eio_poll |
| 141 |
()>. |
| 142 |
|
| 143 |
If C<eio_poll ()> is configured to not handle all results in one go |
| 144 |
(i.e. it returns C<-1>) then you should start an idle watcher that calls |
| 145 |
C<eio_poll> until it returns something C<!= -1>. |
| 146 |
|
| 147 |
A full-featured connector between libeio and libev would look as follows |
| 148 |
(if C<eio_poll> is handling all requests, it can of course be simplified a |
| 149 |
lot by removing the idle watcher logic): |
| 150 |
|
| 151 |
static struct ev_loop *loop; |
| 152 |
static ev_idle repeat_watcher; |
| 153 |
static ev_async ready_watcher; |
| 154 |
|
| 155 |
/* idle watcher callback, only used when eio_poll */ |
| 156 |
/* didn't handle all results in one call */ |
| 157 |
static void |
| 158 |
repeat (EV_P_ ev_idle *w, int revents) |
| 159 |
{ |
| 160 |
if (eio_poll () != -1) |
| 161 |
ev_idle_stop (EV_A_ w); |
| 162 |
} |
| 163 |
|
| 164 |
/* eio has some results, process them */ |
| 165 |
static void |
| 166 |
ready (EV_P_ ev_async *w, int revents) |
| 167 |
{ |
| 168 |
if (eio_poll () == -1) |
| 169 |
ev_idle_start (EV_A_ &repeat_watcher); |
| 170 |
} |
| 171 |
|
| 172 |
/* wake up the event loop */ |
| 173 |
static void |
| 174 |
want_poll (void) |
| 175 |
{ |
| 176 |
ev_async_send (loop, &ready_watcher) |
| 177 |
} |
| 178 |
|
| 179 |
void |
| 180 |
my_init_eio () |
| 181 |
{ |
| 182 |
loop = EV_DEFAULT; |
| 183 |
|
| 184 |
ev_idle_init (&repeat_watcher, repeat); |
| 185 |
ev_async_init (&ready_watcher, ready); |
| 186 |
ev_async_start (loop, &watcher); |
| 187 |
|
| 188 |
eio_init (want_poll, 0); |
| 189 |
} |
| 190 |
|
| 191 |
For most other event loops, you would typically use a pipe - the event |
| 192 |
loop should be told to wait for read readiness on the read end. In |
| 193 |
C<want_poll> you would write a single byte, in C<done_poll> you would try |
| 194 |
to read that byte, and in the callback for the read end, you would call |
| 195 |
C<eio_poll>. |
| 196 |
|
| 197 |
You don't have to take special care in the case C<eio_poll> doesn't handle |
| 198 |
all requests, as the done callback will not be invoked, so the event loop |
| 199 |
will still signal readiness for the pipe until I<all> results have been |
| 200 |
processed. |
| 201 |
|
| 202 |
|
| 203 |
=head1 HIGH LEVEL REQUEST API |
| 204 |
|
| 205 |
Libeio has both a high-level API, which consists of calling a request |
| 206 |
function with a callback to be called on completion, and a low-level API |
| 207 |
where you fill out request structures and submit them. |
| 208 |
|
| 209 |
This section describes the high-level API. |
| 210 |
|
| 211 |
=head2 REQUEST SUBMISSION AND RESULT PROCESSING |
| 212 |
|
| 213 |
You submit a request by calling the relevant C<eio_TYPE> function with the |
| 214 |
required parameters, a callback of type C<int (*eio_cb)(eio_req *req)> |
| 215 |
(called C<eio_cb> below) and a freely usable C<void *data> argument. |
| 216 |
|
| 217 |
The return value will either be 0, in case something went really wrong |
| 218 |
(which can basically only happen on very fatal errors, such as C<malloc> |
| 219 |
returning 0, which is rather unlikely), or a pointer to the newly-created |
| 220 |
and submitted C<eio_req *>. |
| 221 |
|
| 222 |
The callback will be called with an C<eio_req *> which contains the |
| 223 |
results of the request. The members you can access inside that structure |
| 224 |
vary from request to request, except for: |
| 225 |
|
| 226 |
=over 4 |
| 227 |
|
| 228 |
=item C<ssize_t result> |
| 229 |
|
| 230 |
This contains the result value from the call (usually the same as the |
| 231 |
syscall of the same name). |
| 232 |
|
| 233 |
=item C<int errorno> |
| 234 |
|
| 235 |
This contains the value of C<errno> after the call. |
| 236 |
|
| 237 |
=item C<void *data> |
| 238 |
|
| 239 |
The C<void *data> member simply stores the value of the C<data> argument. |
| 240 |
|
| 241 |
=back |
| 242 |
|
| 243 |
Members not explicitly described as accessible must not be |
| 244 |
accessed. Specifically, there is no guarantee that any members will still |
| 245 |
have the value they had when the request was submitted. |
| 246 |
|
| 247 |
The return value of the callback is normally C<0>, which tells libeio to |
| 248 |
continue normally. If a callback returns a nonzero value, libeio will |
| 249 |
stop processing results (in C<eio_poll>) and will return the value to its |
| 250 |
caller. |
| 251 |
|
| 252 |
Memory areas passed to libeio wrappers must stay valid as long as a |
| 253 |
request executes, with the exception of paths, which are being copied |
| 254 |
internally. Any memory libeio itself allocates will be freed after the |
| 255 |
finish callback has been called. If you want to manage all memory passed |
| 256 |
to libeio yourself you can use the low-level API. |
| 257 |
|
| 258 |
For example, to open a file, you could do this: |
| 259 |
|
| 260 |
static int |
| 261 |
file_open_done (eio_req *req) |
| 262 |
{ |
| 263 |
if (req->result < 0) |
| 264 |
{ |
| 265 |
/* open() returned -1 */ |
| 266 |
errno = req->errorno; |
| 267 |
perror ("open"); |
| 268 |
} |
| 269 |
else |
| 270 |
{ |
| 271 |
int fd = req->result; |
| 272 |
/* now we have the new fd in fd */ |
| 273 |
} |
| 274 |
|
| 275 |
return 0; |
| 276 |
} |
| 277 |
|
| 278 |
/* the first three arguments are passed to open(2) */ |
| 279 |
/* the remaining are priority, callback and data */ |
| 280 |
if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0)) |
| 281 |
abort (); /* something went wrong, we will all die!!! */ |
| 282 |
|
| 283 |
Note that you additionally need to call C<eio_poll> when the C<want_cb> |
| 284 |
indicates that requests are ready to be processed. |
| 285 |
|
| 286 |
=head2 CANCELLING REQUESTS |
| 287 |
|
| 288 |
Sometimes the need for a request goes away before the request is |
| 289 |
finished. In that case, one can cancel the request by a call to |
| 290 |
C<eio_cancel>: |
| 291 |
|
| 292 |
=over 4 |
| 293 |
|
| 294 |
=item eio_cancel (eio_req *req) |
| 295 |
|
| 296 |
Cancel the request (and all its subrequests). If the request is currently |
| 297 |
executing it might still continue to execute, and in other cases it might |
| 298 |
still take a while till the request is cancelled. |
| 299 |
|
| 300 |
When cancelled, the finish callback will not be invoked. |
| 301 |
|
| 302 |
C<EIO_CANCELLED> is still true for requests that have successfully |
| 303 |
executed, as long as C<eio_cancel> was called on them at some point. |
| 304 |
|
| 305 |
=back |
| 306 |
|
| 307 |
=head2 AVAILABLE REQUESTS |
| 308 |
|
| 309 |
The following request functions are available. I<All> of them return the |
| 310 |
C<eio_req *> on success and C<0> on failure, and I<all> of them have the |
| 311 |
same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is |
| 312 |
mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some |
| 313 |
custom data value as C<data>. |
| 314 |
|
| 315 |
=head3 POSIX API WRAPPERS |
| 316 |
|
| 317 |
These requests simply wrap the POSIX call of the same name, with the same |
| 318 |
arguments. If a function is not implemented by the OS and cannot be emulated |
| 319 |
in some way, then all of these return C<-1> and set C<errorno> to C<ENOSYS>. |
| 320 |
|
| 321 |
=over 4 |
| 322 |
|
| 323 |
=item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data) |
| 324 |
|
| 325 |
=item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data) |
| 326 |
|
| 327 |
=item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data) |
| 328 |
|
| 329 |
=item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data) |
| 330 |
|
| 331 |
=item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data) |
| 332 |
|
| 333 |
=item eio_rmdir (const char *path, int pri, eio_cb cb, void *data) |
| 334 |
|
| 335 |
=item eio_unlink (const char *path, int pri, eio_cb cb, void *data) |
| 336 |
|
| 337 |
=item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data) |
| 338 |
|
| 339 |
=item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data) |
| 340 |
|
| 341 |
=item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data) |
| 342 |
|
| 343 |
=item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data) |
| 344 |
|
| 345 |
=item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data) |
| 346 |
|
| 347 |
=item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data) |
| 348 |
|
| 349 |
=item eio_close (int fd, int pri, eio_cb cb, void *data) |
| 350 |
|
| 351 |
=item eio_sync (int pri, eio_cb cb, void *data) |
| 352 |
|
| 353 |
=item eio_fsync (int fd, int pri, eio_cb cb, void *data) |
| 354 |
|
| 355 |
=item eio_fdatasync (int fd, int pri, eio_cb cb, void *data) |
| 356 |
|
| 357 |
=item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data) |
| 358 |
|
| 359 |
=item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data) |
| 360 |
|
| 361 |
=item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data) |
| 362 |
|
| 363 |
=item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data) |
| 364 |
|
| 365 |
=item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data) |
| 366 |
|
| 367 |
These have the same semantics as the syscall of the same name, their |
| 368 |
return value is available as C<< req->result >> later. |
| 369 |
|
| 370 |
=item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data) |
| 371 |
|
| 372 |
=item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data) |
| 373 |
|
| 374 |
These two requests are called C<read> and C<write>, but actually wrap |
| 375 |
C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin), |
| 376 |
libeio uses lseek/read_or_write/lseek and a mutex to serialise the |
| 377 |
requests, so all these requests run serially and do not disturb each |
| 378 |
other. However, they still disturb the file offset while they run, so it's |
| 379 |
not safe to call these functions concurrently with non-libeio functions on |
| 380 |
the same fd on these systems. |
| 381 |
|
| 382 |
Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X), |
| 383 |
so it is advised not to submit multiple requests on the same fd on this |
| 384 |
horrible pile of garbage. |
| 385 |
|
| 386 |
=item eio_mlockall (int flags, int pri, eio_cb cb, void *data) |
| 387 |
|
| 388 |
Like C<mlockall>, but the flag value constants are called |
| 389 |
C<EIO_MCL_CURRENT> and C<EIO_MCL_FUTURE>. |
| 390 |
|
| 391 |
=item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data) |
| 392 |
|
| 393 |
Just like msync, except that the flag values are called C<EIO_MS_ASYNC>, |
| 394 |
C<EIO_MS_INVALIDATE> and C<EIO_MS_SYNC>. |
| 395 |
|
| 396 |
=item eio_readlink (const char *path, int pri, eio_cb cb, void *data) |
| 397 |
|
| 398 |
If successful, the path read by C<readlink(2)> can be accessed via C<< |
| 399 |
req->ptr2 >> and is I<NOT> null-terminated, with the length specified as |
| 400 |
C<< req->result >>. |
| 401 |
|
| 402 |
if (req->result >= 0) |
| 403 |
{ |
| 404 |
char *target = strndup ((char *)req->ptr2, req->result); |
| 405 |
|
| 406 |
free (target); |
| 407 |
} |
| 408 |
|
| 409 |
=item eio_realpath (const char *path, int pri, eio_cb cb, void *data) |
| 410 |
|
| 411 |
Similar to the realpath libc function, but unlike that one, C<< |
| 412 |
req->result >> is C<-1> on failure. On success, the result is the length |
| 413 |
of the returned path in C<ptr2> (which is I<NOT> 0-terminated) - this is |
| 414 |
similar to readlink. |
| 415 |
|
| 416 |
=item eio_stat (const char *path, int pri, eio_cb cb, void *data) |
| 417 |
|
| 418 |
=item eio_lstat (const char *path, int pri, eio_cb cb, void *data) |
| 419 |
|
| 420 |
=item eio_fstat (int fd, int pri, eio_cb cb, void *data) |
| 421 |
|
| 422 |
Stats a file - if C<< req->result >> indicates success, then you can |
| 423 |
access the C<struct stat>-like structure via C<< req->ptr2 >>: |
| 424 |
|
| 425 |
EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2; |
| 426 |
|
| 427 |
=item eio_statvfs (const char *path, int pri, eio_cb cb, void *data) |
| 428 |
|
| 429 |
=item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data) |
| 430 |
|
| 431 |
Stats a filesystem - if C<< req->result >> indicates success, then you can |
| 432 |
access the C<struct statvfs>-like structure via C<< req->ptr2 >>: |
| 433 |
|
| 434 |
EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2; |
| 435 |
|
| 436 |
=back |
| 437 |
|
| 438 |
=head3 READING DIRECTORIES |
| 439 |
|
| 440 |
Reading directories sounds simple, but can be rather demanding, especially |
| 441 |
if you want to do stuff such as traversing a directory hierarchy or |
| 442 |
processing all files in a directory. Libeio can assist these complex tasks |
| 443 |
with it's C<eio_readdir> call. |
| 444 |
|
| 445 |
=over 4 |
| 446 |
|
| 447 |
=item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data) |
| 448 |
|
| 449 |
This is a very complex call. It basically reads through a whole directory |
| 450 |
(via the C<opendir>, C<readdir> and C<closedir> calls) and returns either |
| 451 |
the names or an array of C<struct eio_dirent>, depending on the C<flags> |
| 452 |
argument. |
| 453 |
|
| 454 |
The C<< req->result >> indicates either the number of files found, or |
| 455 |
C<-1> on error. On success, null-terminated names can be found as C<< req->ptr2 >>, |
| 456 |
and C<struct eio_dirents>, if requested by C<flags>, can be found via C<< |
| 457 |
req->ptr1 >>. |
| 458 |
|
| 459 |
Here is an example that prints all the names: |
| 460 |
|
| 461 |
int i; |
| 462 |
char *names = (char *)req->ptr2; |
| 463 |
|
| 464 |
for (i = 0; i < req->result; ++i) |
| 465 |
{ |
| 466 |
printf ("name #%d: %s\n", i, names); |
| 467 |
|
| 468 |
/* move to next name */ |
| 469 |
names += strlen (names) + 1; |
| 470 |
} |
| 471 |
|
| 472 |
Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>. |
| 473 |
|
| 474 |
C<flags> can be any combination of: |
| 475 |
|
| 476 |
=over 4 |
| 477 |
|
| 478 |
=item EIO_READDIR_DENTS |
| 479 |
|
| 480 |
If this flag is specified, then, in addition to the names in C<ptr2>, |
| 481 |
also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct |
| 482 |
eio_dirent> looks like this: |
| 483 |
|
| 484 |
struct eio_dirent |
| 485 |
{ |
| 486 |
int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */ |
| 487 |
unsigned short namelen; /* size of filename without trailing 0 */ |
| 488 |
unsigned char type; /* one of EIO_DT_* */ |
| 489 |
signed char score; /* internal use */ |
| 490 |
ino_t inode; /* the inode number, if available, otherwise unspecified */ |
| 491 |
}; |
| 492 |
|
| 493 |
The only members you normally would access are C<nameofs>, which is the |
| 494 |
byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>. |
| 495 |
|
| 496 |
C<type> can be one of: |
| 497 |
|
| 498 |
C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat> |
| 499 |
the name yourself if you need to know, |
| 500 |
one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>, |
| 501 |
C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>) |
| 502 |
or some OS-specific type (currently |
| 503 |
C<EIO_DT_MPC> - multiplexed char device (v7+coherent), |
| 504 |
C<EIO_DT_NAM> - xenix special named file, |
| 505 |
C<EIO_DT_MPB> - multiplexed block device (v7+coherent), |
| 506 |
C<EIO_DT_NWK> - HP-UX network special, |
| 507 |
C<EIO_DT_CMP> - VxFS compressed, |
| 508 |
C<EIO_DT_DOOR> - solaris door, or |
| 509 |
C<EIO_DT_WHT>). |
| 510 |
|
| 511 |
This example prints all names and their type: |
| 512 |
|
| 513 |
int i; |
| 514 |
struct eio_dirent *ents = (struct eio_dirent *)req->ptr1; |
| 515 |
char *names = (char *)req->ptr2; |
| 516 |
|
| 517 |
for (i = 0; i < req->result; ++i) |
| 518 |
{ |
| 519 |
struct eio_dirent *ent = ents + i; |
| 520 |
char *name = names + ent->nameofs; |
| 521 |
|
| 522 |
printf ("name #%d: %s (type %d)\n", i, name, ent->type); |
| 523 |
} |
| 524 |
|
| 525 |
=item EIO_READDIR_DIRS_FIRST |
| 526 |
|
| 527 |
When this flag is specified, then the names will be returned in an order |
| 528 |
where likely directories come first, in optimal C<stat> order. This is |
| 529 |
useful when you need to quickly find directories, or you want to find all |
| 530 |
directories while avoiding to stat() each entry. |
| 531 |
|
| 532 |
If the system returns type information in readdir, then this is used |
| 533 |
to find directories directly. Otherwise, likely directories are names |
| 534 |
beginning with ".", or otherwise names with no dots, of which names with |
| 535 |
short names are tried first. |
| 536 |
|
| 537 |
=item EIO_READDIR_STAT_ORDER |
| 538 |
|
| 539 |
When this flag is specified, then the names will be returned in an order |
| 540 |
suitable for stat()'ing each one. That is, when you plan to stat() |
| 541 |
all files in the given directory, then the returned order will likely |
| 542 |
be fastest. |
| 543 |
|
| 544 |
If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then the |
| 545 |
likely directories come first, resulting in a less optimal stat order. |
| 546 |
|
| 547 |
=item EIO_READDIR_FOUND_UNKNOWN |
| 548 |
|
| 549 |
This flag should not be specified when calling C<eio_readdir>. Instead, |
| 550 |
it is being set by C<eio_readdir> (you can access the C<flags> via C<< |
| 551 |
req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The |
| 552 |
absence of this flag therefore indicates that all C<type>'s are known, |
| 553 |
which can be used to speed up some algorithms. |
| 554 |
|
| 555 |
A typical use case would be to identify all subdirectories within a |
| 556 |
directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If |
| 557 |
then this flag is I<NOT> set, then all the entries at the beginning of the |
| 558 |
returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you |
| 559 |
should start C<stat()>'ing the entries starting at the beginning of the |
| 560 |
array, stopping as soon as you found all directories (the count can be |
| 561 |
deduced by the link count of the directory). |
| 562 |
|
| 563 |
=back |
| 564 |
|
| 565 |
=back |
| 566 |
|
| 567 |
=head3 OS-SPECIFIC CALL WRAPPERS |
| 568 |
|
| 569 |
These wrap OS-specific calls (usually Linux ones), and might or might not |
| 570 |
be emulated on other operating systems. Calls that are not emulated will |
| 571 |
return C<-1> and set C<errno> to C<ENOSYS>. |
| 572 |
|
| 573 |
=over 4 |
| 574 |
|
| 575 |
=item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data) |
| 576 |
|
| 577 |
Wraps the C<sendfile> syscall. The arguments follow the Linux version, but |
| 578 |
libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and |
| 579 |
Darwin. |
| 580 |
|
| 581 |
If the OS doesn't support some sendfile-like call, or the call fails, |
| 582 |
indicating support for the given file descriptor type (for example, |
| 583 |
Linux's sendfile might not support file to file copies), then libeio will |
| 584 |
emulate the call in userspace, so there are almost no limitations on its |
| 585 |
use. |
| 586 |
|
| 587 |
=item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data) |
| 588 |
|
| 589 |
Calls C<readahead(2)>. If the syscall is missing, then the call is |
| 590 |
emulated by simply reading the data (currently in 64kiB chunks). |
| 591 |
|
| 592 |
=item eio_syncfs (int fd, int pri, eio_cb cb, void *data) |
| 593 |
|
| 594 |
Calls Linux' C<syncfs> syscall, if available. Returns C<-1> and sets |
| 595 |
C<errno> to C<ENOSYS> if the call is missing I<but still calls sync()>, |
| 596 |
if the C<fd> is C<< >= 0 >>, so you can probe for the availability of the |
| 597 |
syscall with a negative C<fd> argument and checking for C<-1/ENOSYS>. |
| 598 |
|
| 599 |
=item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data) |
| 600 |
|
| 601 |
Calls C<sync_file_range>. If the syscall is missing, then this is the same |
| 602 |
as calling C<fdatasync>. |
| 603 |
|
| 604 |
Flags can be any combination of C<EIO_SYNC_FILE_RANGE_WAIT_BEFORE>, |
| 605 |
C<EIO_SYNC_FILE_RANGE_WRITE> and C<EIO_SYNC_FILE_RANGE_WAIT_AFTER>. |
| 606 |
|
| 607 |
=item eio_fallocate (int fd, int mode, off_t offset, off_t len, int pri, eio_cb cb, void *data) |
| 608 |
|
| 609 |
Calls C<fallocate> (note: I<NOT> C<posix_fallocate>!). If the syscall is |
| 610 |
missing, then it returns failure and sets C<errno> to C<ENOSYS>. |
| 611 |
|
| 612 |
The C<mode> argument can be C<0> (for behaviour similar to |
| 613 |
C<posix_fallocate>), or C<EIO_FALLOC_FL_KEEP_SIZE>, which keeps the size |
| 614 |
of the file unchanged (but still preallocates space beyond end of file). |
| 615 |
|
| 616 |
=back |
| 617 |
|
| 618 |
=head3 LIBEIO-SPECIFIC REQUESTS |
| 619 |
|
| 620 |
These requests are specific to libeio and do not correspond to any OS call. |
| 621 |
|
| 622 |
=over 4 |
| 623 |
|
| 624 |
=item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data) |
| 625 |
|
| 626 |
Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY>) the given |
| 627 |
memory area, page-wise, that is, it reads (or reads and writes back) the |
| 628 |
first octet of every page that spans the memory area. |
| 629 |
|
| 630 |
This can be used to page in some mmapped file, or dirty some pages. Note |
| 631 |
that dirtying is an unlocked read-write access, so races can ensue when |
| 632 |
the some other thread modifies the data stored in that memory area. |
| 633 |
|
| 634 |
=item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data) |
| 635 |
|
| 636 |
Executes a custom request, i.e., a user-specified callback. |
| 637 |
|
| 638 |
The callback gets the C<eio_req *> as parameter and is expected to read |
| 639 |
and modify any request-specific members. Specifically, it should set C<< |
| 640 |
req->result >> to the result value, just like other requests. |
| 641 |
|
| 642 |
Here is an example that simply calls C<open>, like C<eio_open>, but it |
| 643 |
uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If |
| 644 |
you want to pass more/other parameters, you either need to pass some |
| 645 |
struct or so via C<data> or provide your own wrapper using the low-level |
| 646 |
API. |
| 647 |
|
| 648 |
static int |
| 649 |
my_open_done (eio_req *req) |
| 650 |
{ |
| 651 |
int fd = req->result; |
| 652 |
|
| 653 |
return 0; |
| 654 |
} |
| 655 |
|
| 656 |
static void |
| 657 |
my_open (eio_req *req) |
| 658 |
{ |
| 659 |
req->result = open (req->data, O_RDONLY); |
| 660 |
} |
| 661 |
|
| 662 |
eio_custom (my_open, 0, my_open_done, "/etc/passwd"); |
| 663 |
|
| 664 |
=item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data) |
| 665 |
|
| 666 |
This is a request that takes C<delay> seconds to execute, but otherwise |
| 667 |
does nothing - it simply puts one of the worker threads to sleep for this |
| 668 |
long. |
| 669 |
|
| 670 |
This request can be used to artificially increase load, e.g. for debugging |
| 671 |
or benchmarking reasons. |
| 672 |
|
| 673 |
=item eio_nop (int pri, eio_cb cb, void *data) |
| 674 |
|
| 675 |
This request does nothing, except go through the whole request cycle. This |
| 676 |
can be used to measure latency or in some cases to simplify code, but is |
| 677 |
not really of much use. |
| 678 |
|
| 679 |
=back |
| 680 |
|
| 681 |
=head3 GROUPING AND LIMITING REQUESTS |
| 682 |
|
| 683 |
There is one more rather special request, C<eio_grp>. It is a very special |
| 684 |
aio request: Instead of doing something, it is a container for other eio |
| 685 |
requests. |
| 686 |
|
| 687 |
There are two primary use cases for this: a) bundle many requests into a |
| 688 |
single, composite, request with a definite callback and the ability to |
| 689 |
cancel the whole request with its subrequests and b) limiting the number |
| 690 |
of "active" requests. |
| 691 |
|
| 692 |
Further below you will find more discussion of these topics - first |
| 693 |
follows the reference section detailing the request generator and other |
| 694 |
methods. |
| 695 |
|
| 696 |
=over 4 |
| 697 |
|
| 698 |
=item eio_req *grp = eio_grp (eio_cb cb, void *data) |
| 699 |
|
| 700 |
Creates, submits and returns a group request. Note that it doesn't have a |
| 701 |
priority, unlike all other requests. |
| 702 |
|
| 703 |
=item eio_grp_add (eio_req *grp, eio_req *req) |
| 704 |
|
| 705 |
Adds a request to the request group. |
| 706 |
|
| 707 |
=item eio_grp_cancel (eio_req *grp) |
| 708 |
|
| 709 |
Cancels all requests I<in> the group, but I<not> the group request |
| 710 |
itself. You can cancel the group request I<and> all subrequests via a |
| 711 |
normal C<eio_cancel> call. |
| 712 |
|
| 713 |
=back |
| 714 |
|
| 715 |
=head4 GROUP REQUEST LIFETIME |
| 716 |
|
| 717 |
Left alone, a group request will instantly move to the pending state and |
| 718 |
will be finished at the next call of C<eio_poll>. |
| 719 |
|
| 720 |
The usefulness stems from the fact that, if a subrequest is added to a |
| 721 |
group I<before> a call to C<eio_poll>, via C<eio_grp_add>, then the group |
| 722 |
will not finish until all the subrequests have finished. |
| 723 |
|
| 724 |
So the usage cycle of a group request is like this: after it is created, |
| 725 |
you normally instantly add a subrequest. If none is added, the group |
| 726 |
request will finish on it's own. As long as subrequests are added before |
| 727 |
the group request is finished it will be kept from finishing, that is the |
| 728 |
callbacks of any subrequests can, in turn, add more requests to the group, |
| 729 |
and as long as any requests are active, the group request itself will not |
| 730 |
finish. |
| 731 |
|
| 732 |
=head4 CREATING COMPOSITE REQUESTS |
| 733 |
|
| 734 |
Imagine you wanted to create an C<eio_load> request that opens a file, |
| 735 |
reads it and closes it. This means it has to execute at least three eio |
| 736 |
requests, but for various reasons it might be nice if that request looked |
| 737 |
like any other eio request. |
| 738 |
|
| 739 |
This can be done with groups: |
| 740 |
|
| 741 |
=over 4 |
| 742 |
|
| 743 |
=item 1) create the request object |
| 744 |
|
| 745 |
Create a group that contains all further requests. This is the request you |
| 746 |
can return as "the load request". |
| 747 |
|
| 748 |
=item 2) open the file, maybe |
| 749 |
|
| 750 |
Next, open the file with C<eio_open> and add the request to the group |
| 751 |
request and you are finished setting up the request. |
| 752 |
|
| 753 |
If, for some reason, you cannot C<eio_open> (path is a null ptr?) you |
| 754 |
can set C<< grp->result >> to C<-1> to signal an error and let the group |
| 755 |
request finish on its own. |
| 756 |
|
| 757 |
=item 3) open callback adds more requests |
| 758 |
|
| 759 |
In the open callback, if the open was not successful, copy C<< |
| 760 |
req->errorno >> to C<< grp->errorno >> and set C<< grp->result >> to |
| 761 |
C<-1> to signal an error. |
| 762 |
|
| 763 |
Otherwise, malloc some memory or so and issue a read request, adding the |
| 764 |
read request to the group. |
| 765 |
|
| 766 |
=item 4) continue issuing requests till finished |
| 767 |
|
| 768 |
In the read callback, check for errors and possibly continue with |
| 769 |
C<eio_close> or any other eio request in the same way. |
| 770 |
|
| 771 |
As soon as no new requests are added, the group request will finish. Make |
| 772 |
sure you I<always> set C<< grp->result >> to some sensible value. |
| 773 |
|
| 774 |
=back |
| 775 |
|
| 776 |
=head4 REQUEST LIMITING |
| 777 |
|
| 778 |
|
| 779 |
#TODO |
| 780 |
|
| 781 |
void eio_grp_limit (eio_req *grp, int limit); |
| 782 |
|
| 783 |
|
| 784 |
|
| 785 |
=head1 LOW LEVEL REQUEST API |
| 786 |
|
| 787 |
#TODO |
| 788 |
|
| 789 |
|
| 790 |
=head1 ANATOMY AND LIFETIME OF AN EIO REQUEST |
| 791 |
|
| 792 |
A request is represented by a structure of type C<eio_req>. To initialise |
| 793 |
it, clear it to all zero bytes: |
| 794 |
|
| 795 |
eio_req req; |
| 796 |
|
| 797 |
memset (&req, 0, sizeof (req)); |
| 798 |
|
| 799 |
A more common way to initialise a new C<eio_req> is to use C<calloc>: |
| 800 |
|
| 801 |
eio_req *req = calloc (1, sizeof (*req)); |
| 802 |
|
| 803 |
In either case, libeio neither allocates, initialises or frees the |
| 804 |
C<eio_req> structure for you - it merely uses it. |
| 805 |
|
| 806 |
zero |
| 807 |
|
| 808 |
#TODO |
| 809 |
|
| 810 |
=head2 CONFIGURATION |
| 811 |
|
| 812 |
The functions in this section can sometimes be useful, but the default |
| 813 |
configuration will do in most case, so you should skip this section on |
| 814 |
first reading. |
| 815 |
|
| 816 |
=over 4 |
| 817 |
|
| 818 |
=item eio_set_max_poll_time (eio_tstamp nseconds) |
| 819 |
|
| 820 |
This causes C<eio_poll ()> to return after it has detected that it was |
| 821 |
running for C<nsecond> seconds or longer (this number can be fractional). |
| 822 |
|
| 823 |
This can be used to limit the amount of time spent handling eio requests, |
| 824 |
for example, in interactive programs, you might want to limit this time to |
| 825 |
C<0.01> seconds or so. |
| 826 |
|
| 827 |
Note that: |
| 828 |
|
| 829 |
=over 4 |
| 830 |
|
| 831 |
=item a) libeio doesn't know how long your request callbacks take, so the |
| 832 |
time spent in C<eio_poll> is up to one callback invocation longer then |
| 833 |
this interval. |
| 834 |
|
| 835 |
=item b) this is implemented by calling C<gettimeofday> after each |
| 836 |
request, which can be costly. |
| 837 |
|
| 838 |
=item c) at least one request will be handled. |
| 839 |
|
| 840 |
=back |
| 841 |
|
| 842 |
=item eio_set_max_poll_reqs (unsigned int nreqs) |
| 843 |
|
| 844 |
When C<nreqs> is non-zero, then C<eio_poll> will not handle more than |
| 845 |
C<nreqs> requests per invocation. This is a less costly way to limit the |
| 846 |
amount of work done by C<eio_poll> then setting a time limit. |
| 847 |
|
| 848 |
If you know your callbacks are generally fast, you could use this to |
| 849 |
encourage interactiveness in your programs by setting it to C<10>, C<100> |
| 850 |
or even C<1000>. |
| 851 |
|
| 852 |
=item eio_set_min_parallel (unsigned int nthreads) |
| 853 |
|
| 854 |
Make sure libeio can handle at least this many requests in parallel. It |
| 855 |
might be able handle more. |
| 856 |
|
| 857 |
=item eio_set_max_parallel (unsigned int nthreads) |
| 858 |
|
| 859 |
Set the maximum number of threads that libeio will spawn. |
| 860 |
|
| 861 |
=item eio_set_max_idle (unsigned int nthreads) |
| 862 |
|
| 863 |
Libeio uses threads internally to handle most requests, and will start and stop threads on demand. |
| 864 |
|
| 865 |
This call can be used to limit the number of idle threads (threads without |
| 866 |
work to do): libeio will keep some threads idle in preparation for more |
| 867 |
requests, but never longer than C<nthreads> threads. |
| 868 |
|
| 869 |
In addition to this, libeio will also stop threads when they are idle for |
| 870 |
a few seconds, regardless of this setting. |
| 871 |
|
| 872 |
=item unsigned int eio_nthreads () |
| 873 |
|
| 874 |
Return the number of worker threads currently running. |
| 875 |
|
| 876 |
=item unsigned int eio_nreqs () |
| 877 |
|
| 878 |
Return the number of requests currently handled by libeio. This is the |
| 879 |
total number of requests that have been submitted to libeio, but not yet |
| 880 |
destroyed. |
| 881 |
|
| 882 |
=item unsigned int eio_nready () |
| 883 |
|
| 884 |
Returns the number of ready requests, i.e. requests that have been |
| 885 |
submitted but have not yet entered the execution phase. |
| 886 |
|
| 887 |
=item unsigned int eio_npending () |
| 888 |
|
| 889 |
Returns the number of pending requests, i.e. requests that have been |
| 890 |
executed and have results, but have not been finished yet by a call to |
| 891 |
C<eio_poll>). |
| 892 |
|
| 893 |
=back |
| 894 |
|
| 895 |
=head1 EMBEDDING |
| 896 |
|
| 897 |
Libeio can be embedded directly into programs. This functionality is not |
| 898 |
documented and not (yet) officially supported. |
| 899 |
|
| 900 |
Note that, when including C<libeio.m4>, you are responsible for defining |
| 901 |
the compilation environment (C<_LARGEFILE_SOURCE>, C<_GNU_SOURCE> etc.). |
| 902 |
|
| 903 |
If you need to know how, check the C<IO::AIO> perl module, which does |
| 904 |
exactly that. |
| 905 |
|
| 906 |
|
| 907 |
=head1 COMPILETIME CONFIGURATION |
| 908 |
|
| 909 |
These symbols, if used, must be defined when compiling F<eio.c>. |
| 910 |
|
| 911 |
=over 4 |
| 912 |
|
| 913 |
=item EIO_STACKSIZE |
| 914 |
|
| 915 |
This symbol governs the stack size for each eio thread. Libeio itself |
| 916 |
was written to use very little stackspace, but when using C<EIO_CUSTOM> |
| 917 |
requests, you might want to increase this. |
| 918 |
|
| 919 |
If this symbol is undefined (the default) then libeio will use its default |
| 920 |
stack size (C<sizeof (void *) * 4096> currently). In all other cases, the |
| 921 |
value must be an expression that evaluates to the desired stack size. |
| 922 |
|
| 923 |
=back |
| 924 |
|
| 925 |
|
| 926 |
=head1 PORTABILITY REQUIREMENTS |
| 927 |
|
| 928 |
In addition to a working ISO-C implementation, libeio relies on a few |
| 929 |
additional extensions: |
| 930 |
|
| 931 |
=over 4 |
| 932 |
|
| 933 |
=item POSIX threads |
| 934 |
|
| 935 |
To be portable, this module uses threads, specifically, the POSIX threads |
| 936 |
library must be available (and working, which partially excludes many xBSD |
| 937 |
systems, where C<fork ()> is buggy). |
| 938 |
|
| 939 |
=item POSIX-compatible filesystem API |
| 940 |
|
| 941 |
This is actually a harder portability requirement: The libeio API is quite |
| 942 |
demanding regarding POSIX API calls (symlinks, user/group management |
| 943 |
etc.). |
| 944 |
|
| 945 |
=item C<double> must hold a time value in seconds with enough accuracy |
| 946 |
|
| 947 |
The type C<double> is used to represent timestamps. It is required to |
| 948 |
have at least 51 bits of mantissa (and 9 bits of exponent), which is good |
| 949 |
enough for at least into the year 4000. This requirement is fulfilled by |
| 950 |
implementations implementing IEEE 754 (basically all existing ones). |
| 951 |
|
| 952 |
=back |
| 953 |
|
| 954 |
If you know of other additional requirements drop me a note. |
| 955 |
|
| 956 |
|
| 957 |
=head1 AUTHOR |
| 958 |
|
| 959 |
Marc Lehmann <libeio@schmorp.de>. |
| 960 |
|