ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/libeio/eio.pod
(Generate patch)

Comparing libeio/eio.pod (file contents):
Revision 1.5 by root, Sun Nov 29 15:53:48 2009 UTC vs.
Revision 1.9 by root, Sun Jun 5 23:07:46 2011 UTC

11The newest version of this document is also available as an html-formatted 11The newest version of this document is also available as an html-formatted
12web page you might find easier to navigate when reading it for the first 12web page you might find easier to navigate when reading it for the first
13time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>. 13time: L<http://pod.tst.eu/http://cvs.schmorp.de/libeio/eio.pod>.
14 14
15Note that this library is a by-product of the C<IO::AIO> perl 15Note that this library is a by-product of the C<IO::AIO> perl
16module, and many of the subtler points regarding requets lifetime 16module, and many of the subtler points regarding requests lifetime
17and so on are only documented in its documentation at the 17and so on are only documented in its documentation at the
18moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>. 18moment: L<http://pod.tst.eu/http://cvs.schmorp.de/IO-AIO/AIO.pm>.
19 19
20=head2 FEATURES 20=head2 FEATURES
21 21
22This library provides fully asynchronous versions of most POSIX functions 22This library provides fully asynchronous versions of most POSIX functions
23dealign with I/O. Unlike most asynchronous libraries, this not only 23dealing with I/O. Unlike most asynchronous libraries, this not only
24includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and 24includes C<read> and C<write>, but also C<open>, C<stat>, C<unlink> and
25similar functions, as well as less rarely ones such as C<mknod>, C<futime> 25similar functions, as well as less rarely ones such as C<mknod>, C<futime>
26or C<readlink>. 26or C<readlink>.
27 27
28It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and 28It also offers wrappers around C<sendfile> (Solaris, Linux, HP-UX and
37=head2 TIME REPRESENTATION 37=head2 TIME REPRESENTATION
38 38
39Libeio represents time as a single floating point number, representing the 39Libeio represents time as a single floating point number, representing the
40(fractional) number of seconds since the (POSIX) epoch (somewhere near 40(fractional) number of seconds since the (POSIX) epoch (somewhere near
41the beginning of 1970, details are complicated, don't ask). This type is 41the beginning of 1970, details are complicated, don't ask). This type is
42called C<eio_tstamp>, but it is guarenteed to be of type C<double> (or 42called C<eio_tstamp>, but it is guaranteed to be of type C<double> (or
43better), so you can freely use C<double> yourself. 43better), so you can freely use C<double> yourself.
44 44
45Unlike the name component C<stamp> might indicate, it is also used for 45Unlike the name component C<stamp> might indicate, it is also used for
46time differences throughout libeio. 46time differences throughout libeio.
47 47
55 3. in the parent, continue business as usual, done 55 3. in the parent, continue business as usual, done
56 4. in the child, destroy all ready and pending requests and free the 56 4. in the child, destroy all ready and pending requests and free the
57 memory used by the worker threads. This gives you a fully empty 57 memory used by the worker threads. This gives you a fully empty
58 libeio queue. 58 libeio queue.
59 59
60Note, however, since libeio does use threads, thr above guarantee doesn't
61cover your libc, for example, malloc and other libc functions are not
62fork-safe, so there is very little you can do after a fork, and in fatc,
63the above might crash, and thus change.
64
60=head1 INITIALISATION/INTEGRATION 65=head1 INITIALISATION/INTEGRATION
61 66
62Before you can call any eio functions you first have to initialise the 67Before you can call any eio functions you first have to initialise the
63library. The library integrates into any event loop, but can also be used 68library. The library integrates into any event loop, but can also be used
64without one, including in polling mode. 69without one, including in polling mode.
97handled or C<done_poll> has been called, which signals the same. 102handled or C<done_poll> has been called, which signals the same.
98 103
99Note that C<eio_poll> might return after C<done_poll> and C<want_poll> 104Note that C<eio_poll> might return after C<done_poll> and C<want_poll>
100have been called again, so watch out for races in your code. 105have been called again, so watch out for races in your code.
101 106
102As with C<want_poll>, this callback is called while lcoks are being held, 107As with C<want_poll>, this callback is called while locks are being held,
103so you I<must not call any libeio functions form within this callback>. 108so you I<must not call any libeio functions form within this callback>.
104 109
105=item int eio_poll () 110=item int eio_poll ()
106 111
107This function has to be called whenever there are pending requests that 112This function has to be called whenever there are pending requests that
126libev resets/rearms the async watcher before calling your callback, 131libev resets/rearms the async watcher before calling your callback,
127and therefore, before calling C<eio_poll>. This might result in (some) 132and therefore, before calling C<eio_poll>. This might result in (some)
128spurious wake-ups, but is generally harmless. 133spurious wake-ups, but is generally harmless.
129 134
130For most other event loops, you would typically use a pipe - the event 135For most other event loops, you would typically use a pipe - the event
131loop should be told to wait for read readyness on the read end. In 136loop should be told to wait for read readiness on the read end. In
132C<want_poll> you would write a single byte, in C<done_poll> you would try 137C<want_poll> you would write a single byte, in C<done_poll> you would try
133to read that byte, and in the callback for the read end, you would call 138to read that byte, and in the callback for the read end, you would call
134C<eio_poll>. The race is avoided here because the event loop should invoke 139C<eio_poll>. The race is avoided here because the event loop should invoke
135your callback again and again until the byte has been read (as the pipe 140your callback again and again until the byte has been read (as the pipe
136read callback does not read it, only C<done_poll>). 141read callback does not read it, only C<done_poll>).
137 142
143
144=head1 HIGH LEVEL REQUEST API
145
146Libeio has both a high-level API, which consists of calling a request
147function with a callback to be called on completion, and a low-level API
148where you fill out request structures and submit them.
149
150This section describes the high-level API.
151
152=head2 REQUEST SUBMISSION AND RESULT PROCESSING
153
154You submit a request by calling the relevant C<eio_TYPE> function with the
155required parameters, a callback of type C<int (*eio_cb)(eio_req *req)>
156(called C<eio_cb> below) and a freely usable C<void *data> argument.
157
158The return value will either be 0
159
160The callback will be called with an C<eio_req *> which contains the
161results of the request. The members you can access inside that structure
162vary from request to request, except for:
163
164=over 4
165
166=item C<ssize_t result>
167
168This contains the result value from the call (usually the same as the
169syscall of the same name).
170
171=item C<int errorno>
172
173This contains the value of C<errno> after the call.
174
175=item C<void *data>
176
177The C<void *data> member simply stores the value of the C<data> argument.
178
179=back
180
181The return value of the callback is normally C<0>, which tells libeio to
182continue normally. If a callback returns a nonzero value, libeio will
183stop processing results (in C<eio_poll>) and will return the value to its
184caller.
185
186Memory areas passed to libeio must stay valid as long as a request
187executes, with the exception of paths, which are being copied
188internally. Any memory libeio itself allocates will be freed after the
189finish callback has been called. If you want to manage all memory passed
190to libeio yourself you can use the low-level API.
191
192For example, to open a file, you could do this:
193
194 static int
195 file_open_done (eio_req *req)
196 {
197 if (req->result < 0)
198 {
199 /* open() returned -1 */
200 errno = req->errorno;
201 perror ("open");
202 }
203 else
204 {
205 int fd = req->result;
206 /* now we have the new fd in fd */
207 }
208
209 return 0;
210 }
211
212 /* the first three arguments are passed to open(2) */
213 /* the remaining are priority, callback and data */
214 if (!eio_open ("/etc/passwd", O_RDONLY, 0, 0, file_open_done, 0))
215 abort (); /* something ent wrong, we will all die!!! */
216
217Note that you additionally need to call C<eio_poll> when the C<want_cb>
218indicates that requests are ready to be processed.
219
220=head2 AVAILABLE REQUESTS
221
222The following request functions are available. I<All> of them return the
223C<eio_req *> on success and C<0> on failure, and I<all> of them have the
224same three trailing arguments: C<pri>, C<cb> and C<data>. The C<cb> is
225mandatory, but in most cases, you pass in C<0> as C<pri> and C<0> or some
226custom data value as C<data>.
227
228=head3 POSIX API WRAPPERS
229
230These requests simply wrap the POSIX call of the same name, with the same
231arguments:
232
233=over 4
234
235=item eio_open (const char *path, int flags, mode_t mode, int pri, eio_cb cb, void *data)
236
237=item eio_utime (const char *path, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
238
239=item eio_truncate (const char *path, off_t offset, int pri, eio_cb cb, void *data)
240
241=item eio_chown (const char *path, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
242
243=item eio_chmod (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
244
245=item eio_mkdir (const char *path, mode_t mode, int pri, eio_cb cb, void *data)
246
247=item eio_rmdir (const char *path, int pri, eio_cb cb, void *data)
248
249=item eio_unlink (const char *path, int pri, eio_cb cb, void *data)
250
251=item eio_readlink (const char *path, int pri, eio_cb cb, void *data) /* result=ptr2 allocated dynamically */
252
253=item eio_stat (const char *path, int pri, eio_cb cb, void *data) /* stat buffer=ptr2 allocated dynamically */
254
255=item eio_lstat (const char *path, int pri, eio_cb cb, void *data) /* stat buffer=ptr2 allocated dynamically */
256
257=item eio_statvfs (const char *path, int pri, eio_cb cb, void *data) /* stat buffer=ptr2 allocated dynamically */
258
259=item eio_mknod (const char *path, mode_t mode, dev_t dev, int pri, eio_cb cb, void *data)
260
261=item eio_link (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
262
263=item eio_symlink (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
264
265=item eio_rename (const char *path, const char *new_path, int pri, eio_cb cb, void *data)
266
267=item eio_msync (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
268
269=item eio_mlock (void *addr, size_t length, int pri, eio_cb cb, void *data)
270
271=item eio_mlockall (int flags, int pri, eio_cb cb, void *data)
272
273=item eio_close (int fd, int pri, eio_cb cb, void *data)
274
275=item eio_sync (int pri, eio_cb cb, void *data)
276
277=item eio_fsync (int fd, int pri, eio_cb cb, void *data)
278
279=item eio_fdatasync (int fd, int pri, eio_cb cb, void *data)
280
281=item eio_futime (int fd, eio_tstamp atime, eio_tstamp mtime, int pri, eio_cb cb, void *data)
282
283=item eio_ftruncate (int fd, off_t offset, int pri, eio_cb cb, void *data)
284
285=item eio_fchmod (int fd, mode_t mode, int pri, eio_cb cb, void *data)
286
287=item eio_fchown (int fd, uid_t uid, gid_t gid, int pri, eio_cb cb, void *data)
288
289=item eio_dup2 (int fd, int fd2, int pri, eio_cb cb, void *data)
290
291These have the same semantics as the syscall of the same name, their
292return value is available as C<< req->result >> later.
293
294=item eio_read (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
295
296=item eio_write (int fd, void *buf, size_t length, off_t offset, int pri, eio_cb cb, void *data)
297
298These two requests are called C<read> and C<write>, but actually wrap
299C<pread> and C<pwrite>. On systems that lack these calls (such as cygwin),
300libeio uses lseek/read_or_write/lseek and a mutex to serialise the
301requests, so all these requests run serially and do not disturb each
302other. However, they still disturb the file offset while they run, so it's
303not safe to call these functions concurrently with non-libeio functions on
304the same fd on these systems.
305
306Not surprisingly, pread and pwrite are not thread-safe on Darwin (OS/X),
307so it is advised not to submit multiple requests on the same fd on this
308horrible pile of garbage.
309
310=item eio_fstat (int fd, int pri, eio_cb cb, void *data)
311
312Stats a file - if C<< req->result >> indicates success, then you can
313access the C<struct stat>-like structure via C<< req->ptr2 >>:
314
315 EIO_STRUCT_STAT *statdata = (EIO_STRUCT_STAT *)req->ptr2;
316
317=item eio_fstatvfs (int fd, int pri, eio_cb cb, void *data) /* stat buffer=ptr2 allocated dynamically */
318
319Stats a filesystem - if C<< req->result >> indicates success, then you can
320access the C<struct statvfs>-like structure via C<< req->ptr2 >>:
321
322 EIO_STRUCT_STATVFS *statdata = (EIO_STRUCT_STATVFS *)req->ptr2;
323
324=back
325
326=head3 READING DIRECTORIES
327
328Reading directories sounds simple, but can be rather demanding, especially
329if you want to do stuff such as traversing a diretcory hierarchy or
330processing all files in a directory. Libeio can assist thess complex tasks
331with it's C<eio_readdir> call.
332
333=over 4
334
335=item eio_readdir (const char *path, int flags, int pri, eio_cb cb, void *data)
336
337This is a very complex call. It basically reads through a whole directory
338(via the C<opendir>, C<readdir> and C<closedir> calls) and returns either
339the names or an array of C<struct eio_dirent>, depending on the C<flags>
340argument.
341
342The C<< req->result >> indicates either the number of files found, or
343C<-1> on error. On success, zero-terminated names can be found as C<< req->ptr2 >>,
344and C<struct eio_dirents>, if requested by C<flags>, can be found via C<<
345req->ptr1 >>.
346
347Here is an example that prints all the names:
348
349 int i;
350 char *names = (char *)req->ptr2;
351
352 for (i = 0; i < req->result; ++i)
353 {
354 printf ("name #%d: %s\n", i, names);
355
356 /* move to next name */
357 names += strlen (names) + 1;
358 }
359
360Pseudo-entries such as F<.> and F<..> are never returned by C<eio_readdir>.
361
362C<flags> can be any combination of:
363
364=over 4
365
366=item EIO_READDIR_DENTS
367
368If this flag is specified, then, in addition to the names in C<ptr2>,
369also an array of C<struct eio_dirent> is returned, in C<ptr1>. A C<struct
370eio_dirent> looks like this:
371
372 struct eio_dirent
373 {
374 int nameofs; /* offset of null-terminated name string in (char *)req->ptr2 */
375 unsigned short namelen; /* size of filename without trailing 0 */
376 unsigned char type; /* one of EIO_DT_* */
377 signed char score; /* internal use */
378 ino_t inode; /* the inode number, if available, otherwise unspecified */
379 };
380
381The only members you normally would access are C<nameofs>, which is the
382byte-offset from C<ptr2> to the start of the name, C<namelen> and C<type>.
383
384C<type> can be one of:
385
386C<EIO_DT_UNKNOWN> - if the type is not known (very common) and you have to C<stat>
387the name yourself if you need to know,
388one of the "standard" POSIX file types (C<EIO_DT_REG>, C<EIO_DT_DIR>, C<EIO_DT_LNK>,
389C<EIO_DT_FIFO>, C<EIO_DT_SOCK>, C<EIO_DT_CHR>, C<EIO_DT_BLK>)
390or some OS-specific type (currently
391C<EIO_DT_MPC> - multiplexed char device (v7+coherent),
392C<EIO_DT_NAM> - xenix special named file,
393C<EIO_DT_MPB> - multiplexed block device (v7+coherent),
394C<EIO_DT_NWK> - HP-UX network special,
395C<EIO_DT_CMP> - VxFS compressed,
396C<EIO_DT_DOOR> - solaris door, or
397C<EIO_DT_WHT>).
398
399This example prints all names and their type:
400
401 int i;
402 struct eio_dirent *ents = (struct eio_dirent *)req->ptr1;
403 char *names = (char *)req->ptr2;
404
405 for (i = 0; i < req->result; ++i)
406 {
407 struct eio_dirent *ent = ents + i;
408 char *name = names + ent->nameofs;
409
410 printf ("name #%d: %s (type %d)\n", i, name, ent->type);
411 }
412
413=item EIO_READDIR_DIRS_FIRST
414
415When this flag is specified, then the names will be returned in an order
416where likely directories come first, in optimal C<stat> order. This is
417useful when you need to quickly find directories, or you want to find all
418directories while avoiding to stat() each entry.
419
420If the system returns type information in readdir, then this is used
421to find directories directly. Otherwise, likely directories are names
422beginning with ".", or otherwise names with no dots, of which names with
423short names are tried first.
424
425=item EIO_READDIR_STAT_ORDER
426
427When this flag is specified, then the names will be returned in an order
428suitable for stat()'ing each one. That is, when you plan to stat()
429all files in the given directory, then the returned order will likely
430be fastest.
431
432If both this flag and C<EIO_READDIR_DIRS_FIRST> are specified, then
433the likely dirs come first, resulting in a less optimal stat order.
434
435=item EIO_READDIR_FOUND_UNKNOWN
436
437This flag should not be specified when calling C<eio_readdir>. Instead,
438it is being set by C<eio_readdir> (you can access the C<flags> via C<<
439req->int1 >>, when any of the C<type>'s found were C<EIO_DT_UNKNOWN>. The
440absense of this flag therefore indicates that all C<type>'s are known,
441which can be used to speed up some algorithms.
442
443A typical use case would be to identify all subdirectories within a
444directory - you would ask C<eio_readdir> for C<EIO_READDIR_DIRS_FIRST>. If
445then this flag is I<NOT> set, then all the entries at the beginning of the
446returned array of type C<EIO_DT_DIR> are the directories. Otherwise, you
447should start C<stat()>'ing the entries starting at the beginning of the
448array, stopping as soon as you found all directories (the count can be
449deduced by the link count of the directory).
450
451=back
452
453=back
454
455=head3 OS-SPECIFIC CALL WRAPPERS
456
457These wrap OS-specific calls (usually Linux ones), and might or might not
458be emulated on other operating systems. Calls that are not emulated will
459return C<-1> and set C<errno> to C<ENOSYS>.
460
461=over 4
462
463=item eio_sendfile (int out_fd, int in_fd, off_t in_offset, size_t length, int pri, eio_cb cb, void *data)
464
465Wraps the C<sendfile> syscall. The arguments follow the Linux version, but
466libeio supports and will use similar calls on FreeBSD, HP/UX, Solaris and
467Darwin.
468
469If the OS doesn't support some sendfile-like call, or the call fails,
470indicating support for the given file descriptor type (for example,
471Linux's sendfile might not support file to file copies), then libeio will
472emulate the call in userspace, so there are almost no limitations on its
473use.
474
475=item eio_readahead (int fd, off_t offset, size_t length, int pri, eio_cb cb, void *data)
476
477Calls C<readahead(2)>. If the syscall is missing, then the call is
478emulated by simply reading the data (currently in 64kiB chunks).
479
480=item eio_sync_file_range (int fd, off_t offset, size_t nbytes, unsigned int flags, int pri, eio_cb cb, void *data)
481
482Calls C<sync_file_range>. If the syscall is missing, then this is the same
483as calling C<fdatasync>.
484
485=back
486
487=head3 LIBEIO-SPECIFIC REQUESTS
488
489These requests are specific to libeio and do not correspond to any OS call.
490
491=over 4
492
493=item eio_mtouch (void *addr, size_t length, int flags, int pri, eio_cb cb, void *data)
494
495Reads (C<flags == 0>) or modifies (C<flags == EIO_MT_MODIFY) the given
496memory area, page-wise, that is, it reads (or reads and writes back) the
497first octet of every page that spans the memory area.
498
499This can be used to page in some mmapped file, or dirty some pages. Note
500that dirtying is an unlocked read-write access, so races can ensue when
501the some other thread modifies the data stored in that memory area.
502
503=item eio_custom (void (*)(eio_req *) execute, int pri, eio_cb cb, void *data)
504
505Executes a custom request, i.e., a user-specified callback.
506
507The callback gets the C<eio_req *> as parameter and is expected to read
508and modify any request-specific members. Specifically, it should set C<<
509req->result >> to the result value, just like other requests.
510
511Here is an example that simply calls C<open>, like C<eio_open>, but it
512uses the C<data> member as filename and uses a hardcoded C<O_RDONLY>. If
513you want to pass more/other parameters, you either need to pass some
514struct or so via C<data> or provide your own wrapper using the low-level
515API.
516
517 static int
518 my_open_done (eio_req *req)
519 {
520 int fd = req->result;
521
522 return 0;
523 }
524
525 static void
526 my_open (eio_req *req)
527 {
528 req->result = open (req->data, O_RDONLY);
529 }
530
531 eio_custom (my_open, 0, my_open_done, "/etc/passwd");
532
533=item eio_busy (eio_tstamp delay, int pri, eio_cb cb, void *data)
534
535This is a a request that takes C<delay> seconds to execute, but otherwise
536does nothing - it simply puts one of the worker threads to sleep for this
537long.
538
539This request can be used to artificially increase load, e.g. for debugging
540or benchmarking reasons.
541
542=item eio_nop (int pri, eio_cb cb, void *data)
543
544This request does nothing, except go through the whole request cycle. This
545can be used to measure latency or in some cases to simplify code, but is
546not really of much use.
547
548=back
549
550=head3 GROUPING AND LIMITING REQUESTS
551
552#TODO
553
554/*****************************************************************************/
555/* groups */
556
557eio_req *eio_grp (eio_cb cb, void *data);
558void eio_grp_feed (eio_req *grp, void (*feed)(eio_req *req), int limit);
559void eio_grp_limit (eio_req *grp, int limit);
560void eio_grp_add (eio_req *grp, eio_req *req);
561void eio_grp_cancel (eio_req *grp); /* cancels all sub requests but not the group */
562
563
564=back
565
566
567=head1 LOW LEVEL REQUEST API
568
569#TODO
570
571
572=head1 ANATOMY AND LIFETIME OF AN EIO REQUEST
573
574A request is represented by a structure of type C<eio_req>. To initialise
575it, clear it to all zero bytes:
576
577 eio_req req;
578
579 memset (&req, 0, sizeof (req));
580
581A more common way to initialise a new C<eio_req> is to use C<calloc>:
582
583 eio_req *req = calloc (1, sizeof (*req));
584
585In either case, libeio neither allocates, initialises or frees the
586C<eio_req> structure for you - it merely uses it.
587
588zero
589
590#TODO
591
138=head2 CONFIGURATION 592=head2 CONFIGURATION
139 593
140The functions in this section can sometimes be useful, but the default 594The functions in this section can sometimes be useful, but the default
141configuration will do in most case, so you should skip this section on 595configuration will do in most case, so you should skip this section on
142first reading. 596first reading.
185=item eio_set_max_idle (unsigned int nthreads) 639=item eio_set_max_idle (unsigned int nthreads)
186 640
187Libeio uses threads internally to handle most requests, and will start and stop threads on demand. 641Libeio uses threads internally to handle most requests, and will start and stop threads on demand.
188 642
189This call can be used to limit the number of idle threads (threads without 643This call can be used to limit the number of idle threads (threads without
190work to do): libeio will keep some threads idle in preperation for more 644work to do): libeio will keep some threads idle in preparation for more
191requests, but never longer than C<nthreads> threads. 645requests, but never longer than C<nthreads> threads.
192 646
193In addition to this, libeio will also stop threads when they are idle for 647In addition to this, libeio will also stop threads when they are idle for
194a few seconds, regardless of this setting. 648a few seconds, regardless of this setting.
195 649
213Returns the number of pending requests, i.e. requests that have been 667Returns the number of pending requests, i.e. requests that have been
214executed and have results, but have not been finished yet by a call to 668executed and have results, but have not been finished yet by a call to
215C<eio_poll>). 669C<eio_poll>).
216 670
217=back 671=back
218
219
220=head1 ANATOMY OF AN EIO REQUEST
221
222#TODO
223
224
225=head1 HIGH LEVEL REQUEST API
226
227#TODO
228
229=back
230
231
232=head1 LOW LEVEL REQUEST API
233
234#TODO
235 672
236=head1 EMBEDDING 673=head1 EMBEDDING
237 674
238Libeio can be embedded directly into programs. This functionality is not 675Libeio can be embedded directly into programs. This functionality is not
239documented and not (yet) officially supported. 676documented and not (yet) officially supported.

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines