ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/IO-AIO/AIO.pm
Revision: 1.53
Committed: Sat Oct 21 23:20:29 2006 UTC (17 years, 7 months ago) by root
Branch: MAIN
Changes since 1.52: +2 -2 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 IO::AIO - Asynchronous Input/Output
4
5 =head1 SYNOPSIS
6
7 use IO::AIO;
8
9 aio_open "/etc/passwd", O_RDONLY, 0, sub {
10 my ($fh) = @_;
11 ...
12 };
13
14 aio_unlink "/tmp/file", sub { };
15
16 aio_read $fh, 30000, 1024, $buffer, 0, sub {
17 $_[0] > 0 or die "read error: $!";
18 };
19
20 use IO::AIO 2; # version has aio objects
21
22 my $req = aio_unlink "/tmp/file", sub { };
23 $req->cancel; # cancel request if still in queue
24
25 # AnyEvent
26 open my $fh, "<&=" . IO::AIO::poll_fileno or die "$!";
27 my $w = AnyEvent->io (fh => $fh, poll => 'r', cb => sub { IO::AIO::poll_cb });
28
29 # Event
30 Event->io (fd => IO::AIO::poll_fileno,
31 poll => 'r',
32 cb => \&IO::AIO::poll_cb);
33
34 # Glib/Gtk2
35 add_watch Glib::IO IO::AIO::poll_fileno,
36 in => sub { IO::AIO::poll_cb; 1 };
37
38 # Tk
39 Tk::Event::IO->fileevent (IO::AIO::poll_fileno, "",
40 readable => \&IO::AIO::poll_cb);
41
42 # Danga::Socket
43 Danga::Socket->AddOtherFds (IO::AIO::poll_fileno =>
44 \&IO::AIO::poll_cb);
45
46
47 =head1 DESCRIPTION
48
49 This module implements asynchronous I/O using whatever means your
50 operating system supports.
51
52 Currently, a number of threads are started that execute your read/writes
53 and signal their completion. You don't need thread support in your libc or
54 perl, and the threads created by this module will not be visible to the
55 pthreads library. In the future, this module might make use of the native
56 aio functions available on many operating systems. However, they are often
57 not well-supported (Linux doesn't allow them on normal files currently,
58 for example), and they would only support aio_read and aio_write, so the
59 remaining functionality would have to be implemented using threads anyway.
60
61 Although the module will work with in the presence of other threads, it is
62 currently not reentrant, so use appropriate locking yourself, always call
63 C<poll_cb> from within the same thread, or never call C<poll_cb> (or other
64 C<aio_> functions) recursively.
65
66 =cut
67
68 package IO::AIO;
69
70 no warnings;
71 use strict 'vars';
72
73 use base 'Exporter';
74
75 BEGIN {
76 our $VERSION = '1.99';
77
78 our @EXPORT = qw(aio_sendfile aio_read aio_write aio_open aio_close aio_stat
79 aio_lstat aio_unlink aio_rmdir aio_readdir aio_scandir aio_symlink
80 aio_fsync aio_fdatasync aio_readahead aio_rename aio_link aio_move);
81 our @EXPORT_OK = qw(poll_fileno poll_cb min_parallel max_parallel max_outstanding nreqs);
82
83 require XSLoader;
84 XSLoader::load ("IO::AIO", $VERSION);
85 }
86
87 =head1 FUNCTIONS
88
89 =head2 AIO FUNCTIONS
90
91 All the C<aio_*> calls are more or less thin wrappers around the syscall
92 with the same name (sans C<aio_>). The arguments are similar or identical,
93 and they all accept an additional (and optional) C<$callback> argument
94 which must be a code reference. This code reference will get called with
95 the syscall return code (e.g. most syscalls return C<-1> on error, unlike
96 perl, which usually delivers "false") as it's sole argument when the given
97 syscall has been executed asynchronously.
98
99 All functions expecting a filehandle keep a copy of the filehandle
100 internally until the request has finished.
101
102 All non-composite requests (requests that are not broken down into
103 multiple requests) return objects of type L<IO::AIO::REQ> that allow
104 further manipulation of running requests.
105
106 The pathnames you pass to these routines I<must> be absolute and
107 encoded in byte form. The reason for the former is that at the time the
108 request is being executed, the current working directory could have
109 changed. Alternatively, you can make sure that you never change the
110 current working directory.
111
112 To encode pathnames to byte form, either make sure you either: a)
113 always pass in filenames you got from outside (command line, readdir
114 etc.), b) are ASCII or ISO 8859-1, c) use the Encode module and encode
115 your pathnames to the locale (or other) encoding in effect in the user
116 environment, d) use Glib::filename_from_unicode on unicode filenames or e)
117 use something else.
118
119 =over 4
120
121 =item aio_open $pathname, $flags, $mode, $callback->($fh)
122
123 Asynchronously open or create a file and call the callback with a newly
124 created filehandle for the file.
125
126 The pathname passed to C<aio_open> must be absolute. See API NOTES, above,
127 for an explanation.
128
129 The C<$flags> argument is a bitmask. See the C<Fcntl> module for a
130 list. They are the same as used by C<sysopen>.
131
132 Likewise, C<$mode> specifies the mode of the newly created file, if it
133 didn't exist and C<O_CREAT> has been given, just like perl's C<sysopen>,
134 except that it is mandatory (i.e. use C<0> if you don't create new files,
135 and C<0666> or C<0777> if you do).
136
137 Example:
138
139 aio_open "/etc/passwd", O_RDONLY, 0, sub {
140 if ($_[0]) {
141 print "open successful, fh is $_[0]\n";
142 ...
143 } else {
144 die "open failed: $!\n";
145 }
146 };
147
148 =item aio_close $fh, $callback->($status)
149
150 Asynchronously close a file and call the callback with the result
151 code. I<WARNING:> although accepted, you should not pass in a perl
152 filehandle here, as perl will likely close the file descriptor another
153 time when the filehandle is destroyed. Normally, you can safely call perls
154 C<close> or just let filehandles go out of scope.
155
156 This is supposed to be a bug in the API, so that might change. It's
157 therefore best to avoid this function.
158
159 =item aio_read $fh,$offset,$length, $data,$dataoffset, $callback->($retval)
160
161 =item aio_write $fh,$offset,$length, $data,$dataoffset, $callback->($retval)
162
163 Reads or writes C<length> bytes from the specified C<fh> and C<offset>
164 into the scalar given by C<data> and offset C<dataoffset> and calls the
165 callback without the actual number of bytes read (or -1 on error, just
166 like the syscall).
167
168 The C<$data> scalar I<MUST NOT> be modified in any way while the request
169 is outstanding. Modifying it can result in segfaults or WW3 (if the
170 necessary/optional hardware is installed).
171
172 Example: Read 15 bytes at offset 7 into scalar C<$buffer>, starting at
173 offset C<0> within the scalar:
174
175 aio_read $fh, 7, 15, $buffer, 0, sub {
176 $_[0] > 0 or die "read error: $!";
177 print "read $_[0] bytes: <$buffer>\n";
178 };
179
180 =item aio_move $srcpath, $dstpath, $callback->($status)
181
182 Try to move the I<file> (directories not supported as either source or
183 destination) from C<$srcpath> to C<$dstpath> and call the callback with
184 the C<0> (error) or C<-1> ok.
185
186 This is a composite request that tries to rename(2) the file first. If
187 rename files with C<EXDEV>, it creates the destination file with mode 0200
188 and copies the contents of the source file into it using C<aio_sendfile>,
189 followed by restoring atime, mtime, access mode and uid/gid, in that
190 order, and unlinking the C<$srcpath>.
191
192 If an error occurs, the partial destination file will be unlinked, if
193 possible, except when setting atime, mtime, access mode and uid/gid, where
194 errors are being ignored.
195
196 =cut
197
198 sub aio_move($$$) {
199 my ($src, $dst, $cb) = @_;
200
201 aio_rename $src, $dst, sub {
202 if ($_[0] && $! == EXDEV) {
203 aio_open $src, O_RDONLY, 0, sub {
204 if (my $src_fh = $_[0]) {
205 my @stat = stat $src_fh;
206
207 aio_open $dst, O_WRONLY, 0200, sub {
208 if (my $dst_fh = $_[0]) {
209 aio_sendfile $dst_fh, $src_fh, 0, $stat[7], sub {
210 close $src_fh;
211
212 if ($_[0] == $stat[7]) {
213 utime $stat[8], $stat[9], $dst;
214 chmod $stat[2] & 07777, $dst_fh;
215 chown $stat[4], $stat[5], $dst_fh;
216 close $dst_fh;
217
218 aio_unlink $src, sub {
219 $cb->($_[0]);
220 };
221 } else {
222 my $errno = $!;
223 aio_unlink $dst, sub {
224 $! = $errno;
225 $cb->(-1);
226 };
227 }
228 };
229 } else {
230 $cb->(-1);
231 }
232 },
233
234 } else {
235 $cb->(-1);
236 }
237 };
238 } else {
239 $cb->($_[0]);
240 }
241 };
242 }
243
244 =item aio_sendfile $out_fh, $in_fh, $in_offset, $length, $callback->($retval)
245
246 Tries to copy C<$length> bytes from C<$in_fh> to C<$out_fh>. It starts
247 reading at byte offset C<$in_offset>, and starts writing at the current
248 file offset of C<$out_fh>. Because of that, it is not safe to issue more
249 than one C<aio_sendfile> per C<$out_fh>, as they will interfere with each
250 other.
251
252 This call tries to make use of a native C<sendfile> syscall to provide
253 zero-copy operation. For this to work, C<$out_fh> should refer to a
254 socket, and C<$in_fh> should refer to mmap'able file.
255
256 If the native sendfile call fails or is not implemented, it will be
257 emulated, so you can call C<aio_sendfile> on any type of filehandle
258 regardless of the limitations of the operating system.
259
260 Please note, however, that C<aio_sendfile> can read more bytes from
261 C<$in_fh> than are written, and there is no way to find out how many
262 bytes have been read from C<aio_sendfile> alone, as C<aio_sendfile> only
263 provides the number of bytes written to C<$out_fh>. Only if the result
264 value equals C<$length> one can assume that C<$length> bytes have been
265 read.
266
267 =item aio_readahead $fh,$offset,$length, $callback->($retval)
268
269 C<aio_readahead> populates the page cache with data from a file so that
270 subsequent reads from that file will not block on disk I/O. The C<$offset>
271 argument specifies the starting point from which data is to be read and
272 C<$length> specifies the number of bytes to be read. I/O is performed in
273 whole pages, so that offset is effectively rounded down to a page boundary
274 and bytes are read up to the next page boundary greater than or equal to
275 (off-set+length). C<aio_readahead> does not read beyond the end of the
276 file. The current file offset of the file is left unchanged.
277
278 If that syscall doesn't exist (likely if your OS isn't Linux) it will be
279 emulated by simply reading the data, which would have a similar effect.
280
281 =item aio_stat $fh_or_path, $callback->($status)
282
283 =item aio_lstat $fh, $callback->($status)
284
285 Works like perl's C<stat> or C<lstat> in void context. The callback will
286 be called after the stat and the results will be available using C<stat _>
287 or C<-s _> etc...
288
289 The pathname passed to C<aio_stat> must be absolute. See API NOTES, above,
290 for an explanation.
291
292 Currently, the stats are always 64-bit-stats, i.e. instead of returning an
293 error when stat'ing a large file, the results will be silently truncated
294 unless perl itself is compiled with large file support.
295
296 Example: Print the length of F</etc/passwd>:
297
298 aio_stat "/etc/passwd", sub {
299 $_[0] and die "stat failed: $!";
300 print "size is ", -s _, "\n";
301 };
302
303 =item aio_unlink $pathname, $callback->($status)
304
305 Asynchronously unlink (delete) a file and call the callback with the
306 result code.
307
308 =item aio_link $srcpath, $dstpath, $callback->($status)
309
310 Asynchronously create a new link to the existing object at C<$srcpath> at
311 the path C<$dstpath> and call the callback with the result code.
312
313 =item aio_symlink $srcpath, $dstpath, $callback->($status)
314
315 Asynchronously create a new symbolic link to the existing object at C<$srcpath> at
316 the path C<$dstpath> and call the callback with the result code.
317
318 =item aio_rename $srcpath, $dstpath, $callback->($status)
319
320 Asynchronously rename the object at C<$srcpath> to C<$dstpath>, just as
321 rename(2) and call the callback with the result code.
322
323 =item aio_rmdir $pathname, $callback->($status)
324
325 Asynchronously rmdir (delete) a directory and call the callback with the
326 result code.
327
328 =item aio_readdir $pathname, $callback->($entries)
329
330 Unlike the POSIX call of the same name, C<aio_readdir> reads an entire
331 directory (i.e. opendir + readdir + closedir). The entries will not be
332 sorted, and will B<NOT> include the C<.> and C<..> entries.
333
334 The callback a single argument which is either C<undef> or an array-ref
335 with the filenames.
336
337 =item aio_scandir $path, $maxreq, $callback->($dirs, $nondirs)
338
339 Scans a directory (similar to C<aio_readdir>) but additionally tries to
340 separate the entries of directory C<$path> into two sets of names, ones
341 you can recurse into (directories or links to them), and ones you cannot
342 recurse into (everything else).
343
344 C<aio_scandir> is a composite request that consists of many sub
345 requests. C<$maxreq> specifies the maximum number of outstanding aio
346 requests that this function generates. If it is C<< <= 0 >>, then a
347 suitable default will be chosen (currently 8).
348
349 On error, the callback is called without arguments, otherwise it receives
350 two array-refs with path-relative entry names.
351
352 Example:
353
354 aio_scandir $dir, 0, sub {
355 my ($dirs, $nondirs) = @_;
356 print "real directories: @$dirs\n";
357 print "everything else: @$nondirs\n";
358 };
359
360 Implementation notes.
361
362 The C<aio_readdir> cannot be avoided, but C<stat()>'ing every entry can.
363
364 After reading the directory, the modification time, size etc. of the
365 directory before and after the readdir is checked, and if they match (and
366 isn't the current time), the link count will be used to decide how many
367 entries are directories (if >= 2). Otherwise, no knowledge of the number
368 of subdirectories will be assumed.
369
370 Then entries will be sorted into likely directories (everything without
371 a non-initial dot currently) and likely non-directories (everything
372 else). Then every entry plus an appended C</.> will be C<stat>'ed,
373 likely directories first. If that succeeds, it assumes that the entry
374 is a directory or a symlink to directory (which will be checked
375 seperately). This is often faster than stat'ing the entry itself because
376 filesystems might detect the type of the entry without reading the inode
377 data (e.g. ext2fs filetype feature).
378
379 If the known number of directories (link count - 2) has been reached, the
380 rest of the entries is assumed to be non-directories.
381
382 This only works with certainty on POSIX (= UNIX) filesystems, which
383 fortunately are the vast majority of filesystems around.
384
385 It will also likely work on non-POSIX filesystems with reduced efficiency
386 as those tend to return 0 or 1 as link counts, which disables the
387 directory counting heuristic.
388
389 =cut
390
391 sub aio_scandir($$$) {
392 my ($path, $maxreq, $cb) = @_;
393
394 $maxreq = 8 if $maxreq <= 0;
395
396 # stat once
397 aio_stat $path, sub {
398 return $cb->() if $_[0];
399 my $now = time;
400 my $hash1 = join ":", (stat _)[0,1,3,7,9];
401
402 # read the directory entries
403 aio_readdir $path, sub {
404 my $entries = shift
405 or return $cb->();
406
407 # stat the dir another time
408 aio_stat $path, sub {
409 my $hash2 = join ":", (stat _)[0,1,3,7,9];
410
411 my $ndirs;
412
413 # take the slow route if anything looks fishy
414 if ($hash1 ne $hash2 or (stat _)[9] == $now) {
415 $ndirs = -1;
416 } else {
417 # if nlink == 2, we are finished
418 # on non-posix-fs's, we rely on nlink < 2
419 $ndirs = (stat _)[3] - 2
420 or return $cb->([], $entries);
421 }
422
423 # sort into likely dirs and likely nondirs
424 # dirs == files without ".", short entries first
425 $entries = [map $_->[0],
426 sort { $b->[1] cmp $a->[1] }
427 map [$_, sprintf "%s%04d", (/.\./ ? "1" : "0"), length],
428 @$entries];
429
430 my (@dirs, @nondirs);
431
432 my ($statcb, $schedcb);
433 my $nreq = 0;
434
435 $schedcb = sub {
436 if (@$entries) {
437 if ($nreq < $maxreq) {
438 my $ent = pop @$entries;
439 $nreq++;
440 aio_stat "$path/$ent/.", sub { $statcb->($_[0], $ent) };
441 }
442 } elsif (!$nreq) {
443 # finished
444 undef $statcb;
445 undef $schedcb;
446 $cb->(\@dirs, \@nondirs) if $cb;
447 undef $cb;
448 }
449 };
450 $statcb = sub {
451 my ($status, $entry) = @_;
452
453 if ($status < 0) {
454 $nreq--;
455 push @nondirs, $entry;
456 &$schedcb;
457 } else {
458 # need to check for real directory
459 aio_lstat "$path/$entry", sub {
460 $nreq--;
461
462 if (-d _) {
463 push @dirs, $entry;
464
465 if (!--$ndirs) {
466 push @nondirs, @$entries;
467 $entries = [];
468 }
469 } else {
470 push @nondirs, $entry;
471 }
472
473 &$schedcb;
474 }
475 }
476 };
477
478 &$schedcb while @$entries && $nreq < $maxreq;
479 };
480 };
481 };
482 }
483
484 =item aio_fsync $fh, $callback->($status)
485
486 Asynchronously call fsync on the given filehandle and call the callback
487 with the fsync result code.
488
489 =item aio_fdatasync $fh, $callback->($status)
490
491 Asynchronously call fdatasync on the given filehandle and call the
492 callback with the fdatasync result code.
493
494 If this call isn't available because your OS lacks it or it couldn't be
495 detected, it will be emulated by calling C<fsync> instead.
496
497 =back
498
499 =head2 IO::AIO::REQ CLASS
500
501 All non-aggregate C<aio_*> functions return an object of this class when
502 called in non-void context.
503
504 A request always moves through the following five states in its lifetime,
505 in order: B<ready> (request has been created, but has not been executed
506 yet), B<execute> (request is currently being executed), B<pending>
507 (request has been executed but callback has not been called yet),
508 B<result> (results are being processed synchronously, includes calling the
509 callback) and B<done> (request has reached the end of its lifetime and
510 holds no resources anymore).
511
512 =over 4
513
514 =item $req->cancel
515
516 Cancels the request, if possible. Has the effect of skipping execution
517 when entering the B<execute> state and skipping calling the callback when
518 entering the the B<result> state, but will leave the request otherwise
519 untouched. That means that requests that currently execute will not be
520 stopped and resources held by the request will not be freed prematurely.
521
522 =back
523
524 =head2 SUPPORT FUNCTIONS
525
526 =over 4
527
528 =item $fileno = IO::AIO::poll_fileno
529
530 Return the I<request result pipe file descriptor>. This filehandle must be
531 polled for reading by some mechanism outside this module (e.g. Event or
532 select, see below or the SYNOPSIS). If the pipe becomes readable you have
533 to call C<poll_cb> to check the results.
534
535 See C<poll_cb> for an example.
536
537 =item IO::AIO::poll_cb
538
539 Process all outstanding events on the result pipe. You have to call this
540 regularly. Returns the number of events processed. Returns immediately
541 when no events are outstanding.
542
543 Example: Install an Event watcher that automatically calls
544 IO::AIO::poll_cb with high priority:
545
546 Event->io (fd => IO::AIO::poll_fileno,
547 poll => 'r', async => 1,
548 cb => \&IO::AIO::poll_cb);
549
550 =item IO::AIO::poll_wait
551
552 Wait till the result filehandle becomes ready for reading (simply does a
553 C<select> on the filehandle. This is useful if you want to synchronously wait
554 for some requests to finish).
555
556 See C<nreqs> for an example.
557
558 =item IO::AIO::nreqs
559
560 Returns the number of requests currently outstanding (i.e. for which their
561 callback has not been invoked yet).
562
563 Example: wait till there are no outstanding requests anymore:
564
565 IO::AIO::poll_wait, IO::AIO::poll_cb
566 while IO::AIO::nreqs;
567
568 =item IO::AIO::flush
569
570 Wait till all outstanding AIO requests have been handled.
571
572 Strictly equivalent to:
573
574 IO::AIO::poll_wait, IO::AIO::poll_cb
575 while IO::AIO::nreqs;
576
577 =item IO::AIO::poll
578
579 Waits until some requests have been handled.
580
581 Strictly equivalent to:
582
583 IO::AIO::poll_wait, IO::AIO::poll_cb
584 if IO::AIO::nreqs;
585
586 =item IO::AIO::min_parallel $nthreads
587
588 Set the minimum number of AIO threads to C<$nthreads>. The current default
589 is C<4>, which means four asynchronous operations can be done at one time
590 (the number of outstanding operations, however, is unlimited).
591
592 IO::AIO starts threads only on demand, when an AIO request is queued and
593 no free thread exists.
594
595 It is recommended to keep the number of threads low, as some Linux
596 kernel versions will scale negatively with the number of threads (higher
597 parallelity => MUCH higher latency). With current Linux 2.6 versions, 4-32
598 threads should be fine.
599
600 Under most circumstances you don't need to call this function, as the
601 module selects a default that is suitable for low to moderate load.
602
603 =item IO::AIO::max_parallel $nthreads
604
605 Sets the maximum number of AIO threads to C<$nthreads>. If more than the
606 specified number of threads are currently running, this function kills
607 them. This function blocks until the limit is reached.
608
609 While C<$nthreads> are zero, aio requests get queued but not executed
610 until the number of threads has been increased again.
611
612 This module automatically runs C<max_parallel 0> at program end, to ensure
613 that all threads are killed and that there are no outstanding requests.
614
615 Under normal circumstances you don't need to call this function.
616
617 =item $oldnreqs = IO::AIO::max_outstanding $nreqs
618
619 Sets the maximum number of outstanding requests to C<$nreqs>. If you
620 try to queue up more than this number of requests, the caller will block until
621 some requests have been handled.
622
623 The default is very large, so normally there is no practical limit. If you
624 queue up many requests in a loop it often improves speed if you set
625 this to a relatively low number, such as C<100>.
626
627 Under normal circumstances you don't need to call this function.
628
629 =back
630
631 =cut
632
633 # support function to convert a fd into a perl filehandle
634 sub _fd2fh {
635 return undef if $_[0] < 0;
636
637 # try to generate nice filehandles
638 my $sym = "IO::AIO::fd#$_[0]";
639 local *$sym;
640
641 open *$sym, "+<&=$_[0]" # usually works under any unix
642 or open *$sym, "<&=$_[0]" # cygwin needs this
643 or open *$sym, ">&=$_[0]" # or this
644 or return undef;
645
646 *$sym
647 }
648
649 min_parallel 4;
650
651 END {
652 max_parallel 0;
653 }
654
655 1;
656
657 =head2 FORK BEHAVIOUR
658
659 This module should do "the right thing" when the process using it forks:
660
661 Before the fork, IO::AIO enters a quiescent state where no requests
662 can be added in other threads and no results will be processed. After
663 the fork the parent simply leaves the quiescent state and continues
664 request/result processing, while the child clears the request/result
665 queue (so the requests started before the fork will only be handled in
666 the parent). Threads will be started on demand until the limit ste in the
667 parent process has been reached again.
668
669 In short: the parent will, after a short pause, continue as if fork had
670 not been called, while the child will act as if IO::AIO has not been used
671 yet.
672
673 =head1 SEE ALSO
674
675 L<Coro>, L<Linux::AIO> (obsolete).
676
677 =head1 AUTHOR
678
679 Marc Lehmann <schmorp@schmorp.de>
680 http://home.schmorp.de/
681
682 =cut
683