ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/IO-AIO/AIO.pm
Revision: 1.54
Committed: Sun Oct 22 00:19:05 2006 UTC (17 years, 7 months ago) by root
Branch: MAIN
Changes since 1.53: +11 -1 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 IO::AIO - Asynchronous Input/Output
4
5 =head1 SYNOPSIS
6
7 use IO::AIO;
8
9 aio_open "/etc/passwd", O_RDONLY, 0, sub {
10 my ($fh) = @_;
11 ...
12 };
13
14 aio_unlink "/tmp/file", sub { };
15
16 aio_read $fh, 30000, 1024, $buffer, 0, sub {
17 $_[0] > 0 or die "read error: $!";
18 };
19
20 use IO::AIO 2; # version has aio objects
21
22 my $req = aio_unlink "/tmp/file", sub { };
23 $req->cancel; # cancel request if still in queue
24
25 # AnyEvent
26 open my $fh, "<&=" . IO::AIO::poll_fileno or die "$!";
27 my $w = AnyEvent->io (fh => $fh, poll => 'r', cb => sub { IO::AIO::poll_cb });
28
29 # Event
30 Event->io (fd => IO::AIO::poll_fileno,
31 poll => 'r',
32 cb => \&IO::AIO::poll_cb);
33
34 # Glib/Gtk2
35 add_watch Glib::IO IO::AIO::poll_fileno,
36 in => sub { IO::AIO::poll_cb; 1 };
37
38 # Tk
39 Tk::Event::IO->fileevent (IO::AIO::poll_fileno, "",
40 readable => \&IO::AIO::poll_cb);
41
42 # Danga::Socket
43 Danga::Socket->AddOtherFds (IO::AIO::poll_fileno =>
44 \&IO::AIO::poll_cb);
45
46
47 =head1 DESCRIPTION
48
49 This module implements asynchronous I/O using whatever means your
50 operating system supports.
51
52 Currently, a number of threads are started that execute your read/writes
53 and signal their completion. You don't need thread support in your libc or
54 perl, and the threads created by this module will not be visible to the
55 pthreads library. In the future, this module might make use of the native
56 aio functions available on many operating systems. However, they are often
57 not well-supported (Linux doesn't allow them on normal files currently,
58 for example), and they would only support aio_read and aio_write, so the
59 remaining functionality would have to be implemented using threads anyway.
60
61 Although the module will work with in the presence of other threads, it is
62 currently not reentrant, so use appropriate locking yourself, always call
63 C<poll_cb> from within the same thread, or never call C<poll_cb> (or other
64 C<aio_> functions) recursively.
65
66 =cut
67
68 package IO::AIO;
69
70 no warnings;
71 use strict 'vars';
72
73 use base 'Exporter';
74
75 BEGIN {
76 our $VERSION = '1.99';
77
78 our @EXPORT = qw(aio_sendfile aio_read aio_write aio_open aio_close aio_stat
79 aio_lstat aio_unlink aio_rmdir aio_readdir aio_scandir aio_symlink
80 aio_fsync aio_fdatasync aio_readahead aio_rename aio_link aio_move
81 aio_group);
82 our @EXPORT_OK = qw(poll_fileno poll_cb min_parallel max_parallel max_outstanding nreqs);
83
84 @IO::AIO::GRP::ISA = 'IO::AIO::REQ';
85
86 require XSLoader;
87 XSLoader::load ("IO::AIO", $VERSION);
88 }
89
90 =head1 FUNCTIONS
91
92 =head2 AIO FUNCTIONS
93
94 All the C<aio_*> calls are more or less thin wrappers around the syscall
95 with the same name (sans C<aio_>). The arguments are similar or identical,
96 and they all accept an additional (and optional) C<$callback> argument
97 which must be a code reference. This code reference will get called with
98 the syscall return code (e.g. most syscalls return C<-1> on error, unlike
99 perl, which usually delivers "false") as it's sole argument when the given
100 syscall has been executed asynchronously.
101
102 All functions expecting a filehandle keep a copy of the filehandle
103 internally until the request has finished.
104
105 All non-composite requests (requests that are not broken down into
106 multiple requests) return objects of type L<IO::AIO::REQ> that allow
107 further manipulation of running requests.
108
109 The pathnames you pass to these routines I<must> be absolute and
110 encoded in byte form. The reason for the former is that at the time the
111 request is being executed, the current working directory could have
112 changed. Alternatively, you can make sure that you never change the
113 current working directory.
114
115 To encode pathnames to byte form, either make sure you either: a)
116 always pass in filenames you got from outside (command line, readdir
117 etc.), b) are ASCII or ISO 8859-1, c) use the Encode module and encode
118 your pathnames to the locale (or other) encoding in effect in the user
119 environment, d) use Glib::filename_from_unicode on unicode filenames or e)
120 use something else.
121
122 =over 4
123
124 =item aio_open $pathname, $flags, $mode, $callback->($fh)
125
126 Asynchronously open or create a file and call the callback with a newly
127 created filehandle for the file.
128
129 The pathname passed to C<aio_open> must be absolute. See API NOTES, above,
130 for an explanation.
131
132 The C<$flags> argument is a bitmask. See the C<Fcntl> module for a
133 list. They are the same as used by C<sysopen>.
134
135 Likewise, C<$mode> specifies the mode of the newly created file, if it
136 didn't exist and C<O_CREAT> has been given, just like perl's C<sysopen>,
137 except that it is mandatory (i.e. use C<0> if you don't create new files,
138 and C<0666> or C<0777> if you do).
139
140 Example:
141
142 aio_open "/etc/passwd", O_RDONLY, 0, sub {
143 if ($_[0]) {
144 print "open successful, fh is $_[0]\n";
145 ...
146 } else {
147 die "open failed: $!\n";
148 }
149 };
150
151 =item aio_close $fh, $callback->($status)
152
153 Asynchronously close a file and call the callback with the result
154 code. I<WARNING:> although accepted, you should not pass in a perl
155 filehandle here, as perl will likely close the file descriptor another
156 time when the filehandle is destroyed. Normally, you can safely call perls
157 C<close> or just let filehandles go out of scope.
158
159 This is supposed to be a bug in the API, so that might change. It's
160 therefore best to avoid this function.
161
162 =item aio_read $fh,$offset,$length, $data,$dataoffset, $callback->($retval)
163
164 =item aio_write $fh,$offset,$length, $data,$dataoffset, $callback->($retval)
165
166 Reads or writes C<length> bytes from the specified C<fh> and C<offset>
167 into the scalar given by C<data> and offset C<dataoffset> and calls the
168 callback without the actual number of bytes read (or -1 on error, just
169 like the syscall).
170
171 The C<$data> scalar I<MUST NOT> be modified in any way while the request
172 is outstanding. Modifying it can result in segfaults or WW3 (if the
173 necessary/optional hardware is installed).
174
175 Example: Read 15 bytes at offset 7 into scalar C<$buffer>, starting at
176 offset C<0> within the scalar:
177
178 aio_read $fh, 7, 15, $buffer, 0, sub {
179 $_[0] > 0 or die "read error: $!";
180 print "read $_[0] bytes: <$buffer>\n";
181 };
182
183 =item aio_move $srcpath, $dstpath, $callback->($status)
184
185 Try to move the I<file> (directories not supported as either source or
186 destination) from C<$srcpath> to C<$dstpath> and call the callback with
187 the C<0> (error) or C<-1> ok.
188
189 This is a composite request that tries to rename(2) the file first. If
190 rename files with C<EXDEV>, it creates the destination file with mode 0200
191 and copies the contents of the source file into it using C<aio_sendfile>,
192 followed by restoring atime, mtime, access mode and uid/gid, in that
193 order, and unlinking the C<$srcpath>.
194
195 If an error occurs, the partial destination file will be unlinked, if
196 possible, except when setting atime, mtime, access mode and uid/gid, where
197 errors are being ignored.
198
199 =cut
200
201 sub aio_move($$$) {
202 my ($src, $dst, $cb) = @_;
203
204 aio_rename $src, $dst, sub {
205 if ($_[0] && $! == EXDEV) {
206 aio_open $src, O_RDONLY, 0, sub {
207 if (my $src_fh = $_[0]) {
208 my @stat = stat $src_fh;
209
210 aio_open $dst, O_WRONLY, 0200, sub {
211 if (my $dst_fh = $_[0]) {
212 aio_sendfile $dst_fh, $src_fh, 0, $stat[7], sub {
213 close $src_fh;
214
215 if ($_[0] == $stat[7]) {
216 utime $stat[8], $stat[9], $dst;
217 chmod $stat[2] & 07777, $dst_fh;
218 chown $stat[4], $stat[5], $dst_fh;
219 close $dst_fh;
220
221 aio_unlink $src, sub {
222 $cb->($_[0]);
223 };
224 } else {
225 my $errno = $!;
226 aio_unlink $dst, sub {
227 $! = $errno;
228 $cb->(-1);
229 };
230 }
231 };
232 } else {
233 $cb->(-1);
234 }
235 },
236
237 } else {
238 $cb->(-1);
239 }
240 };
241 } else {
242 $cb->($_[0]);
243 }
244 };
245 }
246
247 =item aio_sendfile $out_fh, $in_fh, $in_offset, $length, $callback->($retval)
248
249 Tries to copy C<$length> bytes from C<$in_fh> to C<$out_fh>. It starts
250 reading at byte offset C<$in_offset>, and starts writing at the current
251 file offset of C<$out_fh>. Because of that, it is not safe to issue more
252 than one C<aio_sendfile> per C<$out_fh>, as they will interfere with each
253 other.
254
255 This call tries to make use of a native C<sendfile> syscall to provide
256 zero-copy operation. For this to work, C<$out_fh> should refer to a
257 socket, and C<$in_fh> should refer to mmap'able file.
258
259 If the native sendfile call fails or is not implemented, it will be
260 emulated, so you can call C<aio_sendfile> on any type of filehandle
261 regardless of the limitations of the operating system.
262
263 Please note, however, that C<aio_sendfile> can read more bytes from
264 C<$in_fh> than are written, and there is no way to find out how many
265 bytes have been read from C<aio_sendfile> alone, as C<aio_sendfile> only
266 provides the number of bytes written to C<$out_fh>. Only if the result
267 value equals C<$length> one can assume that C<$length> bytes have been
268 read.
269
270 =item aio_readahead $fh,$offset,$length, $callback->($retval)
271
272 C<aio_readahead> populates the page cache with data from a file so that
273 subsequent reads from that file will not block on disk I/O. The C<$offset>
274 argument specifies the starting point from which data is to be read and
275 C<$length> specifies the number of bytes to be read. I/O is performed in
276 whole pages, so that offset is effectively rounded down to a page boundary
277 and bytes are read up to the next page boundary greater than or equal to
278 (off-set+length). C<aio_readahead> does not read beyond the end of the
279 file. The current file offset of the file is left unchanged.
280
281 If that syscall doesn't exist (likely if your OS isn't Linux) it will be
282 emulated by simply reading the data, which would have a similar effect.
283
284 =item aio_stat $fh_or_path, $callback->($status)
285
286 =item aio_lstat $fh, $callback->($status)
287
288 Works like perl's C<stat> or C<lstat> in void context. The callback will
289 be called after the stat and the results will be available using C<stat _>
290 or C<-s _> etc...
291
292 The pathname passed to C<aio_stat> must be absolute. See API NOTES, above,
293 for an explanation.
294
295 Currently, the stats are always 64-bit-stats, i.e. instead of returning an
296 error when stat'ing a large file, the results will be silently truncated
297 unless perl itself is compiled with large file support.
298
299 Example: Print the length of F</etc/passwd>:
300
301 aio_stat "/etc/passwd", sub {
302 $_[0] and die "stat failed: $!";
303 print "size is ", -s _, "\n";
304 };
305
306 =item aio_unlink $pathname, $callback->($status)
307
308 Asynchronously unlink (delete) a file and call the callback with the
309 result code.
310
311 =item aio_link $srcpath, $dstpath, $callback->($status)
312
313 Asynchronously create a new link to the existing object at C<$srcpath> at
314 the path C<$dstpath> and call the callback with the result code.
315
316 =item aio_symlink $srcpath, $dstpath, $callback->($status)
317
318 Asynchronously create a new symbolic link to the existing object at C<$srcpath> at
319 the path C<$dstpath> and call the callback with the result code.
320
321 =item aio_rename $srcpath, $dstpath, $callback->($status)
322
323 Asynchronously rename the object at C<$srcpath> to C<$dstpath>, just as
324 rename(2) and call the callback with the result code.
325
326 =item aio_rmdir $pathname, $callback->($status)
327
328 Asynchronously rmdir (delete) a directory and call the callback with the
329 result code.
330
331 =item aio_readdir $pathname, $callback->($entries)
332
333 Unlike the POSIX call of the same name, C<aio_readdir> reads an entire
334 directory (i.e. opendir + readdir + closedir). The entries will not be
335 sorted, and will B<NOT> include the C<.> and C<..> entries.
336
337 The callback a single argument which is either C<undef> or an array-ref
338 with the filenames.
339
340 =item aio_scandir $path, $maxreq, $callback->($dirs, $nondirs)
341
342 Scans a directory (similar to C<aio_readdir>) but additionally tries to
343 separate the entries of directory C<$path> into two sets of names, ones
344 you can recurse into (directories or links to them), and ones you cannot
345 recurse into (everything else).
346
347 C<aio_scandir> is a composite request that consists of many sub
348 requests. C<$maxreq> specifies the maximum number of outstanding aio
349 requests that this function generates. If it is C<< <= 0 >>, then a
350 suitable default will be chosen (currently 8).
351
352 On error, the callback is called without arguments, otherwise it receives
353 two array-refs with path-relative entry names.
354
355 Example:
356
357 aio_scandir $dir, 0, sub {
358 my ($dirs, $nondirs) = @_;
359 print "real directories: @$dirs\n";
360 print "everything else: @$nondirs\n";
361 };
362
363 Implementation notes.
364
365 The C<aio_readdir> cannot be avoided, but C<stat()>'ing every entry can.
366
367 After reading the directory, the modification time, size etc. of the
368 directory before and after the readdir is checked, and if they match (and
369 isn't the current time), the link count will be used to decide how many
370 entries are directories (if >= 2). Otherwise, no knowledge of the number
371 of subdirectories will be assumed.
372
373 Then entries will be sorted into likely directories (everything without
374 a non-initial dot currently) and likely non-directories (everything
375 else). Then every entry plus an appended C</.> will be C<stat>'ed,
376 likely directories first. If that succeeds, it assumes that the entry
377 is a directory or a symlink to directory (which will be checked
378 seperately). This is often faster than stat'ing the entry itself because
379 filesystems might detect the type of the entry without reading the inode
380 data (e.g. ext2fs filetype feature).
381
382 If the known number of directories (link count - 2) has been reached, the
383 rest of the entries is assumed to be non-directories.
384
385 This only works with certainty on POSIX (= UNIX) filesystems, which
386 fortunately are the vast majority of filesystems around.
387
388 It will also likely work on non-POSIX filesystems with reduced efficiency
389 as those tend to return 0 or 1 as link counts, which disables the
390 directory counting heuristic.
391
392 =cut
393
394 sub aio_scandir($$$) {
395 my ($path, $maxreq, $cb) = @_;
396
397 $maxreq = 8 if $maxreq <= 0;
398
399 # stat once
400 aio_stat $path, sub {
401 return $cb->() if $_[0];
402 my $now = time;
403 my $hash1 = join ":", (stat _)[0,1,3,7,9];
404
405 # read the directory entries
406 aio_readdir $path, sub {
407 my $entries = shift
408 or return $cb->();
409
410 # stat the dir another time
411 aio_stat $path, sub {
412 my $hash2 = join ":", (stat _)[0,1,3,7,9];
413
414 my $ndirs;
415
416 # take the slow route if anything looks fishy
417 if ($hash1 ne $hash2 or (stat _)[9] == $now) {
418 $ndirs = -1;
419 } else {
420 # if nlink == 2, we are finished
421 # on non-posix-fs's, we rely on nlink < 2
422 $ndirs = (stat _)[3] - 2
423 or return $cb->([], $entries);
424 }
425
426 # sort into likely dirs and likely nondirs
427 # dirs == files without ".", short entries first
428 $entries = [map $_->[0],
429 sort { $b->[1] cmp $a->[1] }
430 map [$_, sprintf "%s%04d", (/.\./ ? "1" : "0"), length],
431 @$entries];
432
433 my (@dirs, @nondirs);
434
435 my ($statcb, $schedcb);
436 my $nreq = 0;
437
438 $schedcb = sub {
439 if (@$entries) {
440 if ($nreq < $maxreq) {
441 my $ent = pop @$entries;
442 $nreq++;
443 aio_stat "$path/$ent/.", sub { $statcb->($_[0], $ent) };
444 }
445 } elsif (!$nreq) {
446 # finished
447 undef $statcb;
448 undef $schedcb;
449 $cb->(\@dirs, \@nondirs) if $cb;
450 undef $cb;
451 }
452 };
453 $statcb = sub {
454 my ($status, $entry) = @_;
455
456 if ($status < 0) {
457 $nreq--;
458 push @nondirs, $entry;
459 &$schedcb;
460 } else {
461 # need to check for real directory
462 aio_lstat "$path/$entry", sub {
463 $nreq--;
464
465 if (-d _) {
466 push @dirs, $entry;
467
468 if (!--$ndirs) {
469 push @nondirs, @$entries;
470 $entries = [];
471 }
472 } else {
473 push @nondirs, $entry;
474 }
475
476 &$schedcb;
477 }
478 }
479 };
480
481 &$schedcb while @$entries && $nreq < $maxreq;
482 };
483 };
484 };
485 }
486
487 =item aio_fsync $fh, $callback->($status)
488
489 Asynchronously call fsync on the given filehandle and call the callback
490 with the fsync result code.
491
492 =item aio_fdatasync $fh, $callback->($status)
493
494 Asynchronously call fdatasync on the given filehandle and call the
495 callback with the fdatasync result code.
496
497 If this call isn't available because your OS lacks it or it couldn't be
498 detected, it will be emulated by calling C<fsync> instead.
499
500 =item aio_group $callback->()
501
502 =item aio_sleep $fractional_seconds, $callback->() *NOT EXPORTED*
503
504 Mainly used for debugging and benchmarking, this aio request puts one of
505 the request workers to sleep for the given time.
506
507 =back
508
509 =head2 IO::AIO::REQ CLASS
510
511 All non-aggregate C<aio_*> functions return an object of this class when
512 called in non-void context.
513
514 A request always moves through the following five states in its lifetime,
515 in order: B<ready> (request has been created, but has not been executed
516 yet), B<execute> (request is currently being executed), B<pending>
517 (request has been executed but callback has not been called yet),
518 B<result> (results are being processed synchronously, includes calling the
519 callback) and B<done> (request has reached the end of its lifetime and
520 holds no resources anymore).
521
522 =over 4
523
524 =item $req->cancel
525
526 Cancels the request, if possible. Has the effect of skipping execution
527 when entering the B<execute> state and skipping calling the callback when
528 entering the the B<result> state, but will leave the request otherwise
529 untouched. That means that requests that currently execute will not be
530 stopped and resources held by the request will not be freed prematurely.
531
532 =back
533
534 =head2 SUPPORT FUNCTIONS
535
536 =over 4
537
538 =item $fileno = IO::AIO::poll_fileno
539
540 Return the I<request result pipe file descriptor>. This filehandle must be
541 polled for reading by some mechanism outside this module (e.g. Event or
542 select, see below or the SYNOPSIS). If the pipe becomes readable you have
543 to call C<poll_cb> to check the results.
544
545 See C<poll_cb> for an example.
546
547 =item IO::AIO::poll_cb
548
549 Process all outstanding events on the result pipe. You have to call this
550 regularly. Returns the number of events processed. Returns immediately
551 when no events are outstanding.
552
553 Example: Install an Event watcher that automatically calls
554 IO::AIO::poll_cb with high priority:
555
556 Event->io (fd => IO::AIO::poll_fileno,
557 poll => 'r', async => 1,
558 cb => \&IO::AIO::poll_cb);
559
560 =item IO::AIO::poll_wait
561
562 Wait till the result filehandle becomes ready for reading (simply does a
563 C<select> on the filehandle. This is useful if you want to synchronously wait
564 for some requests to finish).
565
566 See C<nreqs> for an example.
567
568 =item IO::AIO::nreqs
569
570 Returns the number of requests currently outstanding (i.e. for which their
571 callback has not been invoked yet).
572
573 Example: wait till there are no outstanding requests anymore:
574
575 IO::AIO::poll_wait, IO::AIO::poll_cb
576 while IO::AIO::nreqs;
577
578 =item IO::AIO::flush
579
580 Wait till all outstanding AIO requests have been handled.
581
582 Strictly equivalent to:
583
584 IO::AIO::poll_wait, IO::AIO::poll_cb
585 while IO::AIO::nreqs;
586
587 =item IO::AIO::poll
588
589 Waits until some requests have been handled.
590
591 Strictly equivalent to:
592
593 IO::AIO::poll_wait, IO::AIO::poll_cb
594 if IO::AIO::nreqs;
595
596 =item IO::AIO::min_parallel $nthreads
597
598 Set the minimum number of AIO threads to C<$nthreads>. The current default
599 is C<4>, which means four asynchronous operations can be done at one time
600 (the number of outstanding operations, however, is unlimited).
601
602 IO::AIO starts threads only on demand, when an AIO request is queued and
603 no free thread exists.
604
605 It is recommended to keep the number of threads low, as some Linux
606 kernel versions will scale negatively with the number of threads (higher
607 parallelity => MUCH higher latency). With current Linux 2.6 versions, 4-32
608 threads should be fine.
609
610 Under most circumstances you don't need to call this function, as the
611 module selects a default that is suitable for low to moderate load.
612
613 =item IO::AIO::max_parallel $nthreads
614
615 Sets the maximum number of AIO threads to C<$nthreads>. If more than the
616 specified number of threads are currently running, this function kills
617 them. This function blocks until the limit is reached.
618
619 While C<$nthreads> are zero, aio requests get queued but not executed
620 until the number of threads has been increased again.
621
622 This module automatically runs C<max_parallel 0> at program end, to ensure
623 that all threads are killed and that there are no outstanding requests.
624
625 Under normal circumstances you don't need to call this function.
626
627 =item $oldnreqs = IO::AIO::max_outstanding $nreqs
628
629 Sets the maximum number of outstanding requests to C<$nreqs>. If you
630 try to queue up more than this number of requests, the caller will block until
631 some requests have been handled.
632
633 The default is very large, so normally there is no practical limit. If you
634 queue up many requests in a loop it often improves speed if you set
635 this to a relatively low number, such as C<100>.
636
637 Under normal circumstances you don't need to call this function.
638
639 =back
640
641 =cut
642
643 # support function to convert a fd into a perl filehandle
644 sub _fd2fh {
645 return undef if $_[0] < 0;
646
647 # try to generate nice filehandles
648 my $sym = "IO::AIO::fd#$_[0]";
649 local *$sym;
650
651 open *$sym, "+<&=$_[0]" # usually works under any unix
652 or open *$sym, "<&=$_[0]" # cygwin needs this
653 or open *$sym, ">&=$_[0]" # or this
654 or return undef;
655
656 *$sym
657 }
658
659 min_parallel 4;
660
661 END {
662 max_parallel 0;
663 }
664
665 1;
666
667 =head2 FORK BEHAVIOUR
668
669 This module should do "the right thing" when the process using it forks:
670
671 Before the fork, IO::AIO enters a quiescent state where no requests
672 can be added in other threads and no results will be processed. After
673 the fork the parent simply leaves the quiescent state and continues
674 request/result processing, while the child clears the request/result
675 queue (so the requests started before the fork will only be handled in
676 the parent). Threads will be started on demand until the limit ste in the
677 parent process has been reached again.
678
679 In short: the parent will, after a short pause, continue as if fork had
680 not been called, while the child will act as if IO::AIO has not been used
681 yet.
682
683 =head1 SEE ALSO
684
685 L<Coro>, L<Linux::AIO> (obsolete).
686
687 =head1 AUTHOR
688
689 Marc Lehmann <schmorp@schmorp.de>
690 http://home.schmorp.de/
691
692 =cut
693