ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/IO-AIO/AIO.pm
Revision: 1.50
Committed: Sat Jun 24 16:27:02 2006 UTC (17 years, 11 months ago) by root
Branch: MAIN
Changes since 1.49: +82 -2 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 IO::AIO - Asynchronous Input/Output
4
5 =head1 SYNOPSIS
6
7 use IO::AIO;
8
9 aio_open "/etc/passwd", O_RDONLY, 0, sub {
10 my ($fh) = @_;
11 ...
12 };
13
14 aio_unlink "/tmp/file", sub { };
15
16 aio_read $fh, 30000, 1024, $buffer, 0, sub {
17 $_[0] > 0 or die "read error: $!";
18 };
19
20 # AnyEvent
21 open my $fh, "<&=" . IO::AIO::poll_fileno or die "$!";
22 my $w = AnyEvent->io (fh => $fh, poll => 'r', cb => sub { IO::AIO::poll_cb });
23
24 # Event
25 Event->io (fd => IO::AIO::poll_fileno,
26 poll => 'r',
27 cb => \&IO::AIO::poll_cb);
28
29 # Glib/Gtk2
30 add_watch Glib::IO IO::AIO::poll_fileno,
31 in => sub { IO::AIO::poll_cb; 1 };
32
33 # Tk
34 Tk::Event::IO->fileevent (IO::AIO::poll_fileno, "",
35 readable => \&IO::AIO::poll_cb);
36
37 # Danga::Socket
38 Danga::Socket->AddOtherFds (IO::AIO::poll_fileno =>
39 \&IO::AIO::poll_cb);
40
41
42 =head1 DESCRIPTION
43
44 This module implements asynchronous I/O using whatever means your
45 operating system supports.
46
47 Currently, a number of threads are started that execute your read/writes
48 and signal their completion. You don't need thread support in your libc or
49 perl, and the threads created by this module will not be visible to the
50 pthreads library. In the future, this module might make use of the native
51 aio functions available on many operating systems. However, they are often
52 not well-supported (Linux doesn't allow them on normal files currently,
53 for example), and they would only support aio_read and aio_write, so the
54 remaining functionality would have to be implemented using threads anyway.
55
56 Although the module will work with in the presence of other threads, it is
57 currently not reentrant, so use appropriate locking yourself, always call
58 C<poll_cb> from within the same thread, or never call C<poll_cb> (or other
59 C<aio_> functions) recursively.
60
61 =cut
62
63 package IO::AIO;
64
65 no warnings;
66
67 use base 'Exporter';
68
69 use Fcntl ();
70
71 BEGIN {
72 $VERSION = '1.8';
73
74 @EXPORT = qw(aio_sendfile aio_read aio_write aio_open aio_close aio_stat
75 aio_lstat aio_unlink aio_rmdir aio_readdir aio_scandir aio_symlink
76 aio_fsync aio_fdatasync aio_readahead aio_rename aio_link aio_move);
77 @EXPORT_OK = qw(poll_fileno poll_cb min_parallel max_parallel
78 max_outstanding nreqs);
79
80 require XSLoader;
81 XSLoader::load IO::AIO, $VERSION;
82 }
83
84 =head1 FUNCTIONS
85
86 =head2 AIO FUNCTIONS
87
88 All the C<aio_*> calls are more or less thin wrappers around the syscall
89 with the same name (sans C<aio_>). The arguments are similar or identical,
90 and they all accept an additional (and optional) C<$callback> argument
91 which must be a code reference. This code reference will get called with
92 the syscall return code (e.g. most syscalls return C<-1> on error, unlike
93 perl, which usually delivers "false") as it's sole argument when the given
94 syscall has been executed asynchronously.
95
96 All functions expecting a filehandle keep a copy of the filehandle
97 internally until the request has finished.
98
99 The pathnames you pass to these routines I<must> be absolute and
100 encoded in byte form. The reason for the former is that at the time the
101 request is being executed, the current working directory could have
102 changed. Alternatively, you can make sure that you never change the
103 current working directory.
104
105 To encode pathnames to byte form, either make sure you either: a)
106 always pass in filenames you got from outside (command line, readdir
107 etc.), b) are ASCII or ISO 8859-1, c) use the Encode module and encode
108 your pathnames to the locale (or other) encoding in effect in the user
109 environment, d) use Glib::filename_from_unicode on unicode filenames or e)
110 use something else.
111
112 =over 4
113
114 =item aio_open $pathname, $flags, $mode, $callback->($fh)
115
116 Asynchronously open or create a file and call the callback with a newly
117 created filehandle for the file.
118
119 The pathname passed to C<aio_open> must be absolute. See API NOTES, above,
120 for an explanation.
121
122 The C<$flags> argument is a bitmask. See the C<Fcntl> module for a
123 list. They are the same as used by C<sysopen>.
124
125 Likewise, C<$mode> specifies the mode of the newly created file, if it
126 didn't exist and C<O_CREAT> has been given, just like perl's C<sysopen>,
127 except that it is mandatory (i.e. use C<0> if you don't create new files,
128 and C<0666> or C<0777> if you do).
129
130 Example:
131
132 aio_open "/etc/passwd", O_RDONLY, 0, sub {
133 if ($_[0]) {
134 print "open successful, fh is $_[0]\n";
135 ...
136 } else {
137 die "open failed: $!\n";
138 }
139 };
140
141 =item aio_close $fh, $callback->($status)
142
143 Asynchronously close a file and call the callback with the result
144 code. I<WARNING:> although accepted, you should not pass in a perl
145 filehandle here, as perl will likely close the file descriptor another
146 time when the filehandle is destroyed. Normally, you can safely call perls
147 C<close> or just let filehandles go out of scope.
148
149 This is supposed to be a bug in the API, so that might change. It's
150 therefore best to avoid this function.
151
152 =item aio_read $fh,$offset,$length, $data,$dataoffset, $callback->($retval)
153
154 =item aio_write $fh,$offset,$length, $data,$dataoffset, $callback->($retval)
155
156 Reads or writes C<length> bytes from the specified C<fh> and C<offset>
157 into the scalar given by C<data> and offset C<dataoffset> and calls the
158 callback without the actual number of bytes read (or -1 on error, just
159 like the syscall).
160
161 The C<$data> scalar I<MUST NOT> be modified in any way while the request
162 is outstanding. Modifying it can result in segfaults or WW3 (if the
163 necessary/optional hardware is installed).
164
165 Example: Read 15 bytes at offset 7 into scalar C<$buffer>, starting at
166 offset C<0> within the scalar:
167
168 aio_read $fh, 7, 15, $buffer, 0, sub {
169 $_[0] > 0 or die "read error: $!";
170 print "read $_[0] bytes: <$buffer>\n";
171 };
172
173 =item aio_move $srcpath, $dstpath, $callback->($status)
174
175 [EXPERIMENTAL]
176
177 Try to move the I<file> (directories not supported as either source or destination)
178 from C<$srcpath> to C<$dstpath> and call the callback with the C<0> (error) or C<-1> ok.
179
180 This is a composite request that tries to rename(2) the file first. If
181 rename files with C<EXDEV>, it creates the destination file with mode 0200
182 and copies the contents of the source file into it using C<aio_sendfile>,
183 followed by restoring atime, mtime, access mode and uid/gid, in that
184 order, and unlinking the C<$srcpath>.
185
186 If an error occurs, the partial destination file will be unlinked, if
187 possible, except when setting atime, mtime, access mode and uid/gid, where
188 errors are being ignored.
189
190 =cut
191
192 sub aio_move($$$) {
193 my ($src, $dst, $cb) = @_;
194
195 aio_rename $src, $dst, sub {
196 if ($_[0] && $! == Errno::EXDEV) {
197 aio_open $src, O_RDONLY, 0, sub {
198 if (my $src_fh = $_[0]) {
199 my @stat = stat $src_fh;
200
201 aio_open $dst, O_WRONLY, 0200, sub {
202 if (my $dst_fh = $_[0]) {
203 aio_sendfile $dst_fh, $src_fh, 0, $stat[7], sub {
204 close $src_fh;
205
206 if ($_[0] == $stat[7]) {
207 utime $stat[8], $stat[9], $dst;
208 chmod $stat[2] & 07777, $dst_fh;
209 chown $stat[4], $stat[5], $dst_fh;
210 close $dst_fh;
211
212 aio_unlink $src, sub {
213 $cb->($_[0]);
214 };
215 } else {
216 my $errno = $!;
217 aio_unlink $dst, sub {
218 $! = $errno;
219 $cb->(-1);
220 };
221 }
222 };
223 } else {
224 $cb->(-1);
225 }
226 },
227
228 } else {
229 $cb->(-1);
230 }
231 };
232 } else {
233 $cb->($_[0]);
234 }
235 };
236 }
237
238 =item aio_sendfile $out_fh, $in_fh, $in_offset, $length, $callback->($retval)
239
240 Tries to copy C<$length> bytes from C<$in_fh> to C<$out_fh>. It starts
241 reading at byte offset C<$in_offset>, and starts writing at the current
242 file offset of C<$out_fh>. Because of that, it is not safe to issue more
243 than one C<aio_sendfile> per C<$out_fh>, as they will interfere with each
244 other.
245
246 This call tries to make use of a native C<sendfile> syscall to provide
247 zero-copy operation. For this to work, C<$out_fh> should refer to a
248 socket, and C<$in_fh> should refer to mmap'able file.
249
250 If the native sendfile call fails or is not implemented, it will be
251 emulated, so you can call C<aio_sendfile> on any type of filehandle
252 regardless of the limitations of the operating system.
253
254 Please note, however, that C<aio_sendfile> can read more bytes from
255 C<$in_fh> than are written, and there is no way to find out how many
256 bytes have been read from C<aio_sendfile> alone, as C<aio_sendfile> only
257 provides the number of bytes written to C<$out_fh>. Only if the result
258 value equals C<$length> one can assume that C<$length> bytes have been
259 read.
260
261 =item aio_readahead $fh,$offset,$length, $callback->($retval)
262
263 C<aio_readahead> populates the page cache with data from a file so that
264 subsequent reads from that file will not block on disk I/O. The C<$offset>
265 argument specifies the starting point from which data is to be read and
266 C<$length> specifies the number of bytes to be read. I/O is performed in
267 whole pages, so that offset is effectively rounded down to a page boundary
268 and bytes are read up to the next page boundary greater than or equal to
269 (off-set+length). C<aio_readahead> does not read beyond the end of the
270 file. The current file offset of the file is left unchanged.
271
272 If that syscall doesn't exist (likely if your OS isn't Linux) it will be
273 emulated by simply reading the data, which would have a similar effect.
274
275 =item aio_stat $fh_or_path, $callback->($status)
276
277 =item aio_lstat $fh, $callback->($status)
278
279 Works like perl's C<stat> or C<lstat> in void context. The callback will
280 be called after the stat and the results will be available using C<stat _>
281 or C<-s _> etc...
282
283 The pathname passed to C<aio_stat> must be absolute. See API NOTES, above,
284 for an explanation.
285
286 Currently, the stats are always 64-bit-stats, i.e. instead of returning an
287 error when stat'ing a large file, the results will be silently truncated
288 unless perl itself is compiled with large file support.
289
290 Example: Print the length of F</etc/passwd>:
291
292 aio_stat "/etc/passwd", sub {
293 $_[0] and die "stat failed: $!";
294 print "size is ", -s _, "\n";
295 };
296
297 =item aio_unlink $pathname, $callback->($status)
298
299 Asynchronously unlink (delete) a file and call the callback with the
300 result code.
301
302 =item aio_link $srcpath, $dstpath, $callback->($status)
303
304 Asynchronously create a new link to the existing object at C<$srcpath> at
305 the path C<$dstpath> and call the callback with the result code.
306
307 =item aio_symlink $srcpath, $dstpath, $callback->($status)
308
309 Asynchronously create a new symbolic link to the existing object at C<$srcpath> at
310 the path C<$dstpath> and call the callback with the result code.
311
312 =item aio_rename $srcpath, $dstpath, $callback->($status)
313
314 Asynchronously rename the object at C<$srcpath> to C<$dstpath>, just as
315 rename(2) and call the callback with the result code.
316
317 =item aio_rmdir $pathname, $callback->($status)
318
319 Asynchronously rmdir (delete) a directory and call the callback with the
320 result code.
321
322 =item aio_readdir $pathname, $callback->($entries)
323
324 Unlike the POSIX call of the same name, C<aio_readdir> reads an entire
325 directory (i.e. opendir + readdir + closedir). The entries will not be
326 sorted, and will B<NOT> include the C<.> and C<..> entries.
327
328 The callback a single argument which is either C<undef> or an array-ref
329 with the filenames.
330
331 =item aio_scandir $path, $maxreq, $callback->($dirs, $nondirs)
332
333 Scans a directory (similar to C<aio_readdir>) and tries to separate the
334 entries of directory C<$path> into two sets of names, ones you can recurse
335 into (directories), and ones you cannot recurse into (everything else).
336
337 C<aio_scandir> is a composite request that consists of many
338 aio-primitives. C<$maxreq> specifies the maximum number of outstanding
339 aio requests that this function generates. If it is C<< <= 0 >>, then a
340 suitable default will be chosen (currently 8).
341
342 On error, the callback is called without arguments, otherwise it receives
343 two array-refs with path-relative entry names.
344
345 Example:
346
347 aio_scandir $dir, 0, sub {
348 my ($dirs, $nondirs) = @_;
349 print "real directories: @$dirs\n";
350 print "everything else: @$nondirs\n";
351 };
352
353 Implementation notes.
354
355 The C<aio_readdir> cannot be avoided, but C<stat()>'ing every entry can.
356
357 After reading the directory, the modification time, size etc. of the
358 directory before and after the readdir is checked, and if they match, the
359 link count will be used to decide how many entries are directories (if
360 >= 2). Otherwise, no knowledge of the number of subdirectories will be
361 assumed.
362
363 Then entires will be sorted into likely directories (everything without a
364 non-initial dot) and likely non-directories (everything else). Then every
365 entry + C</.> will be C<stat>'ed, likely directories first. This is often
366 faster because filesystems might detect the type of the entry without
367 reading the inode data (e.g. ext2fs filetype feature). If that succeeds,
368 it assumes that the entry is a directory or a symlink to directory (which
369 will be checked seperately).
370
371 If the known number of directories has been reached, the rest of the
372 entries is assumed to be non-directories.
373
374 =cut
375
376 sub aio_scandir($$$) {
377 my ($path, $maxreq, $cb) = @_;
378
379 $maxreq = 8 if $maxreq <= 0;
380
381 # stat once
382 aio_stat $path, sub {
383 return $cb->() if $_[0];
384 my $hash1 = join ":", (stat _)[0,1,3,7,9];
385
386 # read the directory entries
387 aio_readdir $path, sub {
388 my $entries = shift
389 or return $cb->();
390
391 # stat the dir another time
392 aio_stat $path, sub {
393 my $hash2 = join ":", (stat _)[0,1,3,7,9];
394
395 my $ndirs;
396
397 # take the slow route if anything looks fishy
398 if ($hash1 ne $hash2) {
399 $ndirs = -1;
400 } else {
401 # if nlink == 2, we are finished
402 # on non-posix-fs's, we rely on nlink < 2
403 $ndirs = (stat _)[3] - 2
404 or return $cb->([], $entries);
405 }
406
407 # sort into likely dirs and likely nondirs
408 # dirs == files without ".", short entries first
409 $entries = [map $_->[0],
410 sort { $b->[1] cmp $a->[1] }
411 map [$_, sprintf "%s%04d", (/.\./ ? "1" : "0"), length],
412 @$entries];
413
414 my (@dirs, @nondirs);
415
416 my ($statcb, $schedcb);
417 my $nreq = 0;
418
419 $schedcb = sub {
420 if (@$entries) {
421 if ($nreq < $maxreq) {
422 my $ent = pop @$entries;
423 $nreq++;
424 aio_stat "$path/$ent/.", sub { $statcb->($_[0], $ent) };
425 }
426 } elsif (!$nreq) {
427 # finished
428 undef $statcb;
429 undef $schedcb;
430 $cb->(\@dirs, \@nondirs) if $cb;
431 undef $cb;
432 }
433 };
434 $statcb = sub {
435 my ($status, $entry) = @_;
436
437 if ($status < 0) {
438 $nreq--;
439 push @nondirs, $entry;
440 &$schedcb;
441 } else {
442 # need to check for real directory
443 aio_lstat "$path/$entry", sub {
444 $nreq--;
445
446 if (-d _) {
447 push @dirs, $entry;
448
449 if (!--$ndirs) {
450 push @nondirs, @$entries;
451 $entries = [];
452 }
453 } else {
454 push @nondirs, $entry;
455 }
456
457 &$schedcb;
458 }
459 }
460 };
461
462 &$schedcb while @$entries && $nreq < $maxreq;
463 };
464 };
465 };
466 }
467
468 =item aio_fsync $fh, $callback->($status)
469
470 Asynchronously call fsync on the given filehandle and call the callback
471 with the fsync result code.
472
473 =item aio_fdatasync $fh, $callback->($status)
474
475 Asynchronously call fdatasync on the given filehandle and call the
476 callback with the fdatasync result code.
477
478 If this call isn't available because your OS lacks it or it couldn't be
479 detected, it will be emulated by calling C<fsync> instead.
480
481 =back
482
483 =head2 SUPPORT FUNCTIONS
484
485 =over 4
486
487 =item $fileno = IO::AIO::poll_fileno
488
489 Return the I<request result pipe file descriptor>. This filehandle must be
490 polled for reading by some mechanism outside this module (e.g. Event or
491 select, see below or the SYNOPSIS). If the pipe becomes readable you have
492 to call C<poll_cb> to check the results.
493
494 See C<poll_cb> for an example.
495
496 =item IO::AIO::poll_cb
497
498 Process all outstanding events on the result pipe. You have to call this
499 regularly. Returns the number of events processed. Returns immediately
500 when no events are outstanding.
501
502 Example: Install an Event watcher that automatically calls
503 IO::AIO::poll_cb with high priority:
504
505 Event->io (fd => IO::AIO::poll_fileno,
506 poll => 'r', async => 1,
507 cb => \&IO::AIO::poll_cb);
508
509 =item IO::AIO::poll_wait
510
511 Wait till the result filehandle becomes ready for reading (simply does a
512 C<select> on the filehandle. This is useful if you want to synchronously wait
513 for some requests to finish).
514
515 See C<nreqs> for an example.
516
517 =item IO::AIO::nreqs
518
519 Returns the number of requests currently outstanding (i.e. for which their
520 callback has not been invoked yet).
521
522 Example: wait till there are no outstanding requests anymore:
523
524 IO::AIO::poll_wait, IO::AIO::poll_cb
525 while IO::AIO::nreqs;
526
527 =item IO::AIO::flush
528
529 Wait till all outstanding AIO requests have been handled.
530
531 Strictly equivalent to:
532
533 IO::AIO::poll_wait, IO::AIO::poll_cb
534 while IO::AIO::nreqs;
535
536 =item IO::AIO::poll
537
538 Waits until some requests have been handled.
539
540 Strictly equivalent to:
541
542 IO::AIO::poll_wait, IO::AIO::poll_cb
543 if IO::AIO::nreqs;
544
545 =item IO::AIO::min_parallel $nthreads
546
547 Set the minimum number of AIO threads to C<$nthreads>. The current default
548 is C<4>, which means four asynchronous operations can be done at one time
549 (the number of outstanding operations, however, is unlimited).
550
551 IO::AIO starts threads only on demand, when an AIO request is queued and
552 no free thread exists.
553
554 It is recommended to keep the number of threads low, as some Linux
555 kernel versions will scale negatively with the number of threads (higher
556 parallelity => MUCH higher latency). With current Linux 2.6 versions, 4-32
557 threads should be fine.
558
559 Under most circumstances you don't need to call this function, as the
560 module selects a default that is suitable for low to moderate load.
561
562 =item IO::AIO::max_parallel $nthreads
563
564 Sets the maximum number of AIO threads to C<$nthreads>. If more than the
565 specified number of threads are currently running, this function kills
566 them. This function blocks until the limit is reached.
567
568 While C<$nthreads> are zero, aio requests get queued but not executed
569 until the number of threads has been increased again.
570
571 This module automatically runs C<max_parallel 0> at program end, to ensure
572 that all threads are killed and that there are no outstanding requests.
573
574 Under normal circumstances you don't need to call this function.
575
576 =item $oldnreqs = IO::AIO::max_outstanding $nreqs
577
578 Sets the maximum number of outstanding requests to C<$nreqs>. If you
579 try to queue up more than this number of requests, the caller will block until
580 some requests have been handled.
581
582 The default is very large, so normally there is no practical limit. If you
583 queue up many requests in a loop it often improves speed if you set
584 this to a relatively low number, such as C<100>.
585
586 Under normal circumstances you don't need to call this function.
587
588 =back
589
590 =cut
591
592 # support function to convert a fd into a perl filehandle
593 sub _fd2fh {
594 return undef if $_[0] < 0;
595
596 # try to generate nice filehandles
597 my $sym = "IO::AIO::fd#$_[0]";
598 local *$sym;
599
600 open *$sym, "+<&=$_[0]" # usually works under any unix
601 or open *$sym, "<&=$_[0]" # cygwin needs this
602 or open *$sym, ">&=$_[0]" # or this
603 or return undef;
604
605 *$sym
606 }
607
608 min_parallel 4;
609
610 END {
611 max_parallel 0;
612 }
613
614 1;
615
616 =head2 FORK BEHAVIOUR
617
618 Before the fork, IO::AIO enters a quiescent state where no requests
619 can be added in other threads and no results will be processed. After
620 the fork the parent simply leaves the quiescent state and continues
621 request/result processing, while the child clears the request/result
622 queue (so the requests started before the fork will only be handled in
623 the parent). Threats will be started on demand until the limit ste in the
624 parent process has been reached again.
625
626 =head1 SEE ALSO
627
628 L<Coro>, L<Linux::AIO>.
629
630 =head1 AUTHOR
631
632 Marc Lehmann <schmorp@schmorp.de>
633 http://home.schmorp.de/
634
635 =cut
636