1 |
=head1 NAME |
2 |
|
3 |
AnyEvent::IO - the DBI of asynchronous I/O implementations |
4 |
|
5 |
=head1 SYNOPSIS |
6 |
|
7 |
use AnyEvent::IO; |
8 |
|
9 |
io_load "/etc/passwd", sub { |
10 |
my ($data) = @_ |
11 |
or die "/etc/passwd: $!"; |
12 |
|
13 |
warn "/etc/passwd contains ", ($data =~ y/://) , " colons.\n"; |
14 |
}; |
15 |
|
16 |
# also import O_XXX flags |
17 |
use AnyEvent::IO qw(:DEFAULT :flags); |
18 |
|
19 |
my $filedata = AE::cv; |
20 |
|
21 |
io_open "/etc/passwd", O_RDONLY, 0, sub { |
22 |
my ($fh) = @_ |
23 |
or die "/etc/passwd: $!"; |
24 |
|
25 |
io_stat $fh, sub { |
26 |
@_ or die "/etc/passwd: $!"; |
27 |
|
28 |
my $size = -s _; |
29 |
|
30 |
io_read $fh, $size, sub { |
31 |
my ($data) = @_ |
32 |
or die "/etc/passwd: $!"; |
33 |
|
34 |
$size == length $data |
35 |
or die "/etc/passwd: short read, file changed?"; |
36 |
|
37 |
# mostly the same as io_load, above - $data contains |
38 |
# the file contents now. |
39 |
$filedata->($data); |
40 |
}; |
41 |
}; |
42 |
}; |
43 |
|
44 |
my $passwd = $filedata->recv; |
45 |
warn length $passwd, " octets.\n"; |
46 |
|
47 |
=head1 DESCRIPTION |
48 |
|
49 |
This module provides functions that do I/O in an asynchronous fashion. It |
50 |
is to to I/O the same as L<AnyEvent> is to event libraries - it only |
51 |
I<interfaces> to other implementations or to a portable pure-perl |
52 |
implementation (that does not, however, do asynchronous I/O). |
53 |
|
54 |
The only such implementation that is supported (or even known to the |
55 |
author) is L<IO::AIO>, which is used automatically when it can be |
56 |
loaded. If it is not available, L<AnyEvent::IO> falls back to its |
57 |
(synchronous) pure-perl implementation. |
58 |
|
59 |
Unlike L<AnyEvent>, which model to use is currently decided at module load |
60 |
time, not at first use. Future releases might change this. |
61 |
|
62 |
=head2 RATIONALE |
63 |
|
64 |
While disk I/O often seems "instant" compared to, say, socket I/O, there |
65 |
are many situations where your program can block for extended time periods |
66 |
when doing disk I/O. For example, you access a disk on an NFS server and |
67 |
it is gone - can take ages to respond again, if ever. OR your system is |
68 |
extremely busy because it creates or restores a backup - reading data from |
69 |
disk can then take seconds. Or you use Linux, which for so many years has |
70 |
a close-to-broken VM/IO subsystem that can often induce minutes or more of |
71 |
delay for disk I/O, even under what I would consider light I/O loads. |
72 |
|
73 |
Whatever the situation, some programs just can't afford to block for long |
74 |
times (say, half a second or more), because they need to respond as fast |
75 |
as possible. |
76 |
|
77 |
For those cases, you need asynchronous I/O. |
78 |
|
79 |
The problem is, AnyEvent itself sometimes reads disk files (for example, |
80 |
when looking at F</etc/hosts>), and under the above situations, this can |
81 |
bring your program to a complete halt even if your program otherwise |
82 |
takes care to only use asynchronous I/O for everything (e.g. by using |
83 |
L<IO::AIO>). |
84 |
|
85 |
On the other hand, requiring L<IO::AIO> for AnyEvent is clearly |
86 |
impossible, as AnyEvent promises to stay pure-perl, and the overhead of |
87 |
IO::AIO for small programs would be immense, especially when asynchronous |
88 |
I/O isn't even needed. |
89 |
|
90 |
Clearly, this calls for an abstraction layer, and that is what you are |
91 |
looking at right now :-) |
92 |
|
93 |
=head2 ASYNCHRONOUS VS. NON-BLOCKING |
94 |
|
95 |
Many people are continuously confused on what the difference is between |
96 |
asynchronous I/O and non-blocking I/O. Those two terms are not well |
97 |
defined, which often makes it hard to even talk about the difference. Here |
98 |
is a short guideline that should leave you less confused: |
99 |
|
100 |
Non-blocking I/O means that data is delivered by some external means, |
101 |
automatically - that is, something I<pushes> data towards your file |
102 |
handle without you having to do anything. Non-blocking means that if your |
103 |
operating system currently has no data available for you, it will not wait |
104 |
("block" as it would normally do), but immediately return with an error. |
105 |
|
106 |
Your program can then wait for data to arrive. |
107 |
|
108 |
Often, you would expect this to work for disk files as well - if the |
109 |
data isn't already in memory, one might wait for it. While this is sound |
110 |
reasoning, the POSIX API does not support this, because the operating |
111 |
system does not know where and how much of data you want to read, and more |
112 |
so, the OS already knows that data is there, it doesn't need to "wait" |
113 |
until it arrives from some external entity. |
114 |
|
115 |
So basically, while the concept is sound, the existing OS APIs do not |
116 |
support this, it makes no sense to switch a disk file handle into |
117 |
non-blocking mode - it will behave exactly the same as in blocking mode, |
118 |
namely it will block until the data has been read from the disk. |
119 |
|
120 |
Th alternative that actually works is usually called I<asynchronous> |
121 |
I/O. Asynchronous, because the actual I/O is done while your program does |
122 |
something else, and only when it is done will you get notified of it: You |
123 |
only order the operation, it will be executed in the background, and you |
124 |
will get notified of the outcome. |
125 |
|
126 |
This works with disk files, and even with sockets and other sources that |
127 |
you could use with non-blocking I/O instead. It is, however, not very |
128 |
efficient when used with sources that could be driven in a non-blocking |
129 |
way, it makes most sense when confronted with disk files. |
130 |
|
131 |
=head1 IMPORT TAGS |
132 |
|
133 |
By default, this module implements all C<io_>xxx functions. In addition, |
134 |
the following import tags can be used: |
135 |
|
136 |
:io all io_ functions, smae as :DEFAULT |
137 |
:flags the fcntl open flags (O_CREAT, O_RDONLY, ...) |
138 |
|
139 |
=head1 API NOTES |
140 |
|
141 |
The functions in this module are not meant to be the most versatile or the |
142 |
highest-performers. They are meant to be easy to use for common cases. You |
143 |
are advised to use L<IO::AIO> directly when possible, which has a more |
144 |
extensive and faster API. If, however, you just want to do some I/O with |
145 |
the option of it being asynchronous when people need it, these functions |
146 |
are for you. |
147 |
|
148 |
All the functions in this module implement an I/O operation, usually with |
149 |
the same or similar name as the Perl builtin that it mimics, but with |
150 |
an C<io_> prefix. |
151 |
|
152 |
Each function expects a callback as their last argument. The callback is |
153 |
usually called with the result data or result code. An error is usually |
154 |
signalled by passing no arguments to the callback, which is then free to |
155 |
look at C<$!> for the error code. |
156 |
|
157 |
This makes all of the following forms of error checking valid: |
158 |
|
159 |
io_open ...., sub { |
160 |
my $fh = shift # scalar assignment - will assign undef on error |
161 |
or die "..."; |
162 |
|
163 |
my ($fh) = @_ # list assignment - will be 0 elements on error |
164 |
or die "..."; |
165 |
|
166 |
@_ # check the number of elements directly |
167 |
or die "..."; |
168 |
|
169 |
When a path is specified, this path I<must be an absolute> path, unless |
170 |
you make certain that nothing in your process calls C<chdir> or an |
171 |
equivalent function while the request executes. |
172 |
|
173 |
Changing the C<umask> while any requests execute that create files (or |
174 |
otherwise rely on the current umask) results in undefined behaviour - |
175 |
likewise changing anything else that would change the outcome, such as |
176 |
your effective user or group ID. |
177 |
|
178 |
Unlike other functions in the AnyEvent module family, these functions |
179 |
I<may> call your callback instantly, before returning. This should not be |
180 |
a real problem, as these functions never return anything useful. |
181 |
|
182 |
=cut |
183 |
|
184 |
package AnyEvent::IO; |
185 |
|
186 |
use AnyEvent (); BEGIN { AnyEvent::common_sense } |
187 |
|
188 |
use base "Exporter"; |
189 |
|
190 |
our @IO_REQ = qw( |
191 |
io_load io_open io_close io_seek io_read io_write io_truncate |
192 |
io_utime io_chown io_chmod io_stat io_lstat |
193 |
io_link io_symlink io_readlink io_rename io_unlink |
194 |
io_mkdir io_rmdir io_readdir |
195 |
); |
196 |
*EXPORT = \@IO_REQ; |
197 |
our @FLAGS = qw(O_RDONLY O_WRONLY O_RDWR O_CREAT O_EXCL O_TRUNC O_APPEND); |
198 |
*EXPORT_OK = \@FLAGS; |
199 |
our %EXPORT_TAGS = (flags => \@FLAGS, io => \@IO_REQ); |
200 |
|
201 |
our $MODEL; |
202 |
|
203 |
if ($MODEL) { |
204 |
AE::log 7 => "Found preloaded IO model '$MODEL', using it."; |
205 |
} else { |
206 |
if ($ENV{PERL_ANYEVENT_IO_MODEL} =~ /^([a-zA-Z0-9:]+)$/) { |
207 |
if (eval { require "AnyEvent/IO/$ENV{PERL_ANYEVENT_IO_MODEL}.pm" }) { |
208 |
AE::log 7 => "Loaded IO model '$MODEL' (forced by \$ENV{PERL_ANYEVENT_IO_MODEL}), using it."; |
209 |
} else { |
210 |
undef $MODEL; |
211 |
AE::log 4 => "Unable to load IO model '$ENV{PERL_ANYEVENT_IO_MODEL}' (from \$ENV{PERL_ANYEVENT_IO_MODEL}):\n$@"; |
212 |
} |
213 |
} |
214 |
|
215 |
unless ($MODEL) { |
216 |
if (eval { require IO::AIO; require AnyEvent::AIO; require AnyEvent::IO::IOAIO }) { |
217 |
AE::log 7 => "Autoloaded IO model 'IOAIO', using it."; |
218 |
} else { |
219 |
require AnyEvent::IO::PP; |
220 |
AE::log 7 => "Autoloaded IO model 'Perl', using it."; |
221 |
} |
222 |
} |
223 |
} |
224 |
|
225 |
=head1 GLOBAL VARIABLES AND FUNCTIONS |
226 |
|
227 |
=over 4 |
228 |
|
229 |
=item $AnyEvent::IO::MODEL |
230 |
|
231 |
Contains the package name of the backend I/O model in use - at the moment, |
232 |
this is usually C<AnyEvent::IO::Perl> or C<AnyEvent::IO::IOAIO>. |
233 |
|
234 |
=item io_load $path, $cb->($data) |
235 |
|
236 |
Tries to open C<$path> and read its contents into memory (obviously, |
237 |
should only be used on files that are "small enough"), then passes them to |
238 |
the callback as a string. |
239 |
|
240 |
Example: load F</etc/hosts>. |
241 |
|
242 |
io_load "/etc/hosts", sub { |
243 |
my ($hosts) = @_ |
244 |
or die "/etc/hosts: $!"; |
245 |
|
246 |
AE::log info => "/etc/hosts contains ", ($hosts =~ y/\n/), " lines\n"; |
247 |
}; |
248 |
|
249 |
=item io_open $path, $flags, $mode, $cb->($fh) |
250 |
|
251 |
Tries to open the file specified by C<$path> with the O_XXX-flags |
252 |
C<$flags> (from the Fcntl module, or see below) and the mode C<$mode> (a |
253 |
good value is 0666 for C<O_CREAT>, and C<0> otherwise). |
254 |
|
255 |
The (normal, standard, perl) file handle associated with the opened file |
256 |
is then passed to the callback. |
257 |
|
258 |
This works very much like perl's C<sysopen> function. |
259 |
|
260 |
Changing the C<umask> while this request executes results in undefined |
261 |
behaviour - likewise changing anything else that would change the outcome, |
262 |
such as your effective user or group ID. |
263 |
|
264 |
To avoid having to load L<Fcntl>, this module provides constants |
265 |
for C<O_RDONLY>, C<O_WRONLY>, C<O_RDWR>, C<O_CREAT>, C<O_EXCL>, |
266 |
C<O_TRUNC> and C<O_APPEND> - you can either access them directly |
267 |
(C<AnyEvent::IO::O_RDONLY>) or import them by specifying the C<:flags> |
268 |
import tag (see SYNOPSIS). |
269 |
|
270 |
=item io_close $fh, $cb->($success) |
271 |
|
272 |
Closes the file handle (yes, close can block your process indefinitely) |
273 |
and passes a true value to the callback on success. |
274 |
|
275 |
Due to idiosyncrasies in perl, instead of calling C<close>, the file |
276 |
handle might get closed by C<dup2>'ing another file descriptor over |
277 |
it, that is, the C<$fh> might still be open, but can be closed safely |
278 |
afterwards and must not be used for anything. |
279 |
|
280 |
=item io_read $fh, $length, $cb->($data) |
281 |
|
282 |
Tries to read C<$length> octets from the current position from C<$fh> and |
283 |
passes these bytes to C<$cb>. Otherwise the semantics are very much like |
284 |
those of perl's C<sysread>. |
285 |
|
286 |
If less than C<$length> octets have been read, C<$data> will contain |
287 |
only those bytes actually read. At EOF, C<$data> will be a zero-length |
288 |
string. If an error occurs, then nothing is passed to the callback. |
289 |
|
290 |
Obviously, multiple C<io_read>'s or C<io_write>'s at the same time on file |
291 |
handles sharing the underlying open file description results in undefined |
292 |
behaviour, due to sharing of the current file offset (and less obviously |
293 |
so, because OS X is not thread safe and corrupts data when you try). |
294 |
|
295 |
=item io_seek $fh, $offset, $whence, $callback->($offs) |
296 |
|
297 |
Seeks the filehandle to the new C<$offset>, similarly to perl's |
298 |
C<sysseek>. The C<$whence> are the traditional values (C<0> to count from |
299 |
start, C<1> to count from the current position and C<2> to count from the |
300 |
end). |
301 |
|
302 |
The resulting absolute offset will be passed to the callback on success. |
303 |
|
304 |
=item io_write $fh, $data, $cb->($length) |
305 |
|
306 |
Tries to write the octets in C<$data> to the current position of C<$fh> |
307 |
and passes the actual number of bytes written to the C<$cb>. Otherwise the |
308 |
semantics are very much like those of perl's C<syswrite>. |
309 |
|
310 |
If less than C<length $data> octets have been written, C<$length> will |
311 |
reflect that. If an error occurs, then nothing is passed to the callback. |
312 |
|
313 |
Obviously, multiple C<io_read>'s or C<io_write>'s at the same time on file |
314 |
handles sharing the underlying open file description results in undefined |
315 |
behaviour, due to sharing of the current file offset (and less obviouisly |
316 |
so, because OS X is not thread safe and corrupts data when you try). |
317 |
|
318 |
=item io_truncate $fh_or_path, $new_length, $cb->($success) |
319 |
|
320 |
Calls C<truncate> on the path or perl file handle and passes a true value |
321 |
to the callback on success. |
322 |
|
323 |
=item io_utime $fh_or_path, $atime, $mtime, $cb->($success) |
324 |
|
325 |
Calls C<utime> on the path or perl file handle and passes a true value to |
326 |
the callback on success. |
327 |
|
328 |
The special case of both C<$atime> and C<$mtime> being C<undef> sets the |
329 |
times to the current time, on systems that support this. |
330 |
|
331 |
=item io_chown $fh_or_path, $uid, $gid, $cb->($success) |
332 |
|
333 |
Calls C<chown> on the path or perl file handle and passes a true value to |
334 |
the callback on success. |
335 |
|
336 |
If C<$uid> or C<$gid> can be specified as C<undef>, in which case the |
337 |
uid or gid of the file is not changed. This differs from perl's C<chown> |
338 |
builtin, which wants C<-1> for this. |
339 |
|
340 |
=item io_chmod $fh_or_path, $perms, $cb->($success) |
341 |
|
342 |
Calls C<chmod> on the path or perl file handle and passes a true value to |
343 |
the callback on success. |
344 |
|
345 |
=item io_stat $fh_or_path, $cb->($success) |
346 |
|
347 |
=item io_lstat $path, $cb->($success) |
348 |
|
349 |
Calls C<stat> or C<lstat> on the path or perl file handle and passes a |
350 |
true value to the callback on success. |
351 |
|
352 |
The stat data will be available by stat'ing the C<_> file handle |
353 |
(e.g. C<-x _>, C<stat _> and so on). |
354 |
|
355 |
=item io_link $oldpath, $newpath, $cb->($success) |
356 |
|
357 |
Calls C<link> on the paths and passes a true value to the callback on |
358 |
success. |
359 |
|
360 |
=item io_symlink $oldpath, $newpath, $cb->($success) |
361 |
|
362 |
Calls C<symlink> on the paths and passes a true value to the callback on |
363 |
success. |
364 |
|
365 |
=item io_readlink $path, $cb->($target) |
366 |
|
367 |
Calls C<readlink> on the paths and passes the link target string to the |
368 |
callback. |
369 |
|
370 |
=item io_rename $oldpath, $newpath, $cb->($success) |
371 |
|
372 |
Calls C<rename> on the paths and passes a true value to the callback on |
373 |
success. |
374 |
|
375 |
=item io_unlink $path, $cb->($success) |
376 |
|
377 |
Tries to unlink the object at C<$path> and passes a true value to the |
378 |
callback on success. |
379 |
|
380 |
=item io_mkdir $path, $perms, $cb->($success) |
381 |
|
382 |
Calls C<mkdir> on the path with the given permissions C<$perms> (when in |
383 |
doubt, C<0777> is a good value) and passes a true value to the callback on |
384 |
success. |
385 |
|
386 |
=item io_rmdir $path, $cb->($success) |
387 |
|
388 |
Tries to remove the directory at C<$path> and passes a true value to the |
389 |
callback on success. |
390 |
|
391 |
=item io_readdir $path, $cb->(\@names) |
392 |
|
393 |
Reads all filenames from the directory specified by C<$path> and passes |
394 |
them to the callback, as an array reference with the names (without a path |
395 |
prefix). The F<.> and F<..> names will be filtered out first. |
396 |
|
397 |
The ordering of the file names is undefined - backends that are capable |
398 |
of it (e.g. L<IO::AIO>) will return the ordering that most likely is |
399 |
fastest to C<stat> through, and furthermore put entries that likely are |
400 |
directories first in the array. |
401 |
|
402 |
If you need best performance in recursive directory traversal or when |
403 |
looking at really big directories, you are advised to use L<IO::AIO> |
404 |
directly, specifically the C<aio_readdirx> and C<aio_scandir> functions, |
405 |
which have more options to tune performance. |
406 |
|
407 |
=back |
408 |
|
409 |
=head1 ENVIRONMENT VARIABLES |
410 |
|
411 |
See the description of C<PERL_ANYEVENT_IO_MODEL> in the L<AnyEvent> |
412 |
manpage. |
413 |
|
414 |
=head1 AUTHOR |
415 |
|
416 |
Marc Lehmann <schmorp@schmorp.de> |
417 |
http://home.schmorp.de/ |
418 |
|
419 |
=cut |
420 |
|
421 |
1 |
422 |
|