ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/BDB/BDB.pm
Revision: 1.12
Committed: Mon Aug 13 12:07:46 2007 UTC (16 years, 9 months ago) by root
Branch: MAIN
Changes since 1.11: +7 -6 lines
Log Message:
*** empty log message ***

File Contents

# Content
1 =head1 NAME
2
3 BDB - Asynchronous Berkeley DB access
4
5 =head1 SYNOPSIS
6
7 use BDB;
8
9 =head1 DESCRIPTION
10
11 See the BerkeleyDB documentation (L<http://www.oracle.com/technology/documentation/berkeley-db/db/index.html>).
12 The BDB API is very similar to the C API (the translation has been very faithful).
13
14 See also the example sections in the document below and possibly the eg/
15 subdirectory of the BDB distribution. Last not least see the IO::AIO
16 documentation, as that module uses almost the same asynchronous request
17 model as this module.
18
19 I know this is woefully inadequate documentation. Send a patch!
20
21
22 =head1 REQUEST ANATOMY AND LIFETIME
23
24 Every request method creates a request. which is a C data structure not
25 directly visible to Perl.
26
27 During their existance, bdb requests travel through the following states,
28 in order:
29
30 =over 4
31
32 =item ready
33
34 Immediately after a request is created it is put into the ready state,
35 waiting for a thread to execute it.
36
37 =item execute
38
39 A thread has accepted the request for processing and is currently
40 executing it (e.g. blocking in read).
41
42 =item pending
43
44 The request has been executed and is waiting for result processing.
45
46 While request submission and execution is fully asynchronous, result
47 processing is not and relies on the perl interpreter calling C<poll_cb>
48 (or another function with the same effect).
49
50 =item result
51
52 The request results are processed synchronously by C<poll_cb>.
53
54 The C<poll_cb> function will process all outstanding aio requests by
55 calling their callbacks, freeing memory associated with them and managing
56 any groups they are contained in.
57
58 =item done
59
60 Request has reached the end of its lifetime and holds no resources anymore
61 (except possibly for the Perl object, but its connection to the actual
62 aio request is severed and calling its methods will either do nothing or
63 result in a runtime error).
64
65 =back
66
67 =cut
68
69 package BDB;
70
71 no warnings;
72 use strict 'vars';
73
74 use base 'Exporter';
75
76 BEGIN {
77 our $VERSION = '1.0';
78
79 our @BDB_REQ = qw(
80 db_env_open db_env_close db_env_txn_checkpoint db_env_lock_detect
81 db_env_memp_sync db_env_memp_trickle
82 db_open db_close db_compact db_sync db_put db_get db_pget db_del db_key_range
83 db_txn_commit db_txn_abort
84 db_c_close db_c_count db_c_put db_c_get db_c_pget db_c_del
85 db_sequence_open db_sequence_close
86 db_sequence_get db_sequence_remove
87 );
88 our @EXPORT = (@BDB_REQ, qw(dbreq_pri dbreq_nice db_env_create db_create));
89 our @EXPORT_OK = qw(
90 poll_fileno poll_cb poll_wait flush
91 min_parallel max_parallel max_idle
92 nreqs nready npending nthreads
93 max_poll_time max_poll_reqs
94 );
95
96 require XSLoader;
97 XSLoader::load ("BDB", $VERSION);
98 }
99
100 =head2 BERKELEYDB FUNCTIONS
101
102 All of these are functions. The create functions simply return a new
103 object and never block. All the remaining functions all take an optional
104 callback as last argument. If it is missing, then the fucntion will be
105 executed synchronously.
106
107 BDB functions that cannot block (mostly functions that manipulate
108 settings) are method calls on the relevant objects, so the rule of thumb
109 is: if its a method, its not blocking, if its a function, it takes a
110 callback as last argument.
111
112 In the following, C<$int> signifies an integer return value,
113 C<octetstring> is a "binary string" (i.e. a perl string with no character
114 indices >255), C<U32> is an unsigned 32 bit integer, C<int> is some
115 integer, C<NV> is a floating point value.
116
117 The C<SV *> types are generic perl scalars (for input and output of data
118 values), and the C<SV *callback> is the optional callback function to call
119 when the request is completed.
120
121 The various C<DB_ENV> etc. arguments are handles return by
122 C<db_env_create>, C<db_create>, C<txn_begin> and so on. If they have an
123 appended C<_ornull> this means they are optional and you can pass C<undef>
124 for them, resulting a NULL pointer on the C level.
125
126 =head3 BDB functions
127
128 Functions in the BDB namespace, exported by default:
129
130 $env = db_env_create (U32 env_flags = 0)
131
132 db_env_open (DB_ENV *env, octetstring db_home, U32 open_flags, int mode, SV *callback = &PL_sv_undef)
133 db_env_close (DB_ENV *env, U32 flags = 0, SV *callback = &PL_sv_undef)
134 db_env_txn_checkpoint (DB_ENV *env, U32 kbyte = 0, U32 min = 0, U32 flags = 0, SV *callback = &PL_sv_undef)
135 db_env_lock_detect (DB_ENV *env, U32 flags = 0, U32 atype = DB_LOCK_DEFAULT, SV *dummy = 0, SV *callback = &PL_sv_undef)
136 db_env_memp_sync (DB_ENV *env, SV *dummy = 0, SV *callback = &PL_sv_undef)
137 db_env_memp_trickle (DB_ENV *env, int percent, SV *dummy = 0, SV *callback = &PL_sv_undef)
138
139 $db = db_create (DB_ENV *env = 0, U32 flags = 0)
140
141 db_open (DB *db, DB_TXN_ornull *txnid, octetstring file, octetstring database, int type, U32 flags, int mode, SV *callback = &PL_sv_undef)
142 db_close (DB *db, U32 flags = 0, SV *callback = &PL_sv_undef)
143 db_compact (DB *db, DB_TXN_ornull *txn = 0, SV *start = 0, SV *stop = 0, SV *unused1 = 0, U32 flags = DB_FREE_SPACE, SV *unused2 = 0, SV *callback = &PL_sv_undef)
144 db_sync (DB *db, U32 flags = 0, SV *callback = &PL_sv_undef)
145 db_key_range (DB *db, DB_TXN_ornull *txn, SV *key, SV *key_range, U32 flags = 0, SV *callback = &PL_sv_undef)
146 db_put (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
147 db_get (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
148 db_pget (DB *db, DB_TXN_ornull *txn, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
149 db_del (DB *db, DB_TXN_ornull *txn, SV *key, U32 flags = 0, SV *callback = &PL_sv_undef)
150 db_txn_commit (DB_TXN *txn, U32 flags = 0, SV *callback = &PL_sv_undef)
151 db_txn_abort (DB_TXN *txn, SV *callback = &PL_sv_undef)
152 db_c_close (DBC *dbc, SV *callback = &PL_sv_undef)
153 db_c_count (DBC *dbc, SV *count, U32 flags = 0, SV *callback = &PL_sv_undef)
154 db_c_put (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
155 db_c_get (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
156 db_c_pget (DBC *dbc, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
157 db_c_del (DBC *dbc, U32 flags = 0, SV *callback = &PL_sv_undef)
158
159 db_sequence_open (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, SV *key, U32 flags = 0, SV *callback = &PL_sv_undef)
160 db_sequence_close (DB_SEQUENCE *seq, U32 flags = 0, SV *callback = &PL_sv_undef)
161 db_sequence_get (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, int delta, SV *seq_value, U32 flags = DB_TXN_NOSYNC, SV *callback = &PL_sv_undef)
162 db_sequence_remove (DB_SEQUENCE *seq, DB_TXN_ornull *txnid = 0, U32 flags = 0, SV *callback = &PL_sv_undef)
163
164
165 =head3 DB_ENV/database environment methods
166
167 Methods available on DB_ENV/$env handles:
168
169 DESTROY (DB_ENV_ornull *env)
170 CODE:
171 if (env)
172 env->close (env, 0);
173
174 $int = $env->set_data_dir (const char *dir)
175 $int = $env->set_tmp_dir (const char *dir)
176 $int = $env->set_lg_dir (const char *dir)
177 $int = $env->set_shm_key (long shm_key)
178 $int = $env->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0)
179 $int = $env->set_flags (U32 flags, int onoff)
180 $env->set_errfile (FILE *errfile = 0)
181 $env->set_msgfile (FILE *msgfile = 0)
182 $int = $env->set_verbose (U32 which, int onoff = 1)
183 $int = $env->set_encrypt (const char *password, U32 flags = 0)
184 $int = $env->set_timeout (NV timeout, U32 flags)
185 $int = $env->set_mp_max_openfd (int maxopenfd);
186 $int = $env->set_mp_max_write (int maxwrite, int maxwrite_sleep);
187 $int = $env->set_mp_mmapsize (int mmapsize_mb)
188 $int = $env->set_lk_detect (U32 detect = DB_LOCK_DEFAULT)
189 $int = $env->set_lk_max_lockers (U32 max)
190 $int = $env->set_lk_max_locks (U32 max)
191 $int = $env->set_lk_max_objects (U32 max)
192 $int = $env->set_lg_bsize (U32 max)
193 $int = $env->set_lg_max (U32 max)
194
195 $txn = $env->txn_begin (DB_TXN_ornull *parent = 0, U32 flags = 0)
196
197 =head4 Example:
198
199 use AnyEvent;
200 use BDB;
201
202 our $FH; open $FH, "<&=" . BDB::poll_fileno;
203 our $WATCHER = AnyEvent->io (fh => $FH, poll => 'r', cb => \&BDB::poll_cb);
204
205 BDB::min_parallel 8;
206
207 my $env = db_env_create;
208
209 mkdir "bdtest", 0700;
210 db_env_open
211 $env,
212 "bdtest",
213 BDB::INIT_LOCK | BDB::INIT_LOG | BDB::INIT_MPOOL | BDB::INIT_TXN | BDB::RECOVER | BDB::USE_ENVIRON | BDB::CREATE,
214 0600;
215
216 $env->set_flags (BDB::AUTO_COMMIT | BDB::TXN_NOSYNC, 1);
217
218
219 =head3 DB/database methods
220
221 Methods available on DB/$db handles:
222
223 DESTROY (DB_ornull *db)
224 CODE:
225 if (db)
226 {
227 SV *env = (SV *)db->app_private;
228 db->close (db, 0);
229 SvREFCNT_dec (env);
230 }
231
232 $int = $db->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0)
233 $int = $db->set_flags (U32 flags)
234 $int = $db->set_encrypt (const char *password, U32 flags)
235 $int = $db->set_lorder (int lorder)
236 $int = $db->set_bt_minkey (U32 minkey)
237 $int = $db->set_re_delim (int delim)
238 $int = $db->set_re_pad (int re_pad)
239 $int = $db->set_re_source (char *source)
240 $int = $db->set_re_len (U32 re_len)
241 $int = $db->set_h_ffactor (U32 h_ffactor)
242 $int = $db->set_h_nelem (U32 h_nelem)
243 $int = $db->set_q_extentsize (U32 extentsize)
244
245 $dbc = $db->cursor (DB_TXN_ornull *txn = 0, U32 flags = 0)
246 $seq = $db->sequence (U32 flags = 0)
247
248 =head4 Example:
249
250 my $db = db_create $env;
251 db_open $db, undef, "table", undef, BDB::BTREE, BDB::AUTO_COMMIT | BDB::CREATE | BDB::READ_UNCOMMITTED, 0600;
252
253 for (1..1000) {
254 db_put $db, undef, "key $_", "data $_";
255
256 db_key_range $db, undef, "key $_", my $keyrange;
257 my ($lt, $eq, $gt) = @$keyrange;
258 }
259
260 db_del $db, undef, "key $_" for 1..1000;
261
262 db_sync $db;
263
264
265 =head3 DB_TXN/transaction methods
266
267 Methods available on DB_TXN/$txn handles:
268
269 DESTROY (DB_TXN_ornull *txn)
270 CODE:
271 if (txn)
272 txn->abort (txn);
273
274 $int = $txn->set_timeout (NV timeout, U32 flags)
275
276
277 =head3 DBC/cursor methods
278
279 Methods available on DBC/$dbc handles:
280
281 DESTROY (DBC_ornull *dbc)
282 CODE:
283 if (dbc)
284 dbc->c_close (dbc);
285
286 =head4 Example:
287
288 my $c = $db->cursor;
289
290 for (;;) {
291 db_c_get $c, my $key, my $data, BDB::NEXT;
292 warn "<$!,$key,$data>";
293 last if $!;
294 }
295
296 db_c_close $c;
297
298
299 =head3 DB_SEQUENCE/sequence methods
300
301 Methods available on DB_SEQUENCE/$seq handles:
302
303 DESTROY (DB_SEQUENCE_ornull *seq)
304 CODE:
305 if (seq)
306 seq->close (seq, 0);
307
308 $int = $seq->initial_value (db_seq_t value)
309 $int = $seq->set_cachesize (U32 size)
310 $int = $seq->set_flags (U32 flags)
311 $int = $seq->set_range (db_seq_t min, db_seq_t max)
312
313 =head4 Example:
314
315 my $seq = $db->sequence;
316
317 db_sequence_open $seq, undef, "seq", BDB::CREATE;
318 db_sequence_get $seq, undef, 1, my $value;
319
320
321 =head2 SUPPORT FUNCTIONS
322
323 =head3 EVENT PROCESSING AND EVENT LOOP INTEGRATION
324
325 =over 4
326
327 =item $fileno = BDB::poll_fileno
328
329 Return the I<request result pipe file descriptor>. This filehandle must be
330 polled for reading by some mechanism outside this module (e.g. Event or
331 select, see below or the SYNOPSIS). If the pipe becomes readable you have
332 to call C<poll_cb> to check the results.
333
334 See C<poll_cb> for an example.
335
336 =item BDB::poll_cb
337
338 Process some outstanding events on the result pipe. You have to call this
339 regularly. Returns the number of events processed. Returns immediately
340 when no events are outstanding. The amount of events processed depends on
341 the settings of C<BDB::max_poll_req> and C<BDB::max_poll_time>.
342
343 If not all requests were processed for whatever reason, the filehandle
344 will still be ready when C<poll_cb> returns.
345
346 Example: Install an Event watcher that automatically calls
347 BDB::poll_cb with high priority:
348
349 Event->io (fd => BDB::poll_fileno,
350 poll => 'r', async => 1,
351 cb => \&BDB::poll_cb);
352
353 =item BDB::max_poll_reqs $nreqs
354
355 =item BDB::max_poll_time $seconds
356
357 These set the maximum number of requests (default C<0>, meaning infinity)
358 that are being processed by C<BDB::poll_cb> in one call, respectively
359 the maximum amount of time (default C<0>, meaning infinity) spent in
360 C<BDB::poll_cb> to process requests (more correctly the mininum amount
361 of time C<poll_cb> is allowed to use).
362
363 Setting C<max_poll_time> to a non-zero value creates an overhead of one
364 syscall per request processed, which is not normally a problem unless your
365 callbacks are really really fast or your OS is really really slow (I am
366 not mentioning Solaris here). Using C<max_poll_reqs> incurs no overhead.
367
368 Setting these is useful if you want to ensure some level of
369 interactiveness when perl is not fast enough to process all requests in
370 time.
371
372 For interactive programs, values such as C<0.01> to C<0.1> should be fine.
373
374 Example: Install an Event watcher that automatically calls
375 BDB::poll_cb with low priority, to ensure that other parts of the
376 program get the CPU sometimes even under high AIO load.
377
378 # try not to spend much more than 0.1s in poll_cb
379 BDB::max_poll_time 0.1;
380
381 # use a low priority so other tasks have priority
382 Event->io (fd => BDB::poll_fileno,
383 poll => 'r', nice => 1,
384 cb => &BDB::poll_cb);
385
386 =item BDB::poll_wait
387
388 If there are any outstanding requests and none of them in the result
389 phase, wait till the result filehandle becomes ready for reading (simply
390 does a C<select> on the filehandle. This is useful if you want to
391 synchronously wait for some requests to finish).
392
393 See C<nreqs> for an example.
394
395 =item BDB::poll
396
397 Waits until some requests have been handled.
398
399 Returns the number of requests processed, but is otherwise strictly
400 equivalent to:
401
402 BDB::poll_wait, BDB::poll_cb
403
404 =item BDB::flush
405
406 Wait till all outstanding AIO requests have been handled.
407
408 Strictly equivalent to:
409
410 BDB::poll_wait, BDB::poll_cb
411 while BDB::nreqs;
412
413 =back
414
415 =head3 CONTROLLING THE NUMBER OF THREADS
416
417 =over 4
418
419 =item BDB::min_parallel $nthreads
420
421 Set the minimum number of AIO threads to C<$nthreads>. The current
422 default is C<8>, which means eight asynchronous operations can execute
423 concurrently at any one time (the number of outstanding requests,
424 however, is unlimited).
425
426 BDB starts threads only on demand, when an AIO request is queued and
427 no free thread exists. Please note that queueing up a hundred requests can
428 create demand for a hundred threads, even if it turns out that everything
429 is in the cache and could have been processed faster by a single thread.
430
431 It is recommended to keep the number of threads relatively low, as some
432 Linux kernel versions will scale negatively with the number of threads
433 (higher parallelity => MUCH higher latency). With current Linux 2.6
434 versions, 4-32 threads should be fine.
435
436 Under most circumstances you don't need to call this function, as the
437 module selects a default that is suitable for low to moderate load.
438
439 =item BDB::max_parallel $nthreads
440
441 Sets the maximum number of AIO threads to C<$nthreads>. If more than the
442 specified number of threads are currently running, this function kills
443 them. This function blocks until the limit is reached.
444
445 While C<$nthreads> are zero, aio requests get queued but not executed
446 until the number of threads has been increased again.
447
448 This module automatically runs C<max_parallel 0> at program end, to ensure
449 that all threads are killed and that there are no outstanding requests.
450
451 Under normal circumstances you don't need to call this function.
452
453 =item BDB::max_idle $nthreads
454
455 Limit the number of threads (default: 4) that are allowed to idle (i.e.,
456 threads that did not get a request to process within 10 seconds). That
457 means if a thread becomes idle while C<$nthreads> other threads are also
458 idle, it will free its resources and exit.
459
460 This is useful when you allow a large number of threads (e.g. 100 or 1000)
461 to allow for extremely high load situations, but want to free resources
462 under normal circumstances (1000 threads can easily consume 30MB of RAM).
463
464 The default is probably ok in most situations, especially if thread
465 creation is fast. If thread creation is very slow on your system you might
466 want to use larger values.
467
468 =item $oldmaxreqs = BDB::max_outstanding $maxreqs
469
470 This is a very bad function to use in interactive programs because it
471 blocks, and a bad way to reduce concurrency because it is inexact: Better
472 use an C<aio_group> together with a feed callback.
473
474 Sets the maximum number of outstanding requests to C<$nreqs>. If you
475 to queue up more than this number of requests, the next call to the
476 C<poll_cb> (and C<poll_some> and other functions calling C<poll_cb>)
477 function will block until the limit is no longer exceeded.
478
479 The default value is very large, so there is no practical limit on the
480 number of outstanding requests.
481
482 You can still queue as many requests as you want. Therefore,
483 C<max_oustsanding> is mainly useful in simple scripts (with low values) or
484 as a stop gap to shield against fatal memory overflow (with large values).
485
486 =item BDB::set_sync_prepare $cb
487
488 Sets a callback that is called whenever a request is created without an
489 explicit callback. It has to return two code references. The first is used
490 as the request callback, and the second is called to wait until the first
491 callback has been called. The default implementation works like this:
492
493 sub {
494 my $status;
495 (
496 sub { $status = $! },
497 sub { BDB::poll while !defined $status; $! = $status },
498 )
499 }
500
501 =back
502
503 =head3 STATISTICAL INFORMATION
504
505 =over 4
506
507 =item BDB::nreqs
508
509 Returns the number of requests currently in the ready, execute or pending
510 states (i.e. for which their callback has not been invoked yet).
511
512 Example: wait till there are no outstanding requests anymore:
513
514 BDB::poll_wait, BDB::poll_cb
515 while BDB::nreqs;
516
517 =item BDB::nready
518
519 Returns the number of requests currently in the ready state (not yet
520 executed).
521
522 =item BDB::npending
523
524 Returns the number of requests currently in the pending state (executed,
525 but not yet processed by poll_cb).
526
527 =back
528
529 =cut
530
531 set_sync_prepare {
532 my $status;
533 (
534 sub {
535 $status = $!;
536 },
537 sub {
538 BDB::poll while !defined $status;
539 $! = $status;
540 },
541 )
542 };
543
544 min_parallel 8;
545
546 END { flush }
547
548 1;
549
550 =head2 FORK BEHAVIOUR
551
552 This module should do "the right thing" when the process using it forks:
553
554 Before the fork, IO::AIO enters a quiescent state where no requests
555 can be added in other threads and no results will be processed. After
556 the fork the parent simply leaves the quiescent state and continues
557 request/result processing, while the child frees the request/result queue
558 (so that the requests started before the fork will only be handled in the
559 parent). Threads will be started on demand until the limit set in the
560 parent process has been reached again.
561
562 In short: the parent will, after a short pause, continue as if fork had
563 not been called, while the child will act as if IO::AIO has not been used
564 yet.
565
566 =head2 MEMORY USAGE
567
568 Per-request usage:
569
570 Each aio request uses - depending on your architecture - around 100-200
571 bytes of memory. In addition, stat requests need a stat buffer (possibly
572 a few hundred bytes), readdir requires a result buffer and so on. Perl
573 scalars and other data passed into aio requests will also be locked and
574 will consume memory till the request has entered the done state.
575
576 This is now awfully much, so queuing lots of requests is not usually a
577 problem.
578
579 Per-thread usage:
580
581 In the execution phase, some aio requests require more memory for
582 temporary buffers, and each thread requires a stack and other data
583 structures (usually around 16k-128k, depending on the OS).
584
585 =head1 KNOWN BUGS
586
587 Known bugs will be fixed in the next release.
588
589 =head1 SEE ALSO
590
591 L<Coro::AIO>.
592
593 =head1 AUTHOR
594
595 Marc Lehmann <schmorp@schmorp.de>
596 http://home.schmorp.de/
597
598 =cut
599