ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/BDB/BDB.pm
Revision: 1.10
Committed: Mon Aug 13 12:01:45 2007 UTC (16 years, 9 months ago) by root
Branch: MAIN
Changes since 1.9: +230 -5 lines
Log Message:
vastly improved docs

File Contents

# Content
1 =head1 NAME
2
3 BDB - Asynchronous Berkeley DB access
4
5 =head1 SYNOPSIS
6
7 use BDB;
8
9 =head1 DESCRIPTION
10
11 See the BerkeleyDB documentation (L<http://www.oracle.com/technology/documentation/berkeley-db/db/index.html>).
12 The BDB API is very similar to the C API (the translation ahs been very faithful).
13
14 See also the example sections in the document below and possibly the eg/
15 subdirectory of the BDB distribution. Last not least see the IO::AIO
16 documentation, as that module uses almost the same asynchronous request
17 model as this module.
18
19 I know this is woefully inadequate documentation. Send a patch!
20
21
22 =head1 REQUEST ANATOMY AND LIFETIME
23
24 Every request method creates a request. which is a C data structure not
25 directly visible to Perl.
26
27 During their existance, bdb requests travel through the following states,
28 in order:
29
30 =over 4
31
32 =item ready
33
34 Immediately after a request is created it is put into the ready state,
35 waiting for a thread to execute it.
36
37 =item execute
38
39 A thread has accepted the request for processing and is currently
40 executing it (e.g. blocking in read).
41
42 =item pending
43
44 The request has been executed and is waiting for result processing.
45
46 While request submission and execution is fully asynchronous, result
47 processing is not and relies on the perl interpreter calling C<poll_cb>
48 (or another function with the same effect).
49
50 =item result
51
52 The request results are processed synchronously by C<poll_cb>.
53
54 The C<poll_cb> function will process all outstanding aio requests by
55 calling their callbacks, freeing memory associated with them and managing
56 any groups they are contained in.
57
58 =item done
59
60 Request has reached the end of its lifetime and holds no resources anymore
61 (except possibly for the Perl object, but its connection to the actual
62 aio request is severed and calling its methods will either do nothing or
63 result in a runtime error).
64
65 =back
66
67 =cut
68
69 package BDB;
70
71 no warnings;
72 use strict 'vars';
73
74 use base 'Exporter';
75
76 BEGIN {
77 our $VERSION = '0.6';
78
79 our @BDB_REQ = qw(
80 db_env_open db_env_close db_env_txn_checkpoint db_env_lock_detect
81 db_env_memp_sync db_env_memp_trickle
82 db_open db_close db_compact db_sync db_put db_get db_pget db_del db_key_range
83 db_txn_commit db_txn_abort
84 db_c_close db_c_count db_c_put db_c_get db_c_pget db_c_del
85 db_sequence_open db_sequence_close
86 db_sequence_get db_sequence_remove
87 );
88 our @EXPORT = (@BDB_REQ, qw(dbreq_pri dbreq_nice db_env_create db_create));
89 our @EXPORT_OK = qw(
90 poll_fileno poll_cb poll_wait flush
91 min_parallel max_parallel max_idle
92 nreqs nready npending nthreads
93 max_poll_time max_poll_reqs
94 );
95
96 require XSLoader;
97 XSLoader::load ("BDB", $VERSION);
98 }
99
100 =head2 BERKELEYDB FUNCTIONS
101
102 All of these are functions. The create functions simply return a new
103 object and never block. All the remaining functions all take an optional
104 callback as last argument. If it is missing, then the fucntion will be
105 executed synchronously.
106
107 BDB functions that cannot block (mostly functions that manipulate
108 settings) are method calls on the relevant objects, so the rule of thumb
109 is: if its a method, its not blocking, if its a function, it takes a
110 callback as last argument.
111
112 In the following, C<$int> signifies an integer return value,
113 C<octetstring> is a "binary string" (i.e. a perl string with no character
114 indices >255), C<U32> is an unsigned 32 bit integer, C<int> is some
115 integer, C<NV> is a floating point value.
116
117 The C<SV *> types are generic perl scalars (for input and output of data
118 values), and the C<SV *callback> is the optional callback function to call
119 when the request is completed.
120
121 The various C<DB_ENV> etc. arguments are handles return by db_env_create>,
122 C<C<db_create>, C<txn_begin> and so on. If they have an appended _ornull>
123 C<this means they are optional and you can pass C<undef> for them,
124 C<resulting a NULL pointer on the C level.
125
126 =head3 BDB functions
127
128 Functions in the BDB namespace, exported by default:
129
130 $env = db_env_create (U32 env_flags = 0)
131
132 db_env_open (DB_ENV *env, octetstring db_home, U32 open_flags, int mode, SV *callback = &PL_sv_undef)
133 db_env_close (DB_ENV *env, U32 flags = 0, SV *callback = &PL_sv_undef)
134 db_env_txn_checkpoint (DB_ENV *env, U32 kbyte = 0, U32 min = 0, U32 flags = 0, SV *callback = &PL_sv_undef)
135 db_env_lock_detect (DB_ENV *env, U32 flags = 0, U32 atype = DB_LOCK_DEFAULT, SV *dummy = 0, SV *callback = &PL_sv_undef)
136 db_env_memp_sync (DB_ENV *env, SV *dummy = 0, SV *callback = &PL_sv_undef)
137 db_env_memp_trickle (DB_ENV *env, int percent, SV *dummy = 0, SV *callback = &PL_sv_undef)
138
139 $db = db_create (DB_ENV *env = 0, U32 flags = 0)
140
141 db_open (DB *db, DB_TXN_ornull *txnid, octetstring file, octetstring database, int type, U32 flags, int mode, SV *callback = &PL_sv_undef)
142 db_close (DB *db, U32 flags = 0, SV *callback = &PL_sv_undef)
143 db_compact (DB *db, DB_TXN_ornull *txn = 0, SV *start = 0, SV *stop = 0, SV *unused1 = 0, U32 flags = DB_FREE_SPACE, SV *unused2 = 0, SV *callback = &PL_sv_unde
144 db_sync (DB *db, U32 flags = 0, SV *callback = &PL_sv_undef)
145 db_key_range (DB *db, DB_TXN_ornull *txn, SV *key, SV *key_range, U32 flags = 0, SV *callback = &PL_sv_undef)
146 db_put (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
147 db_get (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
148 db_pget (DB *db, DB_TXN_ornull *txn, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
149 db_del (DB *db, DB_TXN_ornull *txn, SV *key, U32 flags = 0, SV *callback = &PL_sv_undef)
150 db_txn_commit (DB_TXN *txn, U32 flags = 0, SV *callback = &PL_sv_undef)
151 db_txn_abort (DB_TXN *txn, SV *callback = &PL_sv_undef)
152 db_c_close (DBC *dbc, SV *callback = &PL_sv_undef)
153 db_c_count (DBC *dbc, SV *count, U32 flags = 0, SV *callback = &PL_sv_undef)
154 db_c_put (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
155 db_c_get (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
156 db_c_pget (DBC *dbc, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
157 db_c_del (DBC *dbc, U32 flags = 0, SV *callback = &PL_sv_undef)
158
159 db_sequence_open (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, SV *key, U32 flags = 0, SV *callback = &PL_sv_undef)
160 db_sequence_close (DB_SEQUENCE *seq, U32 flags = 0, SV *callback = &PL_sv_undef)
161 db_sequence_get (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, int delta, SV *seq_value, U32 flags = DB_TXN_NOSYNC, SV *callback = &PL_sv_undef)
162 db_sequence_remove (DB_SEQUENCE *seq, DB_TXN_ornull *txnid = 0, U32 flags = 0, SV *callback = &PL_sv_undef)
163
164
165 =head3 DB_ENV/database environment methods
166
167 Methods available on DB_ENV/$env handles:
168
169 DESTROY (DB_ENV_ornull *env)
170 CODE:
171 if (env)
172 env->close (env, 0);
173
174 $int = $env->set_data_dir (const char *dir)
175 $int = $env->set_tmp_dir (const char *dir)
176 $int = $env->set_lg_dir (const char *dir)
177 $int = $env->set_shm_key (long shm_key)
178 $int = $env->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0)
179 $int = $env->set_flags (U32 flags, int onoff)
180 $env->set_errfile (FILE *errfile = 0)
181 $env->set_msgfile (FILE *msgfile = 0)
182 $int = $env->set_verbose (U32 which, int onoff = 1)
183 $int = $env->set_encrypt (const char *password, U32 flags = 0)
184 $int = $env->set_timeout (NV timeout, U32 flags)
185 $int = $env->set_mp_max_openfd (int maxopenfd);
186 $int = $env->set_mp_max_write (int maxwrite, int maxwrite_sleep);
187 $int = $env->set_mp_mmapsize (int mmapsize_mb)
188 $int = $env->set_lk_detect (U32 detect = DB_LOCK_DEFAULT)
189 $int = $env->set_lk_max_lockers (U32 max)
190 $int = $env->set_lk_max_locks (U32 max)
191 $int = $env->set_lk_max_objects (U32 max)
192 $int = $env->set_lg_bsize (U32 max)
193 $int = $env->set_lg_max (U32 max)
194
195 $txn = $env->txn_begin (DB_TXN_ornull *parent = 0, U32 flags = 0)
196
197 =head4 example
198
199 use AnyEvent;
200 use BDB;
201
202 our $FH; open $FH, "<&=" . BDB::poll_fileno;
203 our $WATCHER = AnyEvent->io (fh => $FH, poll => 'r', cb => \&BDB::poll_cb);
204
205 BDB::min_parallel 8;
206
207 my $env = db_env_create;
208
209 mkdir "bdtest", 0700;
210 db_env_open
211 $env,
212 "bdtest",
213 BDB::INIT_LOCK | BDB::INIT_LOG | BDB::INIT_MPOOL | BDB::INIT_TXN | BDB::RECOVER | BDB::USE_ENVIRON | BDB::CREATE,
214 0600;
215
216 $env->set_flags (BDB::AUTO_COMMIT | BDB::TXN_NOSYNC, 1);
217
218
219 =head3 DB/database methods
220
221 Methods available on DB/$db handles:
222
223 DESTROY (DB_ornull *db)
224 CODE:
225 if (db)
226 {
227 SV *env = (SV *)db->app_private;
228 db->close (db, 0);
229 SvREFCNT_dec (env);
230 }
231
232 $int = $db->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0)
233 $int = $db->set_flags (U32 flags)
234 $int = $db->set_encrypt (const char *password, U32 flags)
235 $int = $db->set_lorder (int lorder)
236 $int = $db->set_bt_minkey (U32 minkey)
237 $int = $db->set_re_delim (int delim)
238 $int = $db->set_re_pad (int re_pad)
239 $int = $db->set_re_source (char *source)
240 $int = $db->set_re_len (U32 re_len)
241 $int = $db->set_h_ffactor (U32 h_ffactor)
242 $int = $db->set_h_nelem (U32 h_nelem)
243 $int = $db->set_q_extentsize (U32 extentsize)
244
245 $dbc = $db->cursor (DB_TXN_ornull *txn = 0, U32 flags = 0)
246 $seq = $db->sequence (U32 flags = 0)
247
248 =head4 example
249
250 my $db = db_create $env;
251 db_open $db, undef, "table", undef, BDB::BTREE, BDB::AUTO_COMMIT | BDB::CREATE | BDB::READ_UNCOMMITTED, 0600;
252
253 for (1..1000) {
254 db_put $db, undef, "key $_", "data $_";
255
256 db_key_range $db, undef, "key $_", my $keyrange;
257 my ($lt, $eq, $gt) = @$keyrange;
258 }
259
260 db_del $db, undef, "key $_" for 1..1000;
261
262 db_sync $db;
263
264
265 =head3 DB_TXN/transaction methods
266
267 Methods available on DB_TXN/$txn handles:
268
269 DESTROY (DB_TXN_ornull *txn)
270 CODE:
271 if (txn)
272 txn->abort (txn);
273
274 $int = $txn->set_timeout (NV timeout, U32 flags)
275
276
277 =head3 DBC/cursor methods
278
279 Methods available on DBC/$dbc handles:
280
281 DESTROY (DBC_ornull *dbc)
282 CODE:
283 if (dbc)
284 dbc->c_close (dbc);
285
286 =head4 example
287
288 my $c = $db->cursor;
289
290 for (;;) {
291 db_c_get $c, my $key, my $data, BDB::NEXT;
292 warn "<$!,$key,$data>";
293 last if $!;
294 }
295
296 db_c_close $c;
297
298 =head3 DB_SEQUENCE/sequence methods
299
300 Methods available on DB_SEQUENCE/$seq handles:
301
302 DESTROY (DB_SEQUENCE_ornull *seq)
303 CODE:
304 if (seq)
305 seq->close (seq, 0);
306
307 $int = $seq->initial_value (db_seq_t value)
308 $int = $seq->set_cachesize (U32 size)
309 $int = $seq->set_flags (U32 flags)
310 $int = $seq->set_range (db_seq_t min, db_seq_t max)
311
312 =head4 example
313
314 my $seq = $db->sequence;
315
316 db_sequence_open $seq, undef, "seq", BDB::CREATE;
317 db_sequence_get $seq, undef, 1, my $value;
318
319
320 =head2 SUPPORT FUNCTIONS
321
322 =head3 EVENT PROCESSING AND EVENT LOOP INTEGRATION
323
324 =over 4
325
326 =item $fileno = BDB::poll_fileno
327
328 Return the I<request result pipe file descriptor>. This filehandle must be
329 polled for reading by some mechanism outside this module (e.g. Event or
330 select, see below or the SYNOPSIS). If the pipe becomes readable you have
331 to call C<poll_cb> to check the results.
332
333 See C<poll_cb> for an example.
334
335 =item BDB::poll_cb
336
337 Process some outstanding events on the result pipe. You have to call this
338 regularly. Returns the number of events processed. Returns immediately
339 when no events are outstanding. The amount of events processed depends on
340 the settings of C<BDB::max_poll_req> and C<BDB::max_poll_time>.
341
342 If not all requests were processed for whatever reason, the filehandle
343 will still be ready when C<poll_cb> returns.
344
345 Example: Install an Event watcher that automatically calls
346 BDB::poll_cb with high priority:
347
348 Event->io (fd => BDB::poll_fileno,
349 poll => 'r', async => 1,
350 cb => \&BDB::poll_cb);
351
352 =item BDB::max_poll_reqs $nreqs
353
354 =item BDB::max_poll_time $seconds
355
356 These set the maximum number of requests (default C<0>, meaning infinity)
357 that are being processed by C<BDB::poll_cb> in one call, respectively
358 the maximum amount of time (default C<0>, meaning infinity) spent in
359 C<BDB::poll_cb> to process requests (more correctly the mininum amount
360 of time C<poll_cb> is allowed to use).
361
362 Setting C<max_poll_time> to a non-zero value creates an overhead of one
363 syscall per request processed, which is not normally a problem unless your
364 callbacks are really really fast or your OS is really really slow (I am
365 not mentioning Solaris here). Using C<max_poll_reqs> incurs no overhead.
366
367 Setting these is useful if you want to ensure some level of
368 interactiveness when perl is not fast enough to process all requests in
369 time.
370
371 For interactive programs, values such as C<0.01> to C<0.1> should be fine.
372
373 Example: Install an Event watcher that automatically calls
374 BDB::poll_cb with low priority, to ensure that other parts of the
375 program get the CPU sometimes even under high AIO load.
376
377 # try not to spend much more than 0.1s in poll_cb
378 BDB::max_poll_time 0.1;
379
380 # use a low priority so other tasks have priority
381 Event->io (fd => BDB::poll_fileno,
382 poll => 'r', nice => 1,
383 cb => &BDB::poll_cb);
384
385 =item BDB::poll_wait
386
387 If there are any outstanding requests and none of them in the result
388 phase, wait till the result filehandle becomes ready for reading (simply
389 does a C<select> on the filehandle. This is useful if you want to
390 synchronously wait for some requests to finish).
391
392 See C<nreqs> for an example.
393
394 =item BDB::poll
395
396 Waits until some requests have been handled.
397
398 Returns the number of requests processed, but is otherwise strictly
399 equivalent to:
400
401 BDB::poll_wait, BDB::poll_cb
402
403 =item BDB::flush
404
405 Wait till all outstanding AIO requests have been handled.
406
407 Strictly equivalent to:
408
409 BDB::poll_wait, BDB::poll_cb
410 while BDB::nreqs;
411
412 =back
413
414 =head3 CONTROLLING THE NUMBER OF THREADS
415
416 =over 4
417
418 =item BDB::min_parallel $nthreads
419
420 Set the minimum number of AIO threads to C<$nthreads>. The current
421 default is C<8>, which means eight asynchronous operations can execute
422 concurrently at any one time (the number of outstanding requests,
423 however, is unlimited).
424
425 BDB starts threads only on demand, when an AIO request is queued and
426 no free thread exists. Please note that queueing up a hundred requests can
427 create demand for a hundred threads, even if it turns out that everything
428 is in the cache and could have been processed faster by a single thread.
429
430 It is recommended to keep the number of threads relatively low, as some
431 Linux kernel versions will scale negatively with the number of threads
432 (higher parallelity => MUCH higher latency). With current Linux 2.6
433 versions, 4-32 threads should be fine.
434
435 Under most circumstances you don't need to call this function, as the
436 module selects a default that is suitable for low to moderate load.
437
438 =item BDB::max_parallel $nthreads
439
440 Sets the maximum number of AIO threads to C<$nthreads>. If more than the
441 specified number of threads are currently running, this function kills
442 them. This function blocks until the limit is reached.
443
444 While C<$nthreads> are zero, aio requests get queued but not executed
445 until the number of threads has been increased again.
446
447 This module automatically runs C<max_parallel 0> at program end, to ensure
448 that all threads are killed and that there are no outstanding requests.
449
450 Under normal circumstances you don't need to call this function.
451
452 =item BDB::max_idle $nthreads
453
454 Limit the number of threads (default: 4) that are allowed to idle (i.e.,
455 threads that did not get a request to process within 10 seconds). That
456 means if a thread becomes idle while C<$nthreads> other threads are also
457 idle, it will free its resources and exit.
458
459 This is useful when you allow a large number of threads (e.g. 100 or 1000)
460 to allow for extremely high load situations, but want to free resources
461 under normal circumstances (1000 threads can easily consume 30MB of RAM).
462
463 The default is probably ok in most situations, especially if thread
464 creation is fast. If thread creation is very slow on your system you might
465 want to use larger values.
466
467 =item $oldmaxreqs = BDB::max_outstanding $maxreqs
468
469 This is a very bad function to use in interactive programs because it
470 blocks, and a bad way to reduce concurrency because it is inexact: Better
471 use an C<aio_group> together with a feed callback.
472
473 Sets the maximum number of outstanding requests to C<$nreqs>. If you
474 to queue up more than this number of requests, the next call to the
475 C<poll_cb> (and C<poll_some> and other functions calling C<poll_cb>)
476 function will block until the limit is no longer exceeded.
477
478 The default value is very large, so there is no practical limit on the
479 number of outstanding requests.
480
481 You can still queue as many requests as you want. Therefore,
482 C<max_oustsanding> is mainly useful in simple scripts (with low values) or
483 as a stop gap to shield against fatal memory overflow (with large values).
484
485 =item BDB::set_sync_prepare $cb
486
487 Sets a callback that is called whenever a request is created without an
488 explicit callback. It has to return two code references. The first is used
489 as the request callback, and the second is called to wait until the first
490 callback has been called. The default implementation works like this:
491
492 sub {
493 my $status;
494 (
495 sub { $status = $! },
496 sub { BDB::poll while !defined $status; $! = $status },
497 )
498 }
499
500 =back
501
502 =head3 STATISTICAL INFORMATION
503
504 =over 4
505
506 =item BDB::nreqs
507
508 Returns the number of requests currently in the ready, execute or pending
509 states (i.e. for which their callback has not been invoked yet).
510
511 Example: wait till there are no outstanding requests anymore:
512
513 BDB::poll_wait, BDB::poll_cb
514 while BDB::nreqs;
515
516 =item BDB::nready
517
518 Returns the number of requests currently in the ready state (not yet
519 executed).
520
521 =item BDB::npending
522
523 Returns the number of requests currently in the pending state (executed,
524 but not yet processed by poll_cb).
525
526 =back
527
528 =cut
529
530 set_sync_prepare {
531 my $status;
532 (
533 sub {
534 $status = $!;
535 },
536 sub {
537 BDB::poll while !defined $status;
538 $! = $status;
539 },
540 )
541 };
542
543 min_parallel 8;
544
545 END { flush }
546
547 1;
548
549 =head2 FORK BEHAVIOUR
550
551 This module should do "the right thing" when the process using it forks:
552
553 Before the fork, IO::AIO enters a quiescent state where no requests
554 can be added in other threads and no results will be processed. After
555 the fork the parent simply leaves the quiescent state and continues
556 request/result processing, while the child frees the request/result queue
557 (so that the requests started before the fork will only be handled in the
558 parent). Threads will be started on demand until the limit set in the
559 parent process has been reached again.
560
561 In short: the parent will, after a short pause, continue as if fork had
562 not been called, while the child will act as if IO::AIO has not been used
563 yet.
564
565 =head2 MEMORY USAGE
566
567 Per-request usage:
568
569 Each aio request uses - depending on your architecture - around 100-200
570 bytes of memory. In addition, stat requests need a stat buffer (possibly
571 a few hundred bytes), readdir requires a result buffer and so on. Perl
572 scalars and other data passed into aio requests will also be locked and
573 will consume memory till the request has entered the done state.
574
575 This is now awfully much, so queuing lots of requests is not usually a
576 problem.
577
578 Per-thread usage:
579
580 In the execution phase, some aio requests require more memory for
581 temporary buffers, and each thread requires a stack and other data
582 structures (usually around 16k-128k, depending on the OS).
583
584 =head1 KNOWN BUGS
585
586 Known bugs will be fixed in the next release.
587
588 =head1 SEE ALSO
589
590 L<Coro::AIO>.
591
592 =head1 AUTHOR
593
594 Marc Lehmann <schmorp@schmorp.de>
595 http://home.schmorp.de/
596
597 =cut
598