ViewVC Help
View File | Revision Log | Show Annotations | Download File
/cvs/BDB/BDB.pm
Revision: 1.10
Committed: Mon Aug 13 12:01:45 2007 UTC (16 years, 9 months ago) by root
Branch: MAIN
Changes since 1.9: +230 -5 lines
Log Message:
vastly improved docs

File Contents

# User Rev Content
1 root 1.1 =head1 NAME
2    
3 root 1.2 BDB - Asynchronous Berkeley DB access
4 root 1.1
5     =head1 SYNOPSIS
6    
7 root 1.2 use BDB;
8 root 1.1
9     =head1 DESCRIPTION
10    
11 root 1.10 See the BerkeleyDB documentation (L<http://www.oracle.com/technology/documentation/berkeley-db/db/index.html>).
12     The BDB API is very similar to the C API (the translation ahs been very faithful).
13    
14     See also the example sections in the document below and possibly the eg/
15     subdirectory of the BDB distribution. Last not least see the IO::AIO
16     documentation, as that module uses almost the same asynchronous request
17     model as this module.
18    
19     I know this is woefully inadequate documentation. Send a patch!
20 root 1.7
21 root 1.1
22     =head1 REQUEST ANATOMY AND LIFETIME
23    
24     Every request method creates a request. which is a C data structure not
25     directly visible to Perl.
26    
27     During their existance, bdb requests travel through the following states,
28     in order:
29    
30     =over 4
31    
32     =item ready
33    
34     Immediately after a request is created it is put into the ready state,
35     waiting for a thread to execute it.
36    
37     =item execute
38    
39     A thread has accepted the request for processing and is currently
40     executing it (e.g. blocking in read).
41    
42     =item pending
43    
44     The request has been executed and is waiting for result processing.
45    
46     While request submission and execution is fully asynchronous, result
47     processing is not and relies on the perl interpreter calling C<poll_cb>
48     (or another function with the same effect).
49    
50     =item result
51    
52     The request results are processed synchronously by C<poll_cb>.
53    
54     The C<poll_cb> function will process all outstanding aio requests by
55     calling their callbacks, freeing memory associated with them and managing
56     any groups they are contained in.
57    
58     =item done
59    
60     Request has reached the end of its lifetime and holds no resources anymore
61     (except possibly for the Perl object, but its connection to the actual
62     aio request is severed and calling its methods will either do nothing or
63     result in a runtime error).
64    
65     =back
66    
67     =cut
68    
69 root 1.2 package BDB;
70 root 1.1
71     no warnings;
72     use strict 'vars';
73    
74     use base 'Exporter';
75    
76     BEGIN {
77 root 1.10 our $VERSION = '0.6';
78 root 1.1
79 root 1.3 our @BDB_REQ = qw(
80 root 1.6 db_env_open db_env_close db_env_txn_checkpoint db_env_lock_detect
81     db_env_memp_sync db_env_memp_trickle
82     db_open db_close db_compact db_sync db_put db_get db_pget db_del db_key_range
83 root 1.4 db_txn_commit db_txn_abort
84 root 1.5 db_c_close db_c_count db_c_put db_c_get db_c_pget db_c_del
85 root 1.6 db_sequence_open db_sequence_close
86     db_sequence_get db_sequence_remove
87     );
88     our @EXPORT = (@BDB_REQ, qw(dbreq_pri dbreq_nice db_env_create db_create));
89     our @EXPORT_OK = qw(
90     poll_fileno poll_cb poll_wait flush
91     min_parallel max_parallel max_idle
92     nreqs nready npending nthreads
93     max_poll_time max_poll_reqs
94 root 1.3 );
95 root 1.1
96     require XSLoader;
97 root 1.2 XSLoader::load ("BDB", $VERSION);
98 root 1.1 }
99    
100 root 1.10 =head2 BERKELEYDB FUNCTIONS
101    
102     All of these are functions. The create functions simply return a new
103     object and never block. All the remaining functions all take an optional
104     callback as last argument. If it is missing, then the fucntion will be
105     executed synchronously.
106    
107     BDB functions that cannot block (mostly functions that manipulate
108     settings) are method calls on the relevant objects, so the rule of thumb
109     is: if its a method, its not blocking, if its a function, it takes a
110     callback as last argument.
111    
112     In the following, C<$int> signifies an integer return value,
113     C<octetstring> is a "binary string" (i.e. a perl string with no character
114     indices >255), C<U32> is an unsigned 32 bit integer, C<int> is some
115     integer, C<NV> is a floating point value.
116    
117     The C<SV *> types are generic perl scalars (for input and output of data
118     values), and the C<SV *callback> is the optional callback function to call
119     when the request is completed.
120    
121     The various C<DB_ENV> etc. arguments are handles return by db_env_create>,
122     C<C<db_create>, C<txn_begin> and so on. If they have an appended _ornull>
123     C<this means they are optional and you can pass C<undef> for them,
124     C<resulting a NULL pointer on the C level.
125    
126     =head3 BDB functions
127    
128     Functions in the BDB namespace, exported by default:
129    
130     $env = db_env_create (U32 env_flags = 0)
131    
132     db_env_open (DB_ENV *env, octetstring db_home, U32 open_flags, int mode, SV *callback = &PL_sv_undef)
133     db_env_close (DB_ENV *env, U32 flags = 0, SV *callback = &PL_sv_undef)
134     db_env_txn_checkpoint (DB_ENV *env, U32 kbyte = 0, U32 min = 0, U32 flags = 0, SV *callback = &PL_sv_undef)
135     db_env_lock_detect (DB_ENV *env, U32 flags = 0, U32 atype = DB_LOCK_DEFAULT, SV *dummy = 0, SV *callback = &PL_sv_undef)
136     db_env_memp_sync (DB_ENV *env, SV *dummy = 0, SV *callback = &PL_sv_undef)
137     db_env_memp_trickle (DB_ENV *env, int percent, SV *dummy = 0, SV *callback = &PL_sv_undef)
138    
139     $db = db_create (DB_ENV *env = 0, U32 flags = 0)
140    
141     db_open (DB *db, DB_TXN_ornull *txnid, octetstring file, octetstring database, int type, U32 flags, int mode, SV *callback = &PL_sv_undef)
142     db_close (DB *db, U32 flags = 0, SV *callback = &PL_sv_undef)
143     db_compact (DB *db, DB_TXN_ornull *txn = 0, SV *start = 0, SV *stop = 0, SV *unused1 = 0, U32 flags = DB_FREE_SPACE, SV *unused2 = 0, SV *callback = &PL_sv_unde
144     db_sync (DB *db, U32 flags = 0, SV *callback = &PL_sv_undef)
145     db_key_range (DB *db, DB_TXN_ornull *txn, SV *key, SV *key_range, U32 flags = 0, SV *callback = &PL_sv_undef)
146     db_put (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
147     db_get (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
148     db_pget (DB *db, DB_TXN_ornull *txn, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
149     db_del (DB *db, DB_TXN_ornull *txn, SV *key, U32 flags = 0, SV *callback = &PL_sv_undef)
150     db_txn_commit (DB_TXN *txn, U32 flags = 0, SV *callback = &PL_sv_undef)
151     db_txn_abort (DB_TXN *txn, SV *callback = &PL_sv_undef)
152     db_c_close (DBC *dbc, SV *callback = &PL_sv_undef)
153     db_c_count (DBC *dbc, SV *count, U32 flags = 0, SV *callback = &PL_sv_undef)
154     db_c_put (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
155     db_c_get (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
156     db_c_pget (DBC *dbc, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = &PL_sv_undef)
157     db_c_del (DBC *dbc, U32 flags = 0, SV *callback = &PL_sv_undef)
158    
159     db_sequence_open (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, SV *key, U32 flags = 0, SV *callback = &PL_sv_undef)
160     db_sequence_close (DB_SEQUENCE *seq, U32 flags = 0, SV *callback = &PL_sv_undef)
161     db_sequence_get (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, int delta, SV *seq_value, U32 flags = DB_TXN_NOSYNC, SV *callback = &PL_sv_undef)
162     db_sequence_remove (DB_SEQUENCE *seq, DB_TXN_ornull *txnid = 0, U32 flags = 0, SV *callback = &PL_sv_undef)
163    
164    
165     =head3 DB_ENV/database environment methods
166    
167     Methods available on DB_ENV/$env handles:
168    
169     DESTROY (DB_ENV_ornull *env)
170     CODE:
171     if (env)
172     env->close (env, 0);
173    
174     $int = $env->set_data_dir (const char *dir)
175     $int = $env->set_tmp_dir (const char *dir)
176     $int = $env->set_lg_dir (const char *dir)
177     $int = $env->set_shm_key (long shm_key)
178     $int = $env->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0)
179     $int = $env->set_flags (U32 flags, int onoff)
180     $env->set_errfile (FILE *errfile = 0)
181     $env->set_msgfile (FILE *msgfile = 0)
182     $int = $env->set_verbose (U32 which, int onoff = 1)
183     $int = $env->set_encrypt (const char *password, U32 flags = 0)
184     $int = $env->set_timeout (NV timeout, U32 flags)
185     $int = $env->set_mp_max_openfd (int maxopenfd);
186     $int = $env->set_mp_max_write (int maxwrite, int maxwrite_sleep);
187     $int = $env->set_mp_mmapsize (int mmapsize_mb)
188     $int = $env->set_lk_detect (U32 detect = DB_LOCK_DEFAULT)
189     $int = $env->set_lk_max_lockers (U32 max)
190     $int = $env->set_lk_max_locks (U32 max)
191     $int = $env->set_lk_max_objects (U32 max)
192     $int = $env->set_lg_bsize (U32 max)
193     $int = $env->set_lg_max (U32 max)
194    
195     $txn = $env->txn_begin (DB_TXN_ornull *parent = 0, U32 flags = 0)
196    
197     =head4 example
198    
199     use AnyEvent;
200     use BDB;
201    
202     our $FH; open $FH, "<&=" . BDB::poll_fileno;
203     our $WATCHER = AnyEvent->io (fh => $FH, poll => 'r', cb => \&BDB::poll_cb);
204    
205     BDB::min_parallel 8;
206    
207     my $env = db_env_create;
208    
209     mkdir "bdtest", 0700;
210     db_env_open
211     $env,
212     "bdtest",
213     BDB::INIT_LOCK | BDB::INIT_LOG | BDB::INIT_MPOOL | BDB::INIT_TXN | BDB::RECOVER | BDB::USE_ENVIRON | BDB::CREATE,
214     0600;
215    
216     $env->set_flags (BDB::AUTO_COMMIT | BDB::TXN_NOSYNC, 1);
217    
218    
219     =head3 DB/database methods
220    
221     Methods available on DB/$db handles:
222    
223     DESTROY (DB_ornull *db)
224     CODE:
225     if (db)
226     {
227     SV *env = (SV *)db->app_private;
228     db->close (db, 0);
229     SvREFCNT_dec (env);
230     }
231    
232     $int = $db->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0)
233     $int = $db->set_flags (U32 flags)
234     $int = $db->set_encrypt (const char *password, U32 flags)
235     $int = $db->set_lorder (int lorder)
236     $int = $db->set_bt_minkey (U32 minkey)
237     $int = $db->set_re_delim (int delim)
238     $int = $db->set_re_pad (int re_pad)
239     $int = $db->set_re_source (char *source)
240     $int = $db->set_re_len (U32 re_len)
241     $int = $db->set_h_ffactor (U32 h_ffactor)
242     $int = $db->set_h_nelem (U32 h_nelem)
243     $int = $db->set_q_extentsize (U32 extentsize)
244    
245     $dbc = $db->cursor (DB_TXN_ornull *txn = 0, U32 flags = 0)
246     $seq = $db->sequence (U32 flags = 0)
247    
248     =head4 example
249    
250     my $db = db_create $env;
251     db_open $db, undef, "table", undef, BDB::BTREE, BDB::AUTO_COMMIT | BDB::CREATE | BDB::READ_UNCOMMITTED, 0600;
252    
253     for (1..1000) {
254     db_put $db, undef, "key $_", "data $_";
255    
256     db_key_range $db, undef, "key $_", my $keyrange;
257     my ($lt, $eq, $gt) = @$keyrange;
258     }
259    
260     db_del $db, undef, "key $_" for 1..1000;
261    
262     db_sync $db;
263    
264    
265     =head3 DB_TXN/transaction methods
266    
267     Methods available on DB_TXN/$txn handles:
268    
269     DESTROY (DB_TXN_ornull *txn)
270     CODE:
271     if (txn)
272     txn->abort (txn);
273    
274     $int = $txn->set_timeout (NV timeout, U32 flags)
275    
276    
277     =head3 DBC/cursor methods
278    
279     Methods available on DBC/$dbc handles:
280    
281     DESTROY (DBC_ornull *dbc)
282     CODE:
283     if (dbc)
284     dbc->c_close (dbc);
285    
286     =head4 example
287    
288     my $c = $db->cursor;
289    
290     for (;;) {
291     db_c_get $c, my $key, my $data, BDB::NEXT;
292     warn "<$!,$key,$data>";
293     last if $!;
294     }
295    
296     db_c_close $c;
297    
298     =head3 DB_SEQUENCE/sequence methods
299    
300     Methods available on DB_SEQUENCE/$seq handles:
301    
302     DESTROY (DB_SEQUENCE_ornull *seq)
303     CODE:
304     if (seq)
305     seq->close (seq, 0);
306    
307     $int = $seq->initial_value (db_seq_t value)
308     $int = $seq->set_cachesize (U32 size)
309     $int = $seq->set_flags (U32 flags)
310     $int = $seq->set_range (db_seq_t min, db_seq_t max)
311    
312     =head4 example
313    
314     my $seq = $db->sequence;
315    
316     db_sequence_open $seq, undef, "seq", BDB::CREATE;
317     db_sequence_get $seq, undef, 1, my $value;
318    
319    
320 root 1.1 =head2 SUPPORT FUNCTIONS
321    
322     =head3 EVENT PROCESSING AND EVENT LOOP INTEGRATION
323    
324     =over 4
325    
326 root 1.2 =item $fileno = BDB::poll_fileno
327 root 1.1
328     Return the I<request result pipe file descriptor>. This filehandle must be
329     polled for reading by some mechanism outside this module (e.g. Event or
330     select, see below or the SYNOPSIS). If the pipe becomes readable you have
331     to call C<poll_cb> to check the results.
332    
333     See C<poll_cb> for an example.
334    
335 root 1.2 =item BDB::poll_cb
336 root 1.1
337     Process some outstanding events on the result pipe. You have to call this
338     regularly. Returns the number of events processed. Returns immediately
339     when no events are outstanding. The amount of events processed depends on
340 root 1.2 the settings of C<BDB::max_poll_req> and C<BDB::max_poll_time>.
341 root 1.1
342     If not all requests were processed for whatever reason, the filehandle
343     will still be ready when C<poll_cb> returns.
344    
345     Example: Install an Event watcher that automatically calls
346 root 1.2 BDB::poll_cb with high priority:
347 root 1.1
348 root 1.2 Event->io (fd => BDB::poll_fileno,
349 root 1.1 poll => 'r', async => 1,
350 root 1.2 cb => \&BDB::poll_cb);
351 root 1.1
352 root 1.2 =item BDB::max_poll_reqs $nreqs
353 root 1.1
354 root 1.2 =item BDB::max_poll_time $seconds
355 root 1.1
356     These set the maximum number of requests (default C<0>, meaning infinity)
357 root 1.2 that are being processed by C<BDB::poll_cb> in one call, respectively
358 root 1.1 the maximum amount of time (default C<0>, meaning infinity) spent in
359 root 1.2 C<BDB::poll_cb> to process requests (more correctly the mininum amount
360 root 1.1 of time C<poll_cb> is allowed to use).
361    
362     Setting C<max_poll_time> to a non-zero value creates an overhead of one
363     syscall per request processed, which is not normally a problem unless your
364     callbacks are really really fast or your OS is really really slow (I am
365     not mentioning Solaris here). Using C<max_poll_reqs> incurs no overhead.
366    
367     Setting these is useful if you want to ensure some level of
368     interactiveness when perl is not fast enough to process all requests in
369     time.
370    
371     For interactive programs, values such as C<0.01> to C<0.1> should be fine.
372    
373     Example: Install an Event watcher that automatically calls
374 root 1.2 BDB::poll_cb with low priority, to ensure that other parts of the
375 root 1.1 program get the CPU sometimes even under high AIO load.
376    
377     # try not to spend much more than 0.1s in poll_cb
378 root 1.2 BDB::max_poll_time 0.1;
379 root 1.1
380     # use a low priority so other tasks have priority
381 root 1.2 Event->io (fd => BDB::poll_fileno,
382 root 1.1 poll => 'r', nice => 1,
383 root 1.2 cb => &BDB::poll_cb);
384 root 1.1
385 root 1.2 =item BDB::poll_wait
386 root 1.1
387     If there are any outstanding requests and none of them in the result
388     phase, wait till the result filehandle becomes ready for reading (simply
389     does a C<select> on the filehandle. This is useful if you want to
390     synchronously wait for some requests to finish).
391    
392     See C<nreqs> for an example.
393    
394 root 1.2 =item BDB::poll
395 root 1.1
396     Waits until some requests have been handled.
397    
398     Returns the number of requests processed, but is otherwise strictly
399     equivalent to:
400    
401 root 1.2 BDB::poll_wait, BDB::poll_cb
402 root 1.1
403 root 1.2 =item BDB::flush
404 root 1.1
405     Wait till all outstanding AIO requests have been handled.
406    
407     Strictly equivalent to:
408    
409 root 1.2 BDB::poll_wait, BDB::poll_cb
410     while BDB::nreqs;
411 root 1.1
412 root 1.8 =back
413    
414 root 1.1 =head3 CONTROLLING THE NUMBER OF THREADS
415    
416 root 1.8 =over 4
417    
418 root 1.2 =item BDB::min_parallel $nthreads
419 root 1.1
420     Set the minimum number of AIO threads to C<$nthreads>. The current
421     default is C<8>, which means eight asynchronous operations can execute
422     concurrently at any one time (the number of outstanding requests,
423     however, is unlimited).
424    
425 root 1.2 BDB starts threads only on demand, when an AIO request is queued and
426 root 1.1 no free thread exists. Please note that queueing up a hundred requests can
427     create demand for a hundred threads, even if it turns out that everything
428     is in the cache and could have been processed faster by a single thread.
429    
430     It is recommended to keep the number of threads relatively low, as some
431     Linux kernel versions will scale negatively with the number of threads
432     (higher parallelity => MUCH higher latency). With current Linux 2.6
433     versions, 4-32 threads should be fine.
434    
435     Under most circumstances you don't need to call this function, as the
436     module selects a default that is suitable for low to moderate load.
437    
438 root 1.2 =item BDB::max_parallel $nthreads
439 root 1.1
440     Sets the maximum number of AIO threads to C<$nthreads>. If more than the
441     specified number of threads are currently running, this function kills
442     them. This function blocks until the limit is reached.
443    
444     While C<$nthreads> are zero, aio requests get queued but not executed
445     until the number of threads has been increased again.
446    
447     This module automatically runs C<max_parallel 0> at program end, to ensure
448     that all threads are killed and that there are no outstanding requests.
449    
450     Under normal circumstances you don't need to call this function.
451    
452 root 1.2 =item BDB::max_idle $nthreads
453 root 1.1
454     Limit the number of threads (default: 4) that are allowed to idle (i.e.,
455     threads that did not get a request to process within 10 seconds). That
456     means if a thread becomes idle while C<$nthreads> other threads are also
457     idle, it will free its resources and exit.
458    
459     This is useful when you allow a large number of threads (e.g. 100 or 1000)
460     to allow for extremely high load situations, but want to free resources
461     under normal circumstances (1000 threads can easily consume 30MB of RAM).
462    
463     The default is probably ok in most situations, especially if thread
464     creation is fast. If thread creation is very slow on your system you might
465     want to use larger values.
466    
467 root 1.2 =item $oldmaxreqs = BDB::max_outstanding $maxreqs
468 root 1.1
469     This is a very bad function to use in interactive programs because it
470     blocks, and a bad way to reduce concurrency because it is inexact: Better
471     use an C<aio_group> together with a feed callback.
472    
473     Sets the maximum number of outstanding requests to C<$nreqs>. If you
474     to queue up more than this number of requests, the next call to the
475     C<poll_cb> (and C<poll_some> and other functions calling C<poll_cb>)
476     function will block until the limit is no longer exceeded.
477    
478     The default value is very large, so there is no practical limit on the
479     number of outstanding requests.
480    
481     You can still queue as many requests as you want. Therefore,
482     C<max_oustsanding> is mainly useful in simple scripts (with low values) or
483     as a stop gap to shield against fatal memory overflow (with large values).
484    
485 root 1.3 =item BDB::set_sync_prepare $cb
486    
487     Sets a callback that is called whenever a request is created without an
488     explicit callback. It has to return two code references. The first is used
489     as the request callback, and the second is called to wait until the first
490     callback has been called. The default implementation works like this:
491    
492     sub {
493     my $status;
494     (
495     sub { $status = $! },
496     sub { BDB::poll while !defined $status; $! = $status },
497     )
498     }
499    
500     =back
501    
502 root 1.1 =head3 STATISTICAL INFORMATION
503    
504 root 1.3 =over 4
505    
506 root 1.2 =item BDB::nreqs
507 root 1.1
508     Returns the number of requests currently in the ready, execute or pending
509     states (i.e. for which their callback has not been invoked yet).
510    
511     Example: wait till there are no outstanding requests anymore:
512    
513 root 1.2 BDB::poll_wait, BDB::poll_cb
514     while BDB::nreqs;
515 root 1.1
516 root 1.2 =item BDB::nready
517 root 1.1
518     Returns the number of requests currently in the ready state (not yet
519     executed).
520    
521 root 1.2 =item BDB::npending
522 root 1.1
523     Returns the number of requests currently in the pending state (executed,
524     but not yet processed by poll_cb).
525    
526     =back
527    
528     =cut
529    
530 root 1.3 set_sync_prepare {
531     my $status;
532     (
533     sub {
534     $status = $!;
535     },
536     sub {
537     BDB::poll while !defined $status;
538     $! = $status;
539     },
540     )
541     };
542    
543 root 1.1 min_parallel 8;
544    
545     END { flush }
546    
547     1;
548    
549     =head2 FORK BEHAVIOUR
550    
551     This module should do "the right thing" when the process using it forks:
552    
553     Before the fork, IO::AIO enters a quiescent state where no requests
554     can be added in other threads and no results will be processed. After
555     the fork the parent simply leaves the quiescent state and continues
556     request/result processing, while the child frees the request/result queue
557     (so that the requests started before the fork will only be handled in the
558     parent). Threads will be started on demand until the limit set in the
559     parent process has been reached again.
560    
561     In short: the parent will, after a short pause, continue as if fork had
562     not been called, while the child will act as if IO::AIO has not been used
563     yet.
564    
565     =head2 MEMORY USAGE
566    
567     Per-request usage:
568    
569     Each aio request uses - depending on your architecture - around 100-200
570     bytes of memory. In addition, stat requests need a stat buffer (possibly
571     a few hundred bytes), readdir requires a result buffer and so on. Perl
572     scalars and other data passed into aio requests will also be locked and
573     will consume memory till the request has entered the done state.
574    
575     This is now awfully much, so queuing lots of requests is not usually a
576     problem.
577    
578     Per-thread usage:
579    
580     In the execution phase, some aio requests require more memory for
581     temporary buffers, and each thread requires a stack and other data
582     structures (usually around 16k-128k, depending on the OS).
583    
584     =head1 KNOWN BUGS
585    
586     Known bugs will be fixed in the next release.
587    
588     =head1 SEE ALSO
589    
590     L<Coro::AIO>.
591    
592     =head1 AUTHOR
593    
594     Marc Lehmann <schmorp@schmorp.de>
595     http://home.schmorp.de/
596    
597     =cut
598