=head1 NAME BDB - Asynchronous Berkeley DB access =head1 SYNOPSIS use BDB; my $env = db_env_create; mkdir "bdtest", 0700; db_env_open $env, "bdtest", BDB::INIT_LOCK | BDB::INIT_LOG | BDB::INIT_MPOOL | BDB::INIT_TXN | BDB::RECOVER | BDB::USE_ENVIRON | BDB::CREATE, 0600; $env->set_flags (BDB::AUTO_COMMIT | BDB::TXN_NOSYNC, 1); my $db = db_create $env; db_open $db, undef, "table", undef, BDB::BTREE, BDB::AUTO_COMMIT | BDB::CREATE | BDB::READ_UNCOMMITTED, 0600; db_put $db, undef, "key", "data", 0, sub { db_del $db, undef, "key"; }; db_sync $db; # when you also use Coro, management is easy: use Coro::BDB; # automatic event loop integration with AnyEvent: use AnyEvent::BDB; # automatic result processing with EV: my $WATCHER = EV::io BDB::poll_fileno, EV::READ, \&BDB::poll_cb; # with Glib: add_watch Glib::IO BDB::poll_fileno, in => sub { BDB::poll_cb; 1 }; # or simply flush manually BDB::flush; =head1 DESCRIPTION See the BerkeleyDB documentation (L). The BDB API is very similar to the C API (the translation has been very faithful). See also the example sections in the document below and possibly the eg/ subdirectory of the BDB distribution. Last not least see the IO::AIO documentation, as that module uses almost the same asynchronous request model as this module. I know this is woefully inadequate documentation. Send a patch! =head1 REQUEST ANATOMY AND LIFETIME Every request method creates a request. which is a C data structure not directly visible to Perl. During their existance, bdb requests travel through the following states, in order: =over 4 =item ready Immediately after a request is created it is put into the ready state, waiting for a thread to execute it. =item execute A thread has accepted the request for processing and is currently executing it (e.g. blocking in read). =item pending The request has been executed and is waiting for result processing. While request submission and execution is fully asynchronous, result processing is not and relies on the perl interpreter calling C (or another function with the same effect). =item result The request results are processed synchronously by C. The C function will process all outstanding aio requests by calling their callbacks, freeing memory associated with them and managing any groups they are contained in. =item done Request has reached the end of its lifetime and holds no resources anymore (except possibly for the Perl object, but its connection to the actual aio request is severed and calling its methods will either do nothing or result in a runtime error). =back =cut package BDB; use common::sense; use base 'Exporter'; our $VERSION; BEGIN { $VERSION = '1.91'; our @BDB_REQ = qw( db_env_open db_env_close db_env_txn_checkpoint db_env_lock_detect db_env_memp_sync db_env_memp_trickle db_env_dbrename db_env_dbremove db_env_log_archive db_env_reset_lsn db_open db_close db_compact db_sync db_verify db_upgrade db_put db_exists db_get db_pget db_del db_key_range db_txn_commit db_txn_abort db_txn_finish db_c_close db_c_count db_c_put db_c_get db_c_pget db_c_del db_sequence_open db_sequence_close db_sequence_get db_sequence_remove ); our @EXPORT = (@BDB_REQ, qw(dbreq_pri dbreq_nice db_env_create db_create)); our @EXPORT_OK = qw( poll_fileno poll_cb poll_wait flush min_parallel max_parallel max_idle nreqs nready npending nthreads max_poll_time max_poll_reqs ); require XSLoader; XSLoader::load ("BDB", $VERSION); } =head1 BERKELEYDB FUNCTIONS All of these are functions. The create functions simply return a new object and never block. All the remaining functions take an optional callback as last argument. If it is missing, then the function will be executed synchronously. In both cases, C<$!> will reflect the return value of the function. BDB functions that cannot block (mostly functions that manipulate settings) are method calls on the relevant objects, so the rule of thumb is: if it's a method, it's not blocking, if it's a function, it takes a callback as last argument. In the following, C<$int> signifies an integer return value, C is a "filename" (octets on unix, madness on windows), C is an unsigned 32 bit integer, C is some integer, C is a floating point value. Most C types are generic perl scalars (for input and output of data values). The various C etc. arguments are handles return by C, C, C and so on. If they have an appended C<_ornull> this means they are optional and you can pass C for them, resulting a NULL pointer on the C level. The C is the optional callback function to call when the request is completed. This last callback argument is special: the callback is simply the last argument passed. If there are "optional" arguments before the callback they can be left out. The callback itself can be left out or specified as C, in which case the function will be executed synchronously. For example, C usually is called with all integer arguments zero. These can be left out, so all of these specify a call to C<< DB_ENV->txn_checkpoint >>, to be executed asynchronously with a callback to be called: db_env_txn_checkpoint $db_env, 0, 0, 0, sub { }; db_env_txn_checkpoint $db_env, 0, 0, sub { }; db_env_txn_checkpoint $db_env, sub { }; While these all specify a call to C<< DB_ENV->txn_checkpoint >> to be executed synchronously: db_env_txn_checkpoint $db_env, 0, 0, 0, undef; db_env_txn_checkpoint $db_env, 0, 0, 0; db_env_txn_checkpoint $db_env, 0; =head2 BDB functions Functions in the BDB namespace, exported by default: $env = db_env_create (U32 env_flags = 0) flags: RPCCLIENT db_env_open (DB_ENV *env, bdb_filename db_home, U32 open_flags, int mode, SV *callback = 0) open_flags: INIT_CDB INIT_LOCK INIT_LOG INIT_MPOOL INIT_REP INIT_TXN RECOVER RECOVER_FATAL USE_ENVIRON USE_ENVIRON_ROOT CREATE LOCKDOWN PRIVATE REGISTER SYSTEM_MEM db_env_close (DB_ENV *env, U32 flags = 0, SV *callback = 0) db_env_txn_checkpoint (DB_ENV *env, U32 kbyte = 0, U32 min = 0, U32 flags = 0, SV *callback = 0) flags: FORCE db_env_lock_detect (DB_ENV *env, U32 flags = 0, U32 atype = DB_LOCK_DEFAULT, SV *dummy = 0, SV *callback = 0) atype: LOCK_DEFAULT LOCK_EXPIRE LOCK_MAXLOCKS LOCK_MAXWRITE LOCK_MINLOCKS LOCK_MINWRITE LOCK_OLDEST LOCK_RANDOM LOCK_YOUNGEST db_env_memp_sync (DB_ENV *env, SV *dummy = 0, SV *callback = 0) db_env_memp_trickle (DB_ENV *env, int percent, SV *dummy = 0, SV *callback = 0) db_env_dbremove (DB_ENV *env, DB_TXN_ornull *txnid, bdb_filename file, bdb_filename database, U32 flags = 0, SV *callback = 0) db_env_dbrename (DB_ENV *env, DB_TXN_ornull *txnid, bdb_filename file, bdb_filename database, bdb_filename newname, U32 flags = 0, SV *callback = 0) db_env_log_archive (DB_ENV *env, SV *listp, U32 flags = 0, SV *callback = 0) db_env_lsn_reset (DB_ENV *env, bdb_filename db, U32 flags = 0, SV *callback = 0) $db = db_create (DB_ENV *env = 0, U32 flags = 0) flags: XA_CREATE db_open (DB *db, DB_TXN_ornull *txnid, bdb_filename file, bdb_filename database, int type, U32 flags, int mode, SV *callback = 0) flags: AUTO_COMMIT CREATE EXCL MULTIVERSION NOMMAP RDONLY READ_UNCOMMITTED THREAD TRUNCATE db_close (DB *db, U32 flags = 0, SV *callback = 0) flags: DB_NOSYNC db_verify (DB *db, bdb_filename file, bdb_filename database = 0, SV *dummy = 0, U32 flags = 0, SV *callback = 0) db_upgrade (DB *db, bdb_filename file, U32 flags = 0, SV *callback = 0) db_compact (DB *db, DB_TXN_ornull *txn = 0, SV *start = 0, SV *stop = 0, SV *unused1 = 0, U32 flags = DB_FREE_SPACE, SV *unused2 = 0, SV *callback = 0) flags: FREELIST_ONLY FREE_SPACE db_sync (DB *db, U32 flags = 0, SV *callback = 0) db_key_range (DB *db, DB_TXN_ornull *txn, SV *key, SV *key_range, U32 flags = 0, SV *callback = 0) db_put (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = 0) flags: APPEND NODUPDATA NOOVERWRITE db_exists (DB *db, DB_TXN_ornull *txn, SV *key, U32 flags = 0, SV *callback = 0) (v4.6) db_get (DB *db, DB_TXN_ornull *txn, SV *key, SV *data, U32 flags = 0, SV *callback = 0) flags: CONSUME CONSUME_WAIT GET_BOTH SET_RECNO MULTIPLE READ_COMMITTED READ_UNCOMMITTED RMW db_pget (DB *db, DB_TXN_ornull *txn, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = 0) flags: CONSUME CONSUME_WAIT GET_BOTH SET_RECNO MULTIPLE READ_COMMITTED READ_UNCOMMITTED RMW db_del (DB *db, DB_TXN_ornull *txn, SV *key, U32 flags = 0, SV *callback = 0) db_txn_commit (DB_TXN *txn, U32 flags = 0, SV *callback = 0) flags: TXN_NOSYNC TXN_SYNC db_txn_abort (DB_TXN *txn, SV *callback = 0) db_c_close (DBC *dbc, SV *callback = 0) db_c_count (DBC *dbc, SV *count, U32 flags = 0, SV *callback = 0) db_c_put (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = 0) flags: AFTER BEFORE CURRENT KEYFIRST KEYLAST NODUPDATA db_c_get (DBC *dbc, SV *key, SV *data, U32 flags = 0, SV *callback = 0) flags: CURRENT FIRST GET_BOTH GET_BOTH_RANGE GET_RECNO JOIN_ITEM LAST NEXT NEXT_DUP NEXT_NODUP PREV PREV_DUP PREV_NODUP SET SET_RANGE SET_RECNO READ_UNCOMMITTED MULTIPLE MULTIPLE_KEY RMW db_c_pget (DBC *dbc, SV *key, SV *pkey, SV *data, U32 flags = 0, SV *callback = 0) db_c_del (DBC *dbc, U32 flags = 0, SV *callback = 0) db_sequence_open (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, SV *key, U32 flags = 0, SV *callback = 0) flags: CREATE EXCL db_sequence_close (DB_SEQUENCE *seq, U32 flags = 0, SV *callback = 0) db_sequence_get (DB_SEQUENCE *seq, DB_TXN_ornull *txnid, int delta, SV *seq_value, U32 flags = DB_TXN_NOSYNC, SV *callback = 0) flags: TXN_NOSYNC db_sequence_remove (DB_SEQUENCE *seq, DB_TXN_ornull *txnid = 0, U32 flags = 0, SV *callback = 0) flags: TXN_NOSYNC =head3 db_txn_finish (DB_TXN *txn, U32 flags = 0, SV *callback = 0) This is not actually a Berkeley DB function but a BDB module extension. The background for this exytension is: It is very annoying to have to check every single BDB function for error returns and provide a codepath out of your transaction. While the BDB module still makes this possible, it contains the following extensions: When a transaction-protected function returns any operating system error (errno > 0), BDB will set the C flag on the transaction. This flag is also set by Berkeley DB functions themselves when an operation fails with LOCK_DEADLOCK, and it causes all further operations on that transaction (including C) to fail. The C request will look at this flag, and, if it is set, will automatically call C (setting errno to C if it isn't set to something else yet). If it isn't set, it will call C and return the error normally. How to use this? Easy: just write your transaction normally: my $txn = $db_env->txn_begin; db_get $db, $txn, "key", my $data; db_put $db, $txn, "key", $data + 1 unless $! == BDB::NOTFOUND; db_txn_finish $txn; die "transaction failed" if $!; That is, handle only the expected errors. If something unexpected happens (EIO, LOCK_NOTGRANTED or a deadlock in either db_get or db_put), then the remaining requests (db_put in this case) will simply be skipped (they will fail with LOCK_DEADLOCK) and the transaction will be aborted. You can use the C<< $txn->failed >> method to check wether a transaction has failed in this way and abort further processing (excluding C). =head2 DB_ENV/database environment methods Methods available on DB_ENV/$env handles: DESTROY (DB_ENV_ornull *env) CODE: if (env) env->close (env, 0); $int = $env->set_data_dir (const char *dir) $int = $env->set_tmp_dir (const char *dir) $int = $env->set_lg_dir (const char *dir) $int = $env->set_shm_key (long shm_key) $int = $env->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0) $int = $env->set_flags (U32 flags, int onoff = 1) $int = $env->log_set_config (U32 flags, int onoff = 1) (v4.7) $int = $env->set_intermediate_dir_mode (const char *modestring) (v4.7) $env->set_errfile (FILE *errfile = 0) $env->set_msgfile (FILE *msgfile = 0) $int = $env->set_verbose (U32 which, int onoff = 1) $int = $env->set_encrypt (const char *password, U32 flags = 0) $int = $env->set_timeout (NV timeout_seconds, U32 flags = SET_TXN_TIMEOUT) $int = $env->set_mp_max_openfd (int maxopenfd); $int = $env->set_mp_max_write (int maxwrite, int maxwrite_sleep); $int = $env->set_mp_mmapsize (int mmapsize_mb) $int = $env->set_lk_detect (U32 detect = DB_LOCK_DEFAULT) $int = $env->set_lk_max_lockers (U32 max) $int = $env->set_lk_max_locks (U32 max) $int = $env->set_lk_max_objects (U32 max) $int = $env->set_lg_bsize (U32 max) $int = $env->set_lg_max (U32 max) $int = $env->mutex_set_increment (U32 increment) $int = $env->mutex_set_tas_spins (U32 tas_spins) $int = $env->mutex_set_max (U32 max) $int = $env->mutex_set_align (U32 align) $txn = $env->txn_begin (DB_TXN_ornull *parent = 0, U32 flags = 0) flags: READ_COMMITTED READ_UNCOMMITTED TXN_NOSYNC TXN_NOWAIT TXN_SNAPSHOT TXN_SYNC TXN_WAIT TXN_WRITE_NOSYNC $txn = $env->cdsgroup_begin; (v4.5) =head3 Example: use AnyEvent; use BDB; our $FH; open $FH, "<&=" . BDB::poll_fileno; our $WATCHER = AnyEvent->io (fh => $FH, poll => 'r', cb => \&BDB::poll_cb); BDB::min_parallel 8; my $env = db_env_create; mkdir "bdtest", 0700; db_env_open $env, "bdtest", BDB::INIT_LOCK | BDB::INIT_LOG | BDB::INIT_MPOOL | BDB::INIT_TXN | BDB::RECOVER | BDB::USE_ENVIRON | BDB::CREATE, 0600; $env->set_flags (BDB::AUTO_COMMIT | BDB::TXN_NOSYNC, 1); =head2 DB/database methods Methods available on DB/$db handles: DESTROY (DB_ornull *db) CODE: if (db) { SV *env = (SV *)db->app_private; db->close (db, 0); SvREFCNT_dec (env); } $int = $db->set_cachesize (U32 gbytes, U32 bytes, int ncache = 0) $int = $db->set_flags (U32 flags) flags: CHKSUM ENCRYPT TXN_NOT_DURABLE Btree: DUP DUPSORT RECNUM REVSPLITOFF Hash: DUP DUPSORT Queue: INORDER Recno: RENUMBER SNAPSHOT $int = $db->set_encrypt (const char *password, U32 flags) $int = $db->set_lorder (int lorder) $int = $db->set_bt_minkey (U32 minkey) $int = $db->set_re_delim (int delim) $int = $db->set_re_pad (int re_pad) $int = $db->set_re_source (char *source) $int = $db->set_re_len (U32 re_len) $int = $db->set_h_ffactor (U32 h_ffactor) $int = $db->set_h_nelem (U32 h_nelem) $int = $db->set_q_extentsize (U32 extentsize) $dbc = $db->cursor (DB_TXN_ornull *txn = 0, U32 flags = 0) flags: READ_COMMITTED READ_UNCOMMITTED WRITECURSOR TXN_SNAPSHOT $seq = $db->sequence (U32 flags = 0) =head3 Example: my $db = db_create $env; db_open $db, undef, "table", undef, BDB::BTREE, BDB::AUTO_COMMIT | BDB::CREATE | BDB::READ_UNCOMMITTED, 0600; for (1..1000) { db_put $db, undef, "key $_", "data $_"; db_key_range $db, undef, "key $_", my $keyrange; my ($lt, $eq, $gt) = @$keyrange; } db_del $db, undef, "key $_" for 1..1000; db_sync $db; =head2 DB_TXN/transaction methods Methods available on DB_TXN/$txn handles: DESTROY (DB_TXN_ornull *txn) CODE: if (txn) txn->abort (txn); $int = $txn->set_timeout (NV timeout_seconds, U32 flags = SET_TXN_TIMEOUT) flags: SET_LOCK_TIMEOUT SET_TXN_TIMEOUT $bool = $txn->failed # see db_txn_finish documentation, above =head2 DBC/cursor methods Methods available on DBC/$dbc handles: DESTROY (DBC_ornull *dbc) CODE: if (dbc) dbc->c_close (dbc); $int = $cursor->set_priority ($priority = PRIORITY_*) (v4.6) =head3 Example: my $c = $db->cursor; for (;;) { db_c_get $c, my $key, my $data, BDB::NEXT; warn "<$!,$key,$data>"; last if $!; } db_c_close $c; =head2 DB_SEQUENCE/sequence methods Methods available on DB_SEQUENCE/$seq handles: DESTROY (DB_SEQUENCE_ornull *seq) CODE: if (seq) seq->close (seq, 0); $int = $seq->initial_value (db_seq_t value) $int = $seq->set_cachesize (U32 size) $int = $seq->set_flags (U32 flags) flags: SEQ_DEC SEQ_INC SEQ_WRAP $int = $seq->set_range (db_seq_t min, db_seq_t max) =head3 Example: my $seq = $db->sequence; db_sequence_open $seq, undef, "seq", BDB::CREATE; db_sequence_get $seq, undef, 1, my $value; =head1 SUPPORT FUNCTIONS =head2 EVENT PROCESSING AND EVENT LOOP INTEGRATION =over 4 =item $msg = BDB::strerror [$errno] Returns the string corresponding to the given errno value. If no argument is given, use C<$!>. Note that the BDB module also patches the C<$!> variable directly, so you should be able to get a bdb error string by simply stringifying C<$!>. =item $fileno = BDB::poll_fileno Return the I. This filehandle must be polled for reading by some mechanism outside this module (e.g. Event or select, see below or the SYNOPSIS). If the pipe becomes readable you have to call C to check the results. See C for an example. =item BDB::poll_cb Process some outstanding events on the result pipe. You have to call this regularly. Returns the number of events processed. Returns immediately when no events are outstanding. The amount of events processed depends on the settings of C and C. If not all requests were processed for whatever reason, the filehandle will still be ready when C returns. Example: Install an Event watcher that automatically calls BDB::poll_cb with high priority: Event->io (fd => BDB::poll_fileno, poll => 'r', async => 1, cb => \&BDB::poll_cb); =item BDB::max_poll_reqs $nreqs =item BDB::max_poll_time $seconds These set the maximum number of requests (default C<0>, meaning infinity) that are being processed by C in one call, respectively the maximum amount of time (default C<0>, meaning infinity) spent in C to process requests (more correctly the mininum amount of time C is allowed to use). Setting C to a non-zero value creates an overhead of one syscall per request processed, which is not normally a problem unless your callbacks are really really fast or your OS is really really slow (I am not mentioning Solaris here). Using C incurs no overhead. Setting these is useful if you want to ensure some level of interactiveness when perl is not fast enough to process all requests in time. For interactive programs, values such as C<0.01> to C<0.1> should be fine. Example: Install an EV watcher that automatically calls BDB::poll_cb with low priority, to ensure that other parts of the program get the CPU sometimes even under high load. # try not to spend much more than 0.1s in poll_cb BDB::max_poll_time 0.1; my $bdb_poll = EV::io BDB::poll_fileno, EV::READ, \&BDB::poll_cb); =item BDB::poll_wait If there are any outstanding requests and none of them in the result phase, wait till the result filehandle becomes ready for reading (simply does a C