summaryrefslogtreecommitdiff
path: root/lib/tdb/common/transaction.c
AgeCommit message (Collapse)AuthorFilesLines
2011-12-21tdb: don't free old recovery area when expanding if already at EOF.Rusty Russell1-17/+30
We allocate a new recovery area by expanding the file. But if the recovery area is already at the end of file (as shown in at least one client case), we can simply expand the record, rather than freeing it and creating a new one. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Autobuild-User: Rusty Russell <rusty@rustcorp.com.au> Autobuild-Date: Wed Dec 21 06:25:40 CET 2011 on sn-devel-104
2011-12-21tdb: use same expansion factor logic when expanding for new recovery area.Rusty Russell1-1/+5
If we're expanding because the current recovery area is too small, we expand only the amount we need. This can quickly lead to exponential growth when we have a slowly-expanding record (hence a slowly-expanding transaction size). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-12-19tdb: be more careful on 4G files.Rusty Russell1-5/+6
I came across a tdb which had wrapped to 4G + 4K, and the contents had been destroyed by processes which thought it only 4k long. Fix this by checking on open, and making tdb_oob() check for wrap itself. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Autobuild-User: Rusty Russell <rusty@rustcorp.com.au> Autobuild-Date: Mon Dec 19 07:52:01 CET 2011 on sn-devel-104
2011-04-19tdb: make sure we skip over recovery area correctly.Rusty Russell1-15/+29
If it's really the recovery area, we can trust the rec_len field, and don't have to go groping for bitpatterns. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Autobuild-User: Rusty Russell <rusty@rustcorp.com.au> Autobuild-Date: Tue Apr 19 14:15:22 CEST 2011 on sn-devel-104
2011-04-18tdb: tdb_repack() only when it's worthwhile.Rusty Russell1-6/+31
tdb_repack() is expensive and consumes memory, so we can spend some effort to see if it's worthwhile. In particular, tdbbackup doesn't need to repack: it started with an empty database! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-18tdb: fix transaction recovery area for converted tdbs.Rusty Russell1-2/+4
This is why macros are dangerous; these were converting the pointers, not the things pointed to! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-10-21tdb: Set _PUBLIC_ in C file rather than header files (Debian bug 600898)Jelmer Vernooij1-7/+7
Autobuild-User: Jelmer Vernooij <jelmer@samba.org> Autobuild-Date: Thu Oct 21 11:47:22 UTC 2010 on sn-devel-104
2010-07-01tdb: fix the build on mac os x 10.6.4.Günther Deschner1-0/+4
Guenther
2010-03-26tdb: Add a non-blocking version of tdb_transaction_startVolker Lendecke1-2/+15
2010-03-25Fix some nonempty blank linesVolker Lendecke1-11/+11
2010-02-24tdb: handle processes dying during transaction commit.Rusty Russell1-0/+25
tdb transactions were designed to be robust against the machine powering off, but interestingly were never designed to handle the case where an administrator kill -9's a process during commit. Because recovery is only done on tdb_open, processes with the tdb already mapped will simply use it despite it being corrupt and needing recovery. The solution to this is to check for recovery every time we grab a data lock: we could have gained the lock because a process just died. This has no measurable cost: here is the time for tdbtorture -s 0 -n 1 -l 10000: Before: 2.75 2.50 2.81 3.19 2.91 2.53 2.72 2.50 2.78 2.77 = Avg 2.75 After: 2.81 2.57 3.42 2.49 3.02 2.49 2.84 2.48 2.80 2.43 = Avg 2.74 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24tdb: don't truncate tdb on recoveryRusty Russell1-10/+0
The current recovery code truncates the tdb file on recovery. This is fine if recovery is only done on first open, but is a really bad idea as we move to allowing recovery on "live" databases. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24tdb: remove lock opsRusty Russell1-21/+0
Now the transaction code uses the standard allrecord lock, that stops us from trying to grab any per-record locks anyway. We don't need to have special noop lock ops for transactions. This is a nice simplification: if you see brlock, you know it's really going to grab a lock. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24tdb: rename tdb_release_extra_locks() to tdb_release_transaction_locks()Rusty Russell1-2/+1
tdb_release_extra_locks() is too general: it carefully skips over the transaction lock, even though the only caller then drops it. Change this, and rename it to show it's clearly transaction-specific. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24tdb: cleanup: remove ltype argument from _tdb_transaction_cancel.Rusty Russell1-17/+13
Now the transaction allrecord lock is the standard one, and thus is cleaned in tdb_release_extra_locks(), _tdb_transaction_cancel() doesn't need to know what type it is. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17tdb: tdb_allrecord_lock/tdb_allrecord_unlock/tdb_allrecord_upgradeRusty Russell1-7/+5
Centralize locking of all chains of the tdb; rename _tdb_lockall to tdb_allrecord_lock and _tdb_unlockall to tdb_allrecord_unlock, and tdb_brlock_upgrade to tdb_allrecord_upgrade. Then we use this in the transaction code. Unfortunately, if the transaction code records that it has grabbed the allrecord lock read-only, write locks will fail, so we treat this upgradable lock as a write lock, and mark it as upgradable using the otherwise-unused offset field. One subtlety: now the transaction code is using the allrecord_lock, the tdb_release_extra_locks() function drops it for us, so we no longer need to do it manually in _tdb_transaction_cancel. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-22tdb: use tdb_nest_lock() for open lock.Rusty Russell1-12/+5
This never nests, so it's overkill, but it centralizes the locking into lock.c and removes the ugly flag in the transaction code to track whether we have the lock or not. Note that we have a temporary hack so this places a real lock, despite the fact that we are in a transaction. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24tdb: cleanup: tdb_release_extra_locks() helperRusty Russell1-17/+1
Move locking intelligence back into lock.c, rather than open-coding the lock release in transaction.c. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17tdb: cleanup: tdb_have_extra_locks() helperRusty Russell1-2/+2
In many places we check whether locks are held: add a helper to do this. The _tdb_lockall() case has already checked for the allrecord lock, so the extra work done by tdb_have_extra_locks() is merely redundant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17tdb: cleanup: tdb_nest_lock/tdb_nest_unlockRusty Russell1-1/+1
Because fcntl locks don't nest, we track them in the tdb->lockrecs array and only place/release them when the count goes to 1/0. We only do this for record locks, so we simply place the list number (or -1 for the free list) in the structure. To generalize this: 1) Put the offset rather than list number in struct tdb_lock_type. 2) Rename _tdb_lock() to tdb_nest_lock, make it non-static and move the allrecord check out to the callers (except the mark case which doesn't care). 3) Rename _tdb_unlock() to tdb_nest_unlock(), make it non-static and move the allrecord out to the callers (except mark again). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17tdb: cleanup: rename global_lock to allrecord_lock.Rusty Russell1-5/+5
The word global is overloaded in tdb. The global_lock inside struct tdb_context is used to indicate we hold a lock across all the chains. Rename it to allrecord_lock. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17tdb: cleanup: rename GLOBAL_LOCK to OPEN_LOCK.Rusty Russell1-12/+12
The word global is overloaded in tdb. The GLOBAL_LOCK offset is used at open time to serialize initialization (and by the transaction code to block open). Rename it to OPEN_LOCK. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24tdb: make _tdb_transaction_cancel static.Rusty Russell1-1/+1
Now tdb_open() calls tdb_transaction_cancel() instead of _tdb_transaction_cancel, we can make it static. Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
2010-02-17tdb: cleanup: split brlock and brunlock methods.Rusty Russell1-26/+39
This is taken from the CCAN code base: rather than using tdb_brlock for locking and unlocking, we split it into brlock and brunlock functions. For extra debugging information, brunlock says what kind of lock it is unlocking (even though fnctl locks don't need this). This requires an extra argument to tdb_transaction_unlock() so we know whether the lock was upgraded to a write lock or not. We also use a "flags" argument tdb_brlock: 1) TDB_LOCK_NOWAIT replaces lck_type = F_SETLK (vs F_SETLKW). 2) TDB_LOCK_MARK_ONLY replaces setting TDB_MARK_LOCK bit in ltype. 3) TDB_LOCK_PROBE replaces the "probe" argument. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-13tdb: use fdatasync() instead of fsync() in transactionsAndrew Tridgell1-1/+1
This might help on some filesystems
2010-02-13tdb: Apply some const, just for clarityVolker Lendecke1-1/+1
2010-02-10tdb: fix recovery reuse after crashRusty Russell1-4/+10
If a process (or the machine) dies after just after writing the recovery head (pointing at the end of file), the recovery record will filled with 0x42. This will not invoke a recovery on open, since rec.magic != TDB_RECOVERY_MAGIC. Unfortunately, the first transaction commit will happily reuse that area: tdb_recovery_allocate() doesn't check the magic. The recovery record has length 0x42424242, and it writes that back into the now-valid-looking transaction header) for the next comer (which happens to be tdb_wipe_all in my tests). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-10tdb: give a name to the invalid recovery area constant (0)Rusty Russell1-3/+3
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-01tdb: fix an early release of the global lock that can cause data corruptionVolker Lendecke1-5/+10
There was a bug in tdb where the tdb_brlock(tdb, GLOBAL_LOCK, F_UNLCK, F_SETLKW, 0, 1); (ending the transaction-"mutex") was done before the /* remove the recovery marker */ This means that when a transaction is committed there is a window where another opener of the file sees the transaction marker while the transaction committer is still fully functional and working on it. This led to transaction being rolled back by that second opener of the file while transaction_commit() gave no error to the caller. This patch moves the F_UNLCK to after the recovery marker was removed, closing this window.
2009-11-20tdb: add TDB_DISALLOW_NESTING and make TDB_ALLOW_NESTING the default behaviorStefan Metzmacher1-3/+11
We need to keep TDB_ALLOW_NESTING as default behavior, so that existing code continues to work. However we may change the default together with a major version number change in future. metze
2009-11-20New attempt at TDB transaction nesting allow/disallow.Ronnie Sahlberg1-0/+11
Make the default be that transaction is not allowed and any attempt to create a nested transaction will fail with TDB_ERR_NESTING. If an application can cope with transaction nesting and the implicit semantics of tdb_transaction_commit(), it can enable transaction nesting by using the TDB_ALLOW_NESTING flag. (cherry picked from ctdb commit 3e49e41c21eb8c53084aa8cc7fd3557bdd8eb7b6) Signed-off-by: Stefan Metzmacher <metze@samba.org>
2009-10-23tdb: rename 'struct list_struct' into 'struct tdb_record'Stefan Metzmacher1-6/+6
metze
2009-10-22lib/tdb: wean off TDB_ERRCODE.Rusty Russell1-1/+2
It was a regrettable hack which I used to reduce line count in tdb; in fact it caused confusion as can be seen in this patch. In particular, ecode now needs to be set before TDB_LOG anyway, and having it exposed in the header is useless (the struct tdb_context isn't defined, so it's doubly useless). Also, we should never set errno, as io.c was doing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22lib/tdb: TDB_TRACE support (for developers)Rusty Russell1-20/+35
When TDB_TRACE is defined (in tdb_private.h), verbose tracing of tdb operations is enabled. This can be replayed using "replay_trace" from http://ccan.ozlabs.org/info/tdb. The majority of this patch comes from moving internal functions to _<funcname> to avoid double-tracing. There should be no additional overhead for the normal (!TDB_TRACE) case. Note that the verbose traces compress really well with rzip. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-15tdb: allow reads after prepare commitAndrew Tridgell1-8/+0
We previously only allowed a commit to happen after a prepare commit. It is in fact safe to allow reads between a prepare and a commit, and the s4 replication code can make use of that, so allow it.
2009-06-01auto-repack in transactions that expand the tdbAndrew Tridgell1-0/+12
The idea behind this is to recover from badly fragmented free lists. Choosing the point where the file expands is fairly arbitrary, but seems to work well.
2009-05-28make TDB_NOSYNC affect all the fsync/msync calls in transactionsAndrew Tridgell1-5/+7
During a transaction commit tdb normally uses fsync/msync calls to make it crash safe. This can be disabled using the TDB_NOSYNC flag, but it wasn't disabling all the code paths that caused a fsync/msync.
2009-03-31tdb: Remove unused variableTim Prouty1-1/+0
2009-03-31Add tdb_transaction_prepare_commit()Howard Chu1-52/+124
Using tdb_transaction_prepare_commit() gives us 2-phase commits. This allows us to safely commit across multiple tdb databases at once, with reasonable transaction semantics Signed-off-by: tridge@samba.org
2008-09-17Move common libraries from root to lib/.Jelmer Vernooij1-0/+1119