Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds two lock functions used by CTDB to perform asynchronous
locking. These functions do not actually perform any fcntl operations,
but only increment internal counters.
- tdb_transaction_write_lock_mark()
- tdb_transaction_write_lock_unmark()
It also exposes two internal functions
- tdb_lock_nonblock()
- tdb_unlock()
These functions are NOT exposed in include/tdb.h to prevent any further
uses of these functions. If you ever need to use these functions, consider
using tdb2.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
|
|
Autobuild-User: Jelmer Vernooij <jelmer@samba.org>
Autobuild-Date: Thu Oct 21 11:47:22 UTC 2010 on sn-devel-104
|
|
We saw tdb_lockall() take 71 seconds under heavy load; this is because Linux
(at least) doesn't prevent new small locks being obtained while we're waiting
for a big log.
The workaround is to do divide and conquer using non-blocking chainlocks: if
we get down to a single chain we block. Using a simple test program where
children did "hold lock for 100ms, sleep for 1 second" the time to do
tdb_lockall() dropped signifiantly. There are ln(hashsize) locks taken in
the contended case, but that's slow anyway.
More analysis is given in my blog at http://rusty.ozlabs.org/?p=120
This may also help transactions, though in that case it's the initial
read lock which uses this gradual locking routine; the update-to-write-lock
code is separate and still tries to update in one go.
Even though ABI doesn't change, minor version bumped so behavior change
can be easily detected.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
|
|
|
|
tdb transactions were designed to be robust against the machine
powering off, but interestingly were never designed to handle the case
where an administrator kill -9's a process during commit. Because
recovery is only done on tdb_open, processes with the tdb already
mapped will simply use it despite it being corrupt and needing
recovery.
The solution to this is to check for recovery every time we grab a
data lock: we could have gained the lock because a process just died.
This has no measurable cost: here is the time for tdbtorture -s 0 -n 1
-l 10000:
Before:
2.75 2.50 2.81 3.19 2.91 2.53 2.72 2.50 2.78 2.77 = Avg 2.75
After:
2.81 2.57 3.42 2.49 3.02 2.49 2.84 2.48 2.80 2.43 = Avg 2.74
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
|
|
Now the transaction code uses the standard allrecord lock, that stops
us from trying to grab any per-record locks anyway. We don't need to
have special noop lock ops for transactions.
This is a nice simplification: if you see brlock, you know it's really
going to grab a lock.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
tdb_release_extra_locks() is too general: it carefully skips over the
transaction lock, even though the only caller then drops it. Change
this, and rename it to show it's clearly transaction-specific.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Centralize locking of all chains of the tdb; rename _tdb_lockall to
tdb_allrecord_lock and _tdb_unlockall to tdb_allrecord_unlock, and
tdb_brlock_upgrade to tdb_allrecord_upgrade.
Then we use this in the transaction code. Unfortunately, if the transaction
code records that it has grabbed the allrecord lock read-only, write locks
will fail, so we treat this upgradable lock as a write lock, and mark it
as upgradable using the otherwise-unused offset field.
One subtlety: now the transaction code is using the allrecord_lock, the
tdb_release_extra_locks() function drops it for us, so we no longer need
to do it manually in _tdb_transaction_cancel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Records themselves get (read) locked by the traversal code against delete.
Interestingly, this locking isn't done when the allrecord lock has been
taken, though the allrecord lock until recently didn't cover the actual
records (it now goes to end of file).
The write record lock, grabbed by the delete code, is not suppressed
by the allrecord lock. This is now bad: it causes us to punch a hole
in the allrecord lock when we release the write record lock. Make this
consistent: *no* record locks of any kind when the allrecord lock is
taken.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
We were previously inconsistent with our "global" lock: the
transaction code grabbed it from FREELIST_TOP to end of file, and the
rest of the code grabbed it from FREELIST_TOP to end of the hash
chains. Change it to always grab to end of file for simplicity and
so we can merge the two.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
This was redundant before this patch series: it mirrored num_lockrecs
exactly. It still does.
Also, skip useless branch when locks == 1: unconditional assignment is
cheaper anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Use our newly-generic nested lock tracking for the active lock.
Note that the tdb_have_extra_locks() and tdb_release_extra_locks()
functions have to skip over this lock now it is tracked.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
This never nests, so it's overkill, but it centralizes the locking into
lock.c and removes the ugly flag in the transaction code to track whether
we have the lock or not.
Note that we have a temporary hack so this places a real lock, despite
the fact that we are in a transaction.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Rather than a boutique lock and a separate nest count, use our
newly-generic nested lock tracking for the transaction lock.
Note that the tdb_have_extra_locks() and tdb_release_extra_locks()
functions have to skip over this lock now it is tracked.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Factor out two loops which find locks; we are going to introduce a couple
more so a helper makes sense.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Move locking intelligence back into lock.c, rather than open-coding the
lock release in transaction.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
In many places we check whether locks are held: add a helper to do this.
The _tdb_lockall() case has already checked for the allrecord lock, so
the extra work done by tdb_have_extra_locks() is merely redundant.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
tdb_transaction_lock() and tdb_transaction_unlock() do nothing if we
hold the allrecord lock. However, the two locks don't overlap, so
this is wrong.
This simplification makes the transaction lock a straight-forward nested
lock.
There are two callers for these functions:
1) The transaction code, which already makes sure the allrecord_lock
isn't held.
2) The traverse code, which wants to stop transactions whether it has the
allrecord lock or not. There have been deadlocks here before, however
this should not bring them back (I hope!)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Because fcntl locks don't nest, we track them in the tdb->lockrecs array
and only place/release them when the count goes to 1/0. We only do this
for record locks, so we simply place the list number (or -1 for the free
list) in the structure.
To generalize this:
1) Put the offset rather than list number in struct tdb_lock_type.
2) Rename _tdb_lock() to tdb_nest_lock, make it non-static and move the
allrecord check out to the callers (except the mark case which doesn't
care).
3) Rename _tdb_unlock() to tdb_nest_unlock(), make it non-static and
move the allrecord out to the callers (except mark again).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
The word global is overloaded in tdb. The global_lock inside struct
tdb_context is used to indicate we hold a lock across all the chains.
Rename it to allrecord_lock.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
This is taken from the CCAN code base: rather than using tdb_brlock for
locking and unlocking, we split it into brlock and brunlock functions.
For extra debugging information, brunlock says what kind of lock it is
unlocking (even though fnctl locks don't need this). This requires an
extra argument to tdb_transaction_unlock() so we know whether the
lock was upgraded to a write lock or not.
We also use a "flags" argument tdb_brlock:
1) TDB_LOCK_NOWAIT replaces lck_type = F_SETLK (vs F_SETLKW).
2) TDB_LOCK_MARK_ONLY replaces setting TDB_MARK_LOCK bit in ltype.
3) TDB_LOCK_PROBE replaces the "probe" argument.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
It was a regrettable hack which I used to reduce line count in tdb; in fact it caused confusion as can be seen in this patch.
In particular, ecode now needs to be set before TDB_LOG anyway, and having it exposed in
the header is useless (the struct tdb_context isn't defined, so it's doubly useless).
Also, we should never set errno, as io.c was doing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
When TDB_TRACE is defined (in tdb_private.h), verbose tracing of tdb operations is enabled.
This can be replayed using "replay_trace" from http://ccan.ozlabs.org/info/tdb.
The majority of this patch comes from moving internal functions to _<funcname> to
avoid double-tracing. There should be no additional overhead for the normal (!TDB_TRACE)
case.
Note that the verbose traces compress really well with rzip.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
54a51839ea65aa788b18fce8de0ae4f9ba63e4e7 "Make tdb transaction lock
recursive (samba version)" was broken: I "cleaned it up" and prevented
it from ever unlocking.
To see the problem:
$ bin/tdbtorture -s 1248142523
tdb_brlock failed (fd=3) at offset 8 rw_type=1 lck_type=14 len=1
tdb_transaction_lock: failed to get transaction lock
tdb_transaction_start failed: Resource deadlock avoided
My testcase relied on the *count* being correct, which it was. Fixing that
now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Adam <obnox@samba.org>
|
|
This patch replaces 6ed27edbcd3ba1893636a8072c8d7a621437daf7 and
1a416ff13ca7786f2e8d24c66addf00883e9cb12, which fixed the bug where traversals
inside transactions would release the transaction lock early.
This solution is more general, and solves the more minor symptom that nested
traversals would also release the transaction lock early. (It was also suggestd in
Volker's comment in 6ed27ed).
This patch also applies to ctdb, if the traverse.c part is removed (ctdb's tdb
code never received the previous two fixes).
Tested using the testsuite from ccan (adapted to the samba code). Thanks to
Michael Adam for feedback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Adam <obnox@samba.org>
|
|
|