Age | Commit message (Collapse) | Author | Files | Lines |
|
This skips update of the __db_sequence_number__ record when nothing else has
been written. There are transactions that are just openend and then nothing
is written until transaction_commit is called. This is for instance the case
with registry initialization routines: They start a transaction and only
write somthing when the registry has not been initialized yet.
So this change will skip many db_seqnum bumps and TRANS3_COMMIT roundtrips.
Michael
|
|
I carefully prepared the return value only to "return 0;" at the bottom. :-(
This may well have hit us for instance in the nested cancel case
and produced random errors.
Michael
|
|
The logic bug was that if a record was found in the marshall buffer,
then always the ctdb header of tha last record in the marshall buffer
was returned, and not the ctdb header of the last occurrence of the
requested record.
This is fixed by introducing an additional temporary variable.
Michael
|
|
Michael
|
|
Michael
|
|
Don't treat this as an error but return seqnum 0 instead.
Michael
|
|
For persistent databases, 64bit integer is kept in a special record
__db_sequence_number__. This record is incremented with each completed
transaction.
The retry mechanism for failing TRANS3_COMMIT controls inside the
db_ctdb_transaction_commit() function now relies one a modified
behaviour of ctdbd's treatment of persistent databases in recoveries.
Recently, a special treatment for persistent databases had been
introduced in ctdb (1.0.108) to work around the problems with the
orinal design of persistent transactions.
Now with the rewrite we need to revert to the old behaviour that
ctdb always takes the newest copies of all records.
This change also paves the way for a next step, which will make
recovery use the db seqnum to tell which node has the newest copy
of a persistent db and use that node's copy. This will greatly
reduce the amount of data transferred with each recovery.
Michael
|
|
The return values calculated by the callers were wrong anyways since
the new marshalling code does not set the local tdbs tdb error code.
Michael
|
|
Michael
|
|
This simplifies the transaction code a lot:
* transaction_start essentially consists of acquiring a global lock.
* No write operations at all are performed on the local database
until the transaction is committed: Every store operation is just
going into the marshall buffer.
* The commit operation calls a new simplified TRANS3_COMMIT control
in ctdb which rolls out thae changes to all nodes including the
node that is performing the transaction.
Michael
|
|
|
|
This is the basis to implement global locks in ctdb without depending on a
shared file system. The initial goal is to make ctdb persistent transactions
deterministic without too many timeouts.
|
|
This changes the meaning of the ->prev pointer in our doubly linked
lists to point at the end of the list from the front of the list. That
allows us to implement DLIST_ADD_END() and related functions in O(1)
time, which can be a huge saving in many places in Samba.
This also means that the 'type' argument to various DLIST_*() macros
is no longer needed, but I have left it in for now to keep the
patchset small, which will make it easier to revert if any problems
are found. In the future we should remove the 'type' arguments.
(jra. Move the one use of DLIST_TAIL over to the new macros).
|
|
we don't need a separate lru pointer any more
(cherry picked from commit 4ffd7aca3e38728077bd80c2a65c4efbcfd216fc)
|
|
(cherry picked from commit a7d8bfd373392eecf4fff33d39b85e1b55ad901d)
|
|
uses of (list)->prev are moved over to DLIST_PREV. This will be replaced
when the final (new) version of the dlinklist.h header is added.
Jeremy.
|
|
tevent ensures that a timed event is only called once. The old events
code relied on the called handler removing the event itself. If the
handler removed the event after calling a function which invoked the
event loop then the timed event could loop forever.
This change makes the two timed event systems more compatible, by
allowing the handler to free the te if it wants to, but ensuring it is
off the linked list of events before the handler is called, and
ensuring it is freed even if the handler doesn't free it.
|
|
These have been replaced with the min timeout in blocking.c
|
|
respond to a read or write.
Only works on Linux kernels 2.6.26 and above. Grants CAP_KILL capability
to allow Linux threads under different euids to send signals to each other.
Jeremy.
|
|
|
|
This reverts commit dff03b61fd5d923562711b38cc7dbe996dc07283.
|
|
|
|
|
|
|
|
All but one call were pointless, so I think this API should go
|
|
|
|
|
|
|
|
|
|
|
|
metze
Signed-off-by: Stefan Metzmacher <metze@samba.org>
(cherry picked from commit c992127f8a96c37940a6d298c7c6859c47f83d9b)
|
|
|
|
Bjoern, please check.
Guenther
|
|
we already get them from lib/util/time.h
|
|
|
|
|
|
we have nt_time_equal doing the same in lib/util/
|
|
|
|
Thanks to Dan Cox for initial patch for 3.0. This closes #2350.
|
|
this also fixes an error in NTSTATUS handling
|
|
this is in preparation for other preallocation methods to be introduced.
|
|
If the check fails we try to clear the tdb and start
with an empty cache.
metze
|
|
metze
|
|
This is to cope with timeouts when recoveries and transactions collide.
Maybe 100 is too hight, but 10 or even 20 have been too low in a
very busy environment.
Michael
|
|
so that it is correctly handled by recoveries.
Also set the dmaster explicitly.
Michael
|
|
|
|
|
|
right. The previous bugs were due to the fact that get_nt_acl_internal()
could return an NTSTATUS error if there was no stored ACL blob, but
otherwise would return the underlying ACL from the filysystem. Fix
this so it always returns a valid acl if it can, and if it does not
its an error to be reported back to the client. This then changes
the inherit acl code. Previously we were trying to match Windows
by setting a minimal ACL on a new file that didn't inherit anything
from a parent directory. This is silly - the returned ACL wouldn't
match the underlying UNIX permissions. The current code will correctly
inherit from a parent if a parent has any inheritable ACE entries
that apply to the new object, but will return a mapping from the
underlying UNIX permissions if the parent has no inheritable entries.
This makes much more sense for new files/directories.
Jeremy.
|
|
posix_fallocate is more efficient than manual zero'ing the file. When
preallocation in kernel space is supported it's extremely fast. Support for
preallocation at fs layer via posix_fallocate and fallocate at kernel site
can be found in Linux kernel 2.6.23/glibc 2.10 with ext4, XFS and OCFS2. Other
systems that I know of which support fast preallocation in kernel space are
AIX 6.1 with JFS2 and recent Solaris versions with ZFS maybe UFS2, too.
People who have a system with preallocation in kernel space might want to set
"strict allocate = yes". This reduces file fragentation and it's also safer for
setups with quota being turned on.
As of today most systems still don't have preallocation in kernel space, and
that's why "strict allocate = no" will stay the default for now.
|
|
|