Age | Commit message (Collapse) | Author | Files | Lines |
|
When a samba server process dies hard, it has no chance to clean up its entries
in locking.tdb, brlock.tdb, connections.tdb and sessionid.tdb.
For locking.tdb and brlock.tdb Samba is robust by checking every time we read
an entry from the database if the corresponding process still exists. If it
does not exist anymore, the entry is deleted. This is not 100% failsafe though:
On systems with a limited PID space there is a non-zero chance that between the
smbd's death and the fresh access, the PID is recycled by another long-running
process. This renders all files that had been locked by the killed smbd
potentially unusable until the new process also dies.
This patch is supposed to fix the problem the following way: Every process ID
in every database is augmented by a random 64-bit number that is stored in a
serverid.tdb. Whenever we need to check if a process still exists we know its
PID and the 64-bit number. We look up the PID in serverid.tdb and compare the
64-bit number. If it's the same, the process still is a valid smbd holding the
lock. If it is different, a new smbd has taken over.
I believe this is safe against an smbd that has died hard and the PID has been
taken over by a non-samba process. This process would not have registered
itself with a fresh 64-bit number in serverid.tdb, so the old one still exists
in serverid.tdb. We protect against this case by the parent smbd taking care of
deregistering PIDs from serverid.tdb and the fact that serverid.tdb is
CLEAR_IF_FIRST.
CLEAR_IF_FIRST does not work in a cluster, so the automatic cleanup does not
work when all smbds are restarted. For this, "net serverid wipe" has to be run
before smbd starts up. As a convenience, "net serverid wipedbs" also cleans up
sessionid.tdb and connections.tdb.
While there, this also cleans up overloading connections.tdb with all the
process entries just for messaging_send_all().
Volker
|
|
The change notify code registered a separate message handler for each
tree connect. This registration uses the global messaging context.
The messaging code would consider a 2nd registration for the same
messaging type as being an 'update' of the handler, rather than a new
handler. It also would only call the first handler in the linked list
for a given message type when dispatching messages.
This patch changes the messaging code to allow for multiple
registrations of the same message type, and allow for multiple calls
to different messaging handler for one incoming message.
This fixes the problem with the test_notify_tcon() test that I
recently committed to the S4 smbtorture
|
|
This prepares the change to use signal events in the tdb backend.
metze
|
|
Leave level 0 messages to higher level callers.
Michael
(This used to be commit 7bbf29137bf051044cbf0db8d9fe564a7c9d7a29)
|
|
metze
(This used to be commit ee6325495f48bab43a37d740a6eca57192004d57)
|
|
bugs in various places whilst doing this (places that assumed
BOOL == int). I also need to fix the Samba4 pidl generation
(next checkin).
Jeremy.
(This used to be commit f35a266b3cbb3e5fa6a86be60f34fe340a3ca71f)
|
|
and receives messages to other nodes... :-)
(This used to be commit 3e9e9a3f28763500a1c5e551a808a14661d7d9fa)
|
|
(This used to be commit b0132e94fc5fef936aa766fb99a306b3628e9f07)
|
|
Jeremy.
(This used to be commit 407e6e695b8366369b7c76af1ff76869b45347b3)
|
|
I'm 100% certain I've forgotten to merge something, but the main code
should be in. It's mainly in dbwrap_ctdb.c, ctdbd_conn.c and
messages_ctdbd.c.
There should be no changes to the non-cluster case, it does survive make
test on my laptop.
It survives some very basic tests with ctdbd enables, I did not do the
full test suite for clusters yet.
Phew...
Volker
(This used to be commit 15553d6327a3aecdd2b0b94a3656d04bf4106323)
|
|
(This used to be commit 0014ee44b87a4a109c897ffec5f9c38eea442571)
|
|
branch, please check if it fulfils your needs.
Two changes: The validation is not done inside the brlock.c traverse_fn,
it's done as a separate routine.
Secondly, this patch does not call the checker routines in smbcontrol
directly but depends on a running smbd.
(This used to be commit 7e39d77c1f90d9025cab08918385d140e20ca25b)
|
|
(This used to be commit 80a1f43825063bbbda896175d99700ede5a4757a)
|
|
This removes message_block / message_unblock. I've talked to Jeremy and
Günther, giving them my reasons why I believe they have no effect.
Neither could come up with a counter-argument, so they go :-)
(This used to be commit a925e0991ffbaea4a533bab3a5d61e5d367d46c8)
|
|
is now
replaced by MSG_FLAG_LOWPRIORITY or'ed into the msg_type. To enable this,
changed the msg_type definitions to hexadecimal.
This way we could theoretically add the MSG_FLAG_NODUPLICATES again, but I
would rather not do this, because that one is racy and can't be guaranteed at
all.
(This used to be commit 3f5eb8a9600839a9f9c44c553f0bda6df22b30b0)
|
|
(This used to be commit 72ed8388252bed07627b3a4636744dc8acf1c98b)
|
|
doing this because for the clustering the marshalling is needed in more
than one place, so I wanted a decent routine to marshall a message_rec
struct which was not there before.
Tridge, this seems about the same speed as it used to be before, the
librpc/ndr overhead in my tests was under the noise.
Volker
(This used to be commit eaefd00563173dfabb7716c5695ac0a2f7139bb6)
|
|
(This used to be commit d3f16722b2c3c68b03e55b5100d979921c3f284d)
|
|
message_send_pid is used anymore. Two users of duplicates_allowed: winbind and
the printer notify system.
I don't thing this really changes semantics: duplicates_allowed is hell racy
anyway, we can't guarantee that we don't send the same message in sequence
twice, and I think the only thing we can harm with the print notify is
performance.
For winbind I talked to Günther, and he did not seem too worried.
Volker
(This used to be commit 75b3ae6a761c33ae6a70799340a1e8c1edea265e)
|
|
replaces
the timeouts on the individual message send calls with an overall timeout on
all the calls.
The timeout in message_send_pid_with_timeout() did not make much sense IMO
anyway, because the tdb_fetch() for the messages_pending_for_pid was blocking
in a readlock anyway, we "just" did the timeout for the write lock.
This new code goes through the full wait for the write lock once and then
breaks out of sending the notifies instead of running into the timeout per
target.
Jerry, please check this!
Thanks,
Volker
(This used to be commit 697099f06e1aa432187f802b9c2632607e3de46e)
|
|
(This used to be commit 782ee7291683d061767688b23f93ac0865e46331)
|
|
(This used to be commit a8082a3c7c3d1e68c27fc3bf42f3d44402cc6f9f)
|
|
(This used to be commit e3d985c581ffc597aea932858d27c421643d2868)
|
|
(This used to be commit cc92ce665dcfe9054d09429219883b18a4cab090)
|
|
(This used to be commit 27224922cf964cc70aad7cf529ab6c03fb277a89)
|
|
(This used to be commit 330946ad2307ca34f0a8d068a0193fcb8a0d6036)
|
|
tomorrow.
(This used to be commit 74fa57ca5d7fa8eace72bbe948a08a0bca3cc4ca)
|
|
locking/locking.c we have to send retry messages to timed lock holders.
The majority of this patch passes a "struct messaging_context" down
there. No functional change, survives make test.
(This used to be commit bbb508414683eeddd2ee0d2d36fe620118180bbb)
|
|
connections_traverse
and connections_forall. This centralizes all the routines that did individual
tdb_open("connections.tdb") and direct tdb_traverse.
Volker
(This used to be commit e43e94cda1ad8876b3cb5d1129080b57fa6ec214)
|
|
patch.
This changes "struct process_id" to "struct server_id", keeping both is
just too much hassle. No functional change (I hope ;-))
Volker
(This used to be commit 0ad4b1226c9d91b72136310d3bbb640d2c5d67b8)
|
|
Jeremy.
(This used to be commit cc4face3bc269afa7af19bde534cacc04f9510a9)
|
|
messages.c. Refactor to use become_root() instead and
make it local to messages.c
Jeremy.
(This used to be commit f3ffb3f98472b69b476b702dfe5c0575b32da018)
|
|
and fix all compiler warnings in the users
metze
(This used to be commit 3a28443079c141a6ce8182c65b56ca210e34f37f)
|
|
(This used to be commit f3421ae4cfa263c0e7a8e934b40342ee9885d239)
|
|
crashed. So
it needs the specific error message.
Make messages.c return NTSTATUS and specificially NT_STATUS_INVALID_HANDLE if
sending to a non-existent process.
Volker
(This used to be commit 3f620d181da0c356c8ffbdb5b380ccab3645a972)
|
|
(This used to be commit 4a99fa266672e2989f9e62baf9090eb45df750ea)
|
|
messaging wrapper
and tdb_wrap_open.
Volker
(This used to be commit c01f164dcaf88fb7f3bed8f69b210ba8fab326d1)
|
|
void message_register(int msg_type,
void (*fn)(int msg_type, struct process_id pid,
- void *buf, size_t len))
+ void *buf, size_t len,
+ void *private_data),
+ void *private_data)
{
struct dispatch_fns *dfn;
So this adds a (so far unused) private pointer that is passed from
message_register to the message handler. A prerequisite to implement a tiny
samba4-API compatible wrapper around our messaging system. That itself is
necessary for the Samba4 notify system.
Yes, I know, I could import the whole Samba4 messaging system, but I want to
do it step by step and I think getting notify in is more important in this
step.
Volker
(This used to be commit c8ae60ed65dcce9660ee39c75488f2838cf9a28b)
|
|
then terminate the traversal once we've done that.
Jeremy.
(This used to be commit da3d0b62340b545950ea2081f30c252fd4609981)
|
|
per type - this is all we use right now and makes
re-entrancy problems with deleting handlers with
a message dispatch loop go away.
Jeremy.
(This used to be commit 2e9b6faeae1474d65a5c12f5a19b0ae045d47c2e)
|
|
in tdb message processing. If we're inside a dispatch
function and we delete our own handler we'd walk onto
the next pointer from a deleted memory block. Fixes
crash bug in winbindd (and goodness knows where else).
Jeremy.
(This used to be commit 27a4c1121404e346432d90b97b518861e038e9f2)
|
|
(This used to be commit 1e4ee728df7eeafc1b4d533240acb032f73b4f5c)
|
|
calls make it :
become_root_uid_only()
operation
unbecome_root_uid_only()
saving errno across the second call. Most of our internal
change calls can be replaced with these simple calls.
Jeremy
(This used to be commit 4143aa83c029848d8ec741d9218b3fa6e3fd28dd)
|
|
fix the messaging code to call the efficient calls :
save_re_uid()
set_effective_uid(0);
messaging_op
restore_re_uid();
instead of using heavyweight become_root()/unbecome_root()
pairs around all messaging code. Fixup the messaging
code to ensure sec_init() is called (only once) so that non-root
processes still work when sending messages.
This is a lighter weight solution to become_root()/unbecome_root()
(which swaps all the supplemental groups) and should be more
efficient. I will migrate all server code over to using this
(a similar technique should be used in the passdb backend
where needed).
Jeremy.
(This used to be commit 4ace291278d9a44f5c577bdd3b282c1231e543df)
|
|
(This used to be commit 736e55101b1e7cc22f043b836be877afbb031edf)
|
|
if len != 0.
Jeremy.
(This used to be commit e99cedfb0cabe3863797c8bd4594ee0826022d2e)
|
|
ensure that global memory is freed when unloading pam_winbind.so (needs more testing on non-linux platforms)
(This used to be commit 1e0b79e591d70352a96e0a0487d8f394dc7b36ba)
|
|
Sync with trunk as off r13315
(This used to be commit 17e63ac4ed8325c0d44fe62b2442449f3298559f)
|
|
* \PIPE\unixinfo
* winbindd's {group,alias}membership new functions
* winbindd's lookupsids() functionality
* swat (trunk changes to be reverted as per discussion with Deryck)
(This used to be commit 939c3cb5d78e3a2236209b296aa8aba8bdce32d3)
|
|
might be
a big relief on messages.tdb contention as ignoring processes with >1000
messages in printing/notify.c should work correctly now.
Jeremy, Jerry told me to ask you about printer scalability torture tests, this
might be a reason why you implemented the message_send_pid_with_timeout
using the signal (shudder) in the first place. :-)
While looking at that... Wouldn't it be better to not use the signal but have
an overall timeout for print_notify_send_messages using GetTimeOfDay & friends
and not use the alarm signal deep inside tdb.c?
Volker
(This used to be commit b5e82bb512d3425839e061e78f4d6fe0bd05b708)
|