DanShearer
November 1997
&author.jelmer;
Samba Architecture
Introduction
This document gives a general overview of how Samba works
internally. The Samba Team has tried to come up with a model which is
the best possible compromise between elegance, portability, security
and the constraints imposed by the very messy SMB and CIFS
protocol.
It also tries to answer some of the frequently asked questions such as:
Is Samba secure when running on Unix? The xyz platform?
What about the root priveliges issue?
Pros and cons of multithreading in various parts of Samba
Why not have a separate process for name resolution, WINS, and browsing?
Multithreading and Samba
People sometimes tout threads as a uniformly good thing. They are very
nice in their place but are quite inappropriate for smbd. nmbd is
another matter, and multi-threading it would be very nice.
The short version is that smbd is not multithreaded, and alternative
servers that take this approach under Unix (such as Syntax, at the
time of writing) suffer tremendous performance penalties and are less
robust. nmbd is not threaded either, but this is because it is not
possible to do it while keeping code consistent and portable across 35
or more platforms. (This drawback also applies to threading smbd.)
The longer versions is that there are very good reasons for not making
smbd multi-threaded. Multi-threading would actually make Samba much
slower, less scalable, less portable and much less robust. The fact
that we use a separate process for each connection is one of Samba's
biggest advantages.
Threading smbd
A few problems that would arise from a threaded smbd are:
It's not only to create threads instead of processes, but you
must care about all variables if they have to be thread specific
(currently they would be global).
if one thread dies (eg. a seg fault) then all threads die. We can
immediately throw robustness out the window.
many of the system calls we make are blocking. Non-blocking
equivalents of many calls are either not available or are awkward (and
slow) to use. So while we block in one thread all clients are
waiting. Imagine if one share is a slow NFS filesystem and the others
are fast, we will end up slowing all clients to the speed of NFS.
you can't run as a different uid in different threads. This means
we would have to switch uid/gid on _every_ SMB packet. It would be
horrendously slow.
the per process file descriptor limit would mean that we could only
support a limited number of clients.
we couldn't use the system locking calls as the locking context of
fcntl() is a process, not a thread.
Threading nmbd
This would be ideal, but gets sunk by portability requirements.
Andrew tried to write a test threads library for nmbd that used only
ansi-C constructs (using setjmp and longjmp). Unfortunately some OSes
defeat this by restricting longjmp to calling addresses that are
shallower than the current address on the stack (apparently AIX does
this). This makes a truly portable threads library impossible. So to
support all our current platforms we would have to code nmbd both with
and without threads, and as the real aim of threads is to make the
code clearer we would not have gained anything. (it is a myth that
threads make things faster. threading is like recursion, it can make
things clear but the same thing can always be done faster by some
other method)
Chris tried to spec out a general design that would abstract threading
vs separate processes (vs other methods?) and make them accessible
through some general API. This doesn't work because of the data
sharing requirements of the protocol (packets in the future depending
on packets now, etc.) At least, the code would work but would be very
clumsy, and besides the fork() type model would never work on Unix. (Is there an OS that it would work on, for nmbd?)
A fork() is cheap, but not nearly cheap enough to do on every UDP
packet that arrives. Having a pool of processes is possible but is
nasty to program cleanly due to the enormous amount of shared data (in
complex structures) between the processes. We can't rely on each
platform having a shared memory system.
nbmd Design
Originally Andrew used recursion to simulate a multi-threaded
environment, which use the stack enormously and made for really
confusing debugging sessions. Luke Leighton rewrote it to use a
queuing system that keeps state information on each packet. The
first version used a single structure which was used by all the
pending states. As the initialisation of this structure was
done by adding arguments, as the functionality developed, it got
pretty messy. So, it was replaced with a higher-order function
and a pointer to a user-defined memory block. This suddenly
made things much simpler: large numbers of functions could be
made static, and modularised. This is the same principle as used
in NT's kernel, and achieves the same effect as threads, but in
a single process.
Then Jeremy rewrote nmbd. The packet data in nmbd isn't what's on the
wire. It's a nice format that is very amenable to processing but still
keeps the idea of a distinct packet. See "struct packet_struct" in
nameserv.h. It has all the detail but none of the on-the-wire
mess. This makes it ideal for using in disk or memory-based databases
for browsing and WINS support.
Samba's subsystems
Samba's source/ directory contains quite some directories. Here's a small explanation of what each of them contains.
aparser - Obsolete
auth - The authentication subsystem, maintained by Andrew Bartlett
bin - Output directory for all the binary files
client - Contains 'plain' SMB client sources: smbclient and
some mount help utilities
groupdb - Group database and mapping code
include - All of samba's include files
intl - Internationalization files. Not used at the moment.
lib - General C helper functions. Not SMB-specific.
libads - Library with ActiveDirectory related functions.
libsmb - Library with SMB specific functions.
locking - Locking functions!
modules - Source files for various modules (VFS and charset).
msdfs - DCE-DFS code
nmbd - Code for the nmbd daemon
nsswitch - Winbind source code
pam_smbpass - Source code for pam module for authenticating against samba's passdb
param - smb.conf parsing code
passdb - User database(SAM) code with the various backends
po - Internationalisation code - not used atm
popt - Samba's internal copy of the popt library
printing - Printing stuff
profile - Profiling support
python - Python bindings for various libsmb functions
registry - Registry backend
rpc_client - RPC Client library for making remote procedure calls
rpc_parse - Functions for parsing RPC structures (???)
rpc_server - Functions for being an RPC server
rpcclient - Command-line client that is a basically a front-end to rpc_client/
sam - Code for the new (but unused) SAM
script - Various scripts
smbd - Source code for the smbd daemon
smbwrapper - Source code for library that overloads VFS function calls, for usage with LD_PRELOAD=...
stf - Testsuite system?
tdb - Source code of samba's Trivial Database (much like gdbm)
tests - Source code for the larger tests used by configure
torture - 'Torture' utilities, used for testing samba and other cifs servers
ubiqx - The ubiqx library from Chris Hertel
utils - Various small utilities(pdbedit, net, etc)
web - SWAT sourcecode
wrepld - Sourcecode of the WINS replication daemon