From d4b35b895cdf157e49609b59ec89ab648dafb524 Mon Sep 17 00:00:00 2001
From: John Terpstra <jht@samba.org>
Date: Wed, 13 Apr 2005 04:04:36 +0000
Subject: More updates. (This used to be commit
 20f8bde1d0a2b2e42efedcdac21778fe34c0ab79)

---
 .../TOSHARG-HighAvailability.xml                   | 414 +++++++++++++++++++++
 1 file changed, 414 insertions(+)
 create mode 100644 docs/Samba-HOWTO-Collection/TOSHARG-HighAvailability.xml

(limited to 'docs/Samba-HOWTO-Collection/TOSHARG-HighAvailability.xml')
diff --git a/docs/Samba-HOWTO-Collection/TOSHARG-HighAvailability.xml b/docs/Samba-HOWTO-Collection/TOSHARG-HighAvailability.xml
new file mode 100644
index 0000000000..385646d91f
--- /dev/null
+++ b/docs/Samba-HOWTO-Collection/TOSHARG-HighAvailability.xml
@@ -0,0 +1,414 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
+<chapter id="SambaHA">
+<chapterinfo>
+	&author.jht;
+	&author.jeremy;
+</chapterinfo>
+
+<title>High Availability</title>
+
+<sect1>
+<title>Features and Benefits</title>
+
+<para>
+Network administrators are often concerned about the availability of file and print
+services. Network users are inclined toward intolerance of the services they depend
+on to perform vital task responsibilities.
+</para>
+
+<para>
+A sign in a computer room served to remind staff of their responsibilities. It read:
+</para>
+
+<blockquote>
+<para>
+All humans fail, in both great and small ways we fail continually. Machines fail too.
+Computers are machines that are managed by humans, the fallout from failure
+can be spectacular. Your responsibility is to deal with failure, to anticipate it
+and to eliminate it as far as is humanly and economically wise to achieve.
+Are your actions part of the problem or part of the solution?
+</para>
+</blockquote>
+
+<para>
+If we are to deal with failure in a planned and productive manner, then first we must
+understand the problem. That is the purpose of this chapter.
+</para>
+
+<para>
+Parenthetically, in the following discussion there are seeds of information on how to
+provision a network infrastructure against failure. Our purpose here is not to provide
+a lengthy dissertation on the subject of high availability. Additionally, we have made
+a conscious decision to not provide detailed working examples of high availability
+solutions; instead we present an overview of the issues in the hope that someone will
+rise to the challenge of providing a detailed document that is focused purely on
+presentation of the current state of knowledge and practice in high availability as it
+applies to the deployment of Samba and other CIFS/SMB technologies.
+</para>
+
+</sect1>
+
+<sect1>
+<title>Technical Discussion</title>
+
+<para>
+The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003
+conference that was held at Goettingen, Germany, in April 2003. Material has been added
+from other sources, but it was Jeremy who inspired the structure that follows.
+</para>
+
+	<sect2>
+	<title>The Ultimate Goal</title>
+
+	<para>
+	All clustering technologies aim to achieve one or more of the following:
+	</para>
+
+	<itemizedlist>
+		<listitem><para>Obtain the maximum affordable computational power.</para></listitem>
+		<listitem><para>Obtain faster program execution.</para></listitem>
+		<listitem><para>Deliver unstoppable services.</para></listitem>
+		<listitem><para>Avert points of failure.</para></listitem>
+		<listitem><para>Exact most effective utilization of resources.</para></listitem>
+	</itemizedlist>
+
+	<para>
+	A clustered file server ideally has the following properties:
+	</para>
+
+	<itemizedlist>
+		<listitem><para>All clients can connect transparently to any server.</para></listitem>
+		<listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem>
+		<listitem><para>All servers server out the same set of files.</para></listitem>
+		<listitem><para>All file changes are immediately seen on all servers.</para>
+			<itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem>
+		<listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem>
+	</itemizedlist>
+
+	</sect2>
+
+	<sect2>
+	<title>Why Is This So Hard?</title>
+
+	<para>
+	In short, the problem is one of <emphasis>state</emphasis>.
+	</para>
+
+	<itemizedlist>
+		<listitem>
+			<para>
+			All TCP/IP connections are dependent on state information.
+			</para>
+			<para>
+			The TCP connection involves a packet sequence number. This
+			sequence number would need to be dynamically updated on all
+			machines in the cluster to effect seamless TCP fail-over.
+			</para>
+		</listitem>
+		<listitem>
+			<para>
+			CIFS/SMB (the Windows networking protocols) uses TCP connections.
+			</para>
+			<para>
+			This means that from a basic design perspective, fail-over is not
+			seriously considered.
+			<itemizedlist>
+				<listitem><para>
+				All current SMB clusters are fail-over solutions
+				&smbmdash; they rely on the clients to reconnect. They provide server
+				fail-over, but clients can lose information due to a server failure.
+				</para></listitem>
+			</itemizedlist>
+			</para>
+		</listitem>
+		<listitem>
+			<para>
+			Servers keep state information about client connections.
+			<itemizedlist>
+				<listitem><para>CIFS/SMB involves a lot of state.</para></listitem>
+				<listitem><para>Every file open must be compared with other file opens
+						to check share modes.</para></listitem>
+			</itemizedlist>
+			</para>
+		</listitem>
+	</itemizedlist>
+
+		<sect3>
+		<title>The Front-End Challenge</title>
+
+		<para>
+		To make it possible for a cluster of file servers to appear as a single server that has one
+		name and one IP address, the incoming TCP data streams from clients must be processed by the
+		front end virtual server. This server must de-multiplex the incoming packets at the SMB protocol
+		layer level and then feed the SMB packet to different servers in the cluster.
+		</para>
+
+		<para>
+		One could split all IPC$ connections and RPC calls to one server to handle printing and user
+		lookup requirements. RPC Printing handles are shared between different IPC4 sessions &smbmdash; it is
+		hard to split this across clustered servers!
+		</para>
+
+		<para>
+		Conceptually speaking, all other servers would then provide only file services. This is a simpler
+		problem to concentrate on.
+		</para>
+
+		</sect3>
+
+		<sect3>
+		<title>De-multiplexing SMB Requests</title>
+
+		<para>
+		De-multiplexing of SMB requests requires knowledge of SMB state information,
+		all of which must be held by the front-end <emphasis>virtual</emphasis> server.
+		This is a perplexing and complicated problem to solve.
+		</para>
+
+		<para>
+		Windows XP and later have changed semantics so state information (vuid, tid, fid)
+		must match for a successful operation. This makes things simpler than before and is a
+		positive step forward.
+		</para>
+
+		<para>
+		SMB requests are sent by vuid to their associated server. No code exists today to
+		affect this solution. This problem is conceptually similar to the problem of
+		correctly handling requests from multiple requests from Windows 2000
+		Terminal Server in Samba.
+		</para>
+
+		<para>
+		One possibility is to start by exposing the server pool to clients directly.
+		This could eliminate the de-multiplexing step.
+		</para>
+
+		</sect3>
+
+		<sect3>
+		<title>The Distributed File System Challenge</title>
+
+		<para>
+<indexterm><primary>Distributed File Systems</primary></indexterm>
+		There exists many distributed file systems for UNIX and Linux.
+		</para>
+
+		<para>
+		Many could be adopted to backend our cluster, so long as awareness of SMB
+		semantics is kept in mind (share modes, locking and oplock issues in particular).
+		Common free distributed file systems include:
+<indexterm><primary>NFS</primary></indexterm>
+<indexterm><primary>AFS</primary></indexterm>
+<indexterm><primary>OpenGFS</primary></indexterm>
+<indexterm><primary>Lustre</primary></indexterm>
+		</para>
+
+		<itemizedlist>
+			<listitem><para>NFS</para></listitem>
+			<listitem><para>AFS</para></listitem>
+			<listitem><para>OpenGFS</para></listitem>
+			<listitem><para>Lustre</para></listitem>
+		</itemizedlist>
+
+		<para>
+		The server pool (cluster) can use any distributed file system backend if all SMB
+		semantics are performed within this pool.
+		</para>
+
+		</sect3>
+
+		<sect3>
+		<title>Restrictive Constraints on Distributed File Systems</title>
+
+		<para>
+		Where a clustered server provides purely SMB services, oplock handling
+		may be done within the server pool without imposing a need for this to
+		be passed to the backend file system pool.
+		</para>
+
+		<para>
+		On the other hand, where the server pool also provides NFS or other file services,
+		it will be essential that the implementation be oplock aware so it can
+		interoperate with SMB services. This is a significant challenge today. A failure
+		to provide this will result in a significant loss of performance that will be
+		sorely noted by users of Microsoft Windows clients.
+		</para>
+
+		<para>
+		Last, all state information must be shared across the server pool.
+		</para>
+
+		</sect3>
+
+		<sect3>
+		<title>Server Pool Communications</title>
+
+		<para>
+		Most backend file systems support POSIX file semantics. This makes it difficult
+		to push SMB semantics back into the file system. POSIX locks have different properties
+		and semantics from SMB locks.
+		</para>
+
+		<para>
+		All <command>smbd</command> processes in the server pool must of necessity communicate
+		very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba
+		uses is not suitable for use across a network. Clustered <command>smbd</command>'s must use something else.
+		</para>
+
+		</sect3>
+
+		<sect3>
+		<title>Server Pool Communications Demands</title>
+
+		<para>
+		High speed inter-server communications in the server pool is a design prerequisite
+		for a fully functional system. Possibilities for this include:
+		</para>
+
+		<itemizedlist>
+			<listitem><para>
+			Proprietary shared memory bus (example: Myrinet or SCI [Scalable Coherent Interface]).
+			These are high cost items.
+			</para></listitem>
+		
+			<listitem><para>
+			Gigabit ethernet (now quite affordable).
+			</para></listitem>
+		
+			<listitem><para>
+			Raw ethernet framing (to bypass TCP and UDP overheads).
+			</para></listitem>
+		</itemizedlist>
+
+		<para>
+		We have yet to identify metrics for  performance demands to enable this to happen
+		effectively.
+		</para>
+
+		</sect3>
+
+		<sect3>
+		<title>Required Modifications to Samba</title>
+
+		<para>
+		Samba needs to be significantly modified to work with a high-speed server inter-connect
+		system to permit transparent fail-over clustering.
+		</para>
+
+		<para>
+		Particular functions inside Samba that will be affected include:
+		</para>
+
+		<itemizedlist>
+			<listitem><para>
+			The locking database, oplock notifications,
+			and the share mode database.
+			</para></listitem>
+
+			<listitem><para>
+			Failure semantics need to be defined. Samba behaves the same way as Windows.
+			When oplock messages fail, a file open request is allowed, but this is 
+			potentially dangerous in a clustered environment. So how should inter-server
+			pool failure semantics function and how should this be implemented?
+			</para></listitem>
+
+			<listitem><para>
+			Should this be implemented using a point-to-point lock manager, or can this
+			be done using multicast techniques?
+			</para></listitem>
+
+		</itemizedlist>
+
+		</sect3>
+	</sect2>
+
+	<sect2>
+	<title>A Simple Solution</title>
+
+	<para>
+	Allowing fail-over servers to handle different functions within the exported file system
+	removes the problem of requiring a distributed locking protocol.
+	</para>
+
+	<para>
+	If only one server is active in a pair, the need for high speed server interconnect is avoided.
+	This allows the use of existing high availability solutions, instead of inventing a new one.
+	This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more
+	complex file name space. Since there is now not a single file system, administrators
+	must remember where all services are located &smbmdash; a complexity not easily dealt with.
+	</para>
+
+	<para>
+	The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend
+	servers. Backend file space integrity is the responsibility of the administrator.
+	</para>
+
+	</sect2>
+
+	<sect2>
+	<title>High Availability Server Products</title>
+
+	<para>
+	Fail-over servers must communicate in order to handle resource fail-over. This is essential
+	for high availability services. The use of a dedicated heartbeat is a common technique to
+	introduce some intelligence into the fail-over process. This is often done over a dedicated
+	link (LAN or serial).
+	</para>
+
+	<para>
+<indexterm><primary>SCSI</primary></indexterm>
+	Many fail-over solutions (like Red Hat Cluster Manager, as well as Microsoft Wolfpack)
+	can use a shared SCSI of Fiber Channel disk storage array for fail-over communication.
+	Information regarding Red Hat high availability solutions for Samba may be obtained from:
+	<ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com.</ulink>
+	</para>
+
+	<para>
+	The Linux High Availability project is a resource worthy of consultation if your desire is
+	to build a highly available Samba file server solution. Please consult the home page at
+	<ulink url="http://www.linux-ha.org/">www.linux-ha.org/.</ulink>
+	</para>
+
+	<para>
+	Front-end server complexity remains a challenge for high availability as it needs to deal
+	gracefully with backend failures, while at the same time it needs to provide continuity of service
+	to all network clients.
+	</para>
+	
+	</sect2>
+
+	<sect2>
+	<title>MS-DFS: The Poor Man's Cluster</title>
+
+	<para>
+<indexterm><primary>MS-DFS</primary></indexterm>
+<indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm>
+	MS-DFS links can be used to redirect clients to disparate backend servers. This pushes
+	complexity back to the network client, something already included by Microsoft.
+	MS-DFS creates the illusion of a simple, continuous file system name space, that even
+	works at the file level.
+	</para>
+
+	<para>
+	Above all, at the cost of complexity of management, a distributed (pseudo-cluster) can
+	be created using existing Samba functionality.
+	</para>
+
+	</sect2>
+
+	<sect2>
+	<title>Conclusions</title>
+
+	<itemizedlist>
+		<listitem><para>Transparent SMB clustering is hard to do!</para></listitem>
+		<listitem><para>Client fail-over is the best we can do today.</para></listitem>
+		<listitem><para>Much more work is needed before a practical and manageable high
+				availability transparent cluster solution will be possible.</para></listitem>
+		<listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem>
+	</itemizedlist>
+
+	</sect2>
+
+</sect1>
+</chapter>
-- 
cgit