diff options
Diffstat (limited to 'docs/howto/HighAvailability.xml')
-rw-r--r-- | docs/howto/HighAvailability.xml | 412 |
1 files changed, 412 insertions, 0 deletions
diff --git a/docs/howto/HighAvailability.xml b/docs/howto/HighAvailability.xml new file mode 100644 index 0000000000..9af1abe9cd --- /dev/null +++ b/docs/howto/HighAvailability.xml @@ -0,0 +1,412 @@ +<chapter id="SambaHA"> +<chapterinfo> + &author.jht; + &author.jeremy; +</chapterinfo> + +<title>High Availability</title> + +<sect1> +<title>Features and Benefits</title> + +<para> +Network administrators are often concerned about the availability of file and print +services. Network users are inclined toward intolerance of the services they depend +on to perform vital task responsibilities. +</para> + +<para> +A sign in a computer room served to remind staff of their responsibilities. It read: +</para> + +<blockquote> +<para> +All humans fail, in both great and small ways we fail continually. Machines fail too. +Computers are machines that are managed by humans, the fallout from failure +can be spectacular. Your responsibility is to deal with failure, to anticipate it +and to eliminate it as far as is humanly and economically wise to achieve. +Are your actions part of the problem or part of the solution? +</para> +</blockquote> + +<para> +If we are to deal with failure in a planned and productive manner, then first we must +understand the problem. That is the purpose of this chapter. +</para> + +<para> +Parenthetically, in the following discussion there are seeds of information on how to +provision a network infrastructure against failure. Our purpose here is not to provide +a lengthy dissertation on the subject of high availability. Additionally, we have made +a conscious decision to not provide detailed working examples of high availability +solutions; instead we present an overview of the issues in the hope that someone will +rise to the challenge of providing a detailed document that is focused purely on +presentation of the current state of knowledge and practice in high availability as it +applies to the deployment of Samba and other CIFS/SMB technologies. +</para> + +</sect1> + +<sect1> +<title>Technical Discussion</title> + +<para> +The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003 +conference that was held at Goettingen, Germany, in April 2003. Material has been added +from other sources, but it was Jeremy who inspired the structure that follows. +</para> + + <sect2> + <title>The Ultimate Goal</title> + + <para> + All clustering technologies aim to achieve one or more of the following: + </para> + + <itemizedlist> + <listitem><para>Obtain the maximum affordable computational power.</para></listitem> + <listitem><para>Obtain faster program execution.</para></listitem> + <listitem><para>Deliver unstoppable services.</para></listitem> + <listitem><para>Avert points of failure.</para></listitem> + <listitem><para>Exact most effective utilization of resources.</para></listitem> + </itemizedlist> + + <para> + A clustered file server ideally has the following properties: + </para> + + <itemizedlist> + <listitem><para>All clients can connect transparently to any server.</para></listitem> + <listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem> + <listitem><para>All servers server out the same set of files.</para></listitem> + <listitem><para>All file changes are immediately seen on all servers.</para> + <itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem> + <listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem> + </itemizedlist> + + </sect2> + + <sect2> + <title>Why Is This So Hard?</title> + + <para> + In short, the problem is one of <emphasis>state</emphasis>. + </para> + + <itemizedlist> + <listitem> + <para> + All TCP/IP connections are dependent on state information. + </para> + <para> + The TCP connection involves a packet sequence number. This + sequence number would need to be dynamically updated on all + machines in the cluster to effect seamless TCP fail-over. + </para> + </listitem> + <listitem> + <para> + CIFS/SMB (the Windows networking protocols) uses TCP connections. + </para> + <para> + This means that from a basic design perspective, fail-over is not + seriously considered. + <itemizedlist> + <listitem><para> + All current SMB clusters are fail-over solutions + &smbmdash; they rely on the clients to reconnect. They provide server + fail-over, but clients can lose information due to a server failure. + </para></listitem> + </itemizedlist> + </para> + </listitem> + <listitem> + <para> + Servers keep state information about client connections. + <itemizedlist> + <listitem><para>CIFS/SMB involves a lot of state.</para></listitem> + <listitem><para>Every file open must be compared with other file opens + to check share modes.</para></listitem> + </itemizedlist> + </para> + </listitem> + </itemizedlist> + + <sect3> + <title>The Front-End Challenge</title> + + <para> + To make it possible for a cluster of file servers to appear as a single server that has one + name and one IP address, the incoming TCP data streams from clients must be processed by the + front end virtual server. This server must de-multiplex the incoming packets at the SMB protocol + layer level and then feed the SMB packet to different servers in the cluster. + </para> + + <para> + One could split all IPC$ connections and RPC calls to one server to handle printing and user + lookup requirements. RPC Printing handles are shared between different IPC4 sessions &smbmdash; it is + hard to split this across clustered servers! + </para> + + <para> + Conceptually speaking, all other servers would then provide only file services. This is a simpler + problem to concentrate on. + </para> + + </sect3> + + <sect3> + <title>De-multiplexing SMB Requests</title> + + <para> + De-multiplexing of SMB requests requires knowledge of SMB state information, + all of which must be held by the front-end <emphasis>virtual</emphasis> server. + This is a perplexing and complicated problem to solve. + </para> + + <para> + Windows XP and later have changed semantics so state information (vuid, tid, fid) + must match for a successful operation. This makes things simpler than before and is a + positive step forward. + </para> + + <para> + SMB requests are sent by vuid to their associated server. No code exists today to + affect this solution. This problem is conceptually similar to the problem of + correctly handling requests from multiple requests from Windows 2000 + Terminal Server in Samba. + </para> + + <para> + One possibility is to start by exposing the server pool to clients directly. + This could eliminate the de-multiplexing step. + </para> + + </sect3> + + <sect3> + <title>The Distributed File System Challenge</title> + + <para> +<indexterm><primary>Distributed File Systems</primary></indexterm> + There exists many distributed file systems for UNIX and Linux. + </para> + + <para> + Many could be adopted to backend our cluster, so long as awareness of SMB + semantics is kept in mind (share modes, locking and oplock issues in particular). + Common free distributed file systems include: +<indexterm><primary>NFS</primary></indexterm> +<indexterm><primary>AFS</primary></indexterm> +<indexterm><primary>OpenGFS</primary></indexterm> +<indexterm><primary>Lustre</primary></indexterm> + </para> + + <itemizedlist> + <listitem><para>NFS</para></listitem> + <listitem><para>AFS</para></listitem> + <listitem><para>OpenGFS</para></listitem> + <listitem><para>Lustre</para></listitem> + </itemizedlist> + + <para> + The server pool (cluster) can use any distributed file system backend if all SMB + semantics are performed within this pool. + </para> + + </sect3> + + <sect3> + <title>Restrictive Constraints on Distributed File Systems</title> + + <para> + Where a clustered server provides purely SMB services, oplock handling + may be done within the server pool without imposing a need for this to + be passed to the backend file system pool. + </para> + + <para> + On the other hand, where the server pool also provides NFS or other file services, + it will be essential that the implementation be oplock aware so it can + interoperate with SMB services. This is a significant challenge today. A failure + to provide this will result in a significant loss of performance that will be + sorely noted by users of Microsoft Windows clients. + </para> + + <para> + Last, all state information must be shared across the server pool. + </para> + + </sect3> + + <sect3> + <title>Server Pool Communications</title> + + <para> + Most backend file systems support POSIX file semantics. This makes it difficult + to push SMB semantics back into the file system. POSIX locks have different properties + and semantics from SMB locks. + </para> + + <para> + All <command>smbd</command> processes in the server pool must of necessity communicate + very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba + uses is not suitable for use across a network. Clustered <command>smbd</command>'s must use something else. + </para> + + </sect3> + + <sect3> + <title>Server Pool Communications Demands</title> + + <para> + High speed inter-server communications in the server pool is a design prerequisite + for a fully functional system. Possibilities for this include: + </para> + + <itemizedlist> + <listitem><para> + Proprietary shared memory bus (example: Myrinet or SCI [Scalable Coherent Interface]). + These are high cost items. + </para></listitem> + + <listitem><para> + Gigabit ethernet (now quite affordable). + </para></listitem> + + <listitem><para> + Raw ethernet framing (to bypass TCP and UDP overheads). + </para></listitem> + </itemizedlist> + + <para> + We have yet to identify metrics for performance demands to enable this to happen + effectively. + </para> + + </sect3> + + <sect3> + <title>Required Modifications to Samba</title> + + <para> + Samba needs to be significantly modified to work with a high-speed server inter-connect + system to permit transparent fail-over clustering. + </para> + + <para> + Particular functions inside Samba that will be affected include: + </para> + + <itemizedlist> + <listitem><para> + The locking database, oplock notifications, + and the share mode database. + </para></listitem> + + <listitem><para> + Failure semantics need to be defined. Samba behaves the same way as Windows. + When oplock messages fail, a file open request is allowed, but this is + potentially dangerous in a clustered environment. So how should inter-server + pool failure semantics function and how should this be implemented? + </para></listitem> + + <listitem><para> + Should this be implemented using a point-to-point lock manager, or can this + be done using multicast techniques? + </para></listitem> + + </itemizedlist> + + </sect3> + </sect2> + + <sect2> + <title>A Simple Solution</title> + + <para> + Allowing fail-over servers to handle different functions within the exported file system + removes the problem of requiring a distributed locking protocol. + </para> + + <para> + If only one server is active in a pair, the need for high speed server interconnect is avoided. + This allows the use of existing high availability solutions, instead of inventing a new one. + This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more + complex file name space. Since there is now not a single file system, administrators + must remember where all services are located &smbmdash; a complexity not easily dealt with. + </para> + + <para> + The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend + servers. Backend file space integrity is the responsibility of the administrator. + </para> + + </sect2> + + <sect2> + <title>High Availability Server Products</title> + + <para> + Fail-over servers must communicate in order to handle resource fail-over. This is essential + for high availability services. The use of a dedicated heartbeat is a common technique to + introduce some intelligence into the fail-over process. This is often done over a dedicated + link (LAN or serial). + </para> + + <para> +<indexterm><primary>SCSI</primary></indexterm> + Many fail-over solutions (like Red Hat Cluster Manager, as well as Microsoft Wolfpack) + can use a shared SCSI of Fiber Channel disk storage array for fail-over communication. + Information regarding Red Hat high availability solutions for Samba may be obtained from: + <ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com.</ulink> + </para> + + <para> + The Linux High Availability project is a resource worthy of consultation if your desire is + to build a highly available Samba file server solution. Please consult the home page at + <ulink url="http://www.linux-ha.org/">www.linux-ha.org/.</ulink> + </para> + + <para> + Front-end server complexity remains a challenge for high availability as it needs to deal + gracefully with backend failures, while at the same time it needs to provide continuity of service + to all network clients. + </para> + + </sect2> + + <sect2> + <title>MS-DFS: The Poor Man's Cluster</title> + + <para> +<indexterm><primary>MS-DFS</primary></indexterm> +<indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm> + MS-DFS links can be used to redirect clients to disparate backend servers. This pushes + complexity back to the network client, something already included by Microsoft. + MS-DFS creates the illusion of a simple, continuous file system name space, that even + works at the file level. + </para> + + <para> + Above all, at the cost of complexity of management, a distributed (pseudo-cluster) can + be created using existing Samba functionality. + </para> + + </sect2> + + <sect2> + <title>Conclusions</title> + + <itemizedlist> + <listitem><para>Transparent SMB clustering is hard to do!</para></listitem> + <listitem><para>Client fail-over is the best we can do today.</para></listitem> + <listitem><para>Much more work is needed before a practical and manageable high + availability transparent cluster solution will be possible.</para></listitem> + <listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem> + </itemizedlist> + + </sect2> + +</sect1> +</chapter> |