SQL Server Clusters

Windows 2000 Advanced Server and Datacenter Server both provide Cluster Services including Network Load Balancing and Fault Tolerance. SQL Server 2000 is a fully “cluster aware” application that takes full advantage of the Fault Tolerant services that are provided by Windows. However Network Load Balancing (or NLB) is really not designed for read/write services like SQL Server 2000 – Network Load Balancing is designed for services such as ISS, HTTP, FTP, etc.

Each server in a Fault Tolerant Cluster is referred to as a node. Each node in the cluster must have access to a shared SCSI bus. Although each node’s operating system is located on the individual servers, the shared applications (In this case SQL Server) are located on the SCSI bus that is accessible by all the nodes in the cluster.

To determine when a node has failed, nodes frequently send out what are called “heartbeat messages” to each other over the network. In the event that heartbeat messages are no longer detected from a specific node, the other servers in the cluster treat the node as failed and transfer the services from the failed node to another node. The actual process of detecting a failed node and transferring the services is a little more complicated, but I could write a hole other article on just that topic (and I may do that).

There are two types of nodes in a Fault Tolerant Cluster:

  • Primary Node: One per cluster
  • Backup Node: One per cluster using Advanced Server or up to three if using Datacenter Server

The Primary Node is the system that does all the “work” in the cluster. The primary also controls the drives on the SCSI bus that contain the shared application. Unlike the Primary Node, Backup Nodes are only standbys who check to see if the Primary Node is still active. Backup Nodes have physical access to the SCSI bus, but do not access the drives unless they become the Primary Node.

The configuration of one primary doing all the work with one or more standbys is known as an Active/Passive configuration. If you are running large mission critical applications that can’t afford any downtime or loss of performance, this is the configuration for you. However, to use an Active/Passive configuration on a large system is very expensive.

If you don’t like the idea of having one server doing nothing most of the time or you can’t afford all the servers you would need, you can use an Active/Active configuration. An Active/Active configuration is more like an Active/Passive and Passive/Active put together. In this configuration each server in the cluster acts like a Primary Node controlling part of the shared SCSI bus and also acts like a Backup Node. Using multiple instances of SQL Server you can then create a load balancing effect (but not true load balancing). The downside to an Active/Active configuration is that in the event that one node fails the backup node must perform the work of both nodes. This can result in degradation in performance especially if you don’t have the hardware needed to support both applications on one server.

I have just scratched the surface on clustering. Clustering is a neat topic and if you would like to learn more have a look at the following links:

http://www.microsoft.com/windows2000/techinfo/planning/server/clustersteps.asp

http://www.microsoft.com/windows2000/techinfo/howitworks/cluster/clusterarch.asp