Fault Tolerance
Fault tolerance is a very important security feature to consider, especially when building mission critical servers. A system with well established fault tolerance will be invulnerable to any number of different kinds of hardware failures, including the failure of hard drives, network interface cards, controllers, and motherboards. Fault tolerance means that one or more components are tolerant to failure. If a component goes bad there would be another available to take its place. This section covers the most common kinds of fault tolerance and how they work.
Mirroring/RAID
Mirroring or RAID (Random Array of Inexpensive Disks) is a way to make the hard drives on a server fault tolerant. If one drive fails, there are still others running with duplicate information to keep the server going. Mirroring is the most basic implementation of RAID. It simply maintains an identical copy of one disk on another—mirror copies. If one fails the other copy is always available to take over. RAID is similar to mirroring; however, it distributes parts of the data from one disk over three or more disks. It is faster than mirroring because it doesn't always have to access one disk to retrieve information. Information is spread along multiple disks.
Hardware Mirroring/RAID
Hardware mirroring or RAID is more stable and reliable than software mirroring or RAID. A special hard drive controller called a RAID controller that has its own BIOS settings is used. It passes hard drive information and connectivity along to the operating system. The configuration of the hard drive is done through the controller. If the hard drives are configured as one logical disk, then this is all the operating system sees. Even though there may be ten separate disks, the controller organizes them all as one disk and passes this information along to the operating system.
Software Mirroring/RAID
Software mirroring or RAID is cheaper because it doesn't require a RAID controller. This is called "software mirroring" because the operating system is responsible for configuring and managing the RAID or mirrored configuration of the hard disks. However, software mirroring is less reliable. With software mirroring the operating system must be stored on one single disk. Because the operating system does the mirroring, it must first boot without fault tolerance. In this case, the disk with the operating system is usually not fault tolerant. There are newer versions of software mirroring, such as "dynamic disk mirroring" that comes with Windows 2000, that are more reliable; however, hardware mirroring and RAID are still far superior.
Clustering
Clustering is similar to mirroring/RAID; however, it is a kind of fault tolerance that is applied to the entire computer. A special clustering card is placed in two or more servers. If one server fails there is still another identical server up and running to take its place. The simplest form is failover, which simply allows the system to switch over to the backup when the primary fails.
Clustering is an expensive solution as it requires special hardware and multiple computers and the operating system must support clustering. Different forms of UNIX servers and the Windows 2000 Advanced server are capable of clustering.
Load Balancing
Load balancing provides for fault tolerance, but its use is two-fold. As its name implies, load balancing distributes requests to multiple servers, allowing them to all share a load. Load balancing is most common with a multiple web server environment. A load balancer sits on the outside of the web servers and directs traffic to whichever web server is least busy. All web servers are configured the same. The load balancer simply distributes traffic to the web servers behind it.
Load balancing allows for a very scalable and fault tolerant environment. If traffic increases, an administrator simply adds more web servers and configures the load balancer accordingly. If a web server fails, there are plenty of others still available to do the job.
Next: Server Monitoring
