Раздел III: Теория и практика кластеров
Часть 10: Как собрать Linux Enterprise Cluster
So far this book has focused on how to build a highly available pair of servers that can support mission-critical applications with no single point of failure. In this part of the book, we will look at how to build a cluster of servers that are all capable of running the same services to support the end users.
A cluster is more than a highly available pair of servers, because all of the nodes in the cluster can share the processing load. When one node goes down, all of the users that were logged on to that node are affected, and they can simply log back on again to reconnect to a working cluster node.
Steps for Building a Linux Enterprise Cluster
To build a Linux Enterprise Cluster, you need to do several things, each of which are outlined in this chapter:
Decide which NAS server you will use.
Understand the basic concepts of the Kernel Netfilter and kernel packet routing.
Learn how to clone a Linux machine.
Decide on a naming scheme for your cluster.
Learn how to apply system configuration changes to all cluster nodes.
Build a Linux Virtual Server Network Address Translation (LVS-NAT) cluster that uses a separate physical network for the cluster.
Build an LVS Direct Routing (LVS-DR) cluster.
Install software to automatically remove failed cluster nodes.
Install software to monitor the cluster.
Learn how to monitor the performance of the cluster nodes.
Learn how to update software packages on your cluster nodes and servers using an automated tool.
Decide which method you will use to centralize user account administration.
Install a printing system that supports the cluster.
Install a highly available batch job-scheduling system.
Purchase the cluster nodes.
If your cluster will run legacy mission-critical applications that rely on normal Unix lock arbitration methods (discussed in detail in Chapter 16) the Network Attached Storage (NAS) server will be the most important performance bottleneck in your cluster, because all filesystem I/O operations will pass through the NAS server.
You can build your own highly available NAS server on inexpensive hardware using the techniques described in this book, but an enterprise-class cluster should be built on top of a NAS device that can commit write operations to a nonvolatile RAM (NVRAM) cache while guaranteeing the integrity of the cache even if the NAS system crashes or the power fails before the write operation has been committed to disk.
For testing purposes, you can use a Linux server as an NAS server. (See Chapter 16 for a discussion of asynchronous Network File System (NFS) operations and why you can normally only use asynchronous NFS in a test environment and not in production.)
Kernel Netfilter and Kernel Packet Routing
Before you can build a cluster load balancer, you'll need to understand how you can alter the fate of the network packets as they pass through the Linux kernel. The ability to alter the fate of the packet as it passes through the kernel allows you to build a cluster load balancer that distributes incoming requests for services across all cluster nodes. The command-line tools used to alter the fate of a packet are iptables, route, and the ip utility. You should be familiar with these tools before using them to build an enterprise-class cluster on top of them. (See Chapter 2 for more information on these tools.)
Cloning a Linux Machine
Chapters 4 and 5 describe a method for cloning a Linux machine using the SystemImager package. It would not be practical to build a cluster of nodes that are all configured the same way without using system-cloning software.
Cluster Naming Scheme
To automate the cloning process, each node should begin with a common string of characters followed by a sequential cluster-node number. For example, the first cluster-node host name could be clnode1, the second could be clnode2, and so forth.
Applying System Configuration Changes to all Nodes
A cluster administrator needs to know how to automatically apply system administration changes to all cluster nodes. This can be done using the SystemImager package's updateclient command. Before the cluster goes into production, practice using the updateclient command to apply changes made on the Golden Client to all of the cluster nodes (see Chapter 5 for more information).
Building an LVS-NAT Cluster
Building a Linux Virtual Server Network Address Translation (LVS-NAT) cluster will help you understand how the Linux Virtual Server (LVS) software works. It will also help you ensure that your load balancer is in charge of allocating nodes inside the cluster for inbound connection requests. (See Chapter 11 for an introduction to load balancing and Chapter 12 for an introduction to LVS-NAT.)
Building an LVS-DR Cluster
Once you know how to build an LVS-NAT cluster, you are ready to convert it to a Linux Virtual Server Direct Routing (LVS-DR) cluster, as described in Chapter 13. An enterprise-class cluster that is based on LVS-DR is superior to an LVS-NAT cluster for mission-critical applications for several reasons:
The LVS-DR cluster is easier to administer. LVS-DR cluster nodes can be administered from outside the cluster network using telnet and ssh to make a connection to the cluster nodes. In an LVS-NAT cluster, the physical network cabling or VLAN configuration prevents you from making direct connections to the cluster nodes.
The LVS-DR cluster can send replies from the cluster nodes back to the client computers without passing the packets through an intermediate machine (the load balancer).
The LVS-DR cluster load balancer can malfunction and not render all of the cluster nodes useless. In contrast, if the primary and backup LVSNAT cluster load balancers both crash at the same time, the entire LVS-NAT cluster is down. In an LVS-DR cluster, if the primary and backup LVS load balancers crash at the same time, the cluster nodes can still be used as separate or distributed servers. (See Chapter 13 for the details of how to build an LVS-DR cluster.) In practice, however, this is a selling point to management and not a "feature" of an LVS-DR cluster.
The Linux Enterprise Cluster should be protected from an outside attack by a firewall. If you do not protect your cluster nodes from attack with a firewall, or if your cluster must be connected to the Internet, shell access to the cluster nodes should be physically restricted. You can physically restrict shell access to the cluster nodes by building a separate network (or VLAN) that connects the cluster nodes to an administrative machine. This separate physical network is sometimes called an administrative network.
Installing Software to Remove Failed Cluster Nodes
In this book we will use the ldirectord software package (included on the CD-ROM) to automatically remove nodes from the cluster when they fail. Chapter 15 will describe how to install and configure ldirectord.
As cluster administrator, you will also want to know how to manually remove a node from the cluster for maintenance purposes without affecting users currently logged on to the system. We'll look at how to do this in Chapter 19.
Installing Software to Monitor the Cluster Nodes
You cannot wade through the log files on every cluster node every day. You need monitoring software capable of sending email messages, electronic pages, or text messages to the administrator when something goes wrong on one of the cluster nodes.
Many open source packages can accomplish this task, and Chapter 17 describes a method of doing this using the Simple Network Management Protocol (SNMP) and the Mon software package. SNMP and the Mon monitoring package together allow you to monitor the cluster nodes and send an alert when a threshold that you have specified is violated.
Monitoring the Performance of Cluster Nodes
In addition to monitoring the cluster nodes for problems, you will also want to be able to monitor the processing load on each node to see if they are balancing the workload properly. The Ganglia package is an excellent tool for doing this. Chapter 18 will describe how to use Ganglia and will discuss a few of the performance metrics (such as the system load average) that Ganglia collects from all of the cluster nodes.
Managers and operations staff can use web pages created with the Ganglia software package to watch the processing load on the cluster in real time. The day your cluster goes into production, this will be one of the most important tools you have to see what is going on inside the cluster.
Updating Software on Cluster Nodes and Servers
You will also need a method of automatically downloading and installing packages to fix security holes as they are found and plugged. The automated tool Yum, described in Appendix D, is one way of doing this. You'll want to learn how to use Yum or another of the many automated package update utilities before going into production to make sure you have built a system that can continue to evolve and adapt to change.
You'll also need to take into account subtle problems, like the fact that the SystemImager updateclient command may overwrite the package registration information or the list of software (RPM) packages stored on the cluster node's disk drive. (To resolve this problem, you may want to install software on one node—the SystemImager Golden Client—and then just use the updateclient command to upgrade the remaining cluster nodes.)
Centralizing User Account Administration
You will need to somehow centralize the administration of user accounts on your cluster. See Chapters 1 and 19 for discussions of the possible account-administration methods, such as NIS, LDAP, Webmin, OPIUM (part of the OSCAR package), or a cron job that copies information stored on a central server to the local /etc/passwd file on each cluster node. (Whichever method you select, you will also want to decide if this is the right method for centralizing group and host information as well.)
Installing a Printing System
You will need to set up a printing system that allows you to have a single point of control for all print jobs from the cluster without creating a single point of failure. The use of the LPRng package for this purpose in a cluster environment will be briefly discussed in Chapter 19.
Installing a Highly Available Batch Job-Scheduling System
Building a highly available cluster will not improve the reliability of your system if your batch job scheduling system is not highly available. We'll look at how to build a batch job scheduling system with no single point of failure in Chapter 18.
Purchasing the Cluster Nodes
Clusters built to support scientific research can sometimes contain thousands of nodes and fill the data centers of large research institutions (see http://clusters.top500.org). For the applications that run on these clusters, there may be no practical limit to the amount of CPU cycles they can use—as more cycles become available, more work gets done. By contrast, an enterprise cluster has a practical upper limit on the amount of processing cycles that an application can use. An enterprise workload will have periods of peak demand, where processing needs may be double, triple, or more, compared to processing needs during periods of low system activity. However, at some point more processing power does not translate into more work getting done, because external factors (such as the number and ability of the people using the cluster) will determine this limitation.
A Linux Enterprise Cluster will therefore have an optimal number of nodes, determined by the requirements of the organization using it, and by the costs of building, maintaining, and supporting it. In this section, we'll look at two basic design considerations for finding the optimal number of nodes: cluster performance and the impact of a single node failure.
Capacity Planning and Cluster Performance
Applications spend most of their idle time in one of four states: waiting for user input, waiting for the CPU, waiting for filesystem I/O, or waiting for the network I/O. When you build a cluster, you greatly reduce only one of these— the amount of time applications spend waiting for the CPU. By spreading users out over several cluster nodes, you also reduce the likelihood of several CPU-bound processes competing for the same CPU at the same time.
Most organizations can easily afford to purchase enough cluster nodes to eliminate the CPU as the performance bottleneck, and building a cluster doesn't help you prevent the other three performance bottlenecks. Therefore, the second cluster design consideration (the impact of a single node failure) is likely to influence capacity planning in most organizations more than performance.
Capacity Planning and the Impact of a Single Node Failure
The second and more significant design consideration for most organizations is the business impact of a node failure, and the ability to perform routine (planned) maintenance. When you are deciding how many nodes to purchase, the business impact of a single node failure on the enterprise or on the user community may be the single most important design consideration.
Purchasing more nodes than the number needed for peak CPU processing may make sense because the extra nodes will reduce the impact of the failure of a single node on the end-user community, and it will also make cluster maintenance easier. (The cluster administrator can remove a node from the cluster for maintenance, and the cluster will still have enough processing power to continue to get the job done.)
Assuming your budget will allow you to purchase more nodes than you need to adequately meet your CPU processing requirements, you'll need to examine how your workload will be distributed across the cluster nodes and determine the business impact of a node failure (for example, how many users would be affected or how many user sessions would be affected). You can then provide management with the total cost of ownership for each additional cluster node and explain the benefits of purchasing additional nodes.