While many businesses keep clinging to existing Windows, NetWare, and OS/390 solutions, more and more are turning to Linux clusters for high performance computing (HPC), high availability (HA), and Web farm applications. To get the most out of Linux clusters, however, you need to know the ins-and-outs of installation and maintenance, as well as the best software and hardware configurations for specific types of clustering implementations.
So what exactly is a cluster, anyway? "More than one machine, working cooperatively on one or more tasks," says Sean Dague of the IBM Linux Technology Center.
Linux clusters are "like 1,000,000 ants vs. one elephant," Dague illustrated, speaking during the Linux Boot Camp at the recent PCExpo/TechNYXpo conference in New York City.
Yet despite advantages ranging from speed to cost effectiveness, Linux clusters can be a tough solution to sell, according to some audience members at the boot camp in Manhattan. One IT consultant attending the show said that, after years of trying, he is just now starting to convince some of his customers of the benefits.
"My customers have been willing to adopt Linux — but not Linux clusters."
Over the past month, though, one of his customers, a large insurance firm, has decided to replace its previous Novell NetWare implementation with Linux clusters, instead of Microsoft Windows clusters as originally planned. "The firm could have saved $65,000 in consulting costs if they'd followed my first advice," the consultant said.
Closer than Windows to Unix and NetWare
Linux also bears much closer resemblance to legacy OSes such as Unix and NetWare than Microsoft .NET, some observers say.
One of the IT consultant's insurance firm customers actually tried out Windows clusters for several months before taking the consultant's advice and moving to Linux instead. The flat file database formerly used with NetWare is working much better under a Linux architecture than on Windows.
"Lawyers at the company like to be able to share files without dealing with Windows' proprietary things such as UNC," added the consultant.
Reliability and performance of Linux
Other frequently cited advantages of Linux clusters include reliability, modularity, and fast performance.
The IT consultant told a story about the CEO of another client company that has moved to Linux. When the CEO was about to deliver a presentation at an industry conference, he asked for statistics on the availability of the company's Linux systems. The company chief was astounded to get the answer: "364 days a year."
"Jobs that [would otherwise take us] about a month to complete can now be run in about 10 to 12 hours," says Alex Bogdan, a principal developer at Electro-Optical Sciences (EOS). EOS is now operating a Red Hat Linux-based cancer diagnosis application called MelaFind on PC-based eCluster systems at IBM's "Deep Computing on demand" facility in Poughkeepsie, NY.
The cost benefit of clusters
The cost effectiveness of open source software is a no-brainer. "No one ever needs a cluster — but it is often the most cost effective way to get the job done," claims Dague, especially when coupled with Linux.
Clusters can be run on inexpensive PC hardware, as well as on Linux distributions operating on top of zOS- or OS390-driven mainframes.
But keep 'Hidden Costs' in mind, too
On the downside, however, clusters can be accompanied by some hidden costs, including:
- Administration costs – The increased cost of administration as clustering scales upward must be taken into account. How many administrators will it take to run 100 nodes? 1000 nodes?
- Facility costs – Large numbers of CPUs add a great deal of weight and heat to an installation. "Are you sure your building can handle the weight?" questions Dague. "AC (air conditioning) is critical."
If your organization faces one or more of these barriers, outsourced hosting represents an alternative.
Tips and Tricks: Installation
Linux clustering is also a relatively new phenomenon, a fact that might help explain lingering reluctance by some businesses to get started.
Moreover, in some ways, Linux is still sort of a land unto itself. Methods of initial installation, for example, can vary from one Linux distribution to another.
For "lights out" installation, Dague recommends administrators use the following:
- Kickstart for Red Hat distributions
- AutoYast for SuSE distributions
- FAI for Debian distributions
"New machines must be brought from bare members of the cluster with minimal effort," says Dague. "CDs don't cut it."
What to do about 'version skew'
Software maintenance is another big issue. Sometimes, mass updates are performed on some nodes, while other nodes are down. Certain administrators apply hot fixes only to individual nodes that are in particular need of maintenance.
"Before long, though, it becomes unclear what [software] is on any given node," says Dague. Unless you keep careful documentation, this situation leads to "version skew."
Dague offers three options for staving off version skew:
Apply updates only as packages. Even if you're merely making configuration file changes, apply updates only as packages. That way, documentation will be "all in the packages."
Use imaging software such as System Installation Suite. With this approach, image and documentation are one and the same. "Make sure you back up the image, though," advises Dague. Otherwise, if you lose the image, the documentation will go down the drain, too. System Installation Suite is "distro-independent," he notes.
- Create a "diskless" environment. Run nodes off a network file system. All nodes attached to the file system will be running the same software versions. Under this scenario, too, backup is key.
HA clusters – Software needed for switching and failure detection
Software and hardware configurations vary according to type of cluster. High Availability clusters are used when systems need to be "always on," observes Dague. "In the real world, things break. Disks wear out. Memory goes bad. CPUs overheat."
As with other OSes, hot spare systems are required, along with HA software for detecting failures and for switching over to the backup node.
Web farm clusters – Load balancing is key
Web farm implementations come into play when extra CPU and network capacity is needed for handling large volumes of data. In this type of clustering implementation, content is distributed over multiple machines, with load balancing at the front end. Typically, Apache is used as the Web server.
Administrators can consult a couple of Web sites for Linux-based load balancing software:
- Linux Virtual Server – http://www.linuxvirtualserver.org
- mod_backhand – http://www.backhand.org/mod_backhand/
Too small for clusters?
Several bootcamp attendees contended their networks are too small to benefit from Linux clusters.
However, regardless of their size, companies with HPC needs – such of those in scientific and technical fields – are seeing advantages.
In HPC solutions, many computers are tied together through a high-speed network. Applications are then rewritten into "parallelizable chunks."
"Running a simulation, for example, on one node could actually take many years," Dague maintained during the Linux bootcamp.
EOS's Bogdan says it only took his company a couple of hours to port 10 or 12 miniclusters from a homegrown Linux system to IBM's supercomputing architecture in Poughkeepsie, and to optimize the existing Melafind application for IBM's Parallel Virtual Machine (PVM).
HPC clusters – Messaging and dispatch software
Dague suggests the following open source software for HPC messaging and dispatch:
"HPC in a box"
- OSCAR – http://oscar.sourceforge.net/
- OpenPBS – http://www.openpbs.org
- Maui Scheduler – – http://supercluster.org/projects/maui
- Grid Engine – http://gridenfine.sunsource.net
So whether you're interested in HPC, HA, or Web farm applications, Linux clusters could hold a lot of promise for your organization. However, unless you plan to outsource the whole ball of wax, you need to get familiar with open source solutions for installation, maintenance, and specific types of clustering applications.