"Due to the complexity of climate systems and current limitations of climate models, it may take 10 or 20 years to develop clear observational or modeling proof of global warming causes," says Dr. Jonathan H. Jiang, a scientist from the Microwave Limb Sounder (MLS) team at NASA's Jet Propulsion Laboratory (JPL) in Pasadena, Calif. (See sidebar for explanation of what the team does.) "By that point it might be too late for us to prevent the climate changes."
To speed the process, MLS is now hooked into the national TeraGrid, which connects large-scale Linux clusters, using 64-bit Intel Itanium processors, at the Argonne National Laboratory, Caltech, the National Center for Supercomputing Applications and the San Diego Supercomputer Center. The sites are linked via 30-40 Gbps connections and have a combined computing capacity of 15 (soon to be 21+) teraflops and storage capacity of more than seven petabytes.
|What Is A 'Microwave Limb Sounder'?|
|It is an instrument that detects naturally occurring microwaves in Earth's upper atmosphere, or "limb." ("Limb" is an astronomy term meaning "the apparent outer edge of a celestial object.") The first MLS was part of the Upper Atmospheric Research Satellite and helped identify how chlorine compounds were deleting the ozone layer. The next MLS will be part of the Aura satellite, launching in January 2004. It will be addressing a broad range of climate change issues. For more information go to the MLS site.|
The MLS has also boosted its own local computing power. The team members were already outfitted with Linux workstations with dual 1GHz Pentium III processors and 2 GB of RAM. Since these systems are on same subnet, the 25 workstations were set up using homegrown software to act as a single parallel computing system. Applications are developed in house using Fortran 95.
Given the high-intensity computing needed to convert the raw satellite data into usable information, however, that still wasn't enough power. Rather than adding a mainframe, the MLS team decided to add a cluster using the same software it was using to run the grid.
"We were quite pleased with the results we had gotten with our grid," says Navnit Patel, a contractor from ERC Inc. who operates as the team's Science Computing Facility Manager.
"We also wanted to have clusters of nodes acting as a parallel system so that if one more client node went down it wouldn't affect the cluster," he said.
A key point in selecting a vendor for the cluster was that it had to operate seamlessly with the software already in place. The cluster would be supplementing rather than replacing the existing workstation grid, so JPL wanted it to run on the same distributed computing software. After testing the units from several manufacturers, they awarded the contract to IBM.
The new hardware consisted of 64 IBM X330 series 1U rack-mount servers, model 8674-11x, with dual 1 GHZ CPUs and 2 GB RAM. The operating system was Red Hat Linux 7.3. IBM had already installed and configured the hardware into three racks and had conducted extensive burn-in tests on all the systems.
An IBM technician came out to JPL to finish the installation. Since all the wiring had already been done at the factory, he just had to connect the three racks to the master node, install the management software and configure it master node and the clients.
"It was quite a smooth operation," says Patel.
The users have access to master node right from their desktops. The user tells the master node how many nodes or CPUs that he wants to use to get the job done and the master node then allocates the job to the appropriate number of available client nodes.
"Everyone is using this cluster to its maximum capacity," says Patel. "So far we are happy with the performance of the cluster."
Of course, as Patel's "maximum capacity" and "so far" imply, an upgrade it on its way. JPL has just ordered an additional 64 client servers for the cluster. This time they are coming from Linux NetworX Corp. (based in Salt Lake City) and come with dual 2GHz processors. Once these arrive, they will work together with the existing IBM servers, making a total of 256 CPUs available.
Patel attributes the success of his cluster to requiring the equipment be benchmarked with the software that would actually run on it. This is particularly applicable in a situation like JPL's, in which all the software was written by the MLS scientists.
"The best advice I can give is to insist on a successful benchmark," he says. "Any software you intend to run on cluster should be benchmarked."