NASA has just revamped the storage environment at the NASA Advanced Supercomputing (NAS) facility at Ames Research Center in Mountain View, Calif. This revamp consists of a new 450 TB storage network working in tandem with the massive Columbia supercomputer, which runs on Intel Itanium 2 and Linux.
“All of the storage is within one big SAN fabric,” said Bob Ciotti, Terascale Systems Lead at NASA. “We now have 200 TB of data on Fibre Channel and 250 TB on SATA.”
The various system elements include:
- 2 Silkworm switches from Brocade Communications, each with 128 ports
- Disk drives from Engenio Information Technologies
- Memory technology from Dataram Corporation and Micron Technology.
- An InfiniBank 288-port switch from Voltaire
- The 10,240-processor Columbia supercomputer built from 20 Silicon Graphics Inc. (SGI) Altix systems, each powered by 512 Intel Itanium 2 processors. Columbia
- A 450 TB SGI InfiniteStorage SAN
- A further 800 terabytes of existing data managed by Data Migration Facility, SGI’s InfiniteStorage data lifecycle management solution. This consists of a storage server and a tape farm composed of StorageTek tape silos
- CXFS shared file system from SGI as the software layer that sits on top of the two operating systems running – IRIX (a UNIX variant) and Linux.
“The shared file system handles the access of meta data through a proxy system so it is possible to read and write to FC storage directly,” said Ciotti.
Why Itanium 2 and Linux?
Why base the platform on Intel Itanium 2 processors and Linux rather than traditional RISC chips and UNIX? According to Ciotti, Itanium 2 has now reached the point where it can compete well with RISC.
“We get high bandwidth off out Itanium 2 chips,” he said. “It also has a large cache that suits our applications.”
Itanium 2 is based on the EPIC (explicitly parallel instruction computing) architecture. These chips come with Level 1 cache of 32KB, 256KB L2 cache, up to 9 MB L3 cache and clock speeds between 1.3 and 1.6 GHz. They are based upon the Intel E8870 chipset, system buses are either 400 or 533 MHz, and I/O bandwidth is 6.4 GB/sec.
Continued on Page 2.
The operating system for Columbia is Red Hat Linux 2.4. Previously, NASA has been used IRIX for supercomputing on SGI Origin servers. It moved to Red Hat when SGI brought out its more powerful Altix server line that runs on Linux. SGI are now shifting to SUSE Linux and NASA will follow suit.
“Linux doesn’t yet have all the features of IRIX quite yet but it is getting there rapidly,” said Ciotti. “Linux is maturing more rapidly than IRIX.”
A total of 20 SGI Altix servers make up Columbia. NASA’s supercomputer includes eight of the latest Altix models. This is known as the Altix 3700 Bx2. This version doubles the processor density and available bandwidth. As well as Linux and Itanium 2, it has 9 MB cache, 64 processors per rack, and scales to 256 processors in a single system image (SSI).
“Such a large SSI creates really good environment for code development,” said Walt Brooks, division chief, advanced supercomputing division, NASA. It offers low latency and is user friendly.”
SGI NUMAlink 4 interconnect technology has been introduced to Altix produces 6.4GB/sec. Altix also uses a shared memory architecture called NUMAflex that overcomes bottlenecks. All nodes operate on one large shared memory space that eliminates data passing between nodes. Big data sets can fit entirely in memory and less memory per node is required.
Another feature of the storage environment is enhanced system cooling. The eight Altix 3700 Bx2’s have double the density of previous models posing worries about overheating.
“As we were stretching the limits of cooling, the server doors are specially designed to have cool air fed through them,” said Brooks. “Chilled water is brought in and coupled into radiator loops within the door to keep them cool.”
Columbia recently smoked the competition in recent LinPack benchmark tests. It achieved 42.7 trillion calculations per second, making it the fastest in the world at the time.
“The science that already has been produced has been extraordinary,” said Brooks. “Instead of scientists queuing up for supercomputing resources, we now have the enough capacity for everybody.”
He gives an example of simulations showing a decade’s worth of changes in ocean temperatures and sea levels that used to take a year to model. But using the modernized storage environment, scientists can now simulate this in a day or two.