Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business
The filesystem of choice for many of the top 500 supercomputers is the open source Lustre filesystem. Lustre started off as a Sun Microsystems effort and is now under the stewardship of Oracle. Though Oracle is the leader of the Lustre open source project, other vendors also participate in Lustre development. One such vendor is startup Whamcloud.
Oracle has had somewhat mixed success with its leadership of open source efforts that it acquired from Sun, with forks of OpenSolaris and OpenOffice now in progress. According to Whamcloud, with Lustre Oracle is playing nice with its open source community as development work continues to move the project forward.
"Oracle has all of the current gatekeepers for Lustre," Brent Gorda, CEO of Whamcloud told InternetNews.com. "We have hired some former Oracle staff and one of them was the gatekeeper for the Lustre 2.0 version that was just released."
Gorda added that Whamcloud is collaborating with Oracle as closely as possible. He noted that his company has a number of former Oracle employees that are still on good terms with current Oracle staff.
"So when we embark on a development project we do it in full visibility of the geeks at Oracle," Gorda said. "They know what we're working on and they know what to expect in terms of code contributions."
The contributions that could be coming from Whamcloud are noteworthy for a number of reasons. At the top level, Whamcloud is working with both Lawrence Livermore and Oak Ridge U.S. National Laboratories on Lustre support and customizations for some of the most powerful computers on the planet.
"Whamcloud has come together to focus on Lustre and its applicability and scalability for the high-performance computing marketplace," Gorda said.
Gorda noted that Oak Ridge National Laboratory had the number one supercomputer in the world until it was unseated by a Chinese competitor last month. He added that the Lustre filesystem is one of the key components needed by Oak Ridge to make supercomputers work.
At Oak Ridge, Whamcloud is helping the national lab to build a petascale system that can be more efficient. With Lawrence Livermore National Laboratory, Whamcloud is also working on performance related issues for Lustre deployments.
One of the key problem areas that Whamcloud is working on has to do with the inclusion of solid state SSD storage.
"Will solid stage storage help absorb the data deluge that these systems produce?" Gorda said.
Gorda explained that high performance computing applications run in a bulk synchronous manner. With bulk synchronous, all of the processors go through a lock step mechanism where they will compute, hit a barrier and then do all of their I/O at the same time.
"So you can imagine on a big system with several hundred thousand CPUs, if a big system stops and does a check point there are a few hundred thousand clients jumping on the filesystem and all are trying to do a large write at the same time," Gorda said.
Gorda added that a key feature of Lustre is that it has been architected so it can withstand the massive flood of data.
Building tools for Lustre is another area that Whamcloud is working on.
"One of the contracts we have with Lawrence Livermore National Laboratory is to take the community based LMT (Lustre Monitoring Tool), and bring it into the Lustre code base itself," Gorda said. "One of the weak points for Lustre against competitors in this space is a lack of tools, so we're looking to help out with that."
The open source Lustre project is currently at the 2.0 release, though Gorda noted that the vast majority of users are either on the 1.8 or 1.6 release.
"Lustre 2.0 is mostly a reference release at this point and Lustre 2.1 will be coming out soon," Gorda said. "Lustre 2.1 is the next significant release that people are expected to jump on."