Saturday, May 15, 2021

IBM To Help Build Massive Data Grid To Recreate ‘Big Bang’

IBM is joining the CERN openlab for DataGrid applications to help create a massive data-management system built on Grid computing.

IBM’s storage virtualization and file management technology will play a key role in the collaboration, which aims to create a data file system “far larger than exists today” to help scientists at the renowned particle physics research center understand some of the most fundamental questions about the nature of matter and the universe, according to IBM and CERN, the European Organization for Nuclear Research.

Conceived in IBM Research, the storage virtualization technology, known as Storage Tank, is designed to provide scalable, high-performance and highly available management of huge amounts of data using a single file namespace regardless of where or on what operating system the data reside.

IBM and CERN will work together to extend Storage Tank’s capabilities so it can manage and provide access from any location worldwide to the unprecedented amounts of data – billions of gigabytes a year – that CERN’s Large Hadron Collider (LHC) is expected to produce when it goes online in 2007. The LHC is the next-generation particle accelerator designed to recreate the conditions shortly after the Big Bang to help researchers understand the initial seconds when the universe was formed.

The CERN community – which is credited with inventing the World Wide Web in 1990 – hopes to push the Internet even further with Grid computing and the massive data processing requirements for the LHC. CERN openlab is a collaboration between CERN and leading industrial partners that will create and implement data-intensive Grid-computing technologies to aid LHC scientists. Because the same issues facing CERN are becoming increasingly important to the IT industry, the CERN openlab and its partners – which include Enterasys, HP and Intel – are working together to explore advanced computing and data management solutions.

By 2005, the CERN openlab collaboration with IBM is expected to be able to handle up to a petabyte (a million gigabytes) of data.

“We are delighted that IBM is joining the CERN openlab for DataGrid applications,” said Wolfgang von Ruden, Information Technology Division leader and head of the CERN openlab. “Together with IBM, we aim to achieve a one petabyte storage solution and integrate it with the Grid that CERN is building to handle the extreme data challenges of the LHC project.”

“CERN’s scientists and colleagues want to be able to get to their data wherever it may be – local or remote and regardless of which operating system on which it may reside,” said Jai Menon, a fellow at IBM’s Almaden Research Center in San Jose and co-director of IBM’s Storage Systems Institute joint program between IBM Research and the company’s product division. “This is the perfect environment for us to enhance Storage Tank to meet the demanding requirements of large-scale Grid computing systems.”

As part of the agreement, several leading storage management experts from IBM’s Almaden and Haifa (Israel) Research Labs will work with the CERN openlab team. IBM will also give CERN the system’s initial 20 terabytes of high-performance disk storage and six Linux-based servers, and IBM Switzerland will provide additional support.

Storage Tank employs policy-based storage management and includes clustering and specialized protocols that detect network failures to enable very high reliability and availability.

In this initiative, IBM is following a collaboration strategy initiated in 2001 with the European Union-sponsored European Data Grid project, which is also led by CERN.

Similar articles

Latest Articles

How IBM has Changed...

Think is IBM’s big annual conference, and again this year, it was digital. I’m noticing a sharp quality difference in shows like this where...

Database-Tuning Platform Launches and...

PITTSBURGH — A team out of Carnegie Mellon University is launching its automatic database-tuning product today with the help of $2.5 million in funding.   OtterTune,...

Top 10 Professional Services...

Professional services automation (PSA) software aims to offer service-based companies most of the software they will need to run their businesses in one package....

What is Data Aggregation?

Data aggregation is the process where raw data is gathered and presented in a summarized format for statistical analysis. The data may be gathered...