Monday, September 16, 2024

IBM Spectrum Discover: AI at Scale

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

This week IBM made a massive storage announcement covering a variety of impressive offerings. This announcement included huge upgrades to its flash solution (including a noteworthy entry-level offering), Non-Volatile Memory Express over Fabrics (NVMe-oF) updates, a very interesting storage gridlock tool, a bunch of tape updates and a series of new and updated SAP HANA solutions.

But the most interesting to me was IBM Spectrum Discover. It is the first use of AI at scale that I’ve seen (I don’t see everything) using AI to speed critical data research.

Pivot to Importance

Over the last decade I’ve watched a number of powerful IT types argue that that the industry’s focus on big data was wrong headed. The most compelling talk I saw was by President Obama’s former election IT manager, who waxed eloquent on why this focus on collecting massive amounts of data, before really understanding what to do with it, was a colossally bad idea.

It led to the creation of huge repositories, like the one the NSA built in the middle of the country that doesn’t appear to be doing much more than running up huge electrical bills at the moment. The lesson was that the focus should have always been on what was going to be done with the data rather than the collection of it.

My own experience, both in market research and internal audit, is that it is better to have small samples of data but strong analysis, than it is to have overwhelming amounts of data you can’t parse and effectively analyze.

The goal is to have an actionable result — not to create an enormous stinking pile of unstructured data that is equally expensive and can’t be analyzed. But thanks to the effort of “big data,” I’m aware that there are a lot of enterprises and government organizations like the NSA that are now in this mess.

IBM Spectrum Discover

IBM Spectrum Discover is a brand-new AI driven product that appears to be specifically targeting this problem. It has the ability to go into one of these colossal and largely unmanageable data repositories and fix them so they can be properly analyzed. This offering, which came out of IBM Research (indicating it is cutting edge and likely unique), enhances and then leverages the metadata surrounding each element at scale. It does this by rapidly ingesting, consolidating and indexing this metadata for the billions of files in these immense repositories so that the data can then either be sampled and analyzed or potentially analyzed as a whole. (This last tends to be costly and unnecessary if a sample will accomplish much the same result.)

What is also fascinating about this offering is that it initially doesn’t just support IBM Cloud Object Storage and IBM Spectrum Scale, but it will also support Dell EMC Isilon in 2019. IBM, of late, has become far more aggressively hardware platform-agnostic, which means even those that haven’t invested in IBM’s own storage solutions will increasingly benefit from existing and future IBM tools like IBM Spectrum Discover.

Big Data and Big, Bad Decisions

The intense focus on big data around a decade ago led to some really bad decisions with regard to priorities. The biggest was the excessive focus on collecting data, which should have been preceded by a focus on what the heck you were going to do with it. This resulted in some huge unstructured data repositories that are wicked expensive and aren’t providing much value now. IBM’s Spectrum Discover appears to be uniquely targeted at that problem, making it the most important part of the massive IBM announcement that came out this week.

I think this also highlights a common bad practice of not fully thinking through a technology solution when you buy it. Vendors often want to focus you on buying more hardware and software, but your focus needs to be on what will result. Having an enormously expensive data repository you can’t effectively analyze is really not a ton better than not having one — and given the cost, I’d argue it could be a ton worse.

Information is always more important than raw data and should always remain in your sights when making a decision in this area.

Photo courtesy of Shutterstock.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles