Tuesday, March 19, 2024

Putting the Hurt on Legacy Data Infrastructure

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Legacy data infrastructure may finally be dying out. As Cloudera co-founder Mike Olson has declared, “No dominant platform-level software infrastructure has emerged in the last ten years in closed-source, proprietary form.” Despite this obvious shift toward new-school open source data infrastructure (MongoDB, Apache Hadoop, Apache Kafka, etc.), the old-school data infrastructure giants, from Oracle to HPE Vertica to IBM, continue to mint billions of dollars in revenue (and profits).

There are signs, however, that the legacy data gravy train is slowing down.

Oracle, IBM, Teradata, and HPE, for example, have missed earnings more than they’ve hit them over the last several years. Developers, freed from IT’s death grip, have turned to cloud services to run open source data infrastructure built for modern applications. While inertia has slowed the rot, developers are pushing enterprises into a new era of data, as conversations with some of the leading data infrastructure projects reveals.

From my cold, dead hands!

To be clear, the venerable relational database has been one of the best things to happen to technology. Relational technology unshackled developers from IBM’s IMS (and later IDS). These early databases offered high performance but pushed developers to plan out both schema and query design from the start, making it hard to change the application mid-stream.

SQL, the heart of the RDBMS, broke this link between schema and query design, giving developers the ability to focus on schema design with the confidence that they could structure their later queries as they wished. It was an exceptional advance.

But it also happened thirty years ago, well before many of today’s developers were born.

For IT professionals and database administrators who grew up declaring, Charlton Heston-style, “I’ll give you my [RDBMS] when you pry it from my cold, dead hands,” a moment of truth is rapidly approaching. According to the godfather of the RDBMS, Michael Stonebraker, legacy databases (which he more than anyone else helped to create) were built for “business data processing,” but modern data is “of a much broader scope,” requiring fundamentally different approaches to data.

Even so, by IDC’s estimate, this shift is taking some time:

hadoop, nosql

The reasons? Quite simply, old habits die hard. Those habits are particularly hard to kick given that, as the UK government’s former deputy CTO James Stewart told me, the legacy vendors enjoy “Massive lock-in” built upon “Large business process outsourcing” and “Entire supply chains [that are] integrated via their tools.” There is, he continues, “Incredible fear of changing the fragile businesses that result.” It doesn’t hurt, as Made Tech founder Rory MacDonald highlighted to me, that legacy vendors like Oracle often have senior executive buy-in, crimping the choices that developers would otherwise select.

Of course, sometimes RDBMS is the right tool for the job. Though it’s changing (see below), much of today’s enterprise workloads are transactional in nature, a good fit for the RDBMS. It also hasn’t helped that NoSQL and other modern data infrastructure have been slow to deliver tooling equal to or better than their legacy RDBMS counterparts.

If it keeps on raining, the levee is going to break

This inertia, however, won’t be able to hold back the tide toward dramatically better performance. Not if Stonebraker is to be believed:

“There’s no question that with Oracle, the customers are dug in pretty deep in the traditional systems, but…there is two orders of magnitude performance difference to be had with other technology approaches, and sooner or later that will be significant. It may take a decade or longer for the legacy stuff to actually die away — there’s still a lot of IMS data in production in the real world! — but sooner or later it will get replaced.

[If] you want to do 50 transactions per second, it doesn’t matter what technology you use, you can use whatever you want. But if you want to run 50,000 transactions per second, your current implementation is simply not going to do it. Sooner or later, you are going to be up against a technology wall that will force you to move to new technology, and it will be completely based on return on investment.”

For some, it’s already happening. As MongoDB’s vice president of cloud, Sahir Azam told me, “There’s a move away from legacy RDBMS that is accelerating.” According to Azam, MongoDB (Disclosure: I worked for MongoDB from 2012 to 2014) is seeing this momentum away from legacy RDBMS for both application refreshes and greenfield application development, netting the company 3,000 customers across a wide range of industries, including over half of the Global Fortune 100.

While some of these customers are departmental deployments, in talking with Azam he confirmed my previous experience: small deployments targeting non-mission critical applications that often scale up and out, from single servers to clusters with over 1,000 nodes, running significant, mission-critical applications. Azam’s contention is supported by DB-Engines’ database popularity trends.

Importantly, MongoDB is doing something that the legacy vendors have not done, and perhaps cannot do: it’s cloudifying its database. Dubbed Atlas, MongoDB’s database-as-a-service offering is part of a broader shift to “erase previously distinct boundaries between application and database when used in conjunction with services like Lambda,” as Redmonk analyst Stephen O’Grady points out. It’s a double sucker punch against the RDBMS establishment, which struggles both to manage big data and in a cloud context.

Similarly, newly funded Dremio is doing its part to put vendors like Teradata and HPE Vertica on shaky ground. Dremio does much the same thing to data as AWS did for infrastructure: making it self-service. Upending the Teradata’s and HPE Vertica’s of the world, Dremio helps data scientists and BI professionals route around IT to get access to their data at Vertica speeds without paying Vertica prices (or siloing data in Vertica or other systems). As Dremio CMO Kelly Stirman told me in an interview, “Every tool for working with data assumes that all the data is in one high-performance database, but this isn’t the case.” Open source Dremio kills the need for legacy ETL, data warehouses, cubes, and aggregation tables.

Old dogs, new tricks

Which brings us back to the old guard, stoutly defending their billions in profits with “overhyped marketing,” as Jefferies analyst James Kisner says of IBM Watson. It has always been the case that enterprise technology companies oversell their products, but they’ve never had to contend with the twin forces of open source and cloud, both of which empower developers to see beyond the marketing spiel and try the industry’s best new tech for comparatively little.

It is absolutely true, as former MySQL executive Zack Urlocker suggested to me, that “Databases are sticky [because for] DBAs or devs there is no upside to replacing a working Oracle DBMS.” On the other hand, “[Y]our career is toast if an upgrade fails.” This, again, is a clear reason for the industry to move slowly outside its legacy data infrastructure comfort zone.

That’s true…to a point.

The point, as Stonebraker insists, is when the new tools so dramatically outperform the old guard that sticking with the legacy vendor almost becomes a firing offense. As Wall Street analyst Peter Goldmacher has said, open source and cloud offerings are driving CIOs to “the realization that not only are these products cheaper, but they are more flexible and better suited to many legacy workloads, and they offer end users the optionality to expand project scope beyond the constraints associated with legacy product, whether those constraints are cost or capabilities.”

At other times he has been even more blunt: “Every segment of Oracle’s business is under attack from a new generation of technology providers that offer equivalent or better technology at a lower cost.”

Legacy vendors like Oracle are seeing their data infrastructure products increasingly supplanted by more modern tools that are better suited to big data. For years these enterprise titans have held off the flood by tightly aligning with their CIO counterparts, but such cozy relationships are increasingly destabilized by developers tapping into open source software running in the cloud. This doesn’t mean Oracle et al. are doomed – they have the cash to buy a bevy of startups to strengthen their technology portfolios. But it does mean they’re playing defense, not offense, for perhaps the first time in 30 years, and customers stand to benefit.

Matt Asay is VP of mobile at Adobe.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles