By Matt Asay
For all its hype as technology’s next big Gold Rush, few Big Data vendors seem to be making money. Oh, sure, we’ve seen companies raise several hundreds of millions of dollars to make a few hundreds of millions of dollars, but none of the biggest vendors of Hadoop, Spark, and other big data essentials is anywhere near profitability. For the most part, they’re actually getting farther from it.
This isn’t how a Gold Rush is supposed to work. Either a few scattered folks get immensely rich or those selling pickaxes and shovels make a killing. In Big Data Land, however, virtually no one seems to be cashing in.
Who is making big data bucks?
Take a walk through the income statements of a variety of big data vendors and they’re swimming in red ink:
● Cloudera: $261M in revenue, $187M in losses (down from $205M the year before, the only company to narrow its loss)
● Hortonworks: $184M in revenue, $251M in losses (up from $180M the year before)
● Alteryx: $85M in revenue, $24M in losses (up from $21M)
● Splunk: $950M in revenue, $355M in losses (up from $279M)
● Tableau: $827M in revenue, $144M in losses (up from $84M)
These losses keep mounting even as enterprises express frenzied interest in taming big data to tap into this “next frontier for innovation, competition, and productivity,” as McKinsey & Co. styles it. If there are profits to be made in big data, apparently we need to look elsewhere, at least for the next few years.
To be clear, vendors were never likely to be the big winners in big data, anyway. Wall Street analyst Peter Goldmacher hammered this point home five years ago on Cloudera’s own blog: the “biggest winners in the Big Data world aren’t the Big Data technology vendors, but rather the companies that will leverage Big Data technology to create entirely new businesses or disrupt legacy businesses.”
A year later he followed this up with a longer research note, arguing:
"We believe Hadoop is a big opportunity and we can envision a small number of billion dollar companies based on Hadoop. We think the bigger opportunity is Apps and Analytics companies selling products that abstract the complexity of working with Hadoop from end users and sell solutions into a much larger end market of business users. The biggest opportunity in our mind, by far, is the Big Data Practitioners that create entirely new business opportunities based on data where $1M spent on Hadoop is the backbone of a $1B business."
While acknowledging vendors like Cloudera, Hortonworks, DataStax, and others could build impressive businesses selling otherwise open source software, it was the mainstream enterprise with the most to gain:
"The biggest category of winners is the Big Data practitioners. These are the business people that have identified opportunities to use data to create new opportunities or disrupt legacy business models. We think this opportunity is so profound, we believe that the dividing line between winners and losers in the business world over the next decade will hinge on a company’s ability to leverage data as an asset."
And yet we continue to wait for this to happen. There are pockets of success, with companies like Uber building billion-dollar revenue streams by using data to upset tired industries, and Amazon poised to roil the grocery business by applying its data magic to stodgy offline data, but for the most part companies remain sidelined by big data. To the extent that companies have made meaningful strides in putting data to work, they have tended to be new-school vendors that are inherently data-driven.
You can’t, in other words, buy a big data clue. It has to be part of your company’s DNA.
Turns out, it’s hard
Countless surveys have identified broad corporate ambition to derive “actionable insights” from copious quantities of data, yet a paltry 15% or less have actually managed to get into production, according to Gartner analyst Nick Heudecker.
This number has barely budged over the years. Gartner found just 14% of companies surveyed had big data projects in production in 2015, virtually unchanged from the year before (and inching up to 15% in 2016). According to Heudecker, “Investment in big data is up, but...showing signs of slowing growth with fewer companies having a future intent to invest.” The Gold Rush may be panning out as bronze.
There are at least two major reasons for this distinct lack of big data success: the first remains hazy notions of what, exactly, organizations hope to achieve with their big data initiatives. The hype around big data has been so cacophonous that many organizations have simply rushed into investments without having a clear idea as to what an acceptable return on these investments would look like. For years Gartner’s top-two obstacles to big data success were “Determining how to get value from big data” and “defining our strategy,” for which obstacles remain intransigent today.
The other major reason for big data failure is people. Most organizations simply aren’t set up culturally or organizationally to succeed, wishing themselves agile and hoping the data silos will disappear. Neither works.
These problems are compounded by the difficulty inherent in finding solid data science talent. Years ago Mitchell Sanders laid out a roadmap to the archetypal data scientist: someone who combines domain knowledge (i.e., they understand their particular vertical industry), math and statistics expertise, and programming skills. Ben Lorica and Mike Loukides add even more detail to the job description:
"Whatever the role, data scientists aren’t just statisticians; they frequently have doctorates in the sciences, with a lot of practical experience working with data at scale. They are almost always strong programmers, not just specialists in R or some other statistical package. They understand data ingestion, data cleaning, prototyping, bringing prototypes to production, product design, setting up and managing data infrastructure, and much more."
Given the difficulty of finding people with strength in just one of these skills, it’s not surprising Lorica and Loukides dub them “the archetypal Silicon Valley ‘unicorns’: rare and very hard to hire.” With unclear direction and the brutal difficulty of finding data science talent, it’s not surprising that so few organizations have managed to successfully deploy big data projects.
Big data’s silver lining
Still, as referenced above, there are companies making significant money from big data. However, they tend to look like tech companies, not mainstream retailers, media companies, etc. Facebook uses data to wreak havoc on media giants; Google is dismantling the advertising behemoths; and Amazon, well, Amazon is laying waste to most everyone.
In an effort to remake themselves in the image of the Googles of the world, mainstream enterprises have turned to big data vendors for help. Cloudera and Hortonworks collectively made over $400 million last year selling support subscriptions for open source software to these would-be Googles. This feels like a stopgap as such enterprises develop internal big data expertise.
Indeed, notes Paul Ramsey, the revenue flowing to big data software vendors may not have a long shelf-life, given their role as packagers and supporters of open sourcde software. What I once dubbed “the open source dilemma,” Ramsey calls a “market trap”:
"In practice, instead of being long term recurring revenue, the big customers end up being short term consulting gigs. A deal is signed, the customer’s team learns the ropes, with lots of support hours from top level devs on your team, and the deployment goes live. Then things settle down and there is a quick scaling back of support payments: year one is great, year two is OK, year three they’re backing away, year four they’re gone."
Perhaps the timing here will differ, given the complexity inherent in running Kafka or Spark at scale, but the pressure is real. It’s made worse by Amazon Web Services, Microsoft Azure, and Google Cloud Platform turning big data software into more easily consumed cloud services. The terrestrial data vendors have responded by transitioning more of their businesses to cloud, but it’s hard to imagine betting against AWS and winning.
Besides, if Ramsey is correct, the more familiar enterprises become with big data infrastructure, the more likely they’ll be to roll-their-own, eliminating the need for vendors altogether. This is the promise originally promulgated by Goldmacher, and it remains the optimal end goal for most enterprises: turn data into a real asset, one that isn’t dependent on third-party software vendors to achieve.
Squeezing out the middleman
Ultimately, we may see the market split between enterprises rolling their own big data projects as internal skills improve, coupled with offloading workloads to the public clouds, as convenience and data gravity dictate. Even where enterprises get comfortable with data, they may determine that the advantages of elastic infrastructure outweigh other reasons for running it in-house. As AWS general manager Matt Wood told me,
"Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on. You need an environment that is flexible and allows you to quickly respond to changing big data requirements."
The more traditional big data software vendors are sprinting to respond by becoming more cloudy, but they’ll have to spend big on infrastructure to achieve it, driving potential profits even further into the future. For those investors waiting to count their big data shekels, in short, the Gold Rush may have already come...and gone.