SHARE

Is There a Big Data Bubble?

Are we on the brink of another Netscape moment? Back in 1995, Netscape’s IPO started inflating the dotcom bubble. Before Netscape, for all but a small cadre of programmers, computers were glorified word processors. Sure, some people used them for bookkeeping or to play solitaire, but word processing was the bread and butter for general […]

Written By

Jeff Vance

Feb 25, 2014

8 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Are we on the brink of another Netscape moment? Back in 1995, Netscape’s IPO started inflating the dotcom bubble. Before Netscape, for all but a small cadre of programmers, computers were glorified word processors. Sure, some people used them for bookkeeping or to play solitaire, but word processing was the bread and butter for general users.

Then along came Netscape, sparking an Internet gold rush and turning San Francisco, once again, into a get-rich-quick boomtown. The parallels between Big Data and the dotcom era aren’t perfect. As Mark Twain said, “History doesn’t repeat itself, but it does rhyme.”

So, sure, there is no Netscape-equivalent startup in Big Data (although VC money is pouring into Big Data startups). And whereas the birth of the Web felt like we went from nothing to something overnight (even if that wasn’t really true), Big Data already has well-known ancestors, such as data mining, distributed computing, Business Intelligence (BI), etc.

One big parallel is certainly rhyming, if not repeating, however: democratization. The democratization of data will change how businesses decisions are made, who makes them, and who gains (and loses) power from this shift.

What will it mean to democratize data?

Despite the fact that telcos in the U.S. are exerting monopoly power over Internet access, the Internet was certainly a force that democratized information.

In the browser era, people could now read newspapers from all over the world without leaving the house. Information previously locked in arcane journals became available through a simple Web search. And e-commerce sites started to displace retailers, allowing people to hunt low prices in many places other than Walmart.

Granted, much of this democratization resulted in next to nothing. Amazon is the behemoth that Walmart used to be, while Walmart, when not fending off bad PR, is investing heavily in Big Data to bolster its online presence.

Similarly, I can certainly read The Times of London if I want to, but do I?

Rarely.

In fact, one-third of adults in the U.S. now gets news from one source: Facebook.

What this means is that rather than being inundated with news that may be completely foreign to you and may cause you to rethink your worldview, now most of your news is delivered by sites that you have “liked,” and most of it just reinforces what you already believe.

In fact, as the Web moves further and further down the personalization rabbit hole, many of the Big Data insights that companies like Facebook and Google have made mean that we see less and less of the Web each day. Instead, searches, social media streams, and even news sites now serve up things that we’ve expressed an interest in before.

The whole, wide, wooly Web is still out there. You just never see it.

Will this same sort of phenomena happen with Big Data? Will data democratization just give business decision makers more ammo for what they’ve already decided?

David Chaiken, CTO of Big Data infrastructure startup Altiscale, doesn’t think so. Chaiken worked at Yahoo! when the Big Data platform Hadoop was originally developed.

“When I started at Yahoo!, stacks of data were locked away in many different parts of company. Often, access to that data meant getting on a plane and flying to a different site,” he said. “The promise of Big Data is that you can now break down those siloes and democratize the access to data.”

It’s been a goal of IT to break down data siloes for ages, but then what? Just because you can see the data doesn’t mean you can do anything with it. But to Chaiken that doesn’t matter. The simple act of opening up data is huge, he believes.

Chaiken invokes Metcalfe’s Law to drive this point home. “Simply unlocking data creates a network effect for that data. You can know take some anonymous customer data, open it up, discover trends, and all sorts of networking effects start to happen across the organization.”

Chaiken gave the example of a water utility discovering a leak just because a usage spike showed up on some customer’s bill.

From the known unknowns to unknown unknowns

Examples such as the water utility one represent the low-hanging fruit of Big Data, and that’s probably what we’ll see for the next few years. In other words, water utilities have always wanted to spot leaks early, and someone in billing could have had an a-ha moment, realizing that if you compared billing stats with past service calls and found a high degree of correlation, you may be onto something predictive.

Before Big Data came along, though, that billing person could have still done the analysis, but it would have taken so long that most people would give up before even trying.

There’s certainly value in Big Data insights that are of the obvious variety, but where Big Data gets interesting is when it surfaces strange ideas that would have occurred to exactly no one if data patterns didn’t point the way.

For instance, predictive analytics startup Kaggle found that if you’re going to buy a used car, you should get an orange one. Why? This is just Kaggle’s educated guess, but they believe that people buying orange cars are doing so as a means of self-expression, and, therefore, they tend to take better care of their cars. So I guess orange is now the opposite of a lemon.

And the “why” doesn’t matter. If the orange data holds up over time, you’ll want to buy orange used cars, whether or not the why is understood.

Other examples include Google finding that search queries about the flu are a quicker way to predict where the flu is spreading than previous methods (such as hospital admission records); and data analytics firm Evolv learning that employees with a criminal record actually perform slightly better in the workplace than everyone else.

One of the places “unknown unknowns” are a major problem is security. There’s a reason zero-day threats are such a concern in the security community: if you don’t know what it is, how can you block it?

However, by applying Big-Data-driven pattern recognition to threat analysis, security companies are quickly closing the zero-day window.

“When a security incident happens, one question customers always have is about what happened in the moments leading up to a security event. In the past, we couldn’t always tell them, at least not right away,” said Mike Hrabik, president and CTO of managed Security Service Provider Solutionary.

A problem legacy security systems have is that they can’t pull in both structured and unstructured data into a single platform for analysis. Thus, when an event happens, it could take days or weeks of forensics work to figure out what happened.

To address this problem, Solutionary deployed a Hadoop platform from MapR and Cisco’s Unified Computing System for high-performance computing.

Hadoop has significantly increased the amount of data analysis and contextual data that Solutionary can access, which provides a greater view of attack indicators and a better understanding of attackers’ goals and techniques. This capability also enables Solutionary to more quickly identify global patterns across its client base.

If new threats are discovered, Solutionary can now detect and analyze activity across all clients within milliseconds. With the previous environment, even this seemingly simple task would be considerably more difficult and costly to do, taking as long as 30 minutes even with a preplanned, optimized environment.

“In past, due to the sheer amount of data, our analysts were often limited to examining log data, which misses a lot. Now, our experts are unshackled. They can see the big picture and can put the incident into context,” Hrabik said.

Big-Data-driven innovation

Big Data is teaching us more about ourselves each and every day. Facebook may now know when you’ll be entering a romantic relationship before you do, simply based on your online activities. After the Super Bowl, Pornhub proudly told us what we already could have guessed: the fans of the losing team had to find other ways to entertain themselves, since the party for them was over. Less frivolously, analysis of search engine queries can now discover unknown drug side effects.

Many of the Big Data insights you’ll hear about on the news are just noise. They’re publicity tools that add little meaning to our lives.

But that probably tells us more about ourselves than about the capabilities of Big Data. (The Pornhub study showed up everywhere. Conversely, I don’t recall a single mention of the drug study; I just learned about it doing research for this story.)

Moreover, Big Data is already spawning startups looking to do everything from identify the key influencers in social networks (in order to focus marketing efforts on that person) to understanding social value in gaming.

The main difference I see between the dotcom boom and the Big Data one is that most dotcom companies targeted consumers first (as did laptops, WiFi, smartphones, tablets, etc.). The Consumerization of IT trend has taught us that consumer dollars are often easier to capture than business ones, especially for new, unproven tech products.

Big Data is the opposite. There are no real Big Data tools for consumers. The game is in the enterprise, yet the enterprise is far more cautious than your average consumer.

What will this mean?

My guess is that it means Big Data will evolve more slowly and sanely than many preceding trends. Fewer science projects will get funded. We’ll skip a dotcom-sized bubble, and companies will soon know a heck of a lot more about us than we know about them.

History may not be repeating, but I can certainly hear plenty of rhymes.

##

Jeff Vance is a technology journalist based in Santa Monica, California. Connect with him on LinkedIn, follow him on Twitter @JWVance, add him to your cloud computing circle on Google Plus

Photo courtesy of Shutterstock.