Also see: Comparing Big Data Solutions: 9 Tips
Big Data has become the newest and hottest means for obtaining insight into your customers and finding new ways to compete in the marketplace. But comparing Big Data solutions to chose the best one is no trivial task, and once you commit, changing course becomes very difficult and expensive.
The answer to the question of what solution to choose is indeed, remarkably complicated. The product offering for Big Data solutions has grown rapidly in the short time since Yahoo first released Hadoop in 2009 and made Big Data possible.
In comparing Big Data solutions, the first question you need to ask is the most basic one: do I even need this? Joshua Greenbaum, principal analyst for Enterprise Applications Consulting, says yes, it's not even up for debate.
"The answer is honestly yes, you need it," he said. "There is no shortage in any modern enterprise of big analysis questions to be solved. Regardless of your business, you are living in a global economy that is much more time sensitive and customer and quality sensitive than it has ever been. If you do not think you need to be a metrics-driven competitor, you are missing a big boat."
Don't ask what product you need, ask what problem you have to solve, argues Charles King, principal analyst with Pund-IT. "Conducting best/worst case assessments of a solution's potential impact on their business and technology organizations would be a better place to begin. So start with what they hope to achieve, what's required for that purpose, what it will cost in the short and long term, how likely it is to succeed and what to do in case of failure," he said.
Merv Adrian, research vice president with Gartner, is on the side of asking: do you need this? "We actually say Big Data is a fundamentally meaningless term at this point. It means so many things it doesn’t mean anything anymore. Forget the Big Data idea, what's the business outcome you are looking for? What data will be needed to feed that and what tech will be needed to find it," he said.
Like all new markets, Big Data began with new entries, startup firms that spoke Big Data. The Apache Foundation took over Hadoop development from Yahoo, where the technology got its start. Spark, the in-memory analytics for Hadoop that greatly speeds up processing, was born out of the University of California at Berkeley computer science department.
However, unlike other trends, the old guard got involved very fast. Usually they are caught napping for years, like Microsoft's very slow embrace of the Internet in the 1990s and the cloud in the past decade. Now, startups like Cloudera, MapR, and Hortonworks are competing, or working with, giants like IBM, Oracle, and Microsoft.
So talk to your existing vendor-partners first, say the analysts. "Quite often, businesses begin exploring Big Data solutions without tapping into the knowledge of preferred vendors and channel resellers," said King. "That's a mistake, since those companies typically offer or work with numerous Big Data technologies, and also have insights into customers' IT organizations that will impact the success of the solutions they choose."
Gartner's Adrian agrees. "If I'm an Oracle customer and want Hadoop, I don’t need someone else. If I'm Microsoft customer and need a documents database, I don’t need it from someone else. In every one of those scenarios, these guys have gotten in and would be pretty eager to keep their customers," he said.
Start small, say the analysts. Don't bite off more than you can chew, and in the beginning, you can't chew much. Stick to small projects, preferably ones where you can reasonably guess the outcome.
"If you take any business problem that's been bedeviling you and break it down from a data analysis problem, that's a good starting point," said Greenbaum. "To me it's more important to understand the process of data analysis than having the perfect product technology stack in place. That's easy compared to how to get your people to understand how to view data."
Also keep in mind that no matter what technologies you bring in and from whom, you can expect there to be a significant amount of engineering work required to get the new system to integrate well with your existing infrastructure. Synergy and integration will be required to get the full value and these might be very different techs. Your old row-and-column RDBMS might work just fine with a NoSQL database, or it might not.
Another positive in your favor is that it doesn't matter where you start in terms of Big Data projects, you can expand from there into new areas of business or types of analytics. There is a surplus of great tools covering all forms of analytics and data. Some are good for handling logs. Others are good at handling purchasing information, while others work with sensor data. So you can start with one type of data source, gain experience, and expand from there into other areas.
Gartner's Big Data glossary -- helpful in comparing Big Data solutions.
In comparing Big Data solutions, one issue that seems fairly settled in the minds of the analysts is the cloud vs. on premises. In this case, cloud computing wins hands down.
If you are in the cloud it's easier to migrate, it's easier to stand up and stand down servers and to migrate. You're not running it on your own hardware, with all the associated expenses, and don’t have to obtain all the complex software licenses. And it's safer, argues Greenbaum.
"The notion that on premises is greater protection than the cloud is rapidly disappearing. Look at Target, Home Depot, the U.S. government, hospitals. [Hacks are] so rampant and ubiquitous that cyber security is being breached at the corporate IT level for two reasons – back luck, and it's hard to maintain the tech infrastructure and even harder to find top notch people at it," he said.
It becomes easy to say let Amazon or Microsoft or IBM do it, he argues. "Companies I talk to are saying ‘I give up, I need some big ally,’ and that ally is going to be a cloud provider. Companies once loath to put something in the cloud are now looking at it as a safe haven. They've been breached and can't figure out how to stop it," he said.
"In essence, cloud is often a great place to begin exploring big data concepts and solutions," said King. "Big Data implementations often demand investments in new software and hardware, a point that makes running initial deployments in the cloud extremely attractive."
In addition, getting the most out of big data typically involves the efforts of data scientists, a profession that's increasingly in-demand. Talent is often hard to come by. Companies like IBM, Amazon, Google and others offer a wide variety of services and support and have the data scientists on staff, making for one more expense you don't need to incur.
Gartner’s Adrian notes that there are three types of technology used in Big Data; compute, memory, and storage. Not all three scale equally and some scale differently depending on the type of project. But if you try to do it on premises, you'll have to buy all three, and purchase for peak usage. The rest of the time, the hardware sits idle.
"You don’t necessarily want to buy as your data scales. In the cloud, you pay for it as you need it. It's a lot easier to scale them separately on the cloud than it is on premises. When you buy on premises, you get all three and Big Data has peak and off-peak times. The idea I don’t have to pay for compute when I don’t need it is very appealing," he said.
"I wouldn't do Hadoop on-premises," said Greenbaum. "The beauty of Hadoop is that it's elastic. It allows infinite scale. It allows you to distribute the process and storage across clusters of systems that are ideal for the cloud. To maximize Hadoop you'd have to build a massive data center and you are back to the twentieth century model where you build out 80% of a data center you don’t need to handle peaks."
"Overprovisioning is dead," said Adrian. "There is no more dealing with peaks. This is the thing the cloud will change for everybody. You don’t need to buy 100 servers and use them for 15 minutes a year. I just rent them in the cloud and that's a big change."
Next page: Comparing Big Data Solutions: 9 Tips