Also see: Comparing Big Data Solutions: 9 Tips
How to Select a Big Data Solution: Video Roundtable
Big Data has become the newest and hottest means for obtaining insight into your customers and finding new ways to compete in the marketplace. But comparing Big Data solutions to chose the best one is no trivial task, and once you commit, changing course becomes very difficult and expensive.
The answer to the question of what solution to choose is indeed, remarkably complicated. The product offering for Big Data solutions has grown rapidly in the short time since Yahoo first released Hadoop in 2009 and made Big Data possible.
Comparing Big Data Solutions: First Question
In comparing Big Data solutions, the first question you need to ask is the most basic one: do I even need this? Joshua Greenbaum, principal analyst for Enterprise Applications Consulting, says yes, it’s not even up for debate.
“The answer is honestly yes, you need it,” he said. “There is no shortage in any modern enterprise of big analysis questions to be solved. Regardless of your business, you are living in a global economy that is much more time sensitive and customer and quality sensitive than it has ever been. If you do not think you need to be a metrics-driven competitor, you are missing a big boat.”
Don’t ask what product you need, ask what problem you have to solve, argues Charles King, principal analyst with Pund-IT. “Conducting best/worst case assessments of a solution’s potential impact on their business and technology organizations would be a better place to begin. So start with what they hope to achieve, what’s required for that purpose, what it will cost in the short and long term, how likely it is to succeed and what to do in case of failure,” he said.
Merv Adrian, research vice president with Gartner, is on the side of asking: do you need this? “We actually say Big Data is a fundamentally meaningless term at this point. It means so many things it doesn’t mean anything anymore. Forget the Big Data idea, what’s the business outcome you are looking for? What data will be needed to feed that and what tech will be needed to find it,” he said.
Big Data Solutions Comparison: The Big Players
Like all new markets, Big Data began with new entries, startup firms that spoke Big Data. The Apache Foundation took over Hadoop development from Yahoo, where the technology got its start. Spark, the in-memory analytics for Hadoop that greatly speeds up processing, was born out of the University of California at Berkeley computer science department.
However, unlike other trends, the old guard got involved very fast. Usually they are caught napping for years, like Microsoft’s very slow embrace of the Internet in the 1990s and the cloud in the past decade. Now, startups like Cloudera, MapR, and Hortonworks are competing, or working with, giants like IBM, Oracle, and Microsoft.
So talk to your existing vendor-partners first, say the analysts. “Quite often, businesses begin exploring Big Data solutions without tapping into the knowledge of preferred vendors and channel resellers,” said King. “That’s a mistake, since those companies typically offer or work with numerous Big Data technologies, and also have insights into customers’ IT organizations that will impact the success of the solutions they choose.”
Gartner’s Adrian agrees. “If I’m an Oracle customer and want Hadoop, I don’t need someone else. If I’m Microsoft customer and need a documents database, I don’t need it from someone else. In every one of those scenarios, these guys have gotten in and would be pretty eager to keep their customers,” he said.
Next Steps in Comparing Solutions: Break it Down
Start small, say the analysts. Don’t bite off more than you can chew, and in the beginning, you can’t chew much. Stick to small projects, preferably ones where you can reasonably guess the outcome.
“If you take any business problem that’s been bedeviling you and break it down from a data analysis problem, that’s a good starting point,” said Greenbaum. “To me it’s more important to understand the process of data analysis than having the perfect product technology stack in place. That’s easy compared to how to get your people to understand how to view data.”
Also keep in mind that no matter what technologies you bring in and from whom, you can expect there to be a significant amount of engineering work required to get the new system to integrate well with your existing infrastructure. Synergy and integration will be required to get the full value and these might be very different techs. Your old row-and-column RDBMS might work just fine with a NoSQL database, or it might not.
Another positive in your favor is that it doesn’t matter where you start in terms of Big Data projects, you can expand from there into new areas of business or types of analytics. There is a surplus of great tools covering all forms of analytics and data. Some are good for handling logs. Others are good at handling purchasing information, while others work with sensor data. So you can start with one type of data source, gain experience, and expand from there into other areas.
Gartner’s Big Data glossary — helpful in comparing Big Data solutions.
For Big Data Solutions, Cloud is Mandatory
In comparing Big Data solutions, one issue that seems fairly settled in the minds of the analysts is the cloud vs. on premises. In this case, cloud computing wins hands down.
If you are in the cloud it’s easier to migrate, it’s easier to stand up and stand down servers and to migrate. You’re not running it on your own hardware, with all the associated expenses, and don’t have to obtain all the complex software licenses. And it’s safer, argues Greenbaum.
“The notion that on premises is greater protection than the cloud is rapidly disappearing. Look at Target, Home Depot, the U.S. government, hospitals. [Hacks are] so rampant and ubiquitous that cyber security is being breached at the corporate IT level for two reasons – back luck, and it’s hard to maintain the tech infrastructure and even harder to find top notch people at it,” he said.
It becomes easy to say let Amazon or Microsoft or IBM do it, he argues. “Companies I talk to are saying ‘I give up, I need some big ally,’ and that ally is going to be a cloud provider. Companies once loath to put something in the cloud are now looking at it as a safe haven. They’ve been breached and can’t figure out how to stop it,” he said.
“In essence, cloud is often a great place to begin exploring big data concepts and solutions,” said King. “Big Data implementations often demand investments in new software and hardware, a point that makes running initial deployments in the cloud extremely attractive.”
In addition, getting the most out of big data typically involves the efforts of data scientists, a profession that’s increasingly in-demand. Talent is often hard to come by. Companies like IBM, Amazon, Google and others offer a wide variety of services and support and have the data scientists on staff, making for one more expense you don’t need to incur.
Gartner’s Adrian notes that there are three types of technology used in Big Data; compute, memory, and storage. Not all three scale equally and some scale differently depending on the type of project. But if you try to do it on premises, you’ll have to buy all three, and purchase for peak usage. The rest of the time, the hardware sits idle.
“You don’t necessarily want to buy as your data scales. In the cloud, you pay for it as you need it. It’s a lot easier to scale them separately on the cloud than it is on premises. When you buy on premises, you get all three and Big Data has peak and off-peak times. The idea I don’t have to pay for compute when I don’t need it is very appealing,” he said.
“I wouldn’t do Hadoop on-premises,” said Greenbaum. “The beauty of Hadoop is that it’s elastic. It allows infinite scale. It allows you to distribute the process and storage across clusters of systems that are ideal for the cloud. To maximize Hadoop you’d have to build a massive data center and you are back to the twentieth century model where you build out 80% of a data center you don’t need to handle peaks.”
“Overprovisioning is dead,” said Adrian. “There is no more dealing with peaks. This is the thing the cloud will change for everybody. You don’t need to buy 100 servers and use them for 15 minutes a year. I just rent them in the cloud and that’s a big change.”
Next page: Comparing Big Data Solutions: 9 Tips
Comparing Big Data Solutions: 9 Tips
Comparing Big Data solutions with an eye toward finding the best one for your business has become critical: In recent years, big data solutions have gone from being a hot emerging technology to an essential part of everyday business.
In 2015, Gartner dropped big data from its Hype Cycle report, and explained the decision by observing, “Big data isn’t obsolete. It’s normal.”
According to the NewVantage Partners “Big Data Executive Survey 2016”, 62.5 percent of organizations now have at least one big data solution running in production. That’s more than twice as many as in 2013. And today 69.9 percent of organizations say that big data is very important or critical to their companies’ growth.
A separate survey from IDG found that the number of organizations deploying or implementing data-driven projects climbed 125 percent in 2015.
While big data solutions have become very popular, they remain very complex. And because vendors are likely to slap the “big data” label on almost any data-related product, enterprises and small businesses need to be very careful to ensure that the solutions they are purchasing will meet their needs. With data scientists in short supply, organizations also need to ensure that their existing staff has the necessary skills to make use of any software they purchase.
So what should organizations compare big data solutions for their needs? Experts offer nine key tips:
1. Focus on Your Business Objectives
Some companies have fallen into the trap of adopting a big data solution before they know how they are going to use it or why they are going to use it. To avoid this mistake, Sean Anderson, product marketing, Cloudera, recommends that companies “define and organize around a business case.” He says, “Technology can be very compelling but useless ultimately if it is not solving a business problem.”
Once everyone involved understands the use case driving the decision to purchase a solution, it becomes much easier to select the product that will be the best fit for the organization’s needs. Instead, of just looking for a product that checks a lot of boxes, companies can focus on selecting a vendor that can help them accomplish their objectives.
“Many customers get into a feature and functionality bake- off, when in reality you need to think about how you are going to partner with a vendor to ensure your success in bringing an analytics offering to market,” explains Roman Stanek, CEO and founder at GoodData. He adds, “As opposed to thinking strictly about individual features, consider the wealth of expertise and knowledge a vendor can bring to your partnership.”
Stanek says that the most important question a company can ask their big data vendor is “How are you going to help me or allow me to create value from my data assets?” In addition, he advises, “Consider how you are going to productize the analytics solution to turn it into a profit center for your business. Work backwards as you would with any new product or feature you are going to introduce to your product portfolio.”
2. Look for Scalability
The fundamental issue that launched the “big data” trend is the sheer amount of data that most organizations must store and manage. According to IDC’s Digital Universe study, the volume of data stored in all of earth’s digital systems is increasing 40 percent per year. And rapidly growing companies may experience even faster data growth.
With this reality in mind, Jeff Healey, director of product marketing for Vertica at Hewlett Packard Enterprise, says that it’s “important to choose a technology that can scale as you grow and position you for success – as your business changes and data insight moves beyond just experimental.”
Importantly, organizations need to make sure that the solution they choose will continue to offer the levels of performance they need as data stores get larger. “Most of the big data problems have their roots in scalability,” notes Kiran Kamreddy, senior product manager at Teradata. “The ‘big’ volumes create performance issues. A big data solution [must] scale well as the data volumes increase rapidly and deliver acceptable levels of performance.”
3. Make Sure the Big Data Solution Can Handle Many Data Types
When discussing big data, people often refer to the “three Vs”: volume, velocity and variety. Of those three, the variety of data is often the most difficult for enterprises to handle. In the NewVantage Partners report, 40 percent of those surveyed said that data variety is the primary technical driver for their big data investments, compared to just 14.5 percent who said the same about volume and 3.6 who selected velocity as their primary issue.
Kamreddy explains, “Big data means a large variety of data and analytical paradigms, so a big data solution must not be too rigid and be open to handle a lot of variety.” That means the solution should support both structured and unstructured data in a variety of formats, and it should support Hadoop and other common big data tools.
4. Leverage Your Existing Investments
Just because you are investing in a big data solution doesn’t mean that you will be getting rid of your existing storage, data management and analytics tools. “I think some organization view big-data as something that is radically and completely new, and they need to rip and replace their existing data investments,” says Kamreddy. “That is not completely true. It is important to evaluate their current solutions and why/how they are falling short of their big data needs and requirements and how the new solutions augment the existing ones. They should also think about the integration and configuration features of the big data solutions and how easy or difficult is it for them to integrate with existing solutions, and data types.”
Companies should also consider the burden that the new technology will place on IT staff. “Is your big data solution going to force your analysts and BI professionals to learn new tools or limit the tools they can use?” asks Anderson. “This is a surefire way to bolster low buy-in for big data projects. Be sure to choose a vendor that works with popular tooling for ETL, data visualization, data management, analytics, BI, etc. on premise and in the public cloud.”
Leveraging the resources you already have can also help to keep expenses low.
5. Think about TCO
Cost is always a factor in any technology purchase, and many big data initiatives are driven in part by a need to lower the cost of storing, managing, maintaining and analyzing data stores. However, determining the total bill can be a tricky process that involves estimating ongoing operational expenses as well as a careful analysis of the hard costs.
Anderson cautions organizations to make sure that they are calculating the complete cost of the project. “This includes the technology but also the skills, administration, and professional services/support costs.”
Kamreddy agrees, noting that overall total cost of ownership (TCO) is one of the most important considerations for selecting a big data solution. He recommends that organizations evaluate “the ROI/TCO implications of each option, in the light of solving the business case and value delivered.”
6. Consider Solutions Based on Open Standards
Many of the most popular big data solutions, such as Apache Hadoop and its related projects, are available under open source licenses. So organizations frequently go looking for commercially supported options that are based on these open source projects.
“Open source solutions are great because they give the ecosystem great velocity to meet new customer demands, but they are all adopted and supported to varying degrees,” notes Anderson.
Choosing a solution that is based on open standards can give organizations greater agility and the freedom to switch vendors as needs change. Healey advises organizations to look for solutions that are “based on open SQL standards with in-database analytical functions for machine learning, IOT sensor data analytics, pattern matching, and more.” He also notes, “The ability to natively integrate with open source and complementary technology helps to avoid vendor lock in and affords ultimate flexibility.”
7. Don’t Forget About Security
As big data solutions have matured, most have improved their security features. According to IDG, “Confidence in security solutions and products for company data rises, increasing from 49% in 2014 to 66% [in 2015].”
Still, organizations should make sure that any solution they purchase meets their security and compliance requirements. Anderson warns, “Most solutions address security in their own unique way but it’s very rare that a solution covers security from end to end, ensuring data is being protected during ingest, analysis, and when served via online applications.”
8. Choose a Solution That Makes Big Data Widely Accessible
Experts say that big data becomes most useful when a wide range of people within the organization can access big data insights. Look for tools that don’t require users to be experts in data science in order to use them.
“Enabling a data-driven culture across the organization means opening up these systems to self-service access and discovery of data — making data and analytics usage ubiquitous to all users,” says Ritkia Gunnar, vice president of big data and analytics solutions, IBM Analytics.
“The focus is really about how to make sense and interpret value from all data – how to turn the ‘ha-dump’ of data into something meaningful and of interest,” adds Gunnar. “Organizations need to look for solutions that essentially make big data simple.”
9. Ask for References
Finally, when selecting a big data solution, it never hurts to get some customer references from the vendors you are considering. Healey recommends, “Ask vendors specifically for examples of how reference customers have started small, grown without pain, incorporated key complementary technologies, and have succeeded across multiple analytical use cases.”
If you contact other customers, they may also be able to provide advice and tips gleaned from their experience deploying a big data solution. That, in turn, can help your vendor selection and solution deployment process run more smoothly.