On the surface, Big Data as a Service seems like a natural evolution. Yet big questions surround its nascent growth.
On-demand services are one of the big technological revolutions of the 21st century, attributable to the Internet revolution that made remote servers as close as the ones in your data center. It started with software as a service, followed by platform and infrastructure as a service, with a few stray ideas like storage as a service. The next big on-demand service may very well be Big Data as a Service.
Big Data as a Service: Ready for Prime Time?
Big Data as a Service, as defined in a lengthy research piece in Service Technology Magazine by Varun Sharma, an enterprise solutions architect, is a recommended framework “to enable information availability to consumers via reports, discovery services, etc., through the reuse of data services, promoting best practices in data management using data modeling and metadata services.”
Sharma notes the migration of data stores from mainframes to the present, where the underlying platform for the data is no longer relevant, and we are moving away from application-based enterprises to data-driven enterprises. Big Data takes data from sources that range from internal metrics, sales figures and Twitter feedback, making it both internally- and externally-generated.
Big Data means lots of data to process, and when it gets into the petabytes, it doesn’t make sense to move it around for processing. Should you really pull several petabytes of data into your organization to process it on a Hadoop cluster? Or take all your internal data and send it up to the cloud?
No, said Nick Heudecker, research director for information management at Gartner. “A hybrid deployment makes sense,” he said. “Doing some processing in the cloud makes sense, and doing some on premises makes sense. If you have data coming in from cloud services, you can deploy a collection management infrastructure in the cloud, do analytics on it and move it through on premises services. You don’t want to move everything to the cloud.”
He’s not sold on the concept of BDaaS being a pure cloud play the way SalesForce is a pure cloud play or Softlayer is a pure on-demand platform provider. “Big Data as a Service is nonsense from start to finish. There are too many things to do and integrate in a pure cloud play. You’re talking enterprise data warehousing, Hadoop, RDBMS, event processing, NoSQL, in-memory databases and a variety of other things. If all of that is encompassed in Big Data, how can you realistically get that as a service?” he said.
John Myers, research director for business intelligence at Enterprise Management Associates, said the definition of Big Data is evolving, as is the definition of BDaaS. But he adds that it is built on Platform as a Service. “What we’re seeing is people want to move faster. They want to be more nimble and the PaaS argument makes a lot of sense for them,” he said.
“We’ve done research over the last few years leading with the question ‘How big is your Hadoop?’ Now we ask about how they handle existing data structure. We found people using a wide range of technologies. No one has a platform with all your data online like Salesforce, because it’s almost as difficult to manage Hadoop in the cloud as if you were installing them in your own office,” he said.
BDaaS ideal for faster deployment because people can provision the resources they need, then deprovision them after the work is done and not be saddled with a lot of hardware, Myers notes. “You can go to [Microsoft] Azure and set up a platform based on Hortonworks and literally say give me a 100 node cluster and they build it. Now you have this platform as a service available to you,” he said.
Services provider CSC is a proponent of BDaaS and it also advocates the hybrid model. “It’s a world where the end state is to get as much data as you possibly can,” said Jim Kaskade, vice president and general manager of Big Data and analytics at CSC.
“There’s so much in data to go after and how you store and analyze it. I think people just want to get it all in one place they can control. So they will take external and internal data all in one place where you can get an analytic and query capability quickly. Eventually you will get to a federated model where it doesn’t matter where you store it. That’s the holy grail of the future,” he said.
CSC launched its BDPaaS services at the end of July, using Amazon, CSC Cloud Solutions, RedHat OpenStack and VMware VSphere private clouds to integrate client data centers with major cloud services providers. CSC BDPaaS offers batch analytics, fine-grained and interactive analytics, and real-time streaming analytics. It promises insights from data in less than 30 days, even in the most complex hybrid environments.
HP is offering its own BDaaS, called HAVEn As a Service, a cloud-based method for enterprises to subscribe to several of its Big Data analytic products on an as-need basis. HAVEn is HP’s brand for its Hadoop, Autonomy, Vertica and other BD products for processing and analyzing data.
EMC also has talked up BDaaS in a white paper (here in PDF format), which it promotes its own products like Greenplumb and Pivotal. Its services are built on four platforms: Cloud infrastructure, data fabric, data platform as a service and analytics software as a service.
Sharma added another element to BDaaS: governance. Data in the cloud has got to be secured. “Data governance is a must-have, and no longer merely a good-to-have,” he wrote. Ignoring data security, data quality and data access can cost an organization dearly in terms of money, efficiency and reputation.
He also advocates breaking the operational tiers for data flow into logical groups to allow agility via loose coupling and abstraction. Finally, he says not to focus solely on the volume, variety and complexity of data. “Consider the whole cycle from the acquisition of data to the extraction of information, and consider the hygiene factors along this path,” he wrote.
At the end of the day it’s all about storing, analyzing and querying more data from more data sources. There’s more than just the volume of data that you are storing, it’s the speed at which you can acquire the data and act on it. The whole paradigm comes into play. It’s not storing a lot of data and analyzing it.
Kaskade believes the future of BDaaS is something that doesn’t involve a human looking at the data. We already have that in the financial services sector, he notes, where banks will alert a customer if there is a surprisingly large charge. That type of instant action will roll out everywhere. “Now a whole set of industries are applying complex event processing as well. They are acting on millions of streams coming in doing analytics in real time,” he said.
As for what companies should deploy first, he said don’t build a big sandbox of technology, because you might not need it. “What everybody needs to do is use case first and the questions that drive use case. That will dictate what you will buy. You might just need a SaaS app. Don’t feel like you need to make a really big investment. Solve one problem and show your C suite what you can do before going for a bigger bite,” he said.
Photo courtesy of Shutterstock.