On the surface, Big Data as a Service seems like a natural evolution. Yet big questions surround its nascent growth.
On-demand services are one of the big technological revolutions of the 21st century, attributable to the Internet revolution that made remote servers as close as the ones in your data center. It started with software as a service, followed by platform and infrastructure as a service, with a few stray ideas like storage as a service. The next big on-demand service may very well be Big Data as a Service.
Big Data as a Service, as defined in a lengthy research piece in Service Technology Magazine by Varun Sharma, an enterprise solutions architect, is a recommended framework "to enable information availability to consumers via reports, discovery services, etc., through the reuse of data services, promoting best practices in data management using data modeling and metadata services."
Sharma notes the migration of data stores from mainframes to the present, where the underlying platform for the data is no longer relevant, and we are moving away from application-based enterprises to data-driven enterprises. Big Data takes data from sources that range from internal metrics, sales figures and Twitter feedback, making it both internally- and externally-generated.
Big Data means lots of data to process, and when it gets into the petabytes, it doesn't make sense to move it around for processing. Should you really pull several petabytes of data into your organization to process it on a Hadoop cluster? Or take all your internal data and send it up to the cloud?
No, said Nick Heudecker, research director for information management at Gartner. "A hybrid deployment makes sense," he said. "Doing some processing in the cloud makes sense, and doing some on premises makes sense. If you have data coming in from cloud services, you can deploy a collection management infrastructure in the cloud, do analytics on it and move it through on premises services. You don't want to move everything to the cloud."
He’s not sold on the concept of BDaaS being a pure cloud play the way SalesForce is a pure cloud play or Softlayer is a pure on-demand platform provider. "Big Data as a Service is nonsense from start to finish. There are too many things to do and integrate in a pure cloud play. You're talking enterprise data warehousing, Hadoop, RDBMS, event processing, NoSQL, in-memory databases and a variety of other things. If all of that is encompassed in Big Data, how can you realistically get that as a service?" he said.
John Myers, research director for business intelligence at Enterprise Management Associates, said the definition of Big Data is evolving, as is the definition of BDaaS. But he adds that it is built on Platform as a Service. "What we're seeing is people want to move faster. They want to be more nimble and the PaaS argument makes a lot of sense for them," he said.
"We've done research over the last few years leading with the question 'How big is your Hadoop?' Now we ask about how they handle existing data structure. We found people using a wide range of technologies. No one has a platform with all your data online like Salesforce, because it's almost as difficult to manage Hadoop in the cloud as if you were installing them in your own office," he said.
BDaaS ideal for faster deployment because people can provision the resources they need, then deprovision them after the work is done and not be saddled with a lot of hardware, Myers notes. "You can go to [Microsoft] Azure and set up a platform based on Hortonworks and literally say give me a 100 node cluster and they build it. Now you have this platform as a service available to you," he said.
Services provider CSC is a proponent of BDaaS and it also advocates the hybrid model. "It's a world where the end state is to get as much data as you possibly can," said Jim Kaskade, vice president and general manager of Big Data and analytics at CSC.
"There's so much in data to go after and how you store and analyze it. I think people just want to get it all in one place they can control. So they will take external and internal data all in one place where you can get an analytic and query capability quickly. Eventually you will get to a federated model where it doesn't matter where you store it. That's the holy grail of the future," he said.
CSC launched its BDPaaS services at the end of July, using Amazon, CSC Cloud Solutions, RedHat OpenStack and VMware VSphere private clouds to integrate client data centers with major cloud services providers. CSC BDPaaS offers batch analytics, fine-grained and interactive analytics, and real-time streaming analytics. It promises insights from data in less than 30 days, even in the most complex hybrid environments.