The rapid proliferation of the Internet, the surge in social media activity and the digitization of oceans of information has presented the IT industry with a challenge: how to deal with it all? How to cull through it and make sense of it?
The various technologies and solutions to these questions are lumped under the hot buzzword Big Data.
And as the industry works through this challenge, experts believe 2012 could be the year in which more companies try out non-relational models for data management.
A key indicator in this trend is the move away from the earlier accepted application-centric approach to one that is more data-centric. Applications in the past drove the analytics, and then proceeded to implementation. The approach was to work backward to the respective information sources.
However, companies now recognize the value of data from different sources and the demand is for applications to integrate with them. The ability to work with all kinds of data now ascertains the success of an application. “Partly encouraged by new technologies like Hadoop, data is now the prime component,” explains Eli Collins, Software Engineer at Cloudera.
This is a recognition that management practices based on analytics have a positive impact on businesses and can, in some circumstances, even create competitive advantage. Data drives analytics, which in turn forms the basis of a good decision. And therefore the ability to capture, manage, store, quickly analyze and disseminate it to the right sources becomes vitally important.
“This change has been coming about over the past five years – some of the best companies always knew about it but they may not have had all the tools at an affordable cost, says Dan Vesset, VP, Business Analytics, IDC.
Big Data and Hadoop in particular gets a lot of attention from all of the industry big guns – IBM, Oracle, EMC, and Microsoft to name a few – but there is still a lot of hype that surrounds it. The early adopters have already distanced themselves from others through their use of Big Data analytics.
However, for all “the cutting edge work undertaken by these companies, the majority of the enterprise customers are still in the tire kicking phase,” says Herain Oberoi, Director, Product Management, Microsoft.
The industry is optimistic, nonetheless.
Already all the major players have started to evaluate how to make IT simpler, to manage the data and ensure end users like business analyst and data scientist derive value from it.
Last year Microsoft shipped Hadoop connectors for their SQL Server database and Parallel Data Warehouse appliance, announced Hadoop-based services for Windows Azure, Hadoop on Windows as an on-premise offering and integration with their business Intelligence tools. The company is currently testing a second Community Technology Preview of a Hadoop-based service for Windows Azure.
Jack Norris, VP, Marketing, MapR Technologies, added that Hadoop allows data accessing across commodity hardware and scales linearly, which makes it easier to handle the fast growing machine-generated data sources. Furthermore, the inclusion of enterprise grade capabilities like snapshot and mirroring and integration standard protocols like Network File System (NFS), makes Hadoop easier to integrate into the existing IT environment.
Additionally, efforts by the vendors to merge the newer Big Data solutions with traditional platforms both in data management and data warehousing Business Intelligence is also underway.
For instance, the unstructured data requires complex algorithms. This code does not run inside SQL relational databases, but it can be parallelized efficiently using a MapReduce programing model. The twist here, explains Martin Willcox, Director of Product & Solutions Marketing at Teradata, is that the MapReduce is a parallel programming model and the most common implementation is Hadoop.
Hadoop, despite its many strengthes also has some weakness. It is batch-oriented environment and is difficult to maintain high level of user concurrency. In this case, Teradata employs the Aster Mapreduce appliance, which combines Aster’s relational database, and the SQL MapReduce framework that allows users to store complex unstructured data through a simple SQL interface.
“If we are going to really bring this MapReduce programming model and the ability to extract really interesting information from all this new and complex data, we need to industrialize that whole process. In the same way that we have industrialized traditional BI in the past,” added Willcox.
Oracle, too, with its Oracle Big Data Appliance enables its customers to jump-start their big data projects through a comprehensive and pre-integrated solution. “This is well integrated with our database machine Exadata, which the customer can readily deploy and support,” explains Nick Whitehead, Business Analytics Senior Director, Oracle.
The IT department would also need to understand a separate hardware model, because the deployment of the open source Hadoop on Direct-Attached Storage (DAS) does not fit into any of their traditional IT practices around back up or replication or security policy. Here, EMC integrated the technologies from Isilon and Greenplum to offer an enterprise solution, where Isilon storage system is an IT storage play with replication, snapshots and security integration – everything an organization is familiar with on top of the Greenplum Hadoop infrastructure.
There are other concepts in dealing with massive data sets, like Data Policies, Data Protection, and Access to Data, which, according to Michael CHUI, Senior Fellow, McKinsey Global Institute, organizations need to address to capture the full potential of Big Data.
For instance, data policies for multinationals could prove to be a challenge because regulations differ from country to country. Then there is data security and privacy. Organizations will have to consider and address how to protect sensitive data and tackle questions around intellectual property rights attached to an item of data and other legal questions around liability. “Addressing these issues will be the real enabler towards capturing value,” said CHUI.