Vendors have taken note of this and have in place solutions that make addressing these problems easier. IBM, for instance offers comprehensive data integration, data quality and governance capabilities, which ensures information delivered post analysis is trusted. IBM extends these concepts, which are implemented at the database level and application level, to their big data solutions. These policies are plotted into their Hadoop-based platform InfoSphere BigInsights, which allows an organization to audit who accesses the data, when and from where.
“With this we are able to provide a complete picture,” says Anjul Bhambhri, VP, Big Data Products, IBM.
Still, there is some confusion over what organizations can achieve with new technologies as they grapple with which solutions would be appropriate to deal with a given problem.
Traditional relational databases, for instance, have the ability to scale, but may be inappropriate for housing the growing volume of unstructured data. “Organization may have to think of scaling out and scaling up differently to find a cost effective mechanism for extracting benefit from low value unstructured data,” says David Rajan, Director of Technology, Oracle. New technologies like Hadoop and NoSQL not only help organizations derive commercial advantage from a very low value data set but also complement the existing technologies like database infrastructure and data warehousing platform.
Organizations might also endure start up costs. While Hadoop is a good scale out analytics platform, it does have a single point of failure: it’s inefficient from a storage perspective and requires a whole new set of tools, training and processes. “In addition, enterprises are yet to see, post deployment, the way ahead for an open source technology to become mainstream in their architecture,” says Nick Kirsch, Director of Product Management, Isilon.
Hadoop has yet to be deployed on a wider scale and, in the view of some, is difficult to use. Using a more established technology where one can simply push a button to make it work – one backed by a service and support system – can relive headaches for a CIO. “This is the continual challenge faced by freeware in the enterprise, but innovation is not confined to the open source community,” stated Fernando Lucini, Chief Architect Autonomy.
There is already a maturation of the Hadoop mindset and this year the players expect a lot more budget allocated towards these projects.
This exponential data growth shows no signs of slowing down. Vanessa Alvarez, Analyst, Infrastructure & Operations, Forrester Research points to a 50% data growth in organizations. The spotlight, consequently, is shifting to current storage architecture, which could limit the potential of forward-looking Big Data solutions. The need is immediate. “Storage is expensive and accounts for 17% of the IT budget,” she says.
The two main concerns here are the affordability of the uncontrollable demand from the storage infrastructure due to the data growth and the need to harness this growth.
Indeed, high-end storage remains under utilized and increasingly takes the form of solid state disk (SSD). “But customers don’t want to have their rarely used or cold data on the expensive SSDs. And depending on the Service Level Agreement, [they] are uncomfortable about placing their data streams on tapes,” says Steve Wojtowecz, VP Software Storage, IBM.
In this scenario, customers would need to pick a storage approach that would work with the software tools they want to use. Carter George, Executive Director, Dell Storage Strategy exercises caution. Some of the software tools that come with their own built-in storage solutions could lock their user to a particular storage architecture. “This market is yet to mature and the best software for this may not have been written yet, or it may just now be getting prototyped. So I’d be hesitant to commit to a storage strategy that is tied to a specific tool,” added George.
Data analytics workloads will become critical and organizations will prefer to move them away from commodity hardware. The focus will be on reliability, be it on the hardware layer or the file layer that provides RAID or mirroring to be able to recover after the software fails.
“Going forward, workloads will get to a point where the level of functionality and reliability of a NAS and a SAN will be very important,” added Wojtowecz.