It's time for a Big Data reality check. All of the hype about the profound value and benefits of the ability of new databases, servers, networks and other ingredients to rapidly process and present massive amounts of data in the Big Data stew has risen to the peak of expectations made famous by the Gartner hype cycle. After conducting a variety of surveys about the reality of Big Data implementations this year, and asking leading consultants and vendors about what they and their clients have learned, it's time to just slightly deflate the balloon.
While it is too early to declare the arrival of the next phase of the hype cycle—the inevitable trough of disillusionment—early adopters have learned lessons that should be shared with the rest of us. Here are nine Big Data lessons learned that I've collected:
1. Focus on data management. The IT department, specifically data architects, need to determine where the data and apps will reside. In one on-premise system or together in a cloud implementation? The traditional Business Intelligence era approach of 10 years ago—trying to have everything in one data warehouse—frequently failed in the wake of numerous data marts developed by maverick departments like finance. Thomas Davenport, co-author of the best-selling book Competing on Analytics and the upcoming Big Data at Work, warns that "while it is good to have options, multiple Big Data implementations leads to a more complex set of IT management decisions."
Michael Driscoll, CEO of Metamarkets and a longtime observer of the analytics scene, says he's seen too many large companies attempt to put all of the data—and the processors—in one place. He warns against pursuing a "one- platform" solution, foisted on the organization by the CIO. "Unified data platforms are a false promise of hope," he contends. They are too big, too complex and will inevitably frustrate one or more departments or units. "A federation of services approach is best," he explains. In these arrangements, marketing and finance and other departments can each have their own Big Data implementation.
Most of the value of Big Data comes from co-locating it with knowledgeable end users, at the edges of the organization, where they can tinker with and glean insights from their own data.
2. Don't underestimate the data integration challenges. Deriving value from Big Data usually is dependent on processing unstructured information—video feeds from shop floors, telematics sensors in vehicles, GPS sensors in mobile devices, speech to text files and a host of other bits and pieces of information that are not readily processed. "Most organizations do not have experience cleaning these types of data," notes Davenport.
IBM and others promise that their semantic analytics tools are able to not only parse these unstructured data types, but do it fast enough to support real time decision making. Anjul Bhambhri, IBM's vice president of Big Data within its software group, advises keeping all of the incoming data in its raw state, to preserve information that may be useful later when processed by semantic analytics.
"One of the implicit benefits of a Big Data platform is that you can preserve the raw fidelity of the data and apply multiple types of semantic analytics tools that will filter out the appropriate noise for the specific types of analysis being performed," Bhambhri explains. "This allows the same set of raw data to be applied to multiple applications and domains, without having to model the raw data upfront."
3. Start with the basics. "Many of us love to wax poetic about a utopian future where you stroll into a BestBuy, and your smartphone buzzes with a coupon for the new Microsoft Surface," comments Driscoll. "The deal is offered because it is back-to-school week and BestBuy has access to and processed information about your household, including past Microsoft purchases." Another example of utopia: "We analytics folks love to tout our ability to predict the perfect song for your current mood or movie for your weekend. "However, we need to first focus on the basics," he adds. "Big Data should first answer questions like 'How much money did my company make yesterday.' Or, 'Why did our revenues spike 10 percent last Thursday?'"
4. Big Data success requires scale and speed. Hadoop can process a lot of data, but it is a batch process. In many industries, real-time decision making is no longer optional. Driscoll avers that putting SQL on top of Hadoop or other Big Data stores enables organizations to actually use Big Data information in a timely way. As he puts it, "I am advocate for ‘Know SQL’ over ‘NoSQL’."
5. Data visualization is important for Big Data users. Front line professionals and others who are expected to be able to take action based on Big Data insights need an easily digestible delivery mechanism.
6. Big Data implementations belong in the cloud, insists Driscoll, because that’s where Big Data lives. While others will disagree, for various regulatory or corporate culture reasons, he says the data and the applications should be accessible via a software as a service (SaaS) approach. One of the primary reasons for putting the Big Data program in the cloud is lesson learned number 7.
7. Big Data access via mobile devices. The latest generation of touch-enabled smartphones and tablets are driving a huge change in the way companies operate and communicate internally and with their partners and customers. Ignoring their demand for access to manipulate Big Data information and insights via their mobile device is a career-shortening decision for IT managers.
8. Don’t stop at stage one, deploying Big Data to find cost reductions. Once the technology is proven, the next stage is to identify opportunities to improve an organization's top line growth. "Most companies tend to start on their Big Data voyage with a goal of achieving cost savings and then expand from there to add additional forms of data and perform analytics that contributes to top line revenue," IBM's Bhambhri notes. "Once they prove out these cost savings, they start to leverage the platform to bring in other sources of data to combine with the data they have off-loaded or the models they have now moved to the big data platform." She adds that such data types include but are not limited to telemetry data, geospatial data, additional master data from other enterprise systems, click stream data and social media data. Adding these data types enables "other LOBs in the enterprise to leverage the power and scale of the platform as well as the content in it."
This is the pattern that Bhambhri has seen in over 500 implementations across industries, including telecommunications, automotive and finance sectors.
9. If you're not in the Big Data pool now, the lifespan of your career is shrinking by the day. "If you want to stay current and in demand, it's a good idea to buy access to a Hadoop cluster and get some experience with it, as well as scripting languages," urges Davenport. "Smart IT people start to master/explore new technologies ahead of the demand and the price/performance is so much better than data appliances and data warehouses."
Indeed, Big Data projects are underway in at least a third of the large organizations responding to various surveys I've worked on, so it's clear that the hype cycle has yet to peak. If you're in IT and not already climbing the Big Data mountain, in a few years you may find yourself technologically obsolete.