It’s a major challenge facing business intelligence deployments: should data be centralized or held locally? IT analyst Wayne Kernochan discusses new technologies that impact that decision.
One of the most fundamental decisions that business intelligence implementers in IT make, at the beginning of every new BI initiative, is whether the new data involved should be copied into a central data mart or data warehouse or accessed where it is.
The advent of software as a service (SaaS) and the public cloud has added a new dimension to this decision: Now the BI implementer must also decide whether to move the data physically into the cloud and “de-link” the cloud and internal data stores. In fact, this decision is no longer the purview solely of the CTO – the security concerns when you move data to a public service provider mean that corporate needs to have input into the decision. However, fundamentally, the decision is the same: Move the one copy of the data; keep the one copy where it is; or copy the data, move one copy, and synchronize between copies.
Almost two decades of experience with data warehousing has shown that these decisions have serious long-term consequences. On the one hand, for customer buying-pattern insights that demand a petabyte of related data, failure to copy to a central location can mean serious performance degradation – as in, it takes hours instead of minutes to detect a major cross-geography shift in buying behavior. On the other hand, attempting to stuff Big Data like the graphics and video involved in social networking into a data warehouse means an exceptionally long lag time before the data yields insights. The firm’s existing data warehouse wasn’t designed for this data; it is not fine-tuned for good performance on this data; and despite the best efforts of vendors, periodic movement or replication of such massive amounts of data to the data warehouse has a large impact on the data warehouse’s ability to perform its usual tasks. Above all, these consequences are long-term – applications are written that depend for their performance on the existing location of the data, and redoing all of these applications if you want to move to a different database engine or a different location is beyond the powers of most IT shops.
The reason that it is time to revisit the “move or stay” decision now is that business intelligence users, and therefore the BI IT that supports them, are faced with an unprecedented opportunity and an unprecedented problem. The opportunity, which now as never before is available not only to large but also medium-sized firms, is to gather mammoth amounts of new customer data on the Web and use rapid-fire BI on that data to drive faster “customer-of-one” and agile product development and sales. The problem is that much of this new data is large (by some estimates, even unstructured data inside the organization is approaching 90% of organizational data by size), is generated outside the organization, and changes very rapidly due to rapid and dangerous changes in the company’s environment: fads, sudden brand-impacting environmental concerns, and/or competitors who are playing the same BI game as you.
Read the rest about data warehouse and data virtualization at Enterprise Apps Today.