Thursday, September 23, 2021

Data Belongs in the Cloud. Full Stop

In 2012, IBM made an oft-quoted claim that 90 percent of the world’s data has been created in the last two years. They grossly underestimated it.

In 2020, the world produced 49 quintillion bytes of data; an 1,860% increase from IBM’s initial estimate.

A 2012 data strategy cannot survive in our current world. The age of on-premise data warehouses or evolving to hybrid cloud architectures is over. Companies no longer have time to debate their cloud strategy or evolve their architecture. Data belongs in the cloud today. Not tomorrow. Or some distant or hypothetical day in the future.

Compromises We Make For Traditional Data Warehouses

There are many companies — mainly large organizations in highly regulated industries, like health care, financial services, or government — that are committed to keeping their on-premises data warehouses due to what I believe to be antiquated security concerns or have fallen for the sunk-cost fallacy. This complacency is not without compromises and missed opportunities. 

First, on-prem data warehouses pose elasticity challenges, making it difficult to react to dynamic workloads. A company may require significant resources to handle their end-of-quarter workload, but it shouldn’t have to pay for those resources during periods of inactivity. Furthermore, on-prem data warehouses were not built to handle the volume or the many types of data, especially semi-structured and unstructured, that have become commonplace, rendering it difficult to blend discrete data sources for richer insights. When business intelligence (BI) is based on incomplete data, the insights are inaccurate at best and even dangerous in some cases. 

Second, on-prem data is often kept isolated. Database administrators (DBAs) often limit the number of people that have access to the warehouse, because security, governance, and compliance become more difficult to maintain and each new user strains the system. Limited data access creates bottlenecks, constrains the potential for new insights to be discovered, and leads to data extracts or other workarounds that can put the business in jeopardy. 

Finally, DBAs must spend an inordinate amount of time manually tuning on-prem warehouses for performance, routine maintenance, and resolving the above issues. Those resources could be put toward far more valuable tasks, like identifying and pulling new data sources into the warehouse or working on more complex projects that drive greater value for the organization as a whole. 

The result of these three combined compromises is slow, inaccurate analytics, which has no place in today’s always-on, on-demand marketplace. 

Migrating to a Cloud Data Analytics Stack Doesn’t End at the Cloud Data Warehouse

The rate at which data is growing, the now-debunked myth that the cloud is less secure, and the compromises that must be made to maintain an on-prem warehouse are all contributing to its inevitable obsolescence. Approximately 50 percent of all corporate data is already stored in the cloud, and IndustryARC has forecasted the global market for cloud data warehouses to grow at a CAGR of 16.4% from 2020 to 2025, reaching $3.5 billion by 2025. 

That’s because the most effective and efficient place to store data today is in the cloud. Whether it’s a cloud data lake or a cloud data warehouse depends on the company’s needs. In addition to the standard cloud benefits — lower cost, no maintenance, elasticity, redundancy, reliability, flexibility, access from anywhere, etc. — a cloud data lake or cloud data warehouse offers other benefits that are specific to data and analytics, including:

  • Compute power that only the cloud can deliver, which allows for faster analyses, advanced analytics, and data science initiatives to be more easily performed;
  • Near-unlimited capacity and the ability to handle a variety of structured, semi-structured, and even unstructured data in the case of a data lake, which allows nearly all of a company’s data to be centralized for a single source of truth;
  • The option to easily combine data sources for analysis, including public datasets, which leads to richer insights;
  • Centralized data and analytics governance, making it easier and safer to grant data access to more employees.

Unless new technology comes along that’s able to do all that and more, cloud data lakes and cloud data warehouses are here to stay, and the cloud is where your data should be.

Similar articles

Latest Articles