There have been many attempts to label the process of making data ready for business use. Historically, organizations considered this task to be data management. A few years ago, Tom Davenport defined the concepts of data defense and data offense, where defense was about data security, data privacy, data integrity, data quality, data compliance, and data governance.
Now, analyst firms have defined a new term altogether, a Data Fabric, with supporting people and process efforts identified as DataOps. Together, these capture the people, the process, and technology aspects of making data ready for business use.
In their book, Competing in the Age of AI, Marco Iansiti and Karin Lakhani suggested an effective goal for a data fabric and that is to “make clean, consistent data available to applications. Allowing applications to rapidly subscribe, sample what they need, test, and deploy.” (Competing in the Age of AI, HBR Press, Page 72).
In terms of the importance attributed to this new category, analysts assert a data fabric is a hot, emerging market that is focused upon delivering a unified, intelligent, and integrated end-to-end platform that supports new and emerging data use cases.
Drivers for New Market Categorization
It has always seemed clear to me that business users need faster, easier means to access truly trustworthy data. This is essential to making appropriate business decisions and to implementing the edicts of digital transformation. At the same time, it is become more clear that data management needs to be made easier.
The problem is that integrating data from an increasingly hybrid and multiple cloud environment for real time insights is difficult, especially when it involves large and complex data sets. At the same time, data management teams, clearly, needs to provide faster integrated access to data across what the above distributed landscape. The diversity, scale, and complexity of an organizational data set makes data integration and data management design more complicated.
Clearly, traditional data integration failed to meet new business requirements, which combined real-time connected data, self-service, automation, speed, and intelligence. New and expanding data sources, batch data movement, rigid transformation workflows, growing data volume, and distribution of data across multi- and hybrid cloud environments exacerbated this issue.
Enterprises Struggling Delivering Value From Data Lakes
As I have written about multiple times, organizations are increasingly realizing that simply creating a parking lot of diverse data in the cloud, or an on-premises data lake, won’t magically create the meaningful insights that today’s digital businesses demand. This means data lakes need to be purpose built and the data housed within them made fit for purposes.
MIT-CISR’s research has found that 51% of organizations are stuck with data silos and another 21% have connected their data with fragile duct tape and band aids. What is needed instead is a better means for integrating, transforming, enriching, and orchestrating data. As my research with CIOs regarding Data Lakes has found, unless data lakes are purpose built for data, they easily become worthless, data swamps.
At the same time, the growing data volumes and their complexity has made delivering new business initiatives time-consuming, expensive, and challenging. And the desire to leverage data distributed across hybrid and multiple clouds only adds to this difficulty. It is time to deal with each of the following:
- Data silos
- Data sharing
- Data trustworthiness
- Self-service delivery
It is increasingly clear that traditional data integration is not meeting new business requirements from new data stakeholders. These requirements include the ability to combine real-time connected data and self-service along with a high degree of automation, speed, and intelligence. This involves dealing with new and expanding data sources, batch data movement, rigid transformation workflows, growing data volumes, and distribution of data across multi or hybrid cloud environments. While collecting data from various sources can be a straightforward exercise, enterprises struggle to integrate, process, curate, and transform data. This includes creating a comprehensive view of the customer, partner, and employee.
For these reasons, a data architecture is needed that can utilize active metadata, semantics, AI/ML algorithms and knowledge graphs to drive augmented data integration and data management. This architecture needs to be able to go after the multiplicity of data sources and types; to address the soaring number of data silos; to handle a diversity of data integration requirements; and to enable business-led data modeling and schema.
Goals for Data Fabric Category
Analysts suggest a data fabric needs to enable dynamic and augmented data integration in support of a holistic data management strategy. To this end, a data fabric needs to enable enterprises to process, transform, secure, and orchestrate data from disparate data sources. As an output, it needs to deliver a trusted and real-time view of customer and business data.
The end result should be a complete, single view of trusted business data. The impact of data quality itself cannot be emphasized enough. Dun and Bradstreet has found “if a company has 500,000 records and 30% are inaccurate, then it would need to spend $15 million to correct the issues arising from this data versus $150,000 to prevent them.
And given this, there needs to be secure data across the enterprise. Often this means, with data governance, the ability of secure data as it is discovered. In terms of business ends, a data fabric should deliver the self-service and a collaborative platform that business users need while enabling real-time sharing of data with business users, customers, and partners.
What is a Data Fabric?
While there are lots of definitions for a data fabric, probably the best definition is that a data fabric provides a data management platform that outputs clean, consistent, integrated, and secure data. It provides data that that is ready for analysis and action. To achieve this end, a data fabric needs to be agile and scalable, but even more important, it needs to have the ability to integrate data from disparate data sources and secure data that is in motion or rest.
This means that an effective data fabric integrates data sources in real time, near real time, and batch. And while doing this, it must minimize the complexity and hide heterogeneity by embodying a coherent data model that reflects business requirements rather than the details of underlying systems and sources. Doing this effectively starts by exposing metadata and creating a data catalog that complies with enterprise data policies.
As such, a data fabric needs to provide a semantic layer by automating the ingestion, curation, and orchestration of data sources. Core data fabric functionalities include data cataloging, flexible data modeling, semantic enrichment, ML-augmented data integration, and data preparation. Currently, these appear in separate, best-of-breed data management tools. However, the distinction between them is blurring and driving market confusion.
Put together, a data fabric focuses upon automating process integration, transformation, preparation, curation, security, governance, and orchestration to enable analytics and insights more quickly. As a business end, a data fabric should minimize complexity by automating processes, workflows, pipelines, generating code and streamlining data to accelerate various business use cases.
At the same time, a data fabric should result in reusable building blocks and augmented data integration services, data pipelines and semantics for flexible and integrated data delivery. It should optimize data management and integration technology, architecture design and services delivered across multiple deployment and orchestration platforms. Doing this will result in faster and, in some cases, completely automated data access and sharing.
What Should Data Fabrics Need to Accomplish
Data Fabrics should enable organizations to continuously find, integrate, catalog and share all forms of metadata. This includes performing analytics over connected metadata in a knowledge graph. Data Fabrics, also, should enable the ability to use AI/ML algorithms to deliver just-in-time data management infrastructure and processing maps for data integration use cases. In sum, this implies the ability to automate data orchestration.
Clearly, this matters to organizations who want to deliver self-service capabilities and an automate data platforms. In terms of business ends, these organizations typically want to create comprehensive end-to-end data management capabilities in order to do the following types of use cases:
- Customer 360 and customer intelligence
- Real time advanced analytics
- IoT
- 360-degree view of businesses
- Business intelligence and dashboards
- Compliance audit
Parting Words
Clearly, data fabric is an emerging category. But its emergence demonstrates the maturing of data management as a business. It implies a graduation much like that which occurred for ERP from best-of-breed point solutions to a single platform that addresses most customer needs to makes data ready for use.
Recently, CIO Jason James said to me, “data is like oil. It has little function and much expense unless it is refined.” Data professionals can no longer just deliver raw, unprocessed data. Data has to be refined to have value for analysis or automation. The time is now for a single data platform, a data fabric, that can respond to the needs to all data consumers and users.
About the author:
Myles Suer is Principal Product Marketing Manager for Data at Dell Boomi.