Data is the currency of business. But managing data can quickly get out of hand. A recent study by MIT found Big Data is turning into bad data and potentially costing companies up to 25% of possible revenue because having to fix bad data eats into operating expenses.
But working with vast streams of messy data can be a challenge for enterprise organizations, and it's only getting tougher as more data is created and collected. That’s why data management, or data governance, is so important.
Gartner defines master data management, its term for data governance, as “a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise's official shared master data assets.”
Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts.
Data governance is primarily an on-premises solution and most of the leaders in that field are old guard software companies, most of which have made a transition to the cloud to some degree or other. But the firm believes that in the coming years, data governance too will transition to the cloud. Just how quickly is in question.
There are numerous firms in this space all vying for your business, so we’ve narrowed the field to 10 major players. As said previously, most are the old guard, who learned a few new tricks. Others are new players.
Data Governance Solutions
Starting with its Simple Storage Service (S3) and building from there, AWS data governance includes Elastic MapReduce, Athena, a metered query engine for data residing in S3. For provisioning your cloud environment, AWS CloudFormation allows you to use a simple text file to model and provision all the resources needed for your applications. Amazon CloudWatch monitors and collects metrics on all of your resources. AWS Systems Manager allows you to monitor all your resources and automate common operational tasks. Finally there is AWS OpsWorks for configuration management, particularly if you use Chef or Puppet.
IBM is a veteran of data governance due to its mainframe heritage. It offers stand-alone DBMS, including various versions of DB2, IBM PureData System for Analytics, DB2 Analytics Accelerator, Hadoop through IBM BigInsights, DataFirst Method and IBM Watson Data Platform. Its primary governance system is IBM Information Server, which provides unified governance of your data. It helps users find and search through assets, explore relationships between assets, search unstructured data sources as well as structured databases, and allows for automatic discovery of new data.
Microsoft’s data governance starts with its flagship productivity suite, Office 365. It allows customers to manage the full content lifecycle, from creating or importing data to storing it and creating policies to retain and permanently delete content. That runs on top of a bunch of Microsoft products repurposed for the cloud, starting with SQL Server, both on-premises and in Azure. It offers a data warehouse appliance called Azure SQL Data Warehouse, a Hadoop distribution based on Hortonworks called Azure HDInsight, and Azure Data Lake for data collection. Azure SQL Data Warehouse is for the growing interest in cloud data storage as well.
Oracle starts with its flagship product, Oracle Database 12c along with the Oracle Big Data Management System, Oracle Big Data SQL and Big Data Connectors. For specific data governance it has Oracle Enterprise Metadata Manager (OEMM) and Oracle Enterprise Data Quality (EDQ). It also offers turnkey hardware systems for its software stack through the Oracle Exadata Database Machine and Oracle Big Data Appliance, and has cloud services like Oracle Database as a Service, Exadata Cloud Service and Big Data Cloud Service.
The last Oracle competitor left standing, SAP offers its IQ DBMS and Hana for in-memory DBMS and analytics. Hana has been updated to include features like backup and disaster recovery, analytics, integration with Apache Spark and multitenancy. SAP Hana is one component of SAP Platform. Then there is SAP Master Data Governance, which consolidates and governs data from one location to ensure data quality and consistency.
Teradata is known for its analytics platforms, including a DBMS, data warehouse appliances and cloud data warehouses. It has connectivity through Hadoop through Aster Analytics and streaming data via Teradata Listener, all of it designed to present the information through a single unified interface. And its Master Data Management is a complete lifecycle framework for data governance.
Cloudera is one of three major Hadoop distribution companies and very successful at that. It offers Cloudera Enterprise, a Hadoop distribution with both Hadoop for batch analytics and Spark for real-time analytics, plus Cloudera Navigator for governance, Cloudera Manager and Cloudera Director for cluster administration both on-premises and in the cloud and supporting AWS, Azure and Google Cloud Platform.
8) Dell Boomi
Boomi is a business unit within Dell it acquired in 2010 that specializes in master data management both on-premises and in the cloud. Boomi features little to no coding development through its Boomi Process Library, which provides examples for building governance apps. It also supports PaaS vendors and connectors to Azure, AWS and Google, offers EDI connectors for connecting with partners and supports Docker containers for DevOps methods of development.
SAS’s whole business is built on analytics. It offers a master data management solution called SAS Data Governance to help organizations prepare and manage both traditional and big data sources. It lets you maintain and manage data attributes through a common data model, flag changes in metadata and create snapshots, store and manage lists and hierarchies and create reports on data health and any remediation needed.
10) TIBCO Software
TIBCO MDM specializes in offering a unified view of company data that is stored in different silos, it allows companies to get a clear view of their business data and act on it quickly. TIBCO MDM offers visualization of data workflow throughout the company, allowing companies to observe processes and make improvements as needed. It is available both on-premises and in the cloud via TIBCO Clarity Cloud Edition.