Wednesday, September 22, 2021

Top 10 Data Catalog Software Solutions

Data catalog software solutions are geared to handle critical data management issues. For large enterprises that have a data lake or other big data initiative, just figuring out what data the company has available can be extremely challenging. And even if organizations know what they have, they don’t always know which of their datasets are trustworthy and which are less reliable. In these situations, sometimes a data lake becomes more like a data swamp.

A data catalog tool automates the discovery of data sources throughout an enterprise’s systems. It then uses metadata management capabilities to organize that data, show the relationships among different pieces of data, enable search and track data lineage, that is, where the data originated. Many also include data governance capabilities and enable self-service by business users, and some also include glossaries so that users share a common understanding of terms.

Most modern data catalog tools rely heavily on artificial intelligence (AI) and machine learning (ML) capabilities. Often ML provides a score that shows how reliable data is. ML can also provide other types of recommendations and enable some basic analytics.

How to Select Data Catalog Software

If you are in the market for data catalog software, keep these tips in mind:

  • Think about who will use your data catalog software. Data scientists have very different needs than chief data officers (CDOs), who have very different needs than business analysts and chief financial officers (CFOs). When selecting a tool, make sure that the software or service is designed to meet the needs of your users.
  • Consider your deployment needs. Many data catalog tools are available as a cloud-based service, but that isn’t always the best option if you have unique security or compliance needs, or if your data resides in a wide range of cloud and on-premise locations.
  • Make sure it will support your workflows. Your data catalog software will need to integrate with the other software you use for your data lake, and it will need to fit in with your current processes. If you purchase a tool that will require you to make huge changes in the way you conduct day-to-day activities, you may find that it gets limited use or provides limited value.
  • Ask for a demo and detailed pricing. Some vendors offer upfront pricing, but many do not. Conduct a thorough total cost of ownership (TCO) analysis to make sure that you are comparing apples to apples when evaluating your options.

With those tips in mind, here are ten data management platform vendors you might want to consider:

Jump to:

Best Data Catalog Software

Alation

 

A pure-play data governance and data catalog vendor, Alation claims to be “the industry’s leading data catalog.” It boasts more than 300,000 subscribers in 64 countries, and its customers include Finnair, Blackstone, the Australian Government Department of Defence, Dow, Albertsons, Pepsico, Expedia, PNC, American Express, General Mills and many others. It has received numerous accolades, including a KMWorld Readers’ Choice Award for 2020, Gartner Peer Insights Customers’ Choice 2020, and being named a Leader in the Forrester Wave for machine learning data catalogs in 2020.

Key features of the Alation Data Catalog include behavioral intelligence, seamless collaboration, guided navigation, data governance capabilities, and connections to popular big data and BI tools, as well as APIs and an Open Connector SDK. It also offers tailored solutions for finance, healthcare, insurance, manufacturing, retail and technology companies. In addition, it has a large partner ecosystem that includes systems integrators, resellers, and complementary technology vendors.

Pricing is available on request. The company offers a weekly live demo, as well as the opportunity to request a personalized demo.

Pros

  • Alation offers exceptionally good machine learning capabilities, including those integrated into its Behavioral Analysis Engine.
  • Enterprises also give high ratings to Alation’s collaboration capabilities, which are particularly useful for remote teams.
  • The company was one of the early pioneers of data catalog technology and continues to set itself apart as a technology leader.

Cons

  • Some customers complain about Alation’s licensing terms and say that the tool can be very expensive.
  • Others also say that the tool is sometimes buggy when the company pushes out new releases.
  • Alation hasn’t always done the best job with data lineage, but it has recently improved its capabilities in this area.

Alex Solutions

 

Australia-based Alex Solutions describes its product as a metadata management solution that incorporates both data catalog and data governance capabilities. It primarily serves enterprises in the financial services, telecommunications, retail and utilities sectors, and it has customers in Australia, Europe, America and Asia. Gartner and Forrester have named it a Leader in the market.

Alex offers data catalog, business glossary, policy-driven data quality, intelligent tagging, technology-agnostic metadata scanners and workflow capabilities. Its metadata management capabilities are useful for data inventory, enrichment, usage analysis, sensitivity detection, data lineage support, risk management and more. Its machine learning capabilities are highly advanced, and it has an intuitive interface.

Demos and pricing are available on request.

Pros

  • The Alex Solutions product has a very broad range of capabilities.
  • Customers say it is very easy to deploy and use.
  • Alex’s lineage profiling is particularly noteworthy.

Cons

  • It is not as easy to integrate the Alex product with data science and BI solutions as when using some other data catalog products.
  • The product’s collaboration capabilities could be improved.
  • Some customers say that they would like to see better training for business users.

Collibra

 

Collibra aims to make data meaningful with its Data Intelligence Cloud, Platform, Data Catalog, Data Governance, Data Lineage, and Data Privacy products. Its customers include Adobe, AXA XL, DNB, Equifax, Honeywell, NetApp, AstraZeneca, Credit Suisse, Dell, T-Mobile, JPMorgan Chase, Progressive, Cigna, Lockheed Martin, Verizon and others. Forrester and Gartner have both named the company a Leader, and it has also won awards from Forbes, Business Insider, Datanami, Battery Ventures and others.

Collibra’s Data Catalog product includes wide-ranging native connectivity, ML-powered automation, data scoring and embedded data governance abilities. Data catalog capabilities are also included in the company’s flagship Data Intelligence Cloud.

Pricing and demos are available on request.

Pros

  • Users give Collibra high marks for its data intelligence capabilities and graph technology.
  • It’s a good option for large enterprises with complicated data governance needs and a wide range of data sources.
  • The company has a strong ecosystem of third partners and peer-support user groups.

Cons

  • When users complain about Collibra, it is usually the interface that they find the most fault with.
  • Some customers dislike the company’s recent emphasis on its cloud product because that can make it difficult to comply with security policies.
  • While the company gets good marks for service overall, a few customers have reported bad experiences with some service representatives.

Data.world

 

Like many of the other vendors included in this list, Data.world is a pure-play vendor focused on data catalog capabilities. Its customers include AP, Mirum, WPP, Yonder and others. Forrester has named it a Strong Performer, and Gartner calls it a Challenger.

A cloud-native product, Data.world offers contextual data cataloging that includes metadata, dashboards, analysis, code, docs, project management and social collaboration capabilities. It also incorporates knowledge graph technology and provides real-time integration capabilities. In addition, the company follows agile development processes, continually releasing updates and feature improvements.

Unlike many of the other data catalog vendors, Data.world posts its pricing on its website. The community version of the product is available in two tiers (free or $12 per month), and the enterprise version of the product comes in Essentials ($50,000 per year and up), Standard ($100,000 per year and up), Premier ($150,000 per year and up) and Premier Plus (custom pricing) plans. Demos and free trials are also available.

Pros

  • This product’s upfront pricing makes it easy to see how much this tool will cost and evaluate whether the value will be worth the cost.
  • This tool’s user interface is one of the easiest to use.
  • Data.world is a public benefit corporation devoted to providing social benefits, including providing free access to many datasets, supporting data journalism and making education and community resources freely available.

Cons

  • As a younger product, Data.world’s feature set is not quite as mature as some of the offerings on the market.
  • The company does not have as many third-party partners and integrations as other vendors do.
  • It doesn’t have as much support available for customers outside the U.S.

Erwin

 

Previously part of CA Technologies and recently acquired by Quest Software, Erwin focuses on products for the Enterprise Data Governance Experience (EDGE), including business process modeling, enterprise architecture, data modeling, data catalog and data literacy. It has been in business for more than three decades, and its customers include Adecco, Balfour Beatty Construction, CenturyLink, Fidelity International, Royal Bank of Scotland and others. It has earned numerous accolades, including being named a Leader by Gartner and a Contender by Forrester.

Erwin offers Data Catalog (DC) as a standalone product or as part of its Data Intelligence suite. Benefits of Erwin DC include a centralized data governance framework, a metadata-driven approach, accelerated project delivery, increased data quality, regulatory compliance and accurate analytics. It includes a metadata manager, mapping manager, reference data managers, lifecycle manager, business data profiling and data connectors.

Prices for some Erwin products are available online, but for the Data Intelligence and Data Catalog products, you will need to contact a representative. A free trial is available.

Pros

  • Erwin offers a very broad range of data governance capabilities.
  • The company has a reputation for being very good at data modelling, which has influenced its data catalog features.
  • The vendor has a large, strong ecosystem of customers, partners and resellers.

Cons

  • Initial deployment of the product can be complex and time-consuming.
  • The product can be more expensive than some others.
  • The interface isn’t as easy to use as some other options.

Google Cloud Data Catalog

 

Part of Google Cloud’s lineup of data analytics products, Google Cloud Data Catalog is a fully managed cloud service with data discovery and metadata management capabilities. It is available in 23 different regions around the world. Google also has strategic partnerships with Collibra, Tableau and Informatica.

Key features of the service include serverless architecture, metadata as a service, a central catalog, search and discovery, schematized metadata, cloud DLP integration, on-prem connectors, cloud identity and access management (IAM) integration and governance capabilities. It offers a faceted-search interface, metadata syncing and tagging, easy scalability and integration with cloud data loss prevention (DLP) and other Google Cloud services.

Pricing is available on the website, but it is somewhat complicated. Storage for up to 1 MiB per month is free and costs $100 per GiB per month beyond that. The first 1 million API calls are free, and after that they cost $10 per 100,000 API calls. New customers are also available for Google Cloud’s free trials and introductory credits.

Pros

  • Organizations that use other Google Cloud Services will appreciate the easy integration capabilities.
  • It also gets high marks for its scalability.
  • The service is very affordable.

Cons

  • The Google service does not have as many features and capabilities as the offerings from most of the pure-play vendors.
  • It does not integrate with as many data sources as some of the other data catalog options.
  • Because the pricing depends on usage, it can be difficult to estimate the total cost as needs scale.

Hitachi Vantara

 

Formed from the merger of Pentaho, Hitachi Data Systems and Hitachi Insight Group, Hitachi Vantara sells storage hardware, converged and hyperconverged infrastructure, Internet of Things (IoT) solutions, video intelligence, IT operations management software and data protection software, as well as data management and analytics software. Its Lumada Data Catalog software is part of its data management and analytics portfolio, which is used by organizations like Kaiser Permanente, Fannie Mae and Johnson Controls. Forrester named Lumada Data Catalog a Strong Performer.

Based on technology acquired when Hitachi Vantara purchased Waterline Data, Lumada Data Catalog offers very advanced machine learning and behavioral intelligence capabilities. It promises faster data tagging and includes features like AI-driven discovery, end-to-end data lineage, self-service data access, sensitive data management and cross-functional collaboration.

Pricing and a demo are available on request.

Pros

  • Lumada Data Catalog has highly advanced ML and behavioral intelligence features.
  • Analysts say its lineage analysis capabilities are among the best available.
  • Customers praise its interface as user-friendly.

Cons

  • Its data governance capabilities could be improved.
  • This product does not have as many connectors to third-party applications as some of the alternatives.
  • Its collaboration capabilities also have room for improvement.

Infogix

 

Founded in 1982 as a risk and compliance software vendor called Unitech Systems, Infogix now offers a data intelligence platform called Data360 that includes data catalog, data governance, data quality and data analytics capabilities. Its customers include Total Health Care, Swedbank, Keurig and Johnson & Johnson. Gartner named the firm a Challenger, and Forrester said it is a Contender.

Key data catalog features in Data360 include automated metadata management, machine learning-based search and discovery, smart business glossary, data lineage, impact analysis and more. It integrates with the other Data360 products, and the company also offers professional services, training and support.

A demo and pricing are available on request.

Pros

  • The complete Data360 platform has a wide range of data intelligence capabilities.
  • The tool does a very good job of helping organizations quantify the value of their business data and mange data assets.
  • The software is very easy to use.

Cons

  • Some capabilities, like analytics, are not yet as advanced as some competing products.
  • The tool doesn’t always do a good job of handling large data volumes in complicated enterprise environments.
  • Some customers have complained about inadequate documentation.

Informatica

 

One of the most well-known data catalog vendors, Informatica offers an Intelligent Data Platform that incorporates a wide range of cloud-based enterprise data management products. Its Data Catalog customers include Avis Budget Group, AXA XL, Eli Lilly & Co., L.A. Car, AIA Singapore and Franciscan Alliance. Gartner has named the company a Leader for the last five years in a row; Forrester ranks it as a Strong Contender.

Informatica’s Enterprise Data Catalog provides enterprise-wide data discovery capabilities that make use of AI technology. It provides a holistic view of data within its business context. Key features include AI-powered automation, data provisioning, end-to-end data lineage, integrated data quality capabilities and collaboration abilities.

Pricing is available on request. Informatica provides free trials for some of its tools, but not data catalog.

Pros

  • Organizations that use other Informatica tools often find that the company’s Data Catalog service is a good fit for their needs.
  • Its metadata intelligence engine is among the best available in the market.
  • It is highly scalable, making it a good option for organizations creating a cloud-based data lake.

Cons

  • In some situations, it can be difficult to deploy Informatica’s solutions.
  • Enterprise Data Catalog does not have the data governance capabilities built into many other data catalogs.
  • Some customers cite pricing and total cost of ownership concerns.

IBM

 

In the past, IBM offered on-premise data catalog software as part of its InfoSphere line, but it is currently focusing primarily on its cloud-based IBM Watson Knowledge Catalog. Organizations that use the service include Danske Bank and Standard Bank Group. Both Gartner and Forrester named IBM a Leader in this market, and the tool also won the Gartner Peer Insights Customer Choice Award for 2020.

IBM Watson Knowledge Catalog can be deployed on the IBM Cloud or on a private cloud through IBM Cloud Pak for Data. Noteworthy features include intelligent discovery recommendations, an end-to-end catalog, automated data governance, data lineage, quality scores and self-service insights. It also includes data quality, collaboration and compliance capabilities.

If you want to deploy IBM Watson Knowledge Catalog on IBM Cloud Pak for Data, you will need to contact the company for pricing. If you purchase it as a service on IBM Cloud, you have the option of three different pricing tiers: Lite (free), Standard ($300 per instance, $0.50 per capacity unit-hour, and $50 for each additional user) and Professional ($7,000 per instance, $0.40 per capacity unit-hour and $300 per additional user).

Pros

  • The service integrates well with other IBM products and services.
  • The Cloud Pak for Data deployment option is often a good fit for enterprises with very large, complex ecosystems.
  • The upfront pricing for IBM Cloud deployments makes it easy to estimate costs.

Cons

  • The interface for the IBM product is not as user-friendly as some of the other options available.
  • Deployment can be difficult and time-consuming.
  • Some customers complain that the Cloud Pak for Data pricing is too high.

Data Catalog Software Comparison Table

Data Catalog Software

Pros

Cons

Alation

* Excellent
ML capabilities

* Good
collaboration features

* Pioneering
innovation

* High
pricing

* Buggy
releases

* Poor
data lineage features

Alex
Solutions

* Broad capabilities

* East to deploy and use

* Excellent lineage profiling

* Difficult to integrate with BI
and data science tools

* Poor collaboration capabilities

* Needs better training

Collibra

* Excellent
data intelligence and graph technology

* Good
for complex environments

* Strong
partner ecosystem

* Not
user-friendly

* Cloud-only

* Occasional
bad service

Data.world

* Upfront pricing

* Easy to use

* Public benefit corporation

* Immature product

* Limited integrations

* Limited support outside the US

Erwin

* Broad
data governance capabilities

* Good
data modelling capabilities

* Large
ecosystem

* Difficult
deployment

* High
pricing

* Not
user-friendly

Google
Cloud Data Catalog

* Integration with other Google
Cloud products

* Highly scalable

* Affordable

* Limited feature set

* Doesn’t integrate with all
data sources

* Difficult to estimate price
accurately

Hitachi Vantara

* Advance
ML and behavioral intelligence

* Excellent
lineage analysis

* User-friendly

* Limited
data governance abilities

* Limited
connectors

* Poor
collaboration capabilities

Infogix

* Wide range of features

* Quantifies data value

* Easy to use

* Limited analytics capabilities

* Poor handling of large data
sets

* Inadequate documentation

Informatica

* Integration
with other Informatica tools

* Metadata
intelligence engine

* Highly
scalable

* Difficult
deployment

* Limited
data governance capabilities

* High
TCO

IBM

* Integration with other IBM
products

* Flexible deployment options

* Upfront pricing for cloud
deployment

* Not user-friendly

* Difficult deployment

* High pricing

 

Similar articles

Latest Articles