Top 15 Data Warehouse Tools

SHARE

A data warehouse tool is a key component in Big Data and data analytics. A data warehouse is an intelligent data repository that feeds analytics software, allowing users to data mine for competitive insight.

A data warehouse typically sits between large data storage repositories (like databases) and data marts. Data warehouses, often used with ETL tools, enable reporting and analytics of all kinds, from business intelligence to predictive analytics.

Data warehouse tools play an absolutely critical role in managing today’s data analytics process in businesses across all sectors. These tools work with an array of technologies, including DBMS (Database management system), DMA (data management for analytics) and DMSA (Data Management System and Analytics).

Increasingly, data warehouse tools use artificial intelligence and machine learning to boost performance. Today's enterprise-grade CPD (cloud data platform) is a complex technology that combines structured and unstructured data into formats that are useful for analytics.

Investment in data warehouse tools is growing dramatically. From its current size of approximately $21 billion, the data warehouse market is forecast to grow to $34 billion by 2025. The fastest growing players are Amazon Web Services Redshift and Microsoft Azure’s SQL Data Warehouse. These two data warehouse vendors command such a large chunk of growth that competitors divvy up a modest pool.

Indeed, these two market leaders have something in common: they are both cloud computing companies. Like much of what was exclusively in the data center, data warehouses are migrating to the cloud, though there are plenty of in-house and hybrid cloud data warehouse tools available as well. 

data warehouse tools

A data warehouse – including the assorted data warehouse tools – sits between the data source and the users who consume that data, enabling effective data mining.

How to Choose a Data Warehouse Tool

If there was ever a sector that required serious homework to select a product, data warehouse tool is the winner – the complexity and variables of these tools is enormous. This is partially because the field of data analytics is seeing explosive investment, and so there are always more variations on products. And it's also because data can be filtered and stored in so many different systems. But keep these four factors in mind: 

1) Interoperability with your existing system

No matter how robust the data warehouse tool, if it’s not geared for your business’s needs, it’s not effective. Is the data warehouse tool optimized for your data types, say structured or unstructured? Does the required maintenance fit with your available staff? Key question: is this data warehouse part of a “product ecosystem” that includes your existing infrastructure?

2) Cloud or on-premise datacenter

In a sense, the cloud vs. on-premise debate is already settled: most all data warehouse tools are available in the cloud. So if it’s cloud-based you want, you’ll get it. But many businesses – likely yours, if you work for large enterprise – straddles the cloud and on-premise world. In this case, you might want a data warehouse appliance, with software and hardware together. Or, you might seek one of the solutions that has traditionally been on-premise, like a classic DBMS (database management system) or some kind of niche analytics DBMS.

3) Cost

The issue of pricing in the data warehouse market is – like the overall data warehouse market – quite complicated. Although you’ll see pricing listed in the vendor comparison chart below, truly doing an “apples to apples” comparison by price is unlikely.

This is because the overall platform of two vendors that both charge, say, “$.25 per hour” is in all cases quite different. One, for instance, may be very strong in machine learning, while the other has focused on, say, offering the largest number of features. It’s this context  – and how it fits with your business – that determines a data warehouse's real ROI for your business.

4) Overall Vendor Focus

In truth, this is the big one. For most data warehouse buyers, even important concerns like price and data type are secondary to a vendor’s overall market strategy, and how that strategy fits with their own. For instance, is your strategy so cloud-focused that you want a major cloud provider who also sells data warehouses?  Or is your datacenter so core to your infrastructure that you prefer a vendor with a legacy stance built on in-house solutions?

Top Data Warehouse Tools

data warehouse tools, with ETL 

Data warehouse tools like ETL work in tandem with the many elements of the data flow, enabling far more efficient data analytics.

Amazon Web Services

Amazon Redshift is a good fit for enterprises that need top-level advanced functionality, and have the budget for a top tool and have the in-house staff that can manage AWS’s complex menu of solutions.

If sheer number of data warehouses in the cloud determined the market leader, than Amazon Redshift would likely be the top data warehouse tool. Furthermore, given AWS’s gigantic footprint in the cloud market – clearly it’s the top cloud company – the company has a extensive tool set to compliment Redshift’s functionality.

These related data warehouse tools include Redshift Spectrum, which offers advanced serverless functionality that queries data in local storage as well as the voluminous Amazon S3. Also available is Amazon Elasticsearch, a cloud-based search engine, and Amazon Kinesis, a data analytics service. Also notable is AWS Glue, a metadata catalog service.

Pros:

  • AWS’s deep pockets and hyper-aggressive commitment to building out its solution menu means that customers will always have a solution with top functionality.
  • Scalability is a top priority at AWS – the company is a leader even among the “hyper-scalers” (the top cloud providers)
  • The number of third party vendors and solutions built to interoperate with AWS’s offering is seemingly limitless. Any customer seeking any manner of niche data warehouse tool will almost certainly find it. 

Cons:

  • The AWS interface is known for its complexity
  • In addition to its complexity, AWS’s penchant for ceaseless growth – fast even by the standards of the tech industry – means that in-house staff will need to constantly do their homework.

Oracle

There’s no argument that Oracle is a dominant market leader in database tools, and this strength carries over into the closely related market for data warehouse tools. The vendor’s data management products are widely seen as highly capable and as sophisticated as anything on the market.

In short, for those large enterprises with a robust budget, Oracle is often the default choice – many Fortune 500 companies consider Oracle as standard infrastructure. In the cloud, Oracle has caught up from a slow start – very much so. In contrast to its early questioning of the cloud, Oracle has since invested a vast sum in becoming cloud proficient – and has succeed in this. The company’s Autonomous Data Warehouse, which is cloud-based, enables the typically lower overhead expected of cloud-based products.

Any organization using Oracle data warehouse tools will have a plethora of robust tools. This includes the Oracle Big Data Management System, and the well-regarded Oracle Exadata Database Machine. The company’s now extensive menu of cloud-based choices typically has a compliment that is available as an on-premise data tool. 

Pros:

  • Top quality functionality built into its data warehouse tools
  • Given that Oracle is so widely deployed, there is a huge cohort of accredited experts

Cons:

  • Can be expensive
  • Licensing issues generate user complaints
  • Has played catch up in the cloud, but it was a late entrant

Microsoft

As a leader in the overall cloud market, Microsoft’s data warehouse capabilities are nimble and sophisticated, and make great use of the cloud’s scalability and flexibility. Important for many companies in the current multicloud climate is Microsoft’s focus on hybrid cloud; its menu of data warehouse tools functions in this heterogenous environment, in the cloud or on-premise.

Given the flexibility of its data warehouse tool product line, Microsoft is a suitable choice for large enterprise or SMB with a significant budget. The company’s data management solutions are known for top workload management, and the ability to handle major data repositories.

Particularly noteworthy is the vendor’s deep commitment to data governance – an important aspect of data warehouse tools and one that is growing more critical over time.

The company’s data warehouse tools include Azure Databricks, Analytics Platform System, Azure HDInsight, Azure Data Factory and – and old classic – SQL Server.

Pros:

  • Superior use of cloud-based functionality for data warehouse tools
  • A large and robust complimentary set of data warehouse tools
  • Know to generate significant customer loyalty

Cons:

  • Some users complained of confusion around pricing
  • Can be challenging for extensive deployments

IBM

IBM is a top choice for large enterprise customers, with a host of data warehouse and data management tools used by a major install base. The company is well regarded for its vertical data models and – particularly important for the data warehouse market – in-database analytics and real time analytics.

With IBM’s strength in the cloud, most all of its data warehouse solutions can be leveraged on-premise or in the cloud; it’s designed for a hybrid scenario. This is key for large enterprise customers that are still migrating core workloads to the cloud. The company sells systems that feature hyper-scale capabilities for data analytics; it also enables ML algorithms to support cognitive analytics.

Historically among all the legacy tech vendors, IBM has been the most focused on interoperability and open standards – a key advantage in a data warehouse sector that includes products like Hadoop and Spark. Also notable in its product line is the managed data warehouse solution DB2 Warehouse on Cloud.

Pros:

  • Exceptionally wide array of data warehouse offerings means essentially any use case is covered
  • Geared for a multicloud and hybrid cloud scenario
  • The robust functionality expected of a vendor focused on high end Fortune 500 customers.

Cons:

  • Given the sophistication of the data warehouse tools, self service capabilities are not considered a strength.

Teradata

Founded in 1979, Teradata is clearly a well respected legacy vendor in data warehouse tools. It’s product portfolio is well regarded, sophisticated and mature, and definitely geared for today’s high end enterprise that needs advanced solutions and has the budget to pay for them.

To its credit, Teradata’s data warehouse solutions are designed for today’s hybrid cloud and multicloud enterprise landscape. The companies broad mix of tools work on-premise, in the public cloud, or using the private-public mix that is so popular with larger companies currently.

Its product offering includes the IntelliBase and IntelliFlex appliances, for enterprises that want a hardware-support solution. Yet its true strength rests in its software offerings – the flagship Vantage product name –  including an analytics platform supported with an SQL engine and machine learning capability. Teradata also offers in-database analytics, automation features for AI and ML, and full compute processing functions.

Pros:

  • Strong for hybrid and multicloud scenarios.
  • Automation for AI and ML is key advantage as these technologies keep growing.
  • Ease of use is considered a plus.

Cons:

  • Geared for large, advanced enterprise users instead of SMB.
  • Not known for pricing and contract flexibility.

SAP

The German software giant SAP, founded in 1972, is one of the top legacy vendors, yet no one suggests it’s stuck in the past. First, even among leaders in the data warehouse tool market, SAP is a dominant player – arguably the dominant vendor.

As a foreword looking vendor, SAP incorporates machine learning and artificial intelligence functions within its flagship SAP HANA solution, a leading data warehouse tool. For instance, HANA can leverage algorithms from TensorFlow. HANA also features an in-memory database management system (DBMS).

Also impressive: SAP has built an extensive network of alliances with major cloud providers, including the leaders AWS, Azure and Google Cloud, as well as with aggressive growing players like Alibaba. In essence, SAP is a cloud company by default. Among its many cloud-based data warehouse solutions are the SAP Data Warehouse Cloud and the SAP Cloud Platform Big Data Services, which uses Hadoop.

Pros

  • Leverage AI and ML are enhanced performance in its flagship solution.
  • Uses a full network of cloud providers to offers its solutions over the cloud
  • Clearly one of the top vendors even among data warehouse leaders – HANA is  known for scalability and performance. 

Cons

  • Some users have complained about customer support
  • Not an inexpensive solution

Google

Given its dominant position as a search engine, it’s no surprise that Google is known for its ability to manage data. Its data warehouse tools reflect its leading edge – really, next generation – abilities in data management for analytics.

To be sure, many of its tools were developed to service its consumer search business, yet the company has invested heavily in making them enterprise-class tools for demanding clients. Its data warehouse products are known for ease of use, as well as high performance.

Particularly noteworthy is Google’s BigQuery platform, which capably handles a wide array of advanced enterprise use cases. Organizations can leverage the machine learning functions of BigQuery, enabling them to service everything from data science queries to legacy data warehouse needs.  Other important data warehouse tools include Cloud BigTable and Google Sheets.

Pros

  • Google cloud is growing its market share, after a slow start – suggesting still greater investment in its data warehouse tools.
  • Market-leading AI and ML capabilities.
  • Highly scalable platform, possibly one of the most scalable even among data warehouse leaders.

Cons

  • While many Google products are intuitive, BigQuery can require a significant investing in learning.
  • Some users have voiced concern with support.

Snowflake

Founded in 2014, Snowflake is the new hip contender in the data warehouse tools arena – but one whose product portfolio holds its own with more mature contenders. In essence it’s been able to survey the competitors and launch a platform that’s more contemporary. This new player is already considered a market leader, and is known for its reasonable pricing.

The company offers an automated, cloud-based platform. Its flagship solution is a fully managed data warehouse on leading clouds like AWS and Azure. Impressively, its system is set up based on separation of resources, enabling one element to respond and scale based on its own workload demands. The net effect is a robust ability to handle an ever-changing, heterogenous infrastructure. Some customer note that Snowflake allows them to service a greater variety of use cases and more total workloads.

For data warehouses, being ACID-compliant (atomicity, consistency, isolation, durability) means transactions are processed with fewer hiccups. In addition to offering ACID-compliant, Snowflake also supports a wide array of formats, from Parquet and Optimized Row Columnar. The company touts a handful of key partnerships to extends its product offering.

Pros:

  • Automated offers takes full advantage of a mixed multicloud environment.
  • Known for rapid and highly flexible scalability
  • Good choice for cost-conscious organizations

Cons:

  • Among data warehouse vendors, Snowflake is a newer vendor, which might offer some growing pains as it continues its rapid growth.
  • While its product portfolio is impressive, the company doesn’t have the legacy experience of many of the household names in this space.

Cloudera

With strong roots in open source, Cloudera's data warehouse portfolio features some open source classics: Spark, Impala, Kudu and of course Hadoop. This sense of openness is reflected in the many environments in which Cloudera operates: on-premise and fully multicloud, private cloud (for those enterprise that still rely on them), and bare metal. Cloudera is considered strong in data lake deployments.

To its credit, Cloudera makes significant use of artificial intelligence and machine learning – it’s a key part of the company’s strategy. Workloads can be automatically scaled up, powered by ML intelligence. Cloudera adapts to a wide array of use cases in a number of industries.

The company’s Cloudera Data Platform is geared to be cost effective as it handles data from the edge, structured or unstructured data. In an innovative twist, it shifts workloads between cloud and in-house for analytics, with a key AI component. Included in the focus, along with Cloudera’s openness, is consistency across security and data governance.

In a major market move, Cloudera merged with competitor Hortonworks. The two companies combined offer synergy in terms of product portfolio – not to mention eliminating a competitor. 

Pros:

  • Cloudera’s emphasis on AI and ML should keep its data warehouse portfolio on the advanced edge.
  • Flexible and growing product portfolio.
  • Known for good training help.

Cons:

  • Cloudera has encountered some growing pains as it merged with Hortonworks, even as the two firms have combined resources.

Micro Focus

To interoperate with the changing world of multicloud computing, Micro Focus’s flagship Vertica Analytics solution is designed to work with AWS, Microsoft Azure or Google Cloud. It’s even optimized for VMware clouds, for those larger enterprises that depend on a hybrid cloud to straddle the public and private cloud. Vertica offers MPP (massively parallel processing) and is capable of hyperscaling to respond to larger analytic workloads.

Additionally, for the open source elements of your infrastructure, Vertica interconnects with Spark and Kafka and Hadoop. Built into its MPP functionality is a machine learning component that helps mine upper level analytics use cases.

Based in the UK, Micro Focus uses a 'shared nothing' architecture, which leverages a distributed computing design. One of the advantages of shared nothing is that is reduces single points of failure in a system, which increases a system’s overall availability. Micro Focus is known for its high availability – a valuable features for any data warehouse tool.

Pros:

  • Interoperates with essentially any cloud platform and IT infrastructure.
  • Solid use of data science and machine learning technology.
  • Micro Focus is known to be easy to work with.

Cons:

  • Some users look for greater self service automation

Additional Market Leaders: Data Warehouse Tools

MarkLogic

Known for having a clear and efficient data hub strategy  – using automation – that effectively combine and helps data mine from a wide array of data sources. Definitely a top vendor. 

MongoDB

Exceptionally popular choice, used by thousands of companies. One of the true success stories in open source, MongoDB is found in some of the largest data centers in the world. Strong in hybrid cloud.

Talend

With an open source element, Talend touts its portfolio as offering data integration of many types – this is core tool to help a data warehouse. The Talend Open Studio is available as a free download.

Informatica

Founded in 1993, Informatica offers a dazzlingly comprehensive array of data warehouse tools, from cloud data integration to data engineering quality tools. The company is very strong in its ability to support cloud-based data warehouse deployments.

Arm Treasure Data

Treasure Data Customer Data Platform is designed for granular customer targeting to better personalize the marketing approach. It is used widely across an array of sectors, from gaming to retail. Impressively, its strength in IoT helps mine torrents of data from these diverse sectors. 

Data Warehouse Tool Vendor Comparison Chart

Company

Key Data Warehouse Tool

Differentiator

Pricing

Amazon Web Services

 

·   Redshift

 

·   Part of the leading cloud computing platform

·   Pay as you go pricing starts at .25 cents per hour.

Oracle

·   Autonomous Data Warehouse

·   The top legacy name in the database market

·   Monthly flex at $1.68 OCPU per hour / pay as you starts at $2.52

Microsoft

 

·   Azure Synapse SQL

 

·   Many businesses are Windows/Microsoft focused

·   Starter service level at $1.20 per hour; lower with longer contract

IBM

 

·   DB2 Warehouse

· Strong in-database analytics and real time analytics

·   $.68 cents per instance hour

Terradata

·   Terradata Vantage

·   Designed for advanced, high end enterprise users 

·   Pay as you go pricing; figures available upon demand.

SAP

 

·  SAP Data Warehouse Cloud


·   Incorporates ML and AI functionality in its data warehouse solution

·   Offers a pricing calculator; based on level of usage

Google

·   Google BigQuery


·   Versatile and powerful use of machine learning

 

·   Varies based on on-demand or flat rate; Storage API is $1.10 per TB

Snowflake

·   Snowflake on Demand

·   Known as the "contemporary, forward looking" data warehouse

·   $23/TB per month for storage, plus a fraction of a cent per second

Cloudera

·   Enterprise Data Platform

·   A key player for data lake deployments

·  Annual subscription $10,000

Micro Focus

·   Vertica

·   Works with all major cloud platforms, and VMware.


·   Available upon request



NewsletterDATAMATION DAILY NEWSLETTER

SUBSCRIBE TO OUR IT MANAGEMENT NEWSLETTER