Today’s ETL tools play a key role in today’s data-driven enterprises. Every major big data analytics project requires collecting data from disparate sources, getting it into the right format and then loading it back into the analytics software. So it’s no surprise that ETL, short for “Extract, Transform, Load” is used daily.
This ETL process can be a major headache for enterprises. In one Forrester report, customers said that up to 80 percent of their data warehouse workloads were ETL jobs. In addition to consuming valuable computing resources, this work also requires a lot of time on the part of business analysts and highly paid data scientists. As a result, both enterprises and vendors have been looking for ways to streamline and simplify the ETL process.
In the early days of the big data trend, most ETL software solutions were standalone products that really only did one thing — ETL jobs. Today, however, the market has evolved and most ETL products are part of larger data integration solutions. These solutions now boast highly advanced capabilities and aim to make ETL as easy possible.
Unsurprisingly, the market for ETL software is booming. According to MarketsandMarkets, “The data integration market is expected to grow from $6.44 billion in 2017 to $12.24 billion by 2022, at a compound annual growth rate (CAGR) of 13.7 percent.”
Many vendors have jumped into the market with ETL and data integration solutions. The list below features eight of the largest ETL vendors and their key products.
- Informatica Intelligent Data Platform
- Dell Boomi
- Denodo Platform
- IBM InfoSphere DataStage
- Microsoft Azure Data Factory
- SAS Data Management
- SAP Data Services
- Oracle Data Integration Platform
- Talend Data Management Platform
- Microsoft Azure Data Factory
Gartner named Informatica a Leader in its Magic Quadrant for Data Integration Tools, giving the vendor its highest overall score. For ETL, the company offers its Intelligent Data Integration Platform, which includes its Advanced Data Transformation, B2B Data Exchange, Connectors, Integration Hub, PowerCenter (automation) and Real-Time Integration products.
Informatica’s biggest strengths are its large customer base and its extensive product line. Gartner estimated that 9,000 organizations use Informatica’s data integration products and that 75 percent of the contract review calls it receives in relation to data integration mention Informatica. The company also has an ecosystem with more than 500 partners, and it offers training, certification, services and support.
However, the company has a reputation for high prices. Also, Informatica offers so many different products that it can be confusing for customers.
Purchased by Dell in 2010, Boomi offers an integration platform as a service (iPaaS) that includes ETL capabilities. Gartner named Boomi a Leader in its Magic Quadrant for Enterprise Integration Platform as a Service and noted that the company experienced more than 50 percent revenue growth in 2017, adding 1,500 clients.
The Boomi platform boasts high flexibility with the ability to integrate both cloud-based and on-premises data and applications, and it supports real-time, event-based and batch processing. The company boasts that it has more than 7,500 customers, including Novartis, LinkedIn, and Kelly-Moore Paints.
When it comes to weaknesses, Gartner cautions that some users, particularly those who describe themselves as “citizen integrators” found the Dell Boomi interface difficult to use.
Data virtualization is, technically speaking, different from ETL, but it is a closely complimentary solution. ETL applications copy, prep and standardize huge data sets – physically, in real time. Data virtualization, in contrast, can federate (that is, distribute) various data sets – and entire data warehouses – and provide a virtual data offering to assist the work of ETL.
This concept of data virtualization is at the core of Denodo’s approach. Denodo Platform offers logical data fabric functionality. In essence, a logical data fabric uses data virtualization to transcend the limitations of physically stored data, making it more widely – and more efficiently – available. Additionally, the Denodo solution enables companies to automate aspects of the logical data fabric. A major upside of Denodo is its extensive list – literally hundreds – of system integrators, including plenty of blue chip vendors. Furthermore, the company is well regarded for its support and customer training.
IBM describes InfoSphere DataStage as “a leading ETL platform that integrates data across multiple enterprise systems.” Named a Leader by Gartner, it runs on-premises or in the cloud and on either distributed or mainframe systems. IBM also offers a number of other data integration tools, including many that are sold under the InfoSphere brand name and its iPaaS called Application Integration Suite on Cloud.
Like Informatica and Dell Boomi, IBM enjoys a very large customer base. But unlike many of the other products on this list, DataStage is focused primarily on ETL, although it integrates with other IBM tools and the Hadoop ecosystem. IBM offers flexpoint licensing.
On the downside, Gartner said that some customers find it difficult to understand IBM’s pricing, and that they often need technical help to use DataSphere with other IBM data and analytics tools.
In a classically Microsoft effort to appeal to the widest possible client base, the software giant’s Azure Data Factory caters to businesses ranging from large enterprise to SMB. The key offerings in Microsoft’s data integration offering include the SQL Server Integration Services (SSIS), and the Azure Synapse tool. This last item helps offer a wholistic data environment in a cloud-based environment. Naturally that cloud environment is Azure (the Microsoft solution is sometimes faulted for not embracing other clouds). In keeping with the times, the data integration toolset operates in a low code/no code scenario.
Not surprisingly given Microsoft’s long tenure as a software vendor, the company’s data integration is well regarded for usability. It also is considered relatively fast to deploy, and of course it works well with the rest of Microsoft’s extensive portfolio of data products. The total cost of the solution (not just purchase-license price, but associated upkeep and monitoring) is reported to be reasonable. Microsoft is bulking up its capabilities in metadata functionality – which should be a significant plus for customers of its data integration applications.
Known for its business intelligence and analytics solutions, SAS calls its Data Management Platform “the industry’s leading integration technology.” It features support for Hadoop and legacy data platforms, ETL capabilities, a drag-and-drop GUI, data federation, data governance, metadata management and much more. It has also been named a Leader in the Gartner Magic Quadrant.
Strengths of the Data Management Platform include its ability to integrate data from a very broad array of sources. It has strong support for the Hadoop ecosystem, including Impala, Pivotal HAWQ, MapReduce, Pig and Hive, and it promises a truly integrated solution that is easy for business users to utilize without assistance from IT.
The biggest weakness of the SAS solution is its high price.
SAP Data Services promises universal data access, native-text data processing, intuitive business user interfaces, data quality dashboards, simplified data governance, high performance and high scalability. It deploys on-premises only, and it includes data quality as well as data integration capabilities. The company also offers a number of other on-premises and cloud-based data integration and analytics tools.
Customers who use other SAP software will likely be attracted to its data management and ETL products because they make it easy to incorporate data from other SAP applications. And its clean interface provides a good experience for business users.
On the con side, SAP, like many of the vendors in this list, gets negative reviews for its pricing. SAP also has such a large product lineup that customers also sometimes get confused about which products they really need.
Part of Oracle’s relatively new Autonomous Cloud services, Data Integration Platform Cloud offers artificial intelligence and machine learning capabilities, as well as automated data migration and data warehouse building. According to Oracle it is “self-driving, self-securing and self-repairing.” It can access both Oracle and non-Oracle data, and it promises an intuitive user experience.
Oracle’s claim to fame in this space is really its autonomous capabilities, which make the solution easier to manage. The company also promises zero downtime for its data migration and ETL capabilities. Its cloud pricing is also fairly straightforward.
According to Gartner, however, some customers have complained about the solution’s lack of documentation. Customers have also said that it is difficult to find skilled personnel to use the solution because it is so new to the market.
In addition to ETL capabilities, the Talend Data Management Platform offers data quality, data governance, data profiling and more. It claims that it can help customers “develop and deploy end-to-end data integration jobs 10 times faster than hand coding, at 1/5th the cost of competitors.” It was named a Leader and received the highest score in the Forrester Wave: Big Data Fabric, Q2 2018 report. It also has an open source solution called Talend Open Studio.
Talend’s strengths include its strong support for Hadoop, Spark, containers and serverless computing. Thanks to its open source roots, it also costs less than some competing solutions, and it has an Eclipse-based development environment that the company says helps customers develop and deploy integrations 10 times faster than competing solutions.
However, the Gartner report noted that some customers have reported stability issues with Talend software, and it is difficult to find staff or support partners who are familiar with the solution. See the Talend Data Platform updates.
Microsoft offers a fully managed, cloud-based ETL service called Azure Data Factory. It has connectors for more than 70 different data services, features an easy-to-use drag-and-drop interface, supports multiple programming languages and is highly scalable. Customers include Pier1 Imports, Rockwell Automation and the Real Madrid soccer team.
One of the biggest strengths of the Microsoft service is its low total cost of ownership. Also, the interface should be familiar to users of other Microsoft software. It integrates easily with other Azure services, and it boasts strong security.
On the other hand, Gartner cautioned that the solution is not yet well known because it is relatively new and untested.
Additional Market Leaders: ETL Software
Several other ETL vendors that were not included in our top eight are also worthy of mention. They include Information Builders Data Management Platform, Hitachi Vantara Pentaho Data Integration, Alooma Platform and CloverDX.