Monday, May 27, 2024

Top 5 Open Source Data Integration Tools

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Businesses seeking to improve their data integration know that today’s data integration software performs complex tasks. They enable applications to access data associated with other applications, and also to migrate data from one platform to another, transforming it as necessary. Given this sophistication, selecting the best data integration tool is far from easy.

Adding to the complexity of the selection process: early data integration tools focused on ETL – extract, transform, and load processes. However, most of today’s data integration products have much more advanced capabilities and can generally connect both on-premises and cloud-based data. Many also integrate with other data management products, such as business intelligence (BI), big data analytics, master data management (MDM), data governance and data quality solutions.

To help sort through the complex options, the list below highlights five of the best open source data integration tools, based on vendor profile and completeness of their data integration tool set.

Jump List:

Talend Open Studio

Talend includes file management, data flow orchestration and both ETL and ELT capabilities. It connects with most popular data sources, including Oracle database, Teradata, Microsoft SQL Server, Salesforce, NetSuite, SAP, Microsoft Dynamics, Dropbox, Amazon S3, JIRA and more. For developers, it has an Eclipse-based job design environment that is easy to use. It runs on Windows, macOS and many popular flavors of Linux, including CentOS, Red Hat, SUSE and Ubuntu.

Customers looking for a commercially supported version of the software can try Talend Data Management Platform, which allows enterprises to develop and deploy end-to-end data integration jobs exponentially faster than hand coding. The company’s other free products include Data Streams and Data Preparation, and its paid products include Talend Cloud, Talend Real-Time Big Data Platform, Talend Data Quality and the Talend MDM Platform.

The company was founded in 2005 and has its headquarters in Redwood City, California. In 2017, it reported 100 percent growth in its cloud and big data products with total revenue of $148.5 million. It is publicly traded on the NASDAQ exchange under the symbol TLND. See the Talend Data Platform updates.

Jaspersoft ETL

Owned by TIBCO, Jaspersoft offers several open source data integration, business intelligence and analytics tools, including the popular JasperReports reporting library. Jaspersoft ETL (also known as JETL), the company’s data integration platform, comes in both community and commercial editions. The open source Community version is based on the Talend code base and includes many of the same capabilities. It includes a job designer, ETL and ELT support, versioning, wizards and community support.

The paid version comes in the standard JETL or the JETL Extended Big Data version. These editions add capabilities like multiple shared repositories, data viewer, dynamic schema, data lineage, joblets, job compare, Amazon EC2 lifecycle control, schedulers, Hadoop components, AMC Studio for monitoring, dashboards and more. Both offer the option of professional standard or professional premium support.

The Jaspersoft company was originally founded in 2001 as Panscopic. In 2004, it acquired the JasperReports code base and changed its name to Jaspersoft. TIBCO purchased the company in 2014 for approximately $185 million. Jaspersoft is headquartered in San Francisco, while parent company TIBCO is nearby in Palo Alto, California.

Earlier this year, the TIBCO Connected Intelligence Cloud platform was named the best Big Data Tool and Platform of 2018 by the Software and Information Industry Association (SIIA), and Gartner listed the company as a Niche Player in its most recent Magic Quadrant for Data Integration. It also recently acquired Scribe Software, which offered a cloud-based data integration service. The company is privately held.

Hitachi Vantara Pentaho

Pentaho is a business intelligence platform with integrated data integration capabilities. Its most well-known component is the Kettle ETL engine, and it also includes reporting, OLAP data discovery and analysis, dashboards, visualizations and data mining based on the Weka project.

The commercial version of Pentaho, which offers a free 30-day trial, also offers Internet of Things (IoT) analytics and big data analytics capabilities. It counts large companies like BT and Nasdaq among its customers. It also offers a cloud-based online version.

The Pentaho Corporation was founded in 2004. In 2015, Hitachi Data Systems acquired Pentaho for an estimated $550 million. In 2017, the company created a new subsidiary called Hitachi Vantara that combined operations of Pentaho, Hitachi Data Systems and Hitachi Insight Group. The company offers IoT, big data integration and analytics, converged systems, cloud services, storage, data protection and data center management products. Gartner named it a Niche Player in the Magic Quadrant for Data Integration Tools.

Hitachi Vantara plans to open a new headquarters in Santa Clara, California, next year. Parent company Hitachi is a Japanese conglomerate headquartered in Tokyo. In 2017, it reported revenue of ¥9.162 trillion, or approximately $83.2 billion. It is publicly traded on the Tokyo Stock Exchange.


CloverETL is a pure data integration software suite making rapid development and enterprise capabilities available in a light footprint package. Its flagship data integration automation platform includes Designer, a visual development tool; Server, its enterprise-grade data integration runtime platform; and Cluster, which enables parallel data processing on multiple data nodes.

The open source Community Edition includes only the Designer component from the full platform. Also, it’s a stripped-down version that offers only about 20 of the 130 components in the full commercial edition. In addition, it allows only 20 components per transformation. It supports Windows, macOS and Linux.

The CloverETL software has been around since 2002, when it debuted as a Freshmeat project. In 2005, the team behind the open source project formed a company called Javlin Data Solutions, which continues to oversee CloverETL development. Javlin also provides related development, outsourcing, consulting and training services.

Javlin is headquartered in Prague, in the Czech Republic, where it was founded, and also has offices in London, Frankfurt and the Washington, D.C. area. The company is privately held.


Rather than a full data integration platform, Apatar is an ETL platform only. Unlike some of the other tools on this list, the full version of Apatar is available under the open source license. It includes a visual job designer and transformation mapper, and it is completely customizable. It can be deployed as a desktop application, on a server or as an embedded application within other software. It connects to many open source and commercial data sources, including Oracle, Microsoft SQL Server, Sybase, DB2, PostgreSQL, Excel, Salesforce and others. And because it is Java-based, it can run on any operating system.

The company offers a range of related services, including training, consulting, data integration assessment and support, as well as Salesforce consulting. Apatar support packages are available for purchase on case-based or hour-based plans.

The Apatar project began in 2005, and the Apatar company was founded in 2007. It is a subsidiary of Altoros, a software development company headquartered in Sunnyvale, California, with offices in Norway, Denmark, the UK, Finland, Sweden, Argentina and Belarus. It is a frequent contributor to the Cloud Foundry Project and is a silver member of the Cloud Foundry Foundation. It often partners with Pivotal, HPE, Google Cloud, Amazon Web Services and SUSE. The company is privately held and has approximately 250 employees.

Top Open Source Data Integration Tools Compared

Open Source Data Integration Tools Commercial Support? License Key Features
Talend Open Studio Yes Apache • ETL and ELT
• Connectors for most popular data sources
• Eclipse-based job design environment
• File management
• Dataflow orchestration
Jaspersoft ETL Yes Apache • Based on Talend
• Job designer
• ETL and ELT support
• Additional capabilities in commercial editions
Hitachi Vantara Pentaho Yes GPLv2 • Full business intelligence platform
• Kettle ETL Engine
• Weka data mining
• OLAP data discovery
• Embeddable dashboards and analytics
CloverETL Yes LGPL2.1 • Only development tools are available under open source license
• Support for all major operating systems
• Light footprint
Apatar Yes GPLv2 • Full version available under open source license
• ETL only
• Connects to many popular data sources
• Java based


Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles