Hadoop and Big Data: 60 Top Open Source Tools: Page 3

These Hadoop and Big Data applications are helping enterprises manage and analyze large stores of data.


How to Help Your Business Become an AI Early Adopter

(Page 3 of 3)

41. SpagoBI

Called an "open source leader" by market analysts, Spago offers BI, middleware and quality assurance software, as well as a Java EE application development framework. The software is all 100% free and open source, but paid support, consulting, training and other services are available. Operating System: OS Independent.


Short for "Konstanz Information Miner," KNIME is an open source analytics and reporting platform. Several commercial and open source extensions are available to increase its capabilities. Operating System: Windows, Linux, OS X.

43. BIRT

BIRT stands for "Business Intelligence and Reporting Tools." It offers a platform for creating visualizations and reports that can be embedded into applications and websites. It is part of the Eclipse community and is supported by Actuate, IBM and Innovent Solutions. Operating System: OS Independent.

Data Mining


The successor to jHepWork, DataMelt can do mathematical computation, data mining, statistical analysis and data visualization. It supports Java and related programming languages including Jython, Groovy, JRuby and Beanshell. Operating System: OS Independent.

45. KEEL

Short for "Knowledge Extraction based on Evolutionary Learning," KEEL is a Java-based machine learning tool that provides algorithms for a variety of big data tasks. It's also helpful for assessing the effectiveness of algorithms for regression, classification, clustering, pattern mining and similar tasks. Operating System: OS Independent.

46. Orange

Orange believes data mining should be "fruitful and fun," whether you have years of experience or are just getting started in the discipline. It offers visual programming and Python scripting tools for data visualizations and analysis. Operating System: Windows, Linux, OS X.

47. RapidMiner

RapidMiner boasts more than 250,000 users, including PayPal, Deloitte, Ebay, Cisco and Volkswagen. It offers a wide range of open source and paid versions, but note that the free, open source versions only support data in CSV or Excel formats. Operating System: OS Independent.

48. Rattle

Rattle stands for "R Analytical Tool To Learn Easily." It provides a graphical interface for the R programming language, simplifying the processes of creating statistical or visual summaries of data, creating models and performing data transformations. Operating System: Windows, Linux, OS X.

49. SPMF

SPMF now includes 93 algorithms for sequential pattern mining, association rule mining, itemset mining, sequential rule mining and clustering. It can be used on its own or incorporated into other Java-based programs. Operating System: OS Independent.

50. Weka

The Waikato Environment for Knowledge Analysis, or Weka, is a set Java-based machine-learning algorithms for data mining. It can perform data pre-processing, classification, regression, clustering, association rules and visualization. Operating System: Windows, Linux, OS X.

Query Engines

51. Drill

This Apache project allows users to query Hadoop, NoSQL databases and cloud storage services using SQL-based queries. It can be used for data mining and ad hoc queries, and it supports a wide variety of databases, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage and Swift. Operating System: Windows, Linux, OS X.

Programming Languages

52. R

Similar to the S language and environment, R was designed to handle statistical computing and graphics. It includes an integrated suite of big data tools for manipulation, calculation and visualization. Operating System: Windows, Linux, OS X.

53. ECL

Enterprise Control Language, or ECL, is the language developers use for creating big data applications on the HPCC platform. An IDE, tutorials and a variety of related tools for working with the language are available on the HPCC Systems website. Operating System: Linux.

Big Data Search

54. Lucene

Java-based Lucene performs full-text searches very quickly. According to the website, it can index more than 150GB per hour on modern hardware, and it includes powerful and efficient search algorithms. Development is sponsored by the Apache Software Foundation. Operating System: OS Independent.

55. Solr

Based on Apache Lucene, Solr is a highly reliable and scalable enterprise search platform. Well-known users include eHarmony, Sears, StubHub, Zappos, Best Buy, AT&T, Instagram, Netflix, Bloomberg and Travelocity. Operating System: OS Independent.

In-Memory Technology

56. Ignite

This Apache project describes itself as "a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies." The platform includes data grid, compute grid, service grid, streaming, Hadoop acceleration, advanced clustering, file system, messaging, events and data structure capabilities. Operating System: OS Independent.

57. Terracotta

Calling its BigMemory technology "the world's premier in-memory data management platform," Terracotta boasts 2.1 million developers and 2.5 million deployments of its software. The company also offers commercial versions of its software, plus support, consulting and training services. Operating System: OS Independent.

58. Pivotal GemFire/Geode

Earlier this year, Pivotal announced that it would be open-sourcing key components of its Big Data Suite, including the GemFire in-memory NoSQL database. It has submitted a proposal to the Apache Software Foundation to manage the core engine for the GemFire database under the name "Geode." A commercial version of the software is also available. Operating System: Windows, Linux.

59. GridGain

Powered by Apache Ignite, GridGrain offers in-memory data fabric for fast processing of big data and a Hadoop Accelerator based on the same technology. It comes in a paid enterprise version and a free community edition, which includes free basic support. Operating System: Windows, Linux, OS X.

60. Infinispan

A Red Hat JBoss project, Java-based Infinispan is a distributed in-memory data grid. It can be used as a cache, as a high-performance NoSQL database, or to add clustering capabilities to frameworks. Operating System: OS Independent.

Photo courtesy of Shutterstock.

Page 3 of 3

Previous Page
1 2 3

Tags: Hadoop, open source, big data

0 Comments (click to add your comment)
Comment and Contribute


(Maximum characters: 1200). You have characters left.



IT Management Daily
Don't miss an article. Subscribe to our newsletter below.

By submitting your information, you agree that datamation.com may send you Datamation offers via email, phone and text message, as well as email offers about other products and services that Datamation believes may be of interest to you. Datamation will process your information in accordance with the Quinstreet Privacy Policy.