It seems that Hadoop, by offering lower cost distributed computing, did as much to advance Big Data as any other software solution. So certainly any list of open source Big Data platforms will start with Hadoop. Yet as the rise of Spark shows, Hadoop may be a founding pioneer – and may well retain its place as the foundation of Big Data – but will not of course be its sole cornerstone. So think of this list (which does indeed start with Hadoop) as a glimpse of the pioneering days, the true infancy, of Big Data. The solutions on this list all look, to a greater or lesser extent, to Hadoop as a standard by which to compare their own performance. But the range of the list shows that this comparison is indeed just a springboard, and that many other open source Big Data solutions are sure to evolve in the years ahead.
![]() |
![]() |
||
![]() |
8 Open Source Big Data Mining Tools | ![]() |
5 Open Source Big Data File Systems and Programming Languages |
![]() |
5 Open Source Big Data Tools: Transfer and Aggregate | ![]() |
50 Top Open Source Tools for Big Data |
-
5 Open Source Big Data Analysis Platforms and Tools
A brief survey of some of the leading open source platforms that are gaining adoption in today's booming Big Data marketplace. -
Hadoop
You simply can't talk about big data without mentioning Hadoop. The Apache distributed data processing software is so pervasive that often the terms "Hadoop" and "big data" are used synonymously. The Apache Foundation also sponsors a number of related projects that extend the capabilities of Hadoop, and many of them are mentioned below. In addition, numerous vendors offer supported versions of Hadoop and related technologies. Operating System: Windows, Linux, OS X.
-
MapReduce
Originally developed by Google, the MapReduce website describes it as "a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes." It's used by Hadoop, as well as many other data processing applications. Operating System: OS Independent.
-
GridGain
GridGrain offers an alternative to Hadoop's MapReduce that is compatible with the Hadoop Distributed File System. It offers in-memory processing for fast analysis of real-time data. You can download the open source version from GitHub or purchase a commercially supported version from the link above. Operating System: Windows, Linux, OS X.
-
HPCC Systems
Developed by LexisNexis Risk Solutions, HPCC Systems is short for "high performance computing cluster." It claims to offer superior performance to Hadoop. Both free community versions and paid enterprise versions are available. Operating System: Linux.
-
Storm
Now owned by Twitter, Storm offers distributed real-time computation capabilities and is often described as the "Hadoop of realtime." It's highly scalable, robust, fault-tolerant and works with nearly all programming languages. Operating System: Linux.
-
-
10 Big Data Predictions for 2018
View Slideshow » -
Top 6 Barriers to Cloud Analytics
View Slideshow » -
10 Top Big Data Trends for 2018
View Slideshow » -
Cloud Security: 10 Top Startups
View Slideshow » -
Cloud Security: 10 Top Tips
View Slideshow » -
Top 7 Remote Access Apps For Linux
View Slideshow » -
Top 12 SaaS Apps for Enterprises
View Slideshow » -
Improving Your Cloud Deployment: 8 Key Cloud Rules
View Slideshow » -
Top 8 Debian-Based Distros
View Slideshow »
-
Submit a Comment
Loading Comments...