5 Open Source Big Data Analysis Platforms and ToolsA brief survey of some of the leading open source platforms that are gaining adoption in today's booming Big Data marketplace.
You simply can't talk about big data without mentioning Hadoop. The Apache distributed data processing software is so pervasive that often the terms "Hadoop" and "big data" are used synonymously. The Apache Foundation also sponsors a number of related projects that extend the capabilities of Hadoop, and many of them are mentioned below. In addition, numerous vendors offer supported versions of Hadoop and related technologies. Operating System: Windows, Linux, OS X.
Originally developed by Google, the MapReduce website describes it as "a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes." It's used by Hadoop, as well as many other data processing applications. Operating System: OS Independent.
GridGrain offers an alternative to Hadoop's MapReduce that is compatible with the Hadoop Distributed File System. It offers in-memory processing for fast analysis of real-time data. You can download the open source version from GitHub or purchase a commercially supported version from the link above. Operating System: Windows, Linux, OS X.
Developed by LexisNexis Risk Solutions, HPCC Systems is short for "high performance computing cluster." It claims to offer superior performance to Hadoop. Both free community versions and paid enterprise versions are available. Operating System: Linux.
Now owned by Twitter, Storm offers distributed real-time computation capabilities and is often described as the "Hadoop of realtime." It's highly scalable, robust, fault-tolerant and works with nearly all programming languages. Operating System: Linux.
It seems that Hadoop, by offering lower cost distributed computing, did as much to advance Big Data as any other software solution. So certainly any list of open source Big Data platforms will start with Hadoop. Yet as the rise of Spark shows, Hadoop may be a founding pioneer – and may well retain its place as the foundation of Big Data – but will not of course be its sole cornerstone. So think of this list (which does indeed start with Hadoop) as a glimpse of the pioneering days, the true infancy, of Big Data. The solutions on this list all look, to a greater or lesser extent, to Hadoop as a standard by which to compare their own performance. But the range of the list shows that this comparison is indeed just a springboard, and that many other open source Big Data solutions are sure to evolve in the years ahead.
|8 Open Source Big Data Mining Tools|
|5 Open Source Big Data Tools: Transfer and Aggregate||50 Top Open Source Tools for Big Data|