Hadoop and Big Data: 60 Top Open Source Tools: Page 2

These Hadoop and Big Data applications are helping enterprises manage and analyze large stores of data.
(Page 2 of 3)

21. Lumify

Owned by Altamira, which is known for its national security technologies, Lumify is an open source big data integration, analytics and visualization platform. You can see it in action by trying the demo at Try.Lumify.io. Operating System: Linux.

22. Pandas

The Pandas project includes data structures and data analysis tools based on the Python programming language. It allows organizations to use Python as an alternative to R for big data analysis projects. Operating System: Windows, Linux, OS X.

23. Storm

Now an Apache project, Storm offers real-time processing of big data (unlike Hadoop, which only provides batch processing). Its users include Twitter, The Weather Channel, WebMD, Alibaba, Yelp, Yahoo! Japan, Spotify, Group, Flipboard and many other companies. Operating System: Linux.

Databases/Data Warehouses

24. Blazegraph

Formerly known as "Bigdata," Blazegraph is a highly scalable, high-performance database. It is available under an open source or a commercial license. Operating System: OS Independent.

25. Cassandra

Originally developed by Facebook, this NoSQL database is used by more than 1500 organizations, including Apple, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netfilx, Reddit and others. It can support incredibly large clusters; for example, Apple's deployment includes more than 75,000 nodes with more than 10 PB of data. Operating System: OS Independent.

26. CouchDB

"A database that completely embraces the Web," CouchDB stores data in JSON documents that can be queried through a Web browser and manipulated with JavaScript. It's easy-to-use, highly available and highly scalable across distributed systems. Operating system: Windows, Linux, OS X, Android.

27. FlockDB

Developed by Twitter, FlockDB is a very fast, very scalable graph database that is good at storing social networking data. While it is still available for download, the open source version of this project has not been updated in quite a while. Operating System: OS Independent.

28. Hibari

This Erlang-based project describes itself as "a distributed, ordered key-value store with strong consistency guarantee." It was first developed by Gemini Mobile Technologies and is used by several telecommunications carriers in Europe and Asia. Operating System: OS Independent.

29. Hypertable

Used by eBay, Baidu, Groupon, Yelp and many other Internet companies, Hypertable is a Hadoop-compatible big data database that promises fast performance. Commercial support is available. Operating System: Linux, OS X.

30. Impala

Cloudera claims that its SQL-based Impala database is "the leading open source analytic database for Apache Hadoop." It can be downloaded as a standalone product and is also part of Cloudera's commercial big data products. Operating System: Linux, OS X.

31. InfoBright Community Edition

Designed for analytics, InfoBright is a column-oriented database with a high compression rate. InfoBright.com offers paid, supported products based on the same code. Operating System: Windows, Linux.

32. MongoDB

Downloaded more than 10 million times, MongoDB is an extremely popular NoSQL database. An enterprise version, support, training and related products and services are available at MongoDB.com. Operating system: Windows, Linux, OS X, Solaris.

33. Neo4j

Calling itself the "fastest and most scalable native graph database," Neo4j promises massive scalability, fast cypher query performance and improved developer productivity. Users include eBay, Pitney Bowes, Walmart, Lufthansa and CrunchBase. Operating System: Windows, Linux.

34. OrientDB

This multi-model database combines some of the capabilities of a graph database with some of the capabilities of a document database. Paid support, training and consulting are available. Operating system: OS Independent.

35. Pivotal Greenplum Database

Pivotal boasts that Greenplum is a "best-in-class, enterprise-grade analytical database" that can perform powerful analytics on very large volumes of data very quickly. It's part of the Pivotal Big Data Suite. Operating System: Windows, Linux, OS X.

36. Riak

"Full of great stuff," Riak comes in two versions: KV is the distributed NoSQL database, and S2 provides object storage for the cloud. It's available in open source or commercial editions, with add-ons for Spark, Redis and Solr. Operating System: Linux, OS X.

37. Redis

Now sponsored by Pivotal, Redis is a key-value cache and store. Paid support is available. Note that while the project doesn't officially support Windows, Microsoft has a Windows fork on GitHub. Operating System: Linux.

Business Intelligence

38. Talend Open Studio

Downloaded more than 2 million times, Talend's open source software offers data integration capabilities. The company also makes paid big data, cloud, data integration, application integration and master data management tools. It counts organizations like AIG, Comcast, eBay, GE, Samsung, Ticketmaster and Verizon among its users. Operating System: Windows, Linux, OS X.

39. Jaspersoft

Used by organizations like Groupon, CA Technologies, USDA, Ericsson, Time Warner Cable, Olympic Steel, The University of Nebraska and General Dynamics, Jaspersoft offers flexible, embeddable BI tools. In addition to the open source community edition, it comes in paid reporting, AWS, professional and enterprise versions. Operating System: OS Independent.

40. Pentaho

Owned by Hitachi Data Systems, Pentaho offers a variety of data integration and business analytics tools. The link above will take you to the free community version; see Pentaho.com for information on paid, supported versions. Operating System: Windows, Linux, OS X.

Page 2 of 3

Previous Page
1 2 3
Next Page

Tags: Hadoop, open source, big data

0 Comments (click to add your comment)
Comment and Contribute


(Maximum characters: 1200). You have characters left.