Key differentiators: 100 percent open source solution and a major contributor to the Apache Hadoop project.
Hortonworks is always on the leading edge of Hadoop and is committed to the use of open platforms for enterprise solutions surrounding big data and big data analytics. Forrester Research named Hortonworks as a technology leader and ecosystem builder for the entire Hadoop industry. Hortonworks has more volunteers involved in the Hadoop Project than any other commercial entity.
Hortonworks and a consortium of other companies developed Apache Atlas to meet the needs of working with metadata and data governance. “Atlas enhances governance capabilities in Hadoop for both prescriptive and forensic models enriched by taxonomical metadata.” Atlas is designed to exchange metadata with other tools and process inside and outside of the Hadoop stack. It also enables platform-agnostic governance controls that address enterprise compliance requirements. Atlas currently holds incubator status in Apache’s project list.
Key differentiators: If there’s a name that’s synonymous with big data, it’s IBM. The IBM Open Platform (IOP) uses a 100 percent open source solution and is 100 percent free.
IOP contains 16 different Apache projects and has full support for the Open Data Project, which is a shared effort to promote and to advance Hadoop for the enterprise. IBM offers multiple tiers of its product. You can download the IOP free of charge or select a supported offering and use it on premises. You can use IBM’s Hadoop-as-a-Service on its SoftLayer cloud infrastructure to alleviate the pain of managing your own hardware and networking components. On top of the basic underlying infrastructure and software, IBM offers up its BigInsights for Apache Hadoop product for your advanced analytical needs.
IBM’s BigInsights includes: Hadoop, SQL-on-Hadoop, business analytics tools, advanced analytics, accelerators, optimized performance, management, seamless data integration, and real-time streaming analytics.
Key differentiators: MapR is the only distribution that allows Hadoop to be accessed via the Network File System, or NFS. NFS allows faster data management and system administration without requiring multiple steps to move or to access data.
MapR provides a production-ready distribution that runs both online and analytical processing and applications on a single platform. This means that you can run more applications on one Hadoop cluster and minimize your operational costs.
MapR runs the world’s largest single production clusters of Hadoop that includes:
Linear scalability that exceeds the 100 million files limit in the Hadoop Distributed File System (HDFS), distributed metadata architecture scaling to trillions of files and tables capable of storing and processing thousands of petabytes per cluster, and processing of files and tables in one distributed storage layer. This allows NoSQL and Hadoop applications to work seamlessly on a single platform.
MapR also provides Hadoop high availability everywhere across all Apache Hadoop projects and demonstrates 99.999 percent availability.
Key differentiators: Pentaho combines data integration with analytics and features a unique “in-Hadoop” execution that results in extremely fast performance.
Pentaho’s offering connects natively to Hadoop, to NoSQL, and to analytic databases, features a visual designer for MapReduce jobs, allows you to model and explore unstructured data sets, provides a multi-threaded data integration engine, and supports cluster nodes.
Pentaho’s solution also includes what it calls its adaptive big data layer that gives you the capability to access data once, process it, combine it, and consume it anywhere. It supports Hadoop distributions from Cloudera, Hortonworks, and MapR.
Key differentiators: Pentaho produces its own components for big data analytics that includes Pivotal HD, Pivotal Greenplum Database, Pivotal GemFire, and Pivotal HAWQ.
Pivotal’s Hadoop distribution, Pivotal HD is 100 percent Apache compliant, uses other Apache components, and is based on the Open Data Platform. Pivotal GemFire is a distributed data management platform designed for diverse data management situations, but is optimized for high volume, latency-sensitive, mission-critical, transactional systems. The Pivotal GreenPlum Database is a shared-nothing, massively parallel processing (MPP) database used for business intelligence processing as well as for advanced analytics. Pivotal’s HAWQ is an ANSI compliant SQL dialect that supports application portability and the use of data visualization tools such as SAS and Tableau.
Key differentiators: Supermicro’s differentiator in this group is that it is a provider of the underlying commodity hardware that your Hadoop clusters run on.
Supermicro has partnered with Cloudera and Hortonworks to provide turnkey Hadoop cluster solutions to your business, should you decide to host your own Hadoop infrastructure. Your Hadoop implementation won’t even be worth its free price if your hardware is a bottleneck to your large data set processing.
Supermicro takes the guesswork and the opinions about which software works best with which hardware for the best performance and for the best price.
Key differentiators: Zettaset offers Hadoop cluster management software and encryption software for Hadoop data.
Zettaset tackles two very tough jobs for Hadoop: management and encryption. Encryption, while great for security, is usually not a great performer, but Zettaset’s BDEncrypt solution boasts high performance, standards-based encryption for your valuable Hadoop data.
Zettaset Orchestrator is a Hadoop management solution that enables you to address requirements for security, high availability, manageability, and scalability in a distributed computing environment. It includes encryption, role-based access controls, automation options, interoperability with BI and analytics platforms, and maintains your database availability and reliability.
Photo courtesy of Shutterstock.