Also see: Hadoop and Big Data
Hadoop and Big Data analytics are popular topics, perhaps only overshadowed by security talk. Apache’s Hadoop and its other 15 related Big Data projects are enterprise-class and enterprise-ready. Yes, they’re open source and yes, they’re free, but that doesn’t mean that they’re not worthy of your attention. For businesses that want commercial support, here are 15 companies ready to serve you and your Hadoop needs.
This list of Hadoop/Big Data vendors in alphabetical order.
Key differentiators: Amazon’s Elastic Cloud, S3, and DynamoDB integration plus an expensive and flexible pay-as-you-use plan. An added bonus is that EMR plays nice with Apache Spark and the Presto distributed SQL query engine.
Amazon Elastic MapReduce (Amazon EMR) is a part of Amazon Web Services (AWS) and is a web service that allows you to manage your big data sets. Amazon EMR (EMR) promises to securely and reliably handle your big data, log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.
Amazon’s pricing model is simple. Using the simple charge per hour rates, you can accurately predict your monthly fees, which makes it easy to budget and plan next year’s budget. Since Amazon’s cloud computing prices keep going in a southerly direction, your budget shrinks while your revenues pile up. Per hour prices range from $0.011 to $0.27 ($94/year to $2367/year), depending on the size of the instance you select and on the Hadoop distribution.
The downside of Amazon’s services is that they’re somewhat difficult to use. They’re easier to use now than they were a few years ago, but to use AWS and associated services, you will have to possess intermediate level technical skills as a system administrator to understand all of the options and how to handle key pairs and permissions.
Key differentiators: Attunity automates data transfer into Hadoop from any source and it also automates data transfers out of Hadoop, including both structured and unstructured data. Attunity has forged strategic partnerships with Cloudera and Hortonworks (Both included in this article).
It’s hard to pinpoint exactly what Attunity Replicate does for big data until you see the process in action. Replicate takes data from one platform and translates it into another. For example, if you have multiple data sources and want to combine them all into a single data set, then you’d have to struggle with grabbing or dumping the data from all your source platforms and transforming that data into your desired target platform. You might have sources from Oracle, MySQL, IBM DB2, and SQL Server and your target is MySQL.
Attunity’s Click2Replicate allows you to graphically select your source, graphically select your target and then click to replicate the data. You can filter the data by table or other criteria, but the process is simple and you don’t have to worry about the transformation process.
Attunity support a wide range of sources and targets, but check closely before you purchase because not all databases are source and target capable.
3. Cloudera CDH
Key differentiators: CDH is a distribution of Apache Hadoop and related products. It is Apache-licensed, open source, and is the only Hadoop solution that offers unified batch processing, interactive SQL, interactive search, and role-based access controls.
Cloudera claims that enterprises have downloaded CDH more than all other distributions combined. CDH offers the standard Hadoop features but adds its own user interface (Hue), enterprise-level security and integration more than 300 vendor products and services.
Cloudera offers multiple choices for starting up with Hadoop that include an Express version, an Enterprise version, and a Director (cloud) version, four Cloudera Live options, and a Cloudera demo. Additionally, you can download the Cloudera QuickStart VM for those of you who want to test in your own environment.
Key differentiators: The first big data analytics platform for Hadoop-as-a-Service designed for department-specific requirements.
Datameer Professional allows you to ingest, analyze, and visualize terabytes of structured and unstructured data from more than 60 different sources including social media, mobile data, web, machine data, marketing information, CRM data, demographics, and databases to name a few. Datameer also offers you 270 pre-built analytic functions to combine and analyze your unstructured and structured data after ingest.
Datameer focuses on big data analytics in a single application built on top of Hadoop. Datameer features a wizard-based data integration tool, iterative point-and-click analytics, drag-and-drop visualizations, and scales from a single workstation up to thousands of nodes. Datameer is available for all major Hadoop distributions.
Key differentiators: DataStax uses Apache Cassandra and Apache Hadoop as the database engine and the analytics platform that is highly scalable, fast, and capable of real-time and streaming analytics.
DataStax delivers powerful integrated analytics to 20 of the Fortune 100 companies and well-known companies such as eBay and Netflix. DataStax is built on open source software technology for its primary services: Apache Hadoop (analytics0, Apache Cassandra (NoSQL distributed database), and Apache Solr (enterprise search).
DataStax made the choice to use Apache Cassandra, which provides an “always-on” capability for DataStax Enterprise (DSE) Analytics. DataStax OpsCenter also offers a web-based visual management system for DSE that allows cluster management, point-and-click provisioning and administration, secured administration, smart data protection, and visual monitoring and tuning.
Key differentiators: Recently purchased Statistica Big Data Analytics platform features natural language processing, entity extraction, interactive visualizations and dashboards, databases, database appliances, and distributed advanced analytic models across Hadoop.
Dell’s Statistica Big Data Analytics is an integrated, configurable, cloud-enabled software platform that you can easily deploy in minutes. You can harvest sentiments from social media and the web and combine that data to better understand market traction and trends. Dell leverages Hadoop, Lucene/Solr search, and Mahout machine learning to bring you a highly scalable analytic solution running on Dell PowerEdge servers.
Dell summarizes its hardware software requirements for your Hadoop cluster simply as, 2 – 100 Linux servers for Hadoop Cluster, 6GB RAM, 2+ Core, 1TB HDD per server. The point is that entry into a Hadoop solution is simple and inexpensive. And as Dell puts it, “Gain robust big data analytics on an open and easily deployed platform.”
Key differentiators: The FICO Decision Management Suite includes the FICO Big Data Analyzer, which provides an easy way for companies to use big data analytics for decision management solutions.
FICO’s Big Data Analyzer provides purpose-built analytics for business users, analysts, and data scientists from any type of data on Hadoop. Part of FICO’s Big Data Analyzer appeal is that it masks Hadoop’s complexity, allowing any user to gain more business value from any data.
FICO provides an end-to-end analytic modeling lifecycle solution for extracting and exploring data, creating predictive models, discovering business insights, and using this data to create actionable decisions.
Key differentiators: Hadapt was recently purchased by Teradata and has a patent-pending technology that features a hybrid architecture that leverages the latest relational database research to the Hadoop platform.
Hadapt 2.0 delivers interactive applications on Hadoop through Hadapt Interactive Query, the Hadapt Development Kit for custom analytics, and integration with Tableau software. Hadapt’s hybrid storage engine features two different approaches to storage for structured and unstructured data. Structured data uses a high-performance relational engine and unstructured data uses the Hadoop Distributed File System (HDFS). Hadapt has a lot of trademarked products as part of its Adaptive Analytical Platform plus its pending patent for its complete technology solution.
Hadapt diverges from the Hadoop crowd in that it uses a relational database for its analytics and integrates data without the need to ingest Hadoop data. The advantage is that you can have simultaneous operational and analytical processing on the same data sources. This greatly improves speed and efficiency in big data analysis and requires fewer steps.