Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business
Also See: Big Data Startups to Watch
Big Data companies come in many different shapes and flavors. In fact you might say, a list of Big Data companies necessarily contains vendors with highly contrasting strategies – clearly, the data analytics market is in rapid flux.
Standards? Kind of. Not exactly. Depends who you ask.
It has been just seven years since Yahoo introduced Hadoop but the concept behind it, Big Data, has exploded in popularity as more and more firms launch pilot programs to gain insight from the massive amounts of data at their disposal.
Big Data has matured differently than most technologies, however. First, no one leader has emerged after nearly a decade. The analytics industry is still in growth mode, and leaders emerge when an industry consolidates.
Secondly, the big names got in the market early in a big way. That's also unprecedented, because established vendors have traditionally been notoriously slow to embrace a new technology. But already, IBM, Microsoft, SAP, HP, and Oracle are in the game.
So, which tools and platforms should you choose? Here are 25 of the top companies to consider in the Big Data world.
Please note: this list is NOT a ranking – the strategies are too different. So company number 7, for instance, is not a “better” Big Data vendor than company number 20.
The many Big Data companies on this list offer approaches that focus on many different IT sectors.
Big Data Companies: The Leaders
Originally spun out of Stanford University as a research project, Tableau started out by offering visualization techniques for exploring and analyzing relational databases and data cubes and has expanded to include Big Data research. It offers visualization of data from any source, from Hadoop to Excel files, unlike some visualization products that only work with certain sources, and works on everything from a PC to an iPhone.
New Relic uses a SaaS model for monitoring Web and mobile applications in real-time that run in the cloud, on-premises, or in a hybrid mix. It uses more than 50 plug-ins from technology partners to connect to its monitoring dashboard. The plug-ins include PaaS/cloud services, caching, database, Web servers and queuing. Its Insights software for analysis works across the entire New Relic product line, and the company offers a product called Insights Data Explorer that is designed to make it easier for everyone on a software team to explore Insights events.
Alation crawls an enterprise to catalog every bit of information it finds and then centralizes the organization's knowledge of data, automatically capturing information on what the data describes, where the data comes from, who's using it and how it's used. In other words, it turns all your data into metadata, and allows for fast searches using English words and not computer strings. The company's products provide collaborative analytics for faster insight, a unified means of search, provides a more optimized data structure of the company's data, and assists in better data governance.
Teradata has built a portfolio of Big Data apps into what it calls its Unified Data Architecture, which includes Teradata QueryGrid, Teradata Listener, Teradata Unity and Teradata Viewpoint. QueryGrid provides a seamless data fabric across new and existing analytic engines, including Hadoop. Listener is the primary ingestion framework for organizations with multiple data streams, Unity is a portfolio of four integrated products for managing data flow throughout the process, and Viewpoint is a custom Web-based dashboard of tools to manage the Teradata environment.
VMware has incorporated Big Data into its flagship virtualization product, called VMware vSphere Big Data Extensions. BDE is a virtual appliance that enables administrators to deploy and manage the Hadoop clusters under vSphere. It supports a number of Hadoop distributions, including Apache, Cloudera, Hortonworks, MapR and Pivotal.
Splunk Enterprise started out as a log analysis tool but has since expanded its focus and now focuses on machine data analytics to make the information useable by anyone. It can monitor online end-to-end transactions, study customer behavior and usage of services in real time, monitor for security threats, and identify spot trends and sentiment analysis on social platforms.
Besides its mainframe and Power systems, IBM offers cloud services for massive compute scale through its Softlayer subsidiary. On the software side, its DB2, Informix and InfoSphere database software all support Big Data analytics and Cognos and SPSS analytics software specialize in BI and data insight. IBM also offers InfoSphere, the basic platform for building data integration and data warehousing used in a BD scenario.
Formerly known as WebAction, Striim is a real-time, data streaming analytics software platform that reads in data from multiple sources such as databases, log files, applications and IoT sensors and allows customers to react instantly. Enterprises can filter, transform, aggregate and enrich data as it is coming in, organizing it in-memory before it ever lands on disk.
SAP's main Big Data tool is its HANA in-memory relational database, which the company says can run analytics on 80 terabytes of data and integrates with Hadoop. Although HANA is a row-and-column database, it can perform advanced analytics, like predictive analytics, spatial data processing, text analytics, text search, streaming analytics, and graph data processing and has ETL (Extract, Transform, and Load) capabilities.
While some companies specialize in one or few sources of data, SAP deals with data from a wide range of sources, including data from sensors, machine logs and other equipment; human generated data – social, point of sale (POS), ERP, emails documents and other things that make up enterprise data.
A creation of Greenplum employees, Alpine Data Labs puts an easy-to-use advanced analytics interface on Apache Hadoop to provide a collaborative, visual environment for building analytics workflow and predictive models that anyone can use, rather than requiring a high-priced data scientist to program the analytics.
Oracle has its Big Data Appliance that combines an Intel server with a number of Oracle software products. They include Oracle NoSQL Database, Apache Hadoop, Oracle Data Integrator with Application Adapter for Hadoop, Oracle Loader for Hadoop, Oracle R Enterprise tool, which uses the R programming language and software environment for statistical computing and publication-quality graphics, Oracle Linux and Oracle Java Hotspot Virtual Machine.
Calling itself the leader in self-service data analytics, Alteryx's software is meant for the business user and not the data scientist. It allows them to blend data from multiple and potentially disparate sources, analyze it and share it so that actions can be taken. Queries can be made from anything from a history of sales transactions to social media activity.
Splice Machine bills itself as the provider of the only Hadoop relationship database management system (RDBMS). It can act as a general-purpose database that can replace Oracle, MySQL or SQL Server databases for various workloads on Hadoop. The latest version, 2.0, added Spark, which does all analytics in memory instead of on disk. Version 2.0 also added the ability to route work to one of two processing engines either OLTP or OLAP.
Pentaho is a suite of open source-based tools for business analytics that has expanded to cover Big Data. The suite offers data integration, OLAP services, reporting, a dashboard, data mining and ETL capabilities.
Pentaho for Big Data is a data integration tool based specifically designed for executing ETL jobs in and out of Big Data environments such as Apache Hadoop or Hadoop distributions on Amazon, Cloudera, EMC Greenplum, MapR, and Hortonworks. It also supports NoSQL data sources such as MongoDB and HBase. The company was acquired by Hitachi Data Systems in 2015 but continues to operate as a separate subsidiary.
SiSense sells its Prism to the largest enterprises and some SMBs alike because of its small ElastiCube product, a high-performance analytical database tuned specifically for real-time analytics. ElastiCubes are super-fast data stores that are specifically designed for extensive querying. They are positioned as a cheaper alternative to HP's Vertica systems.
Thoughtworks incorporates Agile software development principals into building Big Data applications through its Agile Analytics product. Agile Analytics helps companies build applications for data warehousing and business intelligence using the fast paced Agile process for quick and continuous delivery of newer applications to extract insight from data.
Tibco's Jaspersoft subsidiary has introduced an hourly offering on Amazon's Cloud where you can buy analytics starting at $0.48 per hour. The company is also big on embedded its analytics – having done so with 130,000 production applications worldwide, used by organizations such as Red Hat, CA, Verizon, Tata, Groupon, British Telecom, Virgin, and the U.S. Navy.
Amazon has a number of enterprise Big Data platforms, including the Hadoop-based Elastic MapReduce, Kinesis Firehose for streaming massive amounts of data into AWS, Kinesis Analytics to analyze the data, DynamoDB big data database, NoSQL and HBase, and the Redshift massively parallel data warehouse. All of these services work within its greater Amazon Web Services offerings.
Most significant, AWS is attempting to woo legacy database customers to its newer offering. Experts disagree on how successful AWS will be in this effort, but it is clearly a highly aggressive competitive move.
Microsoft's Big Data strategy is fairly broad and has grown fast. It has a partnership with Hortonworks and offers the HDInsights tool based for analyzing structured and unstructured data on Hortonworks Data Platform. Microsoft also offers the iTrend platform for dynamic reporting of campaigns, brands and individual products. SQL Server 2016 comes with a connector to Hadoop for Big Data processing, and Microsoft recently acquired Revolution Analytics, which made the only Big Data analytics platform written in R, a programming language for building Big Data apps without requiring the skills of a data scientist.
Google continues to expand on its Big Data analytics offerings, starting with BigQuery, a cloud-based analytics platform for quickly analyzing very large datasets. BigQuery is serverless, so there is no infrastructure to manage and you don't need a database administrator, it uses a pay-as-you-go model.
Google also offers Dataflow, a real time data processing service, Dataproc, a Hadoop/Spark-based service, Pub/Sub to connect your services to Google messaging, and Genomics, which is focused on genomic sciences.
Mu Sigma offers an analytics services framework that looks at tables and tables and answers questions for the firm on issues like improved sales and marketing. It cleans up client data to show only relevant data, uses the data to understand it, generates insights from it and gives recommendations to the client. Mu Sigma tries to understand how the business actually works and then identifies where the problem actually is.
HP Enterprise has built up a considerable portfolio of Big Data products in a very short time. Its main product is the Vertica Analytics Platform, designed to manage large, fast-growing volumes of structured data and provide very fast query performance on Hadoop and SQL Analytics for petabyte scalability.
HPE IDOL software provides a single environment for structured, semi-structured and unstructured data. It supports hybrid analytics leveraging statistical techniques and Natural Language Processing (NLP).
HPE has a number of hardware products, including HPE Moonshot, the ultra-converged workload servers, the HPE Apollo 4000 purpose-built server for Big Data, analytics and object storage. HPE ConvergedSystem is designed for SAP HANA workloads and HPE 3PAR StoreServ 20000 stores analyzed data, addressing existing workload demands and future growth.
BigPanda offers a data science algorithm-based platform specifically for IT and DevOps staff that is specifically geared toward addressing alert overload. One of the many sources of Big Data is logs, and they can quickly get out of hand with redundant or false alerts. The company noticed that developers were being overwhelmed with alerts from their logs and had no idea which were real and which were false flags. BigPanda filters down that overload to just the meaningful alerts, allowing IT to react quicker to real problems.
A highly vertical but important service, Cogito Dialog uses behavioral analytics technology, including analysis of everything from customer emails to social media to analysis of the human voice, to help phone support personnel improve their communications while on the phone with customers and to help organizations better manage agent performance.
Datameer claims its end-to-end data analytics solution for Hadoop enables business users to discover insights in any data via wizard-based data integration, iterative point-and-click analytics, and drag-and-drop visualizations, regardless of the data type, size, or source.