Also see: Top 15 Data Warehouse Tools
The Big Data market is rapidly undergoing the contortions that define market maturity; namely, consolidation. When a new market forms, there is a flood of startups all looking for a slice of the pie.
Eventually leaders emerge and stragglers fall away. Such is the case with the data analytics market, which is entering its second decade. The merger of leaders Hortonworks and Cloudera and the financial struggles of MapR are both signs of a ripening, maturing market that is now in consolidation.
Hence the need for this revised list of top Big Data companies. Players have fallen for a variety of reasons while new companies have taken their place. The list of Big Data companies below emphasizes companies with products you can buy - not leading users. So while outfits like Twitter and Facebook make heavy use of Big Data, they don't offer a marketed product to purchase.
Big Data has matured differently than most technologies. First, no single leader has emerged after nearly a decade. Big Data software is still in growth mode, with big advances in predictive analytics tools and data mining tools, along with next-gen artificial intelligence. Given that the technology itself has not matured, it follows that the industry cannot yet consolidate.
Second, the big names got in the market early in a big way. That's also unprecedented, because established vendors have traditionally been notoriously slow to embrace a new technology. But already, IBM, Microsoft, SAP, HP, and Oracle are in the game. The blue chips companies earlier presence in business intelligence software paved the way.
So, which tools and platforms should you choose? Here are 25 of the top companies to consider in the Big Data world.
Please note: this list is NOT a ranking. The companies on this list serve different aspects of the market, making ranking them in any order beyond revenue impossible and unfair.
While many Big Data companies offer tools for on any number of industry verticals and/or functions, most companies offer a platform that works across sectors.
Big Data Companies: The Big Data Leaders
Having established itself as a SaaS leader in office productivity and CRM tools, Zoho offers a versatile data analytics platform geared for both professional data scientists and mid-level staffers who want a self-service option. The application has an intuitive drag and drop interface as well as a classic spreadsheet-style interface. Zoho Analytics is geared for organizations that want to provide actionable data analytics insight to staffers at every level.
Salesforce, the king of SaaS, became a software vendor when it announced plans to purchase Tableau Systems, a data visualization firm that has expanded from its original mission to include Big Data research. It offers visualization of data from any source, from Hadoop to Excel files. Salesforce has its own Big Data tools in joined reports, which lets customers compare different data sets in the hopes of getting insights from customer data.
Cloudera recently merged with Hortonworks, in a marriage of the two largest Hadoop providers. While both focused on the Hadoop market they took different approaches. Hortonworks targeted more technical users and took a pure open source approach, while Cloudera went for the IT market and offered some proprietary tools. Combined, the firm says it will offer a broad spectrum of Hadoop products.
Long known for its data warehouse products, Teradata also a portfolio of Big Data apps called its Unified Data Architecture. Teradata QueryGrid provides a seamless data fabric across new and existing analytic engines, including Hadoop. Teradata Listener is the primary data intake framework for organizations with multiple data streams. Teradata Unity is a portfolio of four integrated products for managing data flow throughout the process. Teradata Viewpoint is a custom Web-based dashboard of tools to manage the Teradata environment.
IBM supports Big Data analytics through a number of databases, including DB2, Informix, and InfoSphere. It also has popular analytics applications such as Cognos and SPSS. In terms of pure Big Data, IBM has its own Hadoop distribution, Stream Computing to perform real-time data processing, IBM BigInsights for Apache Hadoop, and IBM BigInsights on Cloud offering Hadoop as a service through IBM Cloud.
HP Enterprise’s main Big Data product is Vertica Analytics Platform, designed to manage a large volume of structured data with fast query performance on Hadoop and SQL Analytics. It also has Vertica Advanced Analytics for deployment across multiple clouds, commodity hardware, and on any Hadoop distribution system. HPE also has HAVEn, a Big Data platform available on demand focused on machine learning.
HPE has a number of hardware products, including HPE Moonshot, the ultra-converged workload servers, the HPE Apollo 4000 purpose-built server for Big Data, analytics and object storage. HPE ConvergedSystem is designed for SAP HANA workloads and HPE 3PAR StoreServ 20000 stores analyzed data, addressing existing workload demands and future growth.
SAP's main Big Data tool is its HANA in-memory relational database that works with Hadoop. HANA is a traditional row-and-column database, but it can perform advanced analytics, like predictive analytics, spatial data processing, text analytics, text search, streaming analytics, and graph data processing and has ETL (Extract, Transform, and Load) capabilities. SAP also offers data warehousing to manage all of your data from a single platform, cloud services, as well as data management tools for governance, orchestration, cleansing, and storage.
Oracle has a dedicated Big Data Appliance server preloaded and configured with a number of Oracle software products. This includes Oracle Autonomous Data Warehouse, Oracle NoSQL Database, Apache Hadoop, Oracle Data Integrator with Application Adapter for Hadoop, and Oracle Loader for Hadoop. It also has a number of on-premises and cloud-based analytics products as well as integration platforms and streaming analytics to handle data as it comes in.
The Apache Hadoop software library remains the framework for Big Data although many vendors have taken the framework and built their own proprietary and unique functions on it. The base system provides an outline to do your own customization and is designed to scale up from a single server to thousands. Apache also offers Spark, which does in-memory, real-time processing. Apache also offers Storm, a real-time, fault-tolerant processing system designed to run parallel calculations that run across a cluster of machines.
SiSense sells its Prism to the largest enterprises and some SMBs alike because of its small ElastiCube product, a high-performance analytical database tuned specifically for real-time analytics. ElastiCubes are super-fast data stores that are specifically designed for extensive querying. They are positioned as a cheaper alternative to HP's Vertica systems.
HPCC stands for High-Performance Computing Cluster and was developed by LexisNexis Risk Solution. It delivers on a single platform, a single architecture and a single programming language (C++) and a data-centric programming language known as ECL (Enterprise Control Language) for data processing. Its Thor platform is designed for high performance parallel batch processing.
Alteryx brings Big Data analytics processing to a wide variety of popular databases, including Amazon Redshift, Apache Hive, Cloudera Impala, multiple Microsoft databases, SAP HANA, Teradata, Oracle, and more to perform analytics within the database. With no coding required, user can select, filter, create formulas, and build summaries where the data lies. Queries can be made from anything from a history of sales transactions to social media activity.
Microsoft's Big Data strategy – helped by its Azure cloud platform – is fairly broad and has grown fast. It has a partnership with Hortonworks and offers the HDInsights tool based for analyzing structured and unstructured data on Hortonworks Data Platform. Microsoft also offers the iTrend platform for dynamic reporting of campaigns, brands and individual products.
SQL Server 2016 comes with a connector to Hadoop for Big Data processing, and Microsoft recently acquired Revolution Analytics, which made the only Big Data analytics platform written in R, a programming language for building Big Data apps without requiring the skills of a data scientist.
Thoughtworks is the employer of many executives integral to the creation of Agile software development principals and has built Agile development processes into its tools for building Big Data applications. Its Agile Analytics products apply Agile principals for building data warehousing and business intelligence applications, using continuous integration and continuous delivery.
Talend Platform for Big Data is an open source software integration platform to connect Hadoop, NoSQL, MapReduce and Spark specifically for integration to perform extract, load, and transform (ETL) process on MapR large and diverse data sets for better insights or process optimization. For real time ETL, Talend supports Spark streaming, Machine learning, and IoT.
Amazon Web Services
Amazon Web Services offers an array of Big Data products, the main one being the Hadoop-based Elastic MapReduce (EMR), plus Athena for basic database analytics, Kinesis and Storm for real-time analytics, and a number of databases, including DynamoDB Big Data database, Redshift, and NoSQL.
Naturally, AWS benefits greatly in the data market from its overwhelming cloud presence. Many clients turn to their existing cloud provider to purchase Big Data services, which create an enormous natural funnel for AWS.
Splunk Enterprise started as a log analysis tool but has since expanded its focus to include machine data analytics for monitoring end to end transactions for any threats or unusual behavior. Splunk’s Big Data solutions include Splunk Analytics for Hadoop to do the analytics, the Splunk ODBC Driver for connecting to enterprise applications like Tableau, and Splunk DB Connect for connecting to a variety of data sources.
Google continues to expand on its Big Data analytics offerings, starting with BigQuery, a cloud-based analytics platform for quickly analyzing very large datasets. BigQuery is serverless, so there is no infrastructure to manage and you don't need a database administrator, it uses a pay-as-you-go model.
Google also offers Dataflow, a real time data processing service, Dataproc, a Hadoop/Spark-based service, Pub/Sub to connect your services to Google messaging, and Genomics, which is focused on genomic sciences.
TIBCO has a variety of offerings, starting with Spotfire for performing visual analytics, Statistica for moving data around through complex pipelines for processing, and the Alpine Data Labs advanced analytics interface on Apache Hadoop that provides a collaborative, visual environment for building analytics workflow and predictive models. Tibco's Jaspersoft subsidiary has introduced an hourly offering on Amazon's Cloud where you can buy analytics starting at $0.48 per hour.
Pentaho is a suite of open source-based tools for business analytics that offers data integration, OLAP services, reporting, a dashboard, data mining and ETL capabilities. Pentaho for Big Data is a data integration tool based specifically designed for executing ETL jobs in and out of Big Data environments such as Apache Hadoop or Hadoop distributions on Amazon, Cloudera, EMC Greenplum, MapR, and Hortonworks. It also supports NoSQL data sources such as MongoDB and HBase. The company was acquired by Hitachi Data Systems in 2015 but continues to operate as a separate subsidiary.
Datameer offers a unified platform for end-to-end data analytics solutions on Hadoop. It enables business users to discover insights in any data via wizard-based data integration. Datameer’s platform covers the entire data lifecycle, from ingestion, preparation, exploration, and finally consumption. This allows analysts to create and manage their own analytic data pipelines for point-and-click analytics and drag-and-drop visualizations, regardless of the data type, size, or source.
Alation crawls an enterprise to catalog every bit of information it finds and then centralizes the organization's knowledge of data, automatically capturing information on what the data describes, where the data comes from, who's using it and how it's used. In other words, it turns all your data into metadata, and allows for fast searches using English words and not computer strings. The company's products provide collaborative analytics for faster insight, a unified means of search, provides a more optimized data structure of the company's data, and assists in better data governance.
BigPanda’s Autonomous Operations platform helps IT, networking, and DevOps teams detect, investigate, and resolve IT incidents faster by monitoring logs, with an emphasis on alert overloads. Logs are a chief source of data but it’s easy for a team to be overwhelmed with redundant or false alerts. BigPanda correlates IT noise into insights, automates incident management, and unifies fragmented IT operations.
Splice Machine bills itself as the provider of the only Hadoop-based relationship database management system (RDBMS) which does online and offline batch analysis and in real time. It can act as a general-purpose database that can replace SQL databases like Oracle, MySQL, or SQL Server in a Hadoop environment. The Splice Machine RDBMS executes operational workloads on Apache HBase and analytical workloads on Apache Spark.
Formerly known as WebAction, Striim is a real-time, data streaming analytics software platform that reads in data from multiple sources such as databases, log files, applications and IoT sensors and allows customers to react instantly. Enterprises can filter, transform, aggregate and enrich data as it is coming in, organizing it in-memory before it ever lands on disk.
Mu Sigma offers an analytics services framework specifically for large enterprises that are designed to improve sales and marketing. It cleans up client data to show only relevant information, uses the data to understand it, generates insights from it and gives recommendations to the client. It offers marketing analytics to cover sales patterns and customer engagement along with risk analytics, such as predictive modeling of claims, credit scoring, fraud detection and prediction, and so on.
Alpine Data Labs
A creation of Greenplum employees, Alpine Data Labs puts an easy-to-use advanced analytics interface on Apache Hadoop to provide a collaborative, visual environment for building analytics workflow and predictive models that anyone can use, rather than requiring a high-priced data scientist to program the analytics.
A highly vertical but important service, Cogito Dialog uses behavioral analytics technology, including analysis of customer interactions ranging from emails to social media to human voice analysis during service calls to help phone support personnel improve their communications while on the phone with customers and to help organizations better manage agent performance. Cogito’s software evaluates hundreds of behavioral signals through voice to provide live conversation coaching for agents and a real-time measure of customer experience for every call.
New Relic is one of the few Big Data vendors that uses a SaaS model rather than on-premises. Its services monitor Web and mobile applications in real-time that run in the cloud, on-premises, or in a hybrid mix. It monitors the app for any potential problems with the user experience, and New Relic Insights provides a dashboard for user behavior and application performance.
VMware has a Big Data extension in its flagship virtualization product called VMware vSphere Big Data Extensions (BDE) designed to easy the deployment and management of Hadoop clusters under vSphere. It supports a number of Hadoop distributions, including Apache, Cloudera, Hortonworks, MapR and Pivotal. Through vSphere vCentral, Hadoop clusters can be managed and scaled as demand increases.