Also see: Top 15 Data Warehouse Tools
The Big Data market is enjoying dramatic growth, based on the surging interest in the competitive advantage offered by Big Data analytics. Indeed, Big Data software is still in sharp growth mode, with big advances in predictive analytics tools and data mining tools, along with next-gen artificial intelligence.
The Big Data industry itself is the very picture of swirling change. New companies, new products, new approaches and methodologies – change is a constant. Big Data startups are looking for a piece of the pie, doing their best to steal market share from the blue chip companies that won the business intelligence software market.
So, which tools and platforms should you choose? Here are 25 of the top Big Data companies to consider in the Big Data world.
Please note: this list is not a ranking. The companies on this list serve different aspects of the market, making ranking them in any order beyond revenue impossible and unfair.
Big Data companies are forecast to see dramatic revenue increases in the years ahead.
Big Data Companies: The Big Data Leaders
Truly a cloud-native company, Snowflake offers a cloud-based data platform that features a cloud data lake and a data warehouse as a service. In essence it’s a platform that leverages the best of Big Data and cloud in combination, enabling users to mine vast quantities of data using the cloud. Founded in 2012, it runs on Microsoft Azure, AWS and Google Cloud. Among its most important features, the company’s Data Exchange helps companies share data in a secure environment. A star of the data community, Snowflake went public in September 2020.
Intellias has expertise across a wide array of verticals, including FinTech, retail, telecom and insurance. Touting itself as “intelligent software engineering,” the company works across a variety of sectors, from pure technology to consulting services. Intellias has had success leveraging Big Data in location-based services and geospatial initiatives – definitely a growing market as IoT lays sensors across an ever widening area. The company has also done data analytics projects for gaming, which suggests that it understands the needs of the consumer market.
Visual BI Solutions
Visual BI offers cloud-based or in-house business intelligence software that enables easy-to-understand visualizations of a company’s data trends. In fact, the company lives up to its name: if you want a colorful, visual representation of your data, that’s exactly what Visual BI is geared for: bar charts, tables, line graphs – the company’s solution represents data visually in plenty of different ways. The product is considered a good value for the money. Targeting some of the more commonly used platforms, Visual BI has products designed for Microsoft Power BI and SAP business intelligence software. That is, Visual BI offers custom visual extensions to these SAP and Microsoft BI platforms. |For custom support, Visual BI maintains a global help desk.
Salesforce, the king of SaaS, became a data analytics software vendor when it announced plans to purchase Tableau Systems, a data visualization firm that has expanded from its original mission to include Big Data research. It offers visualization of data from any source, from Hadoop to Excel files. Salesforce has its own Big Data tools in joined reports, which lets customers compare different data sets in the hopes of getting insights from customer data.
Long known for its data warehouse products, Teradata also has a portfolio of Big Data apps called its Unified Data Architecture. Teradata QueryGrid provides a seamless data fabric across new and existing analytic engines, including Hadoop. Teradata Listener is the primary data intake framework for organizations with multiple data streams. Teradata Unity is a portfolio of four integrated products for managing data flow throughout the process. Teradata Viewpoint is a custom Web-based dashboard of tools to manage the Teradata environment.
Microsoft's Big Data strategy – helped by its Azure cloud platform – is fairly broad and has grown fast. It has a partnership with Hortonworks and offers the HDInsights tool for analyzing structured and unstructured data on the Hortonworks Data Platform. Microsoft also offers the iTrend platform for dynamic reporting of campaigns, brands and individual products.
SQL Server 2016 comes with a connector to Hadoop for Big Data processing, and Microsoft recently acquired Revolution Analytics, which made the only Big Data analytics platform written in R, a programming language for building Big Data apps without requiring the skills of a data scientist.
Managers and executives can’t mine insights from Big Data platforms with a quality data source. That’s where XPlenty comes in: the company offers a cloud-based ETL solution. To be sure, the process of extract, transform and load is at the very core of an efficient Big Data process. The XPlenty platform is geared to provide a sophisticated toolkit for building data pipelines, connecting a diverse array of data storehouses and cloud-based applications. The company’s client list includes Deloitte, Accenture, Caterpillar, Abbott, and PWC.
SiSense sells its Prism to the largest enterprises and some SMBs alike because of its small ElastiCube product, a high-performance analytical database tuned specifically for real-time analytics. ElastiCubes are super-fast data stores that are specifically designed for extensive querying. They are positioned as a cheaper alternative to HP's Vertica systems.
Cloudera merged with Hortonworks, in a marriage of the two largest Hadoop providers. While both focused on the Hadoop market they took different approaches – and the combined company continues to pivot. The backstory: Hortonworks targeted more technical users and took a pure open source approach, while Cloudera went for the IT market and offered some proprietary tools. In 2020, Cloudera touts itself as offering "an enterprise data cloud for data, from the Edge to AI."
With an exclusive focus on data analytics, Datafactz has a number of vertically-based analytics solutions for key sectors, including retail, automotive, manufacturing, healthcare, insurance and banking. The company’s social media analytics, for instance, is geared to monitor, analyze and report on user generated content, using sentiment analysis – negative to positive – with the goal of delivering insights in an easily digestible format. Founded in 2002, the company has 950 employees and claims 125 clients. Datafactz’s client list includes GAP, Coca-Cola, The Cheesecake Factory, and AAA.
IBM supports Big Data analytics through a number of databases, including DB2, Informix, and InfoSphere. It also has popular analytics applications such as Cognos and SPSS. In terms of pure Big Data, IBM has its own Hadoop distribution, Stream Computing to perform real-time data processing, IBM BigInsights for Apache Hadoop, and IBM BigInsights on Cloud offering Hadoop as a service through IBM Cloud.
Among HPE’s data offerings is HPE Greenlake for Big Data. Designed as an as-a-service solution, Greenlake’s goal is to offer faster data mining by lowering the challenges and costs for the Hadoop platform, among other advantages. It does this by offering software-hardware combinations for in-house installation, along with tools to monitor and manage data activity.
HPE has a number of hardware products, including HPE Moonshot, the ultra-converged workload servers, and the HPE Apollo 4000 purpose-built server for Big Data, analytics and object storage. HPE ConvergedSystem is designed for SAP HANA workloads, and HPE 3PAR StoreServ 20000 stores analyzed data, addressing existing workload demands and future growth. HPE also has HAVEn, a Big Data platform available on-demand focused on machine learning.
SAP's main Big Data tool is its HANA in-memory relational database that works with Hadoop. HANA is a traditional row-and-column database, but it can perform advanced analytics, like predictive analytics, spatial data processing, text analytics, text search, streaming analytics, and graph data processing and has ETL (Extract, Transform, and Load) capabilities. SAP also offers data warehousing to manage all of your data from a single platform, cloud services, as well as data management tools for governance, orchestration, cleansing, and storage.
Oracle has a dedicated Big Data Appliance server preloaded and configured with a number of Oracle software products. This includes Oracle Autonomous Data Warehouse, Oracle NoSQL Database, Apache Hadoop, Oracle Data Integrator with Application Adapter for Hadoop, and Oracle Loader for Hadoop. It also has a number of on-premises and cloud-based analytics products as well as integration platforms and streaming analytics to handle data as it comes in.
The Apache Hadoop software library remains the framework for Big Data although many vendors have taken the framework and built their own proprietary and unique functions on it. The base system provides an outline to do your own customization and is designed to scale up from a single server to thousands. Apache also offers Spark, which does in-memory, real-time processing. Apache also offers Storm, a real-time, fault-tolerant processing system designed to run parallel calculations that run across a cluster of machines.
HPCC stands for High-Performance Computing Cluster and was developed by LexisNexis Risk Solution. It delivers on a single platform, a single architecture and a single programming language (C++) as well as a data-centric programming language known as ECL (Enterprise Control Language) for data processing. Its Thor platform is designed for high performance parallel batch processing.
Having established itself as a SaaS leader in office productivity and CRM tools, Zoho offers a versatile data analytics platform geared for both professional data scientists and mid-level staffers who want a self-service option. The application has an intuitive drag and drop interface as well as a classic spreadsheet-style interface. Zoho Analytics is geared for organizations that want to provide actionable data analytics insight to staffers at every level.
Alteryx brings Big Data analytics processing to a wide variety of popular databases, including Amazon Redshift, Apache Hive, Cloudera Impala, multiple Microsoft databases, SAP HANA, Teradata, Oracle, and more to perform analytics within the database. With no coding required, user can select, filter, create formulas, and build summaries where the data lies. Queries can be made from anything from a history of sales transactions to social media activity.
Thoughtworks is the employer of many executives integral to the creation of Agile software development concepts and has built Agile development processes into its tools for building Big Data applications. Its Agile Analytics products apply Agile principles for building data warehousing and business intelligence applications, using continuous integration and continuous delivery.
Talend Platform for Big Data is an open source software integration platform to connect Hadoop, NoSQL, MapReduce and Spark specifically for integration to perform extract, load, and transform (ETL) process on MapR large and diverse data sets for better insights or process optimization. For real-time ETL, Talend supports Spark streaming, Machine learning, and IoT.
Amazon Web Services
Amazon Web Services offers an array of Big Data products, the main one being the Hadoop-based Elastic MapReduce (EMR), plus Athena for basic database analytics, Kinesis and Storm for real-time analytics, and a number of databases, including DynamoDB Big Data database, Redshift, and NoSQL.
Naturally, AWS benefits greatly in the data market from its overwhelming cloud presence. Many clients turn to their existing cloud provider to purchase Big Data services, which create an enormous natural funnel for AWS.
Splunk Enterprise started as a log analysis tool but has since expanded its focus to include machine data analytics for monitoring end-to-end transactions for any threats or unusual behavior. Splunk’s Big Data solutions include Splunk Analytics for Hadoop to do the analytics, the Splunk ODBC Driver for connecting to enterprise applications like Tableau, and Splunk DB Connect for connecting to a variety of data sources.
Google continues to expand on its Big Data analytics offerings, starting with BigQuery, a cloud-based analytics platform for quickly analyzing very large datasets. BigQuery is serverless, so there is no infrastructure to manage and you don't need a database administrator, it uses a pay-as-you-go model.
Google also offers Dataflow, a real time data processing service, Dataproc, a Hadoop/Spark-based service, Pub/Sub to connect your services to Google messaging, and Genomics, which is focused on genomic sciences.
TIBCO has a variety of offerings, starting with Spotfire for performing visual analytics, Statistica for moving data around through complex pipelines for processing, and the Alpine Data Labs advanced analytics interface on Apache Hadoop that provides a collaborative, visual environment for building analytics workflow and predictive models. Tibco's Jaspersoft subsidiary has introduced an hourly offering on Amazon's Cloud where you can buy analytics starting at $0.48 per hour.
Pentaho is a suite of open source-based tools for business analytics that offers data integration, OLAP services, reporting, a dashboard, data mining and ETL capabilities. Pentaho for Big Data is a data integration tool specifically designed for executing ETL jobs in and out of Big Data environments such as Apache Hadoop or Hadoop distributions on Amazon, Cloudera, EMC Greenplum, MapR, and Hortonworks. It also supports NoSQL data sources such as MongoDB and HBase. The company was acquired by Hitachi Data Systems in 2015 but continues to operate as a separate subsidiary.
Datameer offers a unified platform for end-to-end data analytics solutions on Hadoop. It enables business users to discover insights in any data via wizard-based data integration. Datameer’s platform covers the entire data lifecycle, from ingestion, preparation, exploration, and finally consumption. This allows analysts to create and manage their own analytic data pipelines for point-and-click analytics and drag-and-drop visualizations, regardless of the data type, size, or source.
Alation crawls an enterprise to catalog every bit of information it finds and then centralizes the organization's knowledge of data, automatically capturing information on what the data describes, where the data comes from, who's using it and how it's used. In other words, it turns all your data into metadata, and allows for fast searches using English words and not computer strings. The company's products provide collaborative analytics for faster insight, a unified means of search, provides a more optimized data structure of the company's data, and assists in better data governance.
BigPanda’s Autonomous Operations platform helps IT, networking, and DevOps teams detect, investigate, and resolve IT incidents faster by monitoring logs, with an emphasis on alert overloads. Logs are a chief source of data but it’s easy for a team to be overwhelmed with redundant or false alerts. BigPanda correlates IT noise into insights, automates incident management, and unifies fragmented IT operations.
Splice Machine bills itself as the provider of the only Hadoop-based relationship database management system (RDBMS) which does online and offline batch analysis and in real time. It can act as a general-purpose database that can replace SQL databases like Oracle, MySQL, or SQL Server in a Hadoop environment. The Splice Machine RDBMS executes operational workloads on Apache HBase and analytical workloads on Apache Spark.
Formerly known as WebAction, Striim is a real-time, data streaming analytics software platform that reads in data from multiple sources such as databases, log files, applications and IoT sensors and allows customers to react instantly. Enterprises can filter, transform, aggregate and enrich data as it is coming in, organizing it in-memory before it ever lands on disk.
Mu Sigma offers an analytics services framework specifically for large enterprises that are designed to improve sales and marketing. It cleans up client data to show only relevant information, uses the data to understand it, generates insights from it and gives recommendations to the client. It offers marketing analytics to cover sales patterns and customer engagement along with risk analytics, such as predictive modeling of claims, credit scoring, fraud detection and prediction, and so on.
Alpine Data Labs
A creation of Greenplum employees, Alpine Data Labs puts an easy-to-use advanced analytics interface on Apache Hadoop to provide a collaborative, visual environment for building analytics workflow and predictive models that anyone can use, rather than requiring a high-priced data scientist to program the analytics.
A highly vertical but important service, Cogito Dialog uses behavioral analytics technology, including analysis of customer interactions ranging from emails to social media to human voice analysis during service calls to help phone support personnel improve their communications while on the phone with customers and to help organizations better manage agent performance. Cogito’s software evaluates hundreds of behavioral signals through voice to provide live conversation coaching for agents and a real-time measure of customer experience for every call.
New Relic is one of the few Big Data vendors that use a SaaS model rather than on-premises. Its services monitor Web and mobile applications in real-time that run in the cloud, on-premises, or in a hybrid mix. It monitors the app for any potential problems with the user experience, and New Relic Insights provides a dashboard for user behavior and application performance.
VMware has a Big Data extension in its flagship virtualization product called VMware vSphere Big Data Extensions (BDE) designed to ease the deployment and management of Hadoop clusters under vSphere. It supports a number of Hadoop distributions, including Apache, Cloudera, Hortonworks, MapR and Pivotal. Through vSphere vCentral, Hadoop clusters can be managed and scaled as demand increases.