Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business
Big Data is no longer an experiment, it is an essential part of doing business. IDC estimates that worldwide revenues for Big Data and business analytics (BDA) will reach $150.8 billion in 2017, an increase of 12.4% over 2016. By 2020, revenues will be more than $210 billion.
Much of that is in hardware and services. For Big Data software, in some cases the needs of each company are unique based on industry vertical. Even in the same industry, like retail or manufacturing, needs will differ from one company to the next, so it’s hard to develop packaged software to serve all potential customers across industry.
The key to success is providing the base applications and tools for companies to build their custom applications. There is where we see the real action in what qualifies as Big Data application software. Below is a list of 20 such firms that specialize in one form of Big Data building block or another. Many of these firms have roots in business intelligence, which predates Big Data by years and is essentially the same thing, just not as comprehensive (nor was it ever in real-time) as way Big Data tries to be.
A surprising number are big name, old guard firms, showing you can teach an old dog new tricks. There are, however, some notable startups included as well.
This list is in no particular order.
Former Omniture CEO Josh James founded Domo in 2010 to give businesses a way to visualize their data from different and disparate silos of origin. It automatically pulls in data from spreadsheets, social media, on-premise storage, databases, cloud-based apps, and data warehouses and presents information on a customizable dashboard. It has been lauded for its ease of use and how it can be set up and used by pretty much anyone, not just a data scientist. It comes with a number of preloaded designs for charts and data sources to get moving quickly.
Starting with Teradata Database 15, the company added new Big Data capabilities like the Teradata Unified Data Architecture, enabling companies to access and process analytic queries across multiple systems, including bi-direction data import and export from Hadoop. It also added 3D representation and processing of geospatial data, along with enhanced workload management and system availability. A cloud-based version supporting AWS and Azure is called Teradata Everywhere, featuring massive parallel processing analytics between public cloud-based data and on-premises data.
3) Big Data by Hitachi Vantara
Hitachi Vantara’s Big Data products are built on some popular open source tools. Formed in 2017, Hitachi Vantara combines the Hitachi Data Systems storage and data center infrastructure business, the Hitachi Insight Group IoT business and Hitachi’s Pentaho Big Data business into a combined company. Pentaho is based on the Apache Spark in-memory computing framework and the Apache Kafka messaging system. Pentaho 8.0 also added support for the Apache Knox Gateway to authenticate users and enforce access rules for accessing Big Data repositories. It also adds support for building analytics apps via Docker containers.
TIBCO’s Statistica is predictive analytics software for businesses of all sizes, using Hadoop technology to perform data mining on structured and unstructured data, addresses IoT data, has the ability to deploy analytics on devices and gateways anywhere in the world, and supports in-database analytics capabilities from platforms such as Apache Hive, MySQL, Oracle, and Teradata. It uses templates for designing complete analyses, so less technical users can do their own analysis, and the models can be exported from PCs to other devices.
Panoply sells what it calls the Smart Cloud Data Warehouse by using AI to eliminate the development and coding needed for transforming, integrating, and managing data. The company claims its Smart Cloud Data Warehouse essentially provides data management-as-a-service, able to consume and process up to a petabyte of data without any intervention. Its machine learning algorithms can examine data from any data source and perform queries and visualizations on that data.
Watson Analytics is IBM’s cloud-based analytics service. When you upload data to Watson, it presents you with the questions it can help answer based on its analysis of the data and provide key data visualizations immediately. It also does simple analysis, predictive analytics, smart data discovery, and offers a variety of self-service dashboards. IBM has another analytics product, SPSS, which can be used to uncover patterns from data and find associations between data points.
Statistical Analysis System (SAS) was founded in 1976, long before the term Big Data was coined, for the purpose of handling large data volumes. It can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on said data, then present it in a range of methods, like statistics, graphs, and such, or write the data out to other files. It supports all types of data forecasting and analysis essentials and comes with forecasting tools to analyze and forecast processes.
Sisense claims it offers the only business intelligence software that makes it easy for users to prepare, analyze and visualize complex data by drawing from multiple sources on commodity server hardware. Sisense’s In-Chip high performance data engine can perform queries on a terabyte of data in under one second, and it comes with a batch of templates for different industries.
9) Big Data Studio by Talend
Talend has always focused on generating clean, native code for Hadoop, eliminating the need to manually code everything. It provides interfaces to a variety of Big Data repositories, like Cloudera, MapR, Hortonworks, and Amazon EMR. It recently added a Data Preparation app that lets customers create a common dictionary, and using machine learning, automates the data cleansing process to get data ready for processing in less time.
The most popular provider and supporter of Apache Hadoop, it has partnerships with Dell, Intel, Oracle, SAS, Deloitte and Capgemini. It consists of five primary applications: Cloudera Essentials, the core data management platform; Cloudera Enterprise Data Hub, the data management platform; Cloudera Analytic DB for BI and SQL-based analytics; Cloudera Operational DB, its highly scalable NoSQL database, and Cloudera Data Science and Engineering, the data processing, data science, and machine learning that run on top of the Core Essentials platform.
Big Data databases are traditionally unstructured, meaning any kind of data can be stored in them. Micro Focus’s Vertica Analytics Platform is in the traditional column-oriented, relational database format, but it is specifically designed to handle modern analytical workloads coming from a Hadoop cluster. The platform uses a clustered approach for storing data with full support for SQL, JDBC and ODBC. It uses a columnar store rather than row store because it’s easier to access columns for grouping data.
13) SAP Vora
On its own, SAP HANA isn’t meant for Big Data. It’s an in-memory RDBMS system. But when you add HANA Vora, a Big Data interface, it becomes more viable. Vora allows HANA to connect to Hadoop repositories and extends the Apache Spark execution framework for interactive analytics on enterprise and Hadoop data. So data scientists get the power of HANA with support for Big Data stores.
The database giant has a full suite of Big Data integration products, such as its Data Integration Platform Cloud, which supports real-time data streaming, batch data processing, enterprise data quality, and data governance capabilities, Stream Analytics, IOT support and support for Apache Kafka through the Oracle Event Hub Cloud Service.
15) Apache Cassandra
While MongoDB is the leading database, Cassandra has the edge in scalability. Written by former Facebook employees, it scales across a massive number of commodity servers, ensuring no point of failure and premium fault tolerance.
17) Wolfram Alpha
Want to calculate or know something new about things? Wolfram Alpha is an awesome tool to look for information about just about everything. Doug Smith from Proessaywriting says that his company uses this platform for advanced research of financial, historical, social, and other professional areas. For example, if you type “Microsoft,” you receive input interpretation, fundamentals and financials, latest trade, price history, performance comparisons, data return analysis, correlation matrix, and many other information.
18) Tibco Spotfire
Spotfire is an in-memory analytics platform that was upgraded to include support for Big Data repositories and perform predictive analytics. It features a connector for Apache Hadoop, which will let users perform data mashups, data discovery and analytics tasks on Big Data the way they do with Oracle, SAP, and other traditional data sources. It also supports real-time data-driven event visualization and has a AI-driven recommendation engine to shorten data discovery time.
AnswerRocket specializes in natural language search data discovery, making it a tool for business users rather than an esoteric tool for data scientists. It can provide answers in minutes rather than waiting days for a query to be formed. AnswerRocket users can ask questions using everyday language and get visualizations in seconds, then they can drill down on a particular chart or graph for further insight.
Tableau specializes in drawing from multiple data silos and integrating it into a single dashboard with just a few clicks, to create interactive and flexible dashboards making use of custom filters and drag and connections. Tableau also uses natural language queries so you can ask business questions, not technology questions.