Also see: Top Big Data Companies
For Big Data companies, this is a critical period for competitive jockeying. These are the early days of Big Data, which means there are still a plethora of companies – a mix of new firms and old guard Silicon Valley firms – looking to stay current. Like everything else, the Big Data market will mature and consolidate. In five years, you can bet that many of the Big Data companies on this list will be gone – either out of business or merged/acquired with a larger player.
This isn't meant as a Big Data buyer's guide. Instead, it’s an overview of 30 big and small companies in the field of Big Data Analytics. We're not looking at hardware players, unless they have a software story that goes with it (and some do). The one thing they have in common is analytics.
Big Data Companies: Large Players
In addition to its big iron, IBM offers DB2, Informix and InfoSphere database software, Cognos and SPSS analytics applications, and of course its well-known Global Services division. IBM also supports the Hadoop analytics platform.
HP is a major hardware vendor and services provider, but its big analytics platform is Vertica, which it acquired in 2011. Vertica Analytics Platform is designed to manage large, fast-growing volumes of structured data and provide very fast query performance and petabyte scalability on commodity enterprise servers. It also has the Autonomy unit with its HAVEn software for analyzing and finding meaning from petabytes of structured and unstructured information.
EMC specializes in storage and its Big Data analytics are built around that. It has a Big Data group that covers hardware and software and a number of verticals, like high performance computing, enterprise and oil and gas exploration. EMC also has a Marketing Science Lab to help companies use Big Data analytics in their marketing department.
Teradata's Aster platform has a mix of analytics, including the Discovery Platform, a database, a discovery portfolio with pre-built functions for a broad set of Big Data applications, the Aster SQL-GR next-generation graph analytics engine, SNAP Framework for integration and a unified SQL interface across multiple analytic engines and data sources and its own MapReduce.
Oracle has its Big Data Appliance that combines an Intel server with a number of Oracle software products. They include Oracle NoSQL Database, Apache Hadoop, Oracle Data Integrator with Application Adapter for Hadoop, Oracle Loader for Hadoop, Oracle R Enterprise tool, which uses the R programming language and software environment for statistical computing and publication-quality graphics, Oracle Linux and Oracle Java Hotspot Virtual Machine.
SAP's best Big Data tool is its HANA in-memory database, which the company says can run analytics on 80 terabytes of data, integrate with Hadoop, search text content, harness the power of real-time predictive analytics, and more.
Probably not the first company you would think of, but Microsoft's Big Data strategy is fairly broad. It has a partnership with Hortonworks and offers the HDInsights tool based for analyzing structured and unstructured data on Hortonworks Data Platform. Microsoft also offers the iTrend platform for dynamic reporting of campaigns, brands and individual products.
Amazon has a number of enterprise Big Data platforms, including the Hadoop-based Elastic MapReduce, DynamoDB big data database, and the Redshift massively parallel data warehouse. All of these services work within its greater Amazon Web Services offerings.
VMware is known best for its virtualization hypervisor, but it's building on that platform to offer Big Data software, such as its recent VMware vSphere Big Data Extensions, which lets vSphere control Hadoop deployments and make it easier for enterprises to launch Big Data projects.
Google is more of a cloud services company but it is making a push into Big Data analytics by offering BigQuery, a cloud-based Big Data analytics platform for quickly analyzing very large datasets. Unlike most services, you send data up to BigQuery rather than store it in the cloud.
Big Data Companies: More Players
Splunk Enterprise was originally a log analysis tool, but after partnering with Tableau Software to use Tableau's visual analytics package, Splunk has been reborn as a machine data analytics company. It can monitor online end-to-end transactions, study customer experience, behavior and usage of services in real time and identify spot trends and sentiment analysis on social platforms.
Develops an in-memory relational database that can perform both mixed workloads and analytics at the same time. MemSQL is a highly scalable, in-memory transactional database management system with increased focus on historical analysis.
If you can get past the creepy factor, CIA-funded Palatir has two Big Data analytics products: Palantir Gotham integrates structured and unstructured data for search and discovery capabilities; and Palantir Metropolis for data integration, information management and quantitative analytics. The software connects to a variety of public data sets and discovers trends, relationships and anomalies, including predictive analytics.
Trifacta bridges the gap between collecting data and transforming it into something useable, usually a two-step process. Trifacta's data transformation software automates the process of transforming data from database sources like Hadoop into something that can be used by software visualization and business intelligence tools.
Datameer claims its Datameer Analytics Solution (DAS) is the only end-to-end Hadoop solution for analytics. DAS is a business integration platform for Hadoop that includes data source integration, an analytics engine with a spreadsheet-like interface designed that has more than 200 analytic functions and visualization functions.
Tamr is very new startup that offers a product that gathers data from the company's databases and uses machine intelligence to provide a single view across all of the systems. Tamr is like a search indexing tool, in that it gathers all of the data fields and provides a report on all of the data sources that a human then evaluates.
17. Neo Technology
This company makes a NoSQL graph database. Not a graphics database, a graph one, such as a flow chart of a company's executive structure. A graph database contains information about how each entry is related to other entries. So unlike standard databases, this shows the relationship one item has to another.
Does its own special version of the Apache Cassandra database, providing a massively scalable enterprise NoSQL platform for mission-critical business applications. It is fully distributed and always available to provide real-time scalable analytics.
Infobright’s Knowledge Grid architecture is a standard RDBMS with a focus in machine-generated data that is particularly geared to support the Internet of Things with a high performance analytic database. It rapidly analyzes machine-generated data, enabling applications to perform complex queries.
This Indian company uses data analytics to help companies better understand, predict and influence consumer behavior. The analytics allows companies to identify new market opportunities as they emerge so they can be first to market. Its products help retailers, packaged goods companies, insurance firms and other consumer-facing firms to understand, predict and shape consumer behavior and improve the effectiveness of marketing, pricing and supply chain management.
21. Metric Insights
A rare push platform, it provides to users within a firm the data they need. The platform alerts the user when and why key business metrics have changed. It draws on data from a variety of business intelligence, SaaS, big data and data visualization tools used by the customer to give a personalized report and keep the user up to date as data changes.
The long-time analytics firm founded by an ex-Oracle VP has moved into Big Data as well with five products: PowerCenter Big Data Edition, which allows developers to integrate almost any type of data at any scale without having to learn Hadoop; HParser, a codeless data parsing transformation environment; Data Quality Big Data Edition, which delivers data of any type and volume using pre-built data quality rules processed natively on Hadoop; Vibe Data Stream for Machine Data, which provides highly available, reliable, real-time streaming data collection for Big Data analytics; and Data Masking, which delivers policy-based data security for applications running on Hadoop and other Big Data platforms.
Offers what it calls Analytics as a Service for the online retail-marketing sector. Built on Hadoop, Synasta does real-time ingestion of data, algorithm execution and output to dramatically change the speed and understanding of the business decision process. This helps retailers quickly customize the power and speed of a product offering, driving consumer awareness, acquisition and retention.
Chartio supports a myriad of data sources, including MySQL, PostgresSQL, Amazon Web Services, Amazon Relational Database Services, Rackspace Cloud, Heroku, Google Analytics and Oracle, and offers users a simple dashboard to visualize their data. It's on both PCs and tablets. It has a variety of filters and sliders to manipulate the data in real time.
A software delivery and developer company, Thoughtworks incorporated Agile software development principals into what it calls Agile Analytics, a style of building a data warehouse, data marts, business intelligence applications, and analytics applications that focuses on the early and continuous delivery of business value throughout the development lifecycle.
Platfora works with Hadoop clusters, including Cloudera, MapR, and Amazon EMR, to turn huge amounts of data into dimensional and predictive dashboards, reports and insights. The company’s server architecture enables immediate delivery, analytics overlay, and the option to drill down into specific areas.
A part of supercomputer maker Cray, YarcData makes a Big Data appliance called Urika, which can be purchased or rented from Cray. They perform graphical searches across disparate data sets and optimize the findings for real-time queries to find the relationships, identify the patterns, and uncover the linkages hiding within it.
SiSense sells its Prism to the largest enterprises and some SMBs alike because of its small ElastiCube product, a high-performance analytical database tuned specifically for real-time analytics. ElastiCubes are super-fast data stores which are specifically designed for extensive querying. They are positioned as a cheaper alternative to HP's Vertica systems.
Hadoop and many other Big Data applications are open source, and as is so often the case in new markets, the rush is for features and security tends to come later. ZettaSet Orchestrator makes Hadoop more secure and hardened for enterprise-level performance and analytics through real-time encryption, even when data is being moved.
30. ClearStory Data
ClearStory Data offers a scalable application for data discovery and analysis across many sources and is specifically geared toward tracking and analyzing activity across the customer life cycle. The platform uses an in-memory database to process multiple types of data on the fly and then combine the info with a modern user interface.
Photo courtesy of Shutterstock.