The big data market is strong and thriving — although it isn't always called "big data" these days.
The term "big data" first became part of the tech lexicon in the late 1990s, when people like John Mashey at SGI began using the phrase to describe the enormous and growing stores of enterprise data that were difficult to store and analyze using the technology available at the time.
In 2001, analyst Doug Laney suggested a definition of big data that included three Vs: volume, velocity and variety. Over the next few years, Laney's definition became something of an industry standard, and some people added a fourth V — variability — to the definition.
In 2005, big data technology took a dramatic step forward when Yahoo debuted the Hadoop open source distributed data store. The project became the lynchpin for an entire ecosystem of commercial and open source data storage and analytics solutions.
In 2014, IDC and EMC released their most recent digital universe study, which revealed that the amount of data stored by the world's digital systems is growing by 40 percent per year. The companies predicted that by 2020, the digital universe would include 44 zettabytes of information. That's nearly as many bits as there are stars in the universe, and it's enough information to fill a stack of 2014-era tablets stretching to the moon 6.6 times.
Today, big data certainly hasn't become any smaller, but the size of growing data stores no longer gets as much attention as it once did. Instead, most organizations are focused on analytics, data science and machine learning. They have accepted that managing big data is simply a part of doing business; if they want to compete and succeed, they need to find ways to turn those big data stores into valuable insights.
Big Data Market Overview
Enterprise spending on big data technologies continues to climb as it has for the past decade. According to IDC, worldwide revenues for big data and business analytics are likely to grow from $150.8 billion in 2017 to $210 billion in 2020. That's a compound annual growth rate of 11.9 percent.
"After years of traversing the adoption S-curve, big data and business analytics solutions have finally hit mainstream," said Dan Vesset, an IDC group vice president. "BDA as an enabler of decision support and decision automation is now firmly on the radar of top executives. This category of solutions is also one of the key pillars of enabling digital transformation efforts across industries and business processes globally."
And organizations are reporting that their big data initiatives are having a positive impact on their bottom line. In the NewVantage Partners Big Data Executive Survey, 80.7 percent of respondents said that their big data investments had been successful, and 48.4 percent said that they had realized measurable benefits as a result of their big data initiatives.
Those sorts of results are likely to encourage enterprises to continue investing in big data, but the types of big data solutions they are adopting are shifting. According to Forrester Research, "The shift to the cloud for big data is on. In fact, global spending on big data solutions via cloud subscriptions will grow almost 7.5 times faster than on-premise subscriptions." The firm added, "Furthermore, public cloud was the number one technology priority for big data according to our 2016 and 2017 surveys of data analytics professionals."
The cloud is particularly popular for big data analytics that rely on machine learning technologies. Machine learning requires advanced — and expensive — computing hardware, but running machine learning in the cloud makes it possible for organizations to access this technology at a fraction of the cost of what it would take to install it in their own data centers. Although organizations face some challenges related to cloud analytics, experts say this cloud analytics trend is likely to accelerate in coming years.
Big Data Technologies: Market Breakdown
As the big data market has matured, vendors have developed a wide variety of different big data technologies to meet enterprises' needs. This is a very broad market, but most big data solutions fall into one of the following categories:
- Business intelligence (BI): Business intelligence solutions provide analytics and reporting capabilities on business data typically stored in a data warehouse. According to Gartner, the BI and analytics market is forecast to increase from $18.3 billion in 2017 to $22.8 billion in 2020. However, this is slower growth than in the past.
- Data mining: Data mining is a broad category that encompasses a wide variety of techniques for finding patterns in big data. While many big data solutions still offer data mining capabilities, the term has fallen somewhat out of favor as vendors instead are using terms like "predictive analytics" and "machine learning" to describe their solutions.
- Data integration: One of the big challenges with big data analytics is gathering all the relevant data from disparate sources and converting it into a format that allows for it to be analyzed easily. This had led to a whole crop of data integration solutions, which are sometimes also called ETL (short for "extract, transform, load") solutions. According to Markets and Markets, data integration revenues could be worth $12.4 billion by 2022.
- Data management: This category of solutions includes tools that help organizations integrate, clean, store, secure and assure the quality of their digital data. Markets and Markets predicted that this category of big data tools could generate $105.2 billion in revenue by 2022.
- Open source technologies: Many of the most widely used big data technologies are available under open source licenses. In particular, technologies like Hadoop and Spark, which are managed by the Apache Foundation, have become very popular. Many vendors offer commercially supported versions of these open source big data technologies.
- Data lakes: A data lake is a repository that ingests data from a wide variety of sources and stores it in its native format. This is a little different than a data warehouse, which stores data that has been cleaned and formatted for analytics. Data lakes are popular with organizations that want to perform analytics on both structured and unstructured data.
- NoSQL databases: Unlike relational database management systems (RDBMSes), NoSQL databases don't store information in traditional tables with rows and columns. Instead, they use other models, such as columns, documents or graphs for tracking data. Many enterprises use NoSQL databases for storing unstructured data for analytics.
- Predictive analytics: Currently one of the most popular forms of big data analytics, predictive analytics looks at historical trends in order to offer a good estimate about what might happen in the future. Many modern predictive analytics solutions incorporate machine learning capabilities so that their forecasts become more accurate over time. A Zion Market Research report said spending on predictive analytics could climb from $3.49 billion in 2016 to $10.95 billion by 2022.
- Prescriptive analytics: Prescriptive analytics goes a step farther than predictive analytics. In addition to telling organizations what is likely to happen in the future, these solutions also offer suggested courses of action in order to achieve desired results. Experts say few (if any) big data analytics solutions currently on the market have true prescriptive capabilities, but this is an area of intense research for vendors.
- In-memory databases: In-memory technology makes big data analytics much, much faster. In any computer system, accessing data in memory (also sometimes called RAM) is much faster than accessing stored data on a hard drive or solid state drive. In-memory databases allow users to store vast quantities of data in memory, yielding dramatic speed boosts.
- Artificial intelligence and machine learning: Many next-generation big data analytics tools incorporate machine learning, which is a subcategory of artificial intelligence (AI). Machine learning uses algorithms to help systems get better at tasks over time without explicit programming. This is one of the fastest-growing areas of the big data market.
- Data science platforms: Many vendors have begun labelling their big data analytics solutions as "data science platforms." Products in this category typically incorporate many different capabilities in a unified platform. Nearly all the products in this category have some analytics and machine learning features, and many also have data integration or data management features as well.
Big Data Companies
Given that the market includes so many different types of big data solutions, it should be no surprise that an extremely long list of companies offer big data products. The list below includes some of the best-known big data companies, but there are many others.
- Amazon Web Services — offers cloud storage, databases, data warehouse, analytics and machine learning services
- Alpine Data Labs — now owned by Tibco; offers a data science and machine learning platform
- Alteryx — offers a self-service big data analytics platform
- Big Panda — offers analytics for monitoring and managing IT event data
- Cloudera — offers a Hadoop distribution, plus data science and big data analytics tools
- Databricks — founded by the team behind Apache Spark; offers a united analytics platform powered by Spark
- Dataiku — offers a collaborative data science platform
- Datameer — offers an agile data pipeline management platform
- DataStax — founded by the team behind the Apache Cassandra database; offers a distributed cloud database based on Cassandra
- Domino — offers a data science platform
- FICO — offers data analytics tools, including AI and machine learning software and solutions for fighting fraud and cybercrime
- Google Cloud — offers cloud-based storage, data warehouse, analytics, machine learning, and more
- GridGain — offers an in-memory computing platform based on Apache Ignite
- H2O.ai — offers data science and machine learning platforms based on open source technology
- Hitachi Vantara— formed by the merger of Hitachi Data Systems, Hitachi Insight Group and Pentaho; offers data integration, big data analytics, storage and related products
- Hortonworks — offers a popular Hadoop distribution, as well as other big data tools and services
- HPCC — offers a distributed big data platform that is an alternative to Hadoop
- HPE — offers big data hardware and services
- IBM — offers big data cloud services, as well as database, data warehouse, analytics and machine learning software
- Informatica — offers a cloud-based data management platform with a wide variety of big data solutions
- KNIME — offers data mining and analytics software
- MapR — offers a converged data platform, plus big data storage, analytics, machine learning and NoSQL database
- MarkLogic — offers a NoSQL database and data integration tools
- Microsoft Azure — offers cloud-based storage, big data analytics, machine learning, data warehouse, data lake and more
- MongoDB — offers a NoSQL database and a cloud service based on the same technology
- Mu Sigma — offers big data analytics and decision science solutions
- Oracle — offers cloud-based and on-premise database, data integration, data management, analytics and more
- Palantir — offers data integration and data management solutions
- Pivotal — offers in-memory technology and a multi-cloud analytics platform
- Qlik — offers business intelligence and analytics software
- RapidMiner — offers data mining, data science, predictive analytics and machine learning solutions
- SAP — offers in-memory data management, analytics, artificial intelligence and machine learning tools
- SAS — offers analytics, business intelligence and data management solutions
- SiSense — offers business intelligence and analytics
- Splice Machine — offers a combination database, data warehouse and machine learning platform
- Splunk — offers analytics for log and security data
- Striim — offers streaming analytics
- SumoLogic — offers analytics for log and security data
- Tableau — offers business intelligence and big data analytics
- Talend — offers big data integration tools
- Tibco Jaspersoft — offers business intelligence and analytics
- Teradata — offers data warehouse, data lake and business analytics