Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business
Big Data just keeps getting bigger, in a popularity sense. A new IDC report predicts that the Big Data and business analytics market will grow to $203 billion by 2020, double the $112 billion in 2015.
The banking industry is projected to lead the drive and spend the most, which is not surprising, while IT and businesses services will lead most of the tech investing. Overall, IDC finds that the banking, discrete manufacturing, process manufacturing, federal/central government, and professional services will account for about 50% of the overall spending.
Not surprisingly, some of the biggest big data analytics spending -- about $60 billion -- will go toward reporting and analysis tools. That's what analytics is all about, after all. Hardware investment will reach nearly $30 billion by 2020.
So as Big Data grows, what will be the major trends? In talking to experts and surveying the research reports, a few patterns emerged.
1) Simplified analytics: While the work of Big Data will grow increasingly complex, the software behind it will advance to better handle multiple varied and complex data sets, without needing a data science degree.
2) Machine Learning: Big Data solutions will increasingly rely on automated analysis using machine learning techniques like pattern identification and anomaly detection to sort through the vast quantities of data.
3) Predictive analytics: Machine learning is not just for historical analysis, but also can be used to predict future data points. That will start with basic ‘straight-line’ prediction, deducing B from A. But it will eventually grow and become more sophisticated by detecting patterns and anomalies that are about to happen too.
4) Security analytics: To some degree this aready has a significant prescence. Security software, especially intrusion detection, has learned to spot suspicious and anomalous behavior. Big Data, with all of its source inputs, needs to be secured and there will be greater emphasis on securing the data itself. The same processing power and software analytics used to analyze the data will also be used for rapid detection and adaptive responses.
5) The bar is raised: Traditional programmers will have to add gain data science skills to their repertory in order to stay relevant and employable. But just like many programmers are self-taught, there will be a rise in data scientists from nontraditional professional backgrounds, including self-taught data scientists.
6) The old guard fades: A 2015 report from Gartner found Hadoop was fading in popularity in favor of real-time analytics like Apache Spark. Hadoop was, after all, a batch process run overnight. People want answers in real time. So Hadoop, MapReduce, HBase and HDFS are all going to continue to fade in favor of faster technologies.
7) No more hype: Big Data has faded as a buzzword and is now just another technology like RDBMS and CRM. That means the technology has settled into the enterprise as another tool brought to bear. It’s now a maturing product, free of the hype that can be distracting.
8) More Data Scientists: The Data Scientist is probably the most in-demand technologist out there, with people who qualify commanding a significant salary. Nature abhors a vacuum and you will see more people trying to gain Data Scientist skills. Some will go the self-taught route, which is how many programmers acquired their skills in the first place, while others will get training via crowdsourcing.
9) IoT + BD = soulmates: millions of Internet-connected devices, from wearables to factory equipment, will generate massive amounts of data. This will lead to all kinds of feedback, like machine performance, which in turn will lead to optimized performance and earlier warnings before failure, reducing downtime and expenses.
10) The lake gains power: Data lakes, massive repositories of information, have been around for a while but mostly it's just a store with little idea how to use it. But as organizations demand quicker answers, they will turn to the data lake for those answers.
11) Real time is hot: In a survey of data architects, IT managers, and BI analysts, nearly 70% of the respondents favored Spark over MapReduce. The reason is clear: Spark is in-memory, real time stream processing while MapReduce is batch processing usually done overnight or during off-peak hours. Real-time is in, hours-old data is out.
12) Metadata catalogs: You can gather a lot of data with Hadoop but you can't always process it, or even find what you need in all that information. Enter Metadata Catalogs, a simple concept where aspects of Big Data analytics, like data quality and security, are stored in a catalog. They catalog files using tags, uncover relationships between data assets, and even provide query suggestions. There are a number of companies offering data cataloging software for Hadoop, plus there is an open source project, Apache Atlas.
13) AI explodes: Artificial intelligence, and its cousin machine learning, will see tremendous growth because there is simply too much data coming in to be analyzed to wait for human eyes. More must be automated for faster responses. This is especially true with the massive amounts of data generated by IoT devices.
14) Dashboard maturity: With Big Data still in its early years, there are a lot of technologies that have yet to mature. You just can't rush some things. One of them is the right tools to easily translate the data into something useful. Analysts predict that dashboards will finally get some attention from startups like DataHero, Domo, and Looker, among others, that will offer more powerful tools of analysis.
15) Privacy clash: With all the data being gathered, some governments may put the brakes on things for a variety of reasons. There have been numerous government agency hacks and questions about the 2016 Presidential election. This may result on restrictions from the government on how data is gathered and used. Plus, the EU has set some tough new privacy laws regarding how data is used and how models are built, set to take effect in January 2018. The impact is not yet known, but in the future, data might be harder to come by or use.
16) Digital assistants: Digital voice assistants like Amazon Echo and Alexa and Google Home and Chromecast will be the next generation of data gathering, along with Apple Siri and Microsoft Cortana. Don't think they won't. These are “always listening” devices used to help people make purchase and other consumption decisions. They will become a data source at least for their providers.
17) In-memory everything: Memory has up to now been relatively cheap, and since 64-bit processors can access up to 16 exabytes of memory, server vendors are cramming as much DRAM into these things as possible. Whether in the cloud or on-premises, memory footprints are exploding, and that's making way for more real-time analytics like Spark. Working in memory is at three orders of magnitude faster than going to disk and everyone wants more speed.