Streaming analytics platforms give companies a way to extract business value from data in motion in the same manner that traditional analytics tools have allowed them to do with data at rest.
Instead of historical analysis, the goal with streaming analytics is to enable near real-time decision making by letting companies inspect, correlate and analyze data even as it is streaming into applications and databases from a myriad different sources.
CEP on Steroids
One way to imagine streaming analytics is to think of it as Complex Event Processing (CEP) on steroids. Banks, Wall Street firms, airline companies and telecommunication organizations have used CEP tools for years in a variety of applications. For example, by inspecting and correlating different events as they happen – a card swipe at a terminal, the geographic location of the terminal, time of day and other factors – credit card companies have been quickly able to detect fraudulent transactions for years.
Streaming analytics takes the concept one step further by allowing companies to do such event processing against really massive volumes of data streaming into the enterprise at an extremely high velocity.
Bloor Research analyst Philip Howard says stream processing is really an evolution of CEP. Both CEP and streaming analytics technologies enable action based on an analysis of a series of events that have just happened.
The main difference is the support for much large data volumes and the more sophisticated query processing support available with modern stream analytics tools. Instead of thousands or tens of thousands of events per second, a stream analytics platform can process millions and even tens of millions of events per second, he says.
Because data in a streaming analytics environment is processed before it lands in a database, the technology supports much faster decision making than possible with traditional data analytics technologies, Howard says.
Mike Gualtieri, an analyst at Forrester Research, describes streaming analytics as a way for organizations to gain “perishable” insights or insights that companies can only detect and act on at a moment’s notice.
“With traditional analytics you gather information, store it and do analytics on it later. We call that at-rest analytics.” With streaming technologies the analysis is done as the data arrives. “It could be a piece of farm equipment that has a lot of sensors on it emitting data on temperature and pressure. You want to analyze that in real-time to see if there is a risk of the engine blowing up.”
Similarly, a cable company that has millions of set-top boxes emitting fault codes can benefit from having a way to quickly look at the data, figure out what’s going on and act on it. “We call those insights ‘perishable’. There is a time frame in which you have to act on that.”
Streaming Analytics Use Cases
Forrester lists several use cases for streaming analytics. For instance, dashboards and visualization software on top of streaming analytics platforms can help enterprises visualize and monitor their business in real-time. Such tools can be used to monitor social sentiment and changing customer attitudes.
Similarly, streaming analytics capabilities can be used to enable real-time alerts or leverage new business opportunities – like making promotional offers to customers based on where they might be at a specific time. Streaming analytics capabilities are also vital in the security-monitoring context because it gives organizations a way to quickly correlated seemingly disparate events to detect threat patterns and their risks. Government agencies have used these capabilities to do security monitoring of both network and physical assets.
Stream processing requires two specific technology capabilities, Gualtieri says. First, in order to do stream processing, organizations need to have a way to ingest data from multiple sources. Often, the data types and sources can be highly varied. Any technology that is used for stream processing needs to be able to not just consume different data types but also very high volumes of it from varying sources.
The second component is an analytics engine capable of filtering, aggregating and correlating streaming data in order to find useful patterns. Sometimes, in order to detect patterns and enable useful insights, there may be a need to enrich the data stream with data from an existing database, Gualtieri said.
Many companies combine stream processing with batch processing to derive optimal value from their data, he says. In these situations, once queries have been run against the streaming data the data is stored in a Hadoop or other database for later retrieval and analysis. Some vendors have begun using the term Lambda architecture to describe this sort of a hybrid streaming analytics and batch analytics data environment, he says.
The manner in which streaming data is used after the initial querying up front can vary quite a bit, Howard says. “You could go and land it all in Hadoop or somewhere else to do subsequent analysis. Or you might throw away a lot of the data and just keep the aggregated data,” he says.
Vendors of Real-Time Processing Tools
Multiple tools are available currently that offer these sort of capabilities, including many from CEP platform vendors. The big enterprise players in this space include SAP, Tibco, IBM, Software AG and Oracle.
Open source streaming analytics projects such as Apache Storm and Apache Spark have also generated a lot of buzz in recent times, says Gualtieri. Early users of these technologies have included major Internet companies like Twitter and Spotify and others like the Weather Channel. And then there are the pure-play technology vendors like DataTorrent and Continuuity that offer streaming analytics technologies as well, Forrester says.
Many of these tools are configured for use and support query processing capabilities out-of-the box that would have involved a lot of tedious programming work back when CEP platforms first stared becoming available. Many offer relatively easy to use and intuitive visual interfaces for running queries.
In addition, many vendors offer hosted streaming analytics services that are ideal for companies with cloud applications. Amazon’s Kinesis, Google’s Data Flow and Microsoft’s Azure, for instance, all support real-time processing and data analytics.
Streaming analytics requires organizations to think a little bit differently about how they analyze data, Gualtieri says. Traditional analytical tools are optimized for request and response from static data. “You request something and here’s the response from the database,” he said.
With streaming analytics the data is flying in continuously and you don’t know what’s in that data. “App developers need to stop thinking about request response and start thinking about detecting interesting events as they come in.”
Streaming analytics offers organizations an opportunity to ingest and glean instant insight from real-time data pouring in via transactions, cloud applications, web interactions, mobile devices, and machine sensors.
Howard from Bloor Research predicts that the emerging IoT will also fuel demand for streaming analytics capabilities in the near tern. Sensor data from thousands of Internet connected devices can give companies valuable insights on the health of network or a system.
The best way to look for use cases is to consider the most challenging business processes and walk through them at each stage to identify situations where additional data might help. “Ask yourself if there are data sources available that would give me information in real-time to make this process more efficient,” Gualtieri said.
“On the customer side, walk through the customer journey and see if additional data can improve or detect something in real-time.”
Photo courtesy of Shutterstock.