SHARE

Optimizing the Value of Streaming Data

Edge computing, IoT and a world consumed by data is creating a challenge for businesses: many are overwhelmed by streaming data. It’s cumbersome – some would say impossible – to handle this continuous torrent of data with legacy Big Data architectures and the cloud. Instead, a new method of “continuous intelligence” is called for. To […]

Written By

James Maguire

Jul 15, 2020

10 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Edge computing, IoT and a world consumed by data is creating a challenge for businesses: many are overwhelmed by streaming data. It’s cumbersome – some would say impossible – to handle this continuous torrent of data with legacy Big Data architectures and the cloud. Instead, a new method of “continuous intelligence” is called for.

To address this, this webcast we discussed:

Can value be derived from this “continuous data stream?” How can it be mined for insight?

Given the potential value, what’s the best way to mine value from this data stream? That is, to get value from it before it’s analyzed later by data scientists equipped with data analytics software?

What are some challenges that companies encounter when they try to get value from their edge computing and/or streaming data?

What’s the future of streaming, real time data, and how can businesses prepare for it? How will this future influence edge computing and/or cloud computing in general?

To provide insight into streaming data, I’ll speak with a leading expert, Simon Crosby, CTO, Swim.ai

Download the podcast:

An Overview: Streaming Data

Crosby: “[In streaming data], the data is infinite but also it’s of short value, short term value. And so then yo==u have this problem which is that if you store it, it’s probably already out of date. And also you have this desire to move to what we call continuous intelligence, which is the ability to make decisions on the fly, not wait for the next batch-run. Okay? So you have to make decisions from data of limited lifespan, quickly, which means you have to do it all the time. And that’s a big challenge.”

When will streaming data be mainstream?

“Two things. One is, the data is infinite but also it’s of short value, short term value. And so then you have this problem which is that if you store it, it’s probably already out of date. And also you have this desire to move to what we call continuous intelligence, which is the ability to make decisions on the fly, not wait for the next batch-run. Okay? So you have to make decisions from data of limited lifespan, quickly, which means you have to do it all the time. And that’s a big challenge.”

“It is relatively early and I think of streaming analytics as very much a top-down view and it’s kinda being sold for some verticals. So for example, if you look at application performance manager, it’s pretty good there, right, you can launch all your community gunk in the AWS and track it, that’s cool. And there’s a bunch of companies who are solving that vertical problem. In general, the broader problem is much bigger than this.”

“So let me give you this really cool example from Dubai where we do smart city work. When a truck with bad braking behavior is approaching an inspector, [it] tells the inspector. This is not analytics in the sense of a top down view. This is not a city manager saying, “How many bad trucks do I have?” This is a need to respond, in real time, to every inspector in the city for potentially every truck. And so this notion of continuous intelligence is also based on the idea that information is situationally relevant. It’s highly contextually bound, right?”

“It’s time-based and geo-based, if it’s real world. And so information streams from sources that are contextually related to one another and they are probably related in time doing other things and as anybody who’s in the data science domain knows, you have to find this stuff out. You have to figure it out, and now we have to figure it out on the fly.”

“Okay, so think of two things, like the truck and the inspector. When the truck is in range of the inspector, they link, right? And then the inspector can see where the truck is. And I’m talking digital twins, obviously. And we can then figure out what to do. And so we’re talking about graph structures, not necessarily just big data or no SQL storage. Graph structures, which are inherently fluid and where the analysis is continuous and on the fly. And that’s kinda what we do, in summary.”

How To Glean Value from Streaming Data?

“Yes, I think that is the problem. And I have some good news, and based on experience, that the problems may not be quite as hard as we think they are.”

“Okay. And so let me describe an application, one of the smart city’s application in which intersections in a bunch of US cities predict the future of their behavior two minutes ahead. We are in about 20 cities and if you just go to traffic.swim.ai, you can see Palo Alto. The digital twin of every intersection is predicting its own phases and everything else two minutes out.”

“Now, you would think, “Wow, that’s pretty hard” and everything else. But in fact, it isn’t. Okay, every intersection has maybe 100 sensors, and sensors are of three or four types. There are inner loops, there are pedestrian push buttons and there are lights, and that’s kind of it. And the problem, which is the learn and predict problem, is whenever you want a digital twin of an intersection to predict, which was about once a second, take all your own data and link to all of the neighbors, all your neighboring intersections within a thousand yards and use their data too. And then continually guess and refine your guesses based on what actually happens in the real world. So we engage in this unsupervised learning algorithm, which is straightforward.”

“It matches our intuition in the sense that we think of the traffic around these sections really just being dependent on the neighborhood. The self-training unsupervised learning algorithms work very well for small numbers of inputs, so I don’t need a huge amount of data science knowledge to go off and do this. So the same code that runs in Palo Alto, runs in Las Vegas and Houston and Jacksonville and everywhere else. And I didn’t have to get a data scientist to build me a model for that city.”

What are some of the biggest problems with making this mining streaming data? What challenges will companies run into?

“So the code to do this, to solve this problem, is very short. It’s a couple of thousand lines of Java, not millions of lines of code. In general, what it requires is a slightly different way of thinking. So the received wisdom today is get a whole bunch of data, put it in your Cloud, or your data lake and then find data scientists and build big models in some framework.”

“And our approach is perhaps the exact opposite of that. That is, learn and predict on the fly. And in general, our approach, which I guess is a slight different way of looking at the world, is one in which algorithms can be adapted to continuously process data, analyze, predict, do whatever.”

“Now, if you can deal with this volume of data, there are certain things you have to do. And number one is staple computer. So Swim OS is a staple, if you know the computer science world, it’s a staple implementation of the actual model, where these little things called web agents, which are staple processes, like Java objects but they’re also actually concurrent objects, they each process their own data and safely represent the memory.”

“So we end up building this graph, which is effectively a graph of all these things which link to each other. So an intersection links all its sensors and it links to its neighbors. And this graph can be fluid and then linking is the process by which we get to see state and compute all the time, okay? So it’s concurrently executing implementation of this framework.”

“Computing in memory is literally a million times faster than going to a database. It’s literally a million times faster.”

What’s the biggest challenge, say, sheer number of inputs? Is it managing the system?

“Well, I think part of the problem is that we, as an industry, are a little bit confused about it, because we’ve heard about all these wonderful things, like AWS Lambda. Just like this, or Kafka or Pulsar or whatever. All these wonderful projects/ And they’re all adopting a model which is based on this Cloud. So the model which has made the Cloud so successful, which is rest stateless computing and databases. So what do you do? You send an event to something which is stateless, and all it can do is put that in a database.”

“But you know what? Good luck looking at four petabytes per day. Good luck. Seriously. And you know we’re still pretty early on in this whole process of making everything smart and everything center stage all the time. So people tend to hang on today with this idea that they’ll look at it later, they never do, they never do.”

“And so what’s much better to store is something which is a stateful model of the system. So for example, in the traffic scenario, instead of getting voltage fluctuation as a car goes over a loop and saving that. Let’s just say, I just save the fact that there was a car on the loop. Or instead of getting a register and some of the voltage from a light transition, I’ll just say it was a red light, okay? That’s a factor of ten thousand or more reduction data volume already. So the idea here is that you’re taking raw data, this model sensibly and continuously transforms data to state and then state to insights, and then streams those insights.”

“So it’s literally this, in the traffic scenario which we are supporting today, the predictions for Palo Alto, and all these other cities stream continuously from the cloud, to providers of rooting apps. So Uber, or whoever, right? They just get predictions of what’s gonna happen in the next two minutes in Palo Alto.”

What’s the Future of Streaming Data?

“People have lots of data today, they just don’t know what to do with it, and there are several problems. In general, people who are managing the large pieces of real estate, say, I don’t know, an oil rig. They want a better oil rig, but they don’t necessarily have the skills, which are cloud-native skills to go off and build better, to get better insights, and so on. And so the challenge is to get people from what has been a traditional approach, which is just stick everything on a hard disk, into a more cloud-native approach where they can think about using newer technologies and tools to solve their problems and get real-time insights.”

“It’s a bit of a journey, but we’re on a path, we definitely cannot go any further down the big data path.”

“What I mentioned is that streaming analytics, in my view is a particular used case, it tends to be top-down, tends to be manager-centric, looking down at all my assets. A key use case is the one where I have to tell every rider, your bus is about arrive at this bus station. Which is millions of response delivered in real-time. And real-time has a real notion here, I’m not allowed to tell somebody to go and get their bus when it’s already left. So the notion of real-time here is very closely tied to the evolution of the real world. The bus will come and go, whether I tell the user or not, but I better tell the user in time. In fact, I have to tell every user in the city in the same time frame that it’s gonna work. And so concurrent person of huge amounts of data is a requirement.”

Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs

FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020

SEE ALL
ARTIFICIAL INTELLIGENCE ARTICLES

James Maguire

James Maguire is Datamation's Senior Managing Editor and has been reporting on technology topics for more than 15 years. He has covered the gamut of enterprise and consumer technology, and regularly communicates with leading IT newsmakers, vendors and analysts.

Optimizing the Value of Streaming Data

An Overview: Streaming Data

When will streaming data be mainstream?

How To Glean Value from Streaming Data?

What are some of the biggest problems with making this mining streaming data? What challenges will companies run into?

What’s the biggest challenge, say, sheer number of inputs? Is it managing the system?

What’s the Future of Streaming Data?

James Maguire

Company

Categories

Optimizing the Value of Streaming Data

An Overview: Streaming Data

When will streaming data be mainstream?

How To Glean Value from Streaming Data?

What are some of the biggest problems with making this mining streaming data? What challenges will companies run into?

What’s the biggest challenge, say, sheer number of inputs? Is it managing the system?

What’s the Future of Streaming Data?

RELATED NEWS AND ANALYSIS

James Maguire

Company

Categories