Data warehouse tools – now often based in the cloud – don’t get as many headlines in the tech world as, say, high profile technologies like AI and data analytics. Yet data warehouse tools are the workhorses that support the more glamorous tech advances in AI and analytics.
In this live video panel discussion, we’ll discuss:
- What’s the current state of the cloud data warehouse sector?
- What are the central key trends that will shape the future of data warehouse tools?
- Most important, how can companies prepare for these future trends today?
To provide insight into the future of cloud-based data warehouse tools, I’ll speak with these leading experts:
Neil McGovern, Vice President, Product Marketing, SAP HANA
Mike Matchett, Principal Analyst, Small World Big Data
Philip On, Head of Product Marketing, Data Warehouse Cloud, SAP
James Maguire, Managing Editor, Datamation – moderator
Download the podcast:
The Data Warehouse Sector
See below for edited highlights from the discussion
Neil McGovern: “Five years ago, most of the analysts like Mike [Matchett] started saying to me, “Yeah, people are not investing heavily in data centers any longer. They’re starting to say the future is the cloud.” So I think that what we saw five years ago was that the realization that this was happening… I think what we’ve seen in the last year, 18 months, is really the enactment of that. We are seeing most of the interest that we hear from the future data warehouses as being cloud-based.”
Philip On: “Well, I think certainly the last couple of years we’ve seen [the move to cloud] accelerate. And in this current environment, where people really can’t go into the office anymore, cloud has even accelerated even greater, right? So it gives the flexibility for businesses to basically go online and have access to all of these capabilities that, in the past, they would have to acquire hardware or software, get people from IT to implement and deploy. And so I think that SAP is in a unique position, having 77% of the world’s transactions run on SAP systems. We’re one of the top 10 cloud companies in the world from a revenue perspective. So we’re working with companies in all different lines of businesses and industries to help them deliver greater insights, and using cloud as that enabler to speed their ability to respond to the market.”
Data Warehouse Innovations
Neil McGovern: “Well, I’d have to say that the one thing that most people are talking about is bringing in more real-time data streams and more… I hate to say operational and transactional, but really bringing in data that’s fresher and getting it in there the same day in order to make decisions not just about historical data that’s a day or a week old, but about data that’s just happened and the data that’s been a week or a month old.
“You mentioned machine learning, I think ML accelerates that, because you wanna train and apply models, you wanna give them the biggest footprint of data you can to look at to get the best recommendation.”
Mike Matchett: “I think that you’re gonna find increasing amounts of more modern analytics — “We have predictive analytics, we have advanced analytics, we have other things.” Sometimes that means machine learning. Most often, it means just a better job of doing analysis on the data they have and being able to look backwards and pro-rate that forwards, at least some linear regression and such. But I think, increasingly, we’re finding, in the cloud, you can marry things up.
“So I can take my data warehouse, and if it doesn’t have machine learning necessarily embedded in it, I can go to another cloud service that does machine learning and integrate those things pretty quickly. So it does accelerate, and I think there is a lot of machine learning going on in that environment, but not necessarily part and parcel of the single product.”
Neil McGovern: “So I think Mike hit the nail on the head with the real-time. I think it’s a change in the process of maintaining and building the datasets, the idea of migrating data from external systems into the data warehouse. People are trying their hardest to actually run against the live data. Sometimes, this is where, basically, the data warehouse and the transactional database are the same dataset. We see a lot of people doing that with our data, but then, real-time links to remote data sources, being able to handle streams from IoT.
“As far as advanced analytics, you see graph databases, you see text processing, spatial, and so on. What we’re seeing is that the more advanced companies are looking to have one single solution which can bring… That can store the data in these different formats and also bring engines to bear in those data as part of a single query engine.
“So they don’t want to have a separate copy of the data for their spatial database and another one for their graph database. They wanna have one copy of the data stored in a format that engines can… A single multi-purpose engine can get value from. It’s a very fast-changing world.”
Data Warehouses in Business / Data Warehouse Technology
James Maguire: How does a data warehouse help the business folks?
Philip On: “The lines of businesses are usually the drivers of information demands on IT for a solution like a data warehouse. And actually, quite honestly, they’re in the best position to own that data, to know what good-quality information looks like, and what the data definitions should be for various terms like customers and products.
“And so, what we see is, especially in the cloud data warehousing environment, an innovation shift, not just only providing database as a service in the cloud, but technologies that actually provide a holistic solution that allows a business persona, a business user, to connect to their data sources virtually or through batch bulk movement. And then to transform that data, model it, and then even visualize it with analytics in the cloud.”
Neil McGovern: “One of the things that from IT governance the cloud really scared people early on was they discovered that people were downloading their whole customer database or transact… Order history and then uploading it to a cloud database and IT didn’t even know about it.
“So what we did was… With cloud data warehouse, we allowed the underlying database to do all the heavy-lifting, the technical stuff. We created these things called spaces where you can create a sandpit of curated data. Then IT can curate the data, create a sandpit for the business user and looks and feels like their own data, and you can either just create a copy of it so they can mess around with it, or you can go live with it.
“And that’s one of the big innovations, the data warehouse cloud, SAP Data Warehouse Cloud product brings in.”
Mike Matchett: “I talk to people really concerned on that side of the problem, lot of IT folks who are saying, “Hey, how do I do data protection? How do I handle even today with everyone working from home or working from anywhere and secure down that perimeter when there’s no more perimeter, when everything is in the cloud?” So putting it all in one environment, providing a sandbox, and what I might call sort of a spreadsheet approach for businesspeople to actually make use of the data themselves directly rather than have to go through a development team.”
Neil McGovern: “Put it in a spreadsheet. Excel is still the number one data analytics tool on the planet.”
Data Warehouse Technologies
James Maguire: Why is the data management layer important to a successful data warehouse initiative?
Philip On: “I use the term ‘an analytical system’. And that’s more accurate description because to deliver on the data that the business needs, you need to have the data integration technologies to connect to the sources. Then you need the database to store the data or, in memory, virtualize that.
“Then you need data warehousing functionalities to create a model, to aggregate the information, summarize it so that it’s ready in the shape, in the form that the business can run queries on it. Then, you have analytics to basically report on the data.
“If you change something at the source table, how does that impact the analytics downstream, as well as the semantics of the data. So IT has all this complexity. They have to stitch all of these technologies together to get from point A, which is where the data comes from, to the final stage, where the business people can make decisions. And so, I think that layer, the data management layer is begging for unification and simplification.
“And as more and more business users are empowered to do self-service, they want to be able to handle the different components. Otherwise, they can only handle the visualization part, but they don’t know where the data came from. They have no role in the ability to confirm if the source is truly where it should be coming from as well as the transformation and the logic applied to that data.
So I think that, in short, having a solution in the cloud, and cloud really gives us that framework, that environment to do radical simplification and drive really fast change for organizations. And one of the solutions that we have recently delivered, SAP Data Warehouse Cloud, is exactly that, all-in-one solution that can do data integration, data quality management within that database, data warehousing analytics all in one environment.”
James Maguire: How should a data warehouse be architected for low-latency queries?
Neil McGovern: “Real-time is a huge driver but it comes with a cost. Lower latency tends to be more expensive. Obviously, the lowest latency usually has the data in memory. So not only is it local, but it’s in the fastest store that you can get the data from, so you can get it back in seconds or sub-seconds. And as you move to much larger datasets, data that might be five years old, that you’re not going to use frequently, you might put that in a sort of data lake-type environment, disc space, lower cost, storage.
“Remember, with a cloud database, you’re paying a storage fee every month, whether you use that data or not, so, looking to manage those expenses, so to speak, response time versus cost is really what you’re paying for there. And obviously, with HANA, which is what cloud data warehouse is based on, that’s in memory database, so you’ve got that performance. And we also have a data lake technology at the other end, and everything in-between.”
James Maguire: So, let’s talk about what is the best strategy for a data warehouse – is it bringing all the data across the enterprise into the data warehouse, or leaving all the data in place for a virtual data warehouse?
Mike Matchett: “I’m gonna approach that from a couple of directions very quickly. One is, from a user perspective, I want the data close to where I am for good query performance. From a data perspective, the data has got gravity and probably just wants to sit where it was created, if it had preference, because that would be the easiest place to manage it and to delete it, is not move it at all.
“And so you then get an IT governance perspective, which says, “I wanna centralize it and get it all offline, [chuckle] and protect it and put my hands around it, and I don’t care what performance is.” So, you got a bunch of different perspectives there, and I don’t think there’s any one strategy. I just would say, “At the end of the day, from a user perspective, they aren’t going to care where the data actually is as long as they’re getting good query performance about it.”
“Another angle in there is how volatile the data is. A table with the zip codes of the US isn’t gonna change forward forever, but an older history dataset could be changing in milliseconds. So, a downloaded copy could get out of date very, very quickly. So, there’s that angle in as well, as Mike was saying.”
Future of Data Warehouses
Philip On: “As my colleagues talked about, I think it’s gonna be shifting to address the business user persona, giving them more empowerment to own their data, to build analytical solutions so they can achieve business results very rapidly. And the cloud environment can help break down a lot of the barriers of acquiring hardware and software on-premises.
“And then, I think that another future trend would be just the simplification of a true analytical project from integrating the data, to storing the data, to modeling and visualizing, more vendors will start to collapse these capabilities into a unified solution using cloud as that framework. And we’re very excited because SAP Data Warehouse Cloud is one of the early-movers in this space to actually unify a single solution that delivers a true end-to-end data and analytics all in one.”
Neil McGovern: “One of the visions that we have at SAP is what happened to photographs? It used to be you’d take a photograph on your camera and you send it off and you’d have a hard copy of the photograph. Think of that as the 1970s. Then we went through digital photography, and at first, that was great, but you stored it on your laptop, and if you lost your laptop, you lost your pictures.
“Then it moved up to the cloud, which is an analog of where we are right now, so you’re explicitly moving and saving your pictures, and so on. But the next generation, what was happening now with my Nikon and my iPhone is when I take a picture, it just automatically goes up to the cloud and, literally, my mom in Scotland can watch me taking pictures and watch pictures I’m taking of the family pretty much as they happen
“As Mike says, it’s tokenized, it could be anywhere. It’s handled by the… I think that’s where we are aiming for, where it’s like photography is now will be how data warehouses are… And obviously there’s much more complexity with data warehouses and online pictures, but as a simple model, as a sort of model for the future, that’s the model I see.”
“I’ll say that I think the space is gonna grow tremendously. I think people are unlocking their cloud transitions, if you want to call it digital transformation, initiatives for buzzwords. But what I’m seeing with this whole 2020 onslaught of working from home, working from anywhere, even IT folks, they’re getting a lot more comfortable with the cloud overnight, in a lot of cases, and really moving some key workloads and key corporate workloads.
“It used to be thought of as something that had to be in the data center. You had to put your arms around it and hold it tight. And saying, “That’s just not true. We’re everywhere. Our data can be in the cloud because that’s the easiest place to get access to it and serve access to it, to all the people who are using it.”
“So, I think, by 2024, we will see that long-awaited tilt to most folks thinking cloud-first about things like data warehouses, in general. We’re just gonna go there.. But I will also go say computational storage and storage class memory is coming soon, faster than most people think it’s coming.”