Wednesday, June 23, 2021

Snowflake and the Enterprise Data Platform

A new report entitled Data’s Evolution in the Cloud: The Lynchpin of Competitive Advantage explores executives’ attitude toward the essential – and challenging – process of data mining. Based on a survey conducted by The Economist and sponsored by Snowflake, the report details an industry in rapid flux, with big stakes and big challenges in current data analytics practice – focusing on the myriad innovations enabled by the cloud.

To provide insight into the the intersection of data analytics and cloud computing, I spoke with Kent Graziano, Chief Technical Evangelist, Snowflake

The State of the Enterprise Data Platform

  1. Before the report, let’s talk about the current move toward data in the cloud. It appears that this move has reached a new velocity recently, even before Covid. What’s driving this trend?
  2. The report finds that “87% of executives agree: Data is the most important competitive differentiator in the business landscape today.” However, the core challenge that businesses face – a stubborn challenge – is extracting value from their data. Why is this such a challenge? And how does the cloud enable this?
  3. An even bigger challenge is gathering data from the wider business ecosystem – partners, suppliers, and customers. How does a cloud-based approach help glean value from this diverse data environment? And what are the potential rewards?
  4. How does a cloud data platform like Snowflake transform or expand the work that developers, data scientists and data analysts can do?
  5. For data professionals and developers specifically, what are the advantages of using the Snowflake Data Cloud?
  6. Future of the enterprise data platform? Specifically, any sense of the future evolution of Snowflake as a data platform?

Download the podcast:

Watch the video:

Edited highlights from the conversation: 

Snowflake is definitely getting a lot of buzz. Warren Buffet, of all people, bought into Snowflake pre-IPO.

That certainly created a lot of buzz, because I had my family members who really only barely understand what it is I do, were commenting on it and calling us and going, “Wow, I saw Warren Buffet’s investing in the company that you work for!” It’s like then you know, it’s made an impact when you have people outside the industry noticing.

Why is Snowflake so hot? It’s cloud native, but beyond that even.

The world’s changed so much with all of the data that’s out there, and companies need a way to innovate and be more agile. And what we’re seeing with our platform is that people are able to do that.

They can come in, they can start really, really small, and grow to massive size going into petabytes of data with no management overhead, really. It’s made it so much easier than when I started in the industry 30 years ago, where you had to pre-plan everything.

And you really had to know, where are we gonna go? What’s our three-year, five-year plan? How much data do we think we’re gonna have? How many users do we think we’re gonna have? We don’t have to do that anymore.

And that’s one of the things that I loved about Snowflake, because I came in, I was really a data architect, and a modeler and designer, and it’s like, “This is great, I can actually now work with the business, figure out what data do we really need, what kind of a model should it go into, and very quickly get that up and running without having to worry about, are we gonna have enough disk space?

Are we gonna have enough compute? How many users will we really have?” And I have to size for all that. I don’t have to do any of that with Snowflake, so that really allows me to accelerate the delivery of the value to the business.

You’re saying it’s in contrast to the old days where a large data mining or data analytics application would have been in-house, and that would have been far less scalable than Snowflake?

Yeah, yeah. The on-premises world by definition, you were constrained to a box. It’s a server. It’s got so many CPUs in it, and it’s got so much disk space when you initially buy it. And yes, you can plug… You could get to the point where we could plug in SANs and we could add more disk.

But you still had to plan for that, and then you had to go through a procurement process. I had times when I was building large data warehouses where we told the infrastructure team, “We’re gonna need 10 terabytes.” And they laughed at us and said, “No, you won’t.” And they got us two terabytes, and then three months later we were out of space. And then we had to wait six weeks to get more disk space.

And so that obviously, that slowed our ability to deliver to the business down because we just physically didn’t have the infrastructure. Snowflake, you add data in and it elastically just grows. You don’t have to pre-allocate it, it’s just there on demand, and I don’t have to be a DBA or a system administrator to do anything. I just load the data in and it’s automagically there.

What about the multi-cloud piece? Is it that it works with any of the clouds? And part two of that question is, the cloud providers themselves offer data applications, many of them. Why not just use the data application already offered by one of the hyperscalers?

To answer the first part of the question is, it works on AWS, Azure and GCP. So Snowflake is cloud-agnostic, so when you’re in Snowflake and you’re in the data cloud, you’re in the data cloud. And it doesn’t matter what the underpinnings are, and that is giving people the ability to do is build a true network of data that is location-independent and cloud-independent.

Does that mean the data actually exists there [in various places], or does the data exists in other places and is being virtualized by Snowflake, as a platform?

The data has a home in a particular physical location, and the Snowflake software is managing the… I don’t like the word replication, but replication, if you will, under the covers. So it’s not virtualization.

When we talked about virtualization software, we’re talking about, “Okay, the data is over here and we’re just… We’re looking over there.” And we still have to pull it somewhere, but with Snowflake, our global data mesh is allowing that data to be replicated seamlessly to where it needs to be, where you want it to be, so it’s localized.

So you’re not in London, querying data in Australia. Though, it looks like that is what you’re doing. The data originated in Australia, but you don’t have to care now, and this is like the beauty of the cloud is you don’t have to care where the hardware is, where the data is, and then when you throw Snowflakes data cloud on top of it, now you really don’t need to care, right? That it’s handling all of that for you.

Fascinating. To wrap things up, I’d love to get your sense of the future and the future of the enterprise data platform. Maybe even more interesting, the future evolution of Snowflake. And as you answer, I’m gonna be listening to hear you say the words Artificial Intelligence.

Yeah, so I really see the future of data platforms is obviously, it’s the cloud, but it’s going way beyond what we traditionally thought of of just your basic analytics and dashboards. It is growing into that world of machine learning and artificial intelligence as the source for all that information. And one of the things we’ve learned about machine learning is the more data you have, the more accurate the results are going to be.

And now we have that ability to scale to multiple petabytes in the data cloud. So you have so much more data available to start feeding machine learning and AI types of applications and making it easy through the sharing aspects – through the network of the data – to be able to take that data and get your third party and your partner data and incorporate that all into the data. That your organization then creates themselves and can massage that, do your algorithms and projections off that. And perhaps produce a data product that others don’t have and then share that right back. And it becomes a virtuous cycle.

We’re really evolving into basically the world-wide web of data, so where you’re gonna be able to find the data you need to do the job you need to do, and to make the predictions and forecasts, and work with your customers and provide better customer service and provide more value to your stakeholders,

And to me that’s way beyond. It is probably the vision that we had 20 years ago, 30 years ago, but it took a lot of work to really make that happen, and only the largest organizations could ever afford to do it.

Now smaller organizations can do it because of the power that we have with the data cloud in particular. We’re talking about the cloud, the sky is the limit, right?

You mentioned Artificial Intelligence, we’re gonna get smarter. The Snowflake engine is already pretty smart, and it runs off of metadata, we have an advanced optimization engine, we are metadata driven, and I think over time we are gonna see more machine learning and AI involved under the covers to make it a more seamless experience.

And to make it a more performant experience as the volumes of data grow and grow. I wrote about this a couple of years ago, like, “When you have all of this data available and you know how it’s being used, then it’s just a matter of time before we can be even more predictive about what data do you need, what data…[and] how are you gonna use it?

Our search optimization feature that just came out is another really smart way of being able to query the data to get the performance that you need, again – reduce that time to value even more.

So in essence, the data becomes far faster, far more flexible to shape and imagine and mold as an individual sees fit, and at the same time is also democratized for smaller players to get on board.

Exactly, that’s exactly right. Yes.

Similar articles

Latest Articles

3 AI Implementations That...

I was on a joint educational call for the World Talent Economic Economic forum on mobile computing this week. We drifted to topics that...

Survey of Site Reliability...

NEW YORK — Site reliability engineers (SREs) are warning of a looming scalability ceiling and saying the adoption of AIOps isn’t happening at a...

Druva Integrates sfApex to...

SUNNYVALE, Calif. — A maker of software for cloud data protection and management is helping companies safeguard essential customer data that their sales and...

Best Data Science Tools...

Data science has transformed our world. The ability to extract insights from enormous sets of structured and unstructured data has revolutionized numerous fields —...