In this webinar we took an in-depth look at data governance, including:
- What’s the current state of data governance?
- What are the key trends and forces shaping data governance?
- How are emerging technologies like automation influencing data governance practices?
- What’s the future of data governance over the next few years?
To provide insight into data governance, I spoke with two leading experts:
- Danny Sandwell, Solutions Evangelist, Erwin
- Mike Matchett, Principal IT industry analyst, Small World Big Data
Download the podcast:
A statistic that tracks the number of companies that are actively involved with data governance is really low. Are so few companies really managing data governance?
Sandwell: Oh, I would not want to comment on the accuracy or veracity of the number. We do surveys every once in a while and it does amaze me. Our results generally sit around just under 50%. 98% think it’s important and know they have to do it, but just under 50% are actually actively pursuing it at some level.
Matchett: Yeah, and it does really depend on how you ask the question and how the survey is phrased. I do think a lot of people are doing parts of data governance, probably most people are doing something about data security and making sure their data has integrity and looking at some data quality at some level
I would tend to believe the number is actually lower if you ask people who has a formal data governance program that they can actually document and show to a regulatory authority if they needed to. I would guess it’s probably in the 20s, even that.
Sandwell: “Following up on what Mike said about you ask the question and how people define it, it’s not like the old days of data warehouse where you had a methodology that was blessed and everybody’s brought into and these are the six steps and that’s what you get. So I agree, I think people are doing pieces of it, whether they formalized it. And it really depends on where they are in that data-driven journey.
“With a lot of technology, you go to the end result first. So people are investing huge amounts in things like analytics and stuff like that to really take advantage of the data. And they have to go through that wave of realizing that maybe they’re not getting everything that they expected to get of those types of approaches if they don’t do some of the hard work upfront in terms of getting their arms around the data that they have, where is it, what does it mean to the business, what should we watch out for in this data? As you said, what’s the level of quality?
“And it’s absolutely not a small task to try to do that. So, it really depends on the maturity. Do they have things like metadata management in place or do they really have to start from ground up?
“And then, of course, you get the people that go forward with an approach and then realize that maybe that approach or maybe the scope of that approach or the drivers for that approach has changed or it doesn’t deliver exactly what they would hope that it would, then they have to go back and take a look at it. We recently did another question in the survey and we do these every couple of years. And it was actually a backwards slide in terms of how many people were… Considered that they were doing data governance successfully.
“And it correlated with a major shift in drivers. So two years ago, GDPR out of Europe was coming out and compliance was head and shoulders above all the rest in terms of what was driving their data governance initiative or efforts. In that second survey, that was overtaken by better decision making and more effective analytics. So, I think that maybe they got… A lot of people got their arms around GDPR or CCPA…
“And now they’re saying, “Well… But we’re doing all this, but we’re not getting the return on opportunity from out data.” So now, they’re gonna look at that garbage in, garbage out type problem and see if they can start to address that because a lot of the same techniques and things that you might put in place to be able to answer and assure your compliance requirements, there’s a benefit there.
“There’s a lot of information there that can be again re-purposed and leveraged to a point, as I call it, above the line versus avoiding fines and saving money to really getting more of an opportunity from that data.
What’s your sense of why companies are slower and what’s your overall sense of the industry in terms of what would be the reluctance to do such an important thing as data governance?
Matchett: “I don’t see it as a reluctance, I just see it as something that doesn’t get the priority all the time that it should and I think, Dan, you mentioned you’re looking for the opportunity, not just the pain point, I think is what you’re getting at. How do I actually leverage this for this?
“So I think, for me, if I was just boiling it down it’s like, what does success look like with data governance, how do I quantify that and justify that both to my boss and the business, and say like, “Hey, if we do this the right way, and we do this well, what do we get out of it?”
“And it can’t be… “Well, we don’t get broken, we don’t lose the bit.” It’s like this insurance argument which tends to get de-prioritized. And I think if you can change the argument to saying, “Well, look, if we got a handle on metadata, we got a handle on our data catalogs and that, we could do machine learning and AI ops better.
“We could bring things back to the business because now we know where that information is from and now we can use it and leverage it in a positive, to the bottom line, kind of way.” And that I think is probably gonna be the way forward for data governance, is how do we bring it into something that’s not just not defensive, but moves the ball down the field, if you will.
You may have mentioned literally what is the golden question: How do we define success in data governance?
Matchett: “We can go back and say what is data governance too at the first and what’s all under that umbrella in the bucket because I talked to a lot of different vendors and every one defines it just a little bit differently. I’m interested in Danny’s take on that as well from their perspective, but generally, you are looking to get a handle on the data assets that the company has and treat data as an asset.
“And if you’re gonna do that, you’ve gotta understand where it comes from, where it’s going, how good is it, how you’re protecting it, how are you keeping it from being manipulated and used against you? How do you even know what’s in the bucket when you look at that? So you get the catalogs and the metadata, you have to go in there and say, “I know what this is.”
“And how do you squeeze value out of it. How good am I at taking that vast amount of data I have now and repurposing it, if you will, but how do I mine it for insight, how do I mine it for value and not just accumulate it? So I think a lot of people build, they even build data lakes and things like that because, “Hey, this is a great thing to do, we’re putting all our data together. Now, look at it, it’s all sitting there.”
“You got a petabyte of data somewhere. I can’t use it because I don’t know what’s in it or where it came from or anything that’s solid about it. And I’ve even done projects for people where… This is going a little sideways, but where we say, “Alright, let’s look at this cool data, give me a data cut from six months ago.”
“And we look at that and we say, “Okay, we’ve got this cool algorithm now we can apply to this if we just trusted this data a little bit more and knew what was coming in. By the way, can we get new updated versions of that?” And they kinda raise their hands and go, “What is that data?” Six months later. I don’t know. You gave it to me. Yeah, we don’t know where it came from.
“Wait a minute. There was no process in place to even produce a reliable data set out of something. So I guess coming back to data governance, those are kinds of the problems you can solve and get back to developing opportunities for growth and opportunities for advancement with what you have.
“And I guess that’s the only answer I can give. What success would look like to me is that you’ve got data governance people who are sought out in a company on a daily basis for their advice and help with solving problems and moving the needle and not simply, “Oh, here comes the data governance person. Let’s run and hide.” [chuckle]
Danny, How do you define success in data governance, that moving target. What is success?
Sandwell: “I think as Mike did, kinda go back to what is data governance and there’s a million definitions based on primarily people’s perspective and the role that they play around data because to a technician, it means this, to an end user, it means that, to a lawyer or risk management person, it means this.
“For us, how we try to explain it to people is that it’s visibility, control that leads to capability that leads to value and really getting more. So talking about data as an asset, one of the trends we’re starting to see is people trying to put a balance sheet number on data, not data that’s for sale commercially, but data that’s… So, and again, how do you get to that number? There is no common framework, but some of the indicators are what does it cost you to maintain it?
“Who’s using it, how much is it being used? And then you sort of dig down into it, but really success is, are you getting the results from your data that you want? And again, there’s two sides to that, there’s the defensive, you’re not getting burned by it whether it’s through fines or being known as the poster child of the latest breach.
“But then what are your goals with your data, which then goes to what are your goals with your business? What are you trying to do? Everybody uses the term digital transformation. Well, now it’s data-driven digital transformation, and it’s really about how do you prepare yourself for what you wanna be next week, next month or the year after that. Is your data helping you do that or is it hurting you?
“So, and again, there’s, you know, like I said, so many people involved in so many aspects to the data in an organization, especially with the volumes, the varieties and all of that. You know, now we wanna get to veracity, all right? Which is we wanna make sure not just veracity in the asset itself, but veracity and the confidence in all the processes around that because data is flying here, moving there, being transformed, it’s going into the lake, it’s ending up in the swamp.
“Never to be found again. What’s, again, the lost opportunity based on that inability to do that. And you know, then you get some very basic ROI numbers. The scientists… It’s an exciting new field. I’m thinking of ending my career as a data scientist, if I can only, you know, find myself a lab coat. But the data scientists, high price, high expectations in a lot of organizations, but them and a lot of other people that are using data for the betterment of the business are spending 80%, over 80% of their time just looking for it and seeing if it’s the right stuff, and then 20% delivering actionable insights to the business.
“And they’re trying to, you know, that’s how a lot of companies are measuring it. Can they shift that ratio, and you know, have people delivering value to the organization more often and with more impact based on that data and spending less time saying, “Well, you know, when Joe gets back to me to tell me what transformations this data did. You know is this the gold master data set that I wanna use for this use case.” So lots of different ways to measure it, but it’s really about your capability to make data all that it can be and none of the things that you don’t want it to be.
What are the technologies influencing data governance? Obviously the world of machine learning and the world of data governance are so intertwined.
Matchett: “Certainly automation is a big piece of this. And I think if we think of data governance as sort of the center point of making sure all the data management stuff gets done, right? Which is that you’re really talking about how do I implement policies and procedures that are consistent everywhere across business units, across the company and so on.
“So any kind of technology or solution that allows someone to exercise a, I hate to say, but a centralized authority and push it out consistently to constituents is going to help the data governance. And I think that’s a real problem, especially in any company that’s gotten big enough to have data management issues where they’ve got different lines of business doing different things, right?
“How do you centralize that? So, I think automation is probably the right way to really move forward. So in technologies that can, again, apply policies, use AI to look at data to make sure things are not abnormal, to ingest data, to auto recognize, automatically recognize schemas and look at that and identify sources of data and perhaps even go and work backwards where the data came from and keep track of that.
“You know there’s tools out there that will keep track of the history of where the data came from and allow you to re-create it if you need to be. So those are the kinds of things I’d look for in someone who says, “I wanna do something really aggressive in data governance.” It’s like, “How are you really gonna implement that and who’s gonna be able to do that and how are you gonna leverage them across a larger organization?”
Sandwell: “Yeah, I think you really hit a nutshell in fact to the sort of origin of the question with AI and machine learning, you know, that’s gonna be I think one of the biggest drivers for people really getting serious and successful with data governance.
“Because as I put on my slide decks when I go out and talk to people, you’ve got the picture of the Terminator, where the machines took over the – they took over and got rid of the humans because if you’re relying on AI and machine learning, if you’re feeding it junk, you’re going to get junk.
“So AI machine and learning now is becoming a big part of succeeding in data governance and leveraging that internally to make a better mouse trap to help you govern that data, and a lot of that is automation. A lot of it is combined with automation, where you have, you know, AI and machine learning aspects to it.
“But Mike is absolutely right, if you automate something, then you’re taking a standardized approach. The results that you get are going to be very predictable if you’ve done your automation correctly and you don’t have strange logic loops that take you into other places. That’s a big part of what we do just because that’s what customers are looking for.
“And the other piece that it goes to is efficiencies, reducing latency around data, so automation is going to speed up the process, and then keep data governance harmonized and then aligned with what’s really going on in the world is a huge challenge, and automation is being applied there as well.
“Just imagine what it’s like to have a dashboard with a field on it and a number. And you wanna know where that came from, all right? And it probably went through maybe potentially five, six different places with potential transformations happening to it along the way, that’s a lot of work to figure that out for that one field. Now how many fields are on dashboards across your organization? So lineage is a huge piece that people are looking for.
“And automating that lineage, whether it’s using machine learning to infer the lineage. We take a different approach where we actually automate the interpolation of all of the code that’s out there that’s moving data around and doing things and manipulating data.
“And we auto document that, bring it into a place that you can actually centralize it, and then start to automate the creation of it. So now that whole idea of data in motion or data movement in your organization, which has been one of the biggest challenges for organizations, it’s great. I know what this field is, I know what this field is, but I can’t tie them all together.
“That piece in itself is a huge place where automation will make sure that what you have is up to date, it’s in sync, but also allow you to get data to the consumers in a much faster way by leveraging that centralized metadata approach and then activating that metadata.
“The processes that you need to get data where it needs to go. So, and then again, as I said, if you automate something, it’s easier to govern because if you’ve done the automation right, governance is built into it, right? Because only certain things can happen within those parameters. And you know those are the things that you want to happen. So it is a big thing, and you’re going to see a lot more automation and a lot more AI machine learning-driven automation in the solutions that are delivering governance so that people can do AI machine learning much more effectively than they are today.
What’s the near term future of data governance? Will it become so interwoven with machine learning and AI that it won’t be much of a challenge?
Matchett: “I don’t wanna ever get the impression that this is going to become a robotic task. It’s not a repetitive task. It is, in some ways, probably the opposite. It’s a very creative. Governing people, process, and technology is an ongoing management task that’s gonna require the brightest and creative, most creativity that person can bring to the job in some ways.
“ I think where AI and ML comes into play is giving that person leverage reach and scope and being able to implement things in a large scale across different domains was really where that comes in to make it possible. Where I think this is going, though, is we’re going to see more and more of our data streams plug and play and come with their own meta for governance.
“And so, right now, you can get data and generate data from all sorts of applications and pipe them together and put them in different places. And it’s really just a programmer’s dream, right? You can do stuff. I think in the future, we’re gonna start to see, look, the data comes with its governance and its meta, so it’s like a management ride along. And when you plug things together, even at a code level, it’s gonna pass the data and its governance requirements and information as well. And those will be standards that will have to emerge over time.
“But I think as we automate things and become more connected, that’s just what has to happen. If you think about it, there’s a data plane and a management plan to most control mechanisms and IT things. And I don’t think we’ve got the management plane for data down quite yet. And that’s where I see the technology is gonna come.
Sandwell: “In the 24 months, I think that you’re going to start to see some of the marketing and the architecture that’s out there turning into the fruition of real useful solutions. So when I say marketing, meaning AI and machine learning to look after, artificial intelligence to govern your data, those types of things. You’re going to start to see a lot more reality in that and a lot more measurable value.
“I think overall, as those platforms that produce data and how people configure their data architecture and their data infrastructure moving to the cloud, a lot more things as a service taking out that basement of Gremlins that are coding and doing all the things that… And I can say that because I was in the basement as a gremlin doing all that. That’s been my career. That governance will be built in as part of the service.
“And then now what it’s going to be about is really orchestrating that for the business level, for the senior management level, to not just get the most out of their data, but also to start to garner operational intelligence and insights around those different things that you have and how they mix together.
“With less and less of the IT expert, that you’re going to see a lot more social type applications around this where you’re really starting to leverage a community around your data and taking advantage of that without having to have, Joe, the architect, there to tell you what the data is, now it’s self-service, now you’re sharing and building on that.
“And then, some better capabilities around once you have this pane of glass that tells you all of the data possibilities in your organization, better tools to start ideating around that data and figuring out what are we using it for today or what could we potentially be using it for. And more guidance in that direction as people, again, they’re looking for more of a positive sustainable impact on data governance that can be sold to the people that hold the check book and sign those checks.
Matchett: “So if we’re doing more shorter term, I think things are also getting more real time and even BI and warehousing and things that we tended to fork off before in the historical different datasets are a little bit more solid and provable. We’ve got realtime data streams feeding into those and then we got BI being done off of realtime data going this way.
“And that’s gonna bring a bit of a convergence in the world as well for good reasons, ’cause we get fresher data and more current data, but also I think data governance is gonna have to come back into the mainstream with it a little bit as we go and that’ll help drive that a bit.
The world may run on data, but data without proper data governance is a major missed opportunity. Consequently, understanding what trends and forces currently shape today’s data governance is essential to running a competitive business.