Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business
by Loraine Lawson, IT Business Edge
The Business Edge's Loraine Lawson spoke with Evan Levy, a partner at Baseline Consulting and an instructor at The Data Warehousing Institute, about why metadata matters and the business problems it can create.
Lawson: Could you explain what metadata is?
Levy: The whole premise of metadata is, give me the information and context of the data that I’m looking at or want to use.
So you might get a five-digit ZIP code, might get a street address, but ultimately what you want to understand is tell me about this data. Where did it come from? If it’s a field in a database or field in the computer system, where is it located in that computer system? When was it created? So if I want to talk to someone about this information, I can describe what it is, because it’s not really tangible.
I go to the grocery store and I ask for red delicious apples. When I hold it in my hand, what’s metadata about a red delicious apple? Its type, its variety, how much it weighs, where did it come from, when was it picked off a tree - the same thing with data. Metadata is the information about the data.
Lawson: Now when you're designing Web pages and you put in metadata, you put it in the code. Is that how it works for a database too?
Levy: Actually, it’s funny. There are several different ways that metadata is stored in technology. You just mentioned I could put metadata or comments in the actual Web page, but I think you have to consider one thing. And I’m being a little esoteric, so pardon me, metadata is the content or the information about the contents you're talking about.
Now, where it’s stored is usually the challenge, because there aren’t standards for how that information is always stored. It varies depending on the type of technology you're talking about. In a database, there’s something called a dictionary, but realistically that information isn’t always filled in.
That’s actually part of the premise of master data management. Metadata is almost boundless. If you consider the concept of a Social Security number, one would say, well, what about the rules of who is allowed to look at a Social Security number? Someone might say, “Hey, that’s metadata also.” The premise of master data management is being able to decouple all those rules and details about data from where people typically store it, which is in a database or in an application and in fact a mechanism of coupling that information with the data itself. Does that make sense?
Lawson: Yes. And do you want it coupled with the data or do you want it decoupled?
Levy: Usually you’ll hear with master data management the whole premise of decoupling - that part is where applications are coupled to the data.
In fact, you would like metadata to be coupled or attached to the data itself. Let’s go back to our apples example. You go buy a jar of applesauce. You want to know what the brand is. You want to know how much it weighs. And you want that right on the jar. You don’t want to go look someplace else. I mean, how annoying is it that you can’t look up the price of the item on the item?
The biggest challenge with metadata is that there’s not a good way to attach that information about the content to the content. That’s part of the rationale behind XML, by the way. Extensible markup language not only gives you the value, but it gives you all the details about the value. If you rip open a Web page and you see the HTML, if this is what you're referring to which is, “Okay, here’s the value five” and I have tags about what the font color is and all the other stuff that describes that five. You can also add other tags - where did it come from, who is responsible for it, security details and so forth.
The challenge is metadata tends to vary based upon the way it’s delivered.
Lawson: And does that create business problems? Or just technology issues?
Levy: Enormous. You know, the real issue is exacerbated by technology, it’s not created by technology.
Because data is not tangible - you can’t touch it, it’s not physical - it’s kind of hard to attach information to it. But the fact is, you see those problems all the time when someone fills out a form or prints a report: Where did this piece of paper come from? So, what happens, people as a business standard say, “You always have to put the date on the bottom left, the page number on the bottom right, who is responsible for it on there too.” So you have all of these business conventions. And people sometimes follow them, sometimes they don’t, but they can’t always be enforced.
Lawson: So what kind of business problems does it create? Can you give some examples?
Levy: Sure, sure. There’s a zillion of them. The first example is, so you want to attach a board to a tree outside. I need a screw. Okay, go get a drywall screw. But one screw is not like any other screw. You see a brass screw or a steel screw that won't rust. A drywall screw will. So you put the board on the tree and six months later, it rusts and the board falls off. Why? Because the information about what you were using wasn’t available.
From a more practical perspective, when it comes to data and metadata as opposed to metadata about objects, Verizon launched a marketing campaign where they advertised they would sell long distance to New Jersey. What they didn’t realize was those names weren’t approved to be sold to – it didn’t have the metadata. It's really more of a business rule, but all of those customers that opted into marketing, what they didn’t know was the regulatory approval wasn’t there. That's not data -- that's actually data about data.
There are other instances, and you see this more often than not, where someone comes to report and someone says, "Well, where did this come from?" It's fairly common in companies where people run reports from two different places, but there's no way of knowing that it was from two different places.
My favorite is when people want a cash report and an accrual report, but because it wasn’t labeled correctly, they don't know that they're both accurate, but they show two different numbers.