Getting the Bigger Picture: Dealing with Unstructured Data

Most of your data resides outside of analysis-friendly databases. A little understanding of unstructured data, and the tools to tame it, goes a long way in filling that business intelligence blind spot.
Posted September 13, 2004
By

Drew Robb

Drew Robb


(Page 1 of 2)

Companies have never suffered from a lack of data. They have warehouses of file boxes and terabytes of storage. What is missing is actionable intelligence that they can use to improve business results. Using data mining tools helps to convert database stores into business intelligence.

But that only gives part of the picture since 85 percent of an organization's knowledge isn't in databases. To get at the rest, a new generation of text mining tools allows companies to discover relationships and summarize information from large stores of previously unanalyzed data.

Structured and Unstructured

Information breaks down into two broad categories - structured and unstructured. Structured is what we find in databases. Every bit of information has an assigned format and significance.

stock 
photography
Unstructured data is what we find in emails, reports, PowerPoint presentations, voice mail, phone notes, agendas and photographs.

Companies have been using data mining software for years to extract business intelligence from their structured data. Since the database fields are clearly defined, it is easy to run queries and formulas which extract meaningful information, not just raw data. Computers are great at handling massive quantities of structured information, something which people have a hard time doing.

Unstructured data is what we find in emails, reports, PowerPoint presentations, voice mail, phone notes, agendas and photographs. Shaku Atre, president of the Santa Cruz, CA business intelligence consultancy Atre Group, points out that much of this type of information is better referred to as semi-structured since it contains structured metadata such as the e-mail headers or revision dates in Word documents. For simplicity, we will group the entire spectrum of data that is less structured than database entries under the term "unstructured."

This data typically comprises about 85% of an organization's knowledge stores, but it is not always easy to find, access, analyze or put to use.

"We are drowning in information but are starving for knowledge," says Mani Shabrang, technical leader in research and development at The Dow Chemical Company's business intelligence center in Midland, Michigan. "That information is only useful when it can be located and then synthesized into knowledge."

Running full text queries to find key words is one way to locate text information but it is severely limited. It still relies on a human to then read that information, spot the relationships and convert it into useful knowledge. One problem lies in determining the true meaning and importance of language.

Continued on Page 2.


Page 1 of 2

 
1 2
Next Page





0 Comments (click to add your comment)
Comment and Contribute

 


(Maximum characters: 1200). You have characters left.