SHARE

Drilling Down With A Data Mining Pioneer

Dr. Usama Fayyad is a data mining pioneer who began working in the field in 1989. He got his start at NASA’s Jet Propulsion Laboratory, compiling data on astronomical phenomena such as volcanoes, star systems, etc. From there, he went on to work for Microsoft research and then, frustrated by problems he was seeing in […]

Written By

Nathan Segal

Nov 6, 2002

5 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Dr. Usama Fayyad is a data mining pioneer who began working in the field in 1989. He got his start at NASA’s Jet Propulsion Laboratory, compiling data on astronomical phenomena such as volcanoes, star systems, etc. From there, he went on to work for Microsoft research and then, frustrated by problems he was seeing in the data mining industry, he left Microsoft and started digiMine to deal with the issues of data mining and data warehousing. In this article, he shares his thoughts about the industry and how to get the most out of your data.

“There are two sides to data mining, descriptive and predictive,” says Dr. Fayyad. “Descriptive data mining reorganizes the data, digging deeper into it and pulling out patterns, such as customer similarity, which allows you to create a short description about that group of customers.

“Predictive data mining looks for the best prediction, such as the best product to pitch to a customer. You won’t get much insight, but it increases the performance, the ROI. Using both techniques will give you the best results.

“An important issue today is SQL, the standard interface for databases, which has proven to be the wrong interface,” Fayyad says. “As an example, let’s say you worked for a telecommunications company, and you want to find records about cell phone fraud. Well, guess what? These naturally asked questions cannot be answered by today’s databases, because the interface was designed to address problems where you know the target and you want the database to quickly retrieve the result. If you don’t have an exact description of the target, you’re lost with a database today. This is why data mining is seeing a lot of demand.

“When I started in this field back in 1989, there were many people in large corporations struggling with large data sets. And even though there’s a lot of data out there, it’s not necessarily the right kind. Also, there’s big difference in the ability to store data and the ability to access it in a useful way.

“In response, companies began building data warehouses, which convert transactional database data into a format that allows for more analytically oriented queries to go against it. In theory, it contained all the details for data mining. But in reality, it was a huge challenge. Today, industry analysts are recording an 85% failure rate on data warehouses.

Data Warehousing Woes

“The big question today is: Where’s the data? If there was a data warehouse, most likely it failed or it’s not working. I saw this so consistently that I started digging into it and discovered three problems:

Invariably, data warehouses were built by a company such a IBM or NCR, but once that company left, the data warehouse started dying;

In some cases companies tried to build it themselves, but it became so complicated that they couldn’t maintain it anymore;

The data in the warehouse became stale or dead, after the employee who built it left the company.”

At this point, Fayyad left Microsoft to create digiMine. He says he realized that “you cannot mine if you can’t have access to the data. And you can’t have the right data in the right format unless you ensure that there’s a successful data warehouse. At digiMine, we build and host data warehouses for companies; then we embed data mining on top as a solution.”

Here are some guidelines for collecting data:

The data needs to be in the right format, which not only collects the data but also collects it with enough detail.

The data should not be aggregated in any way, because if the detail is lost, you cannot do data mining on top of it.

There is no specific format for collecting data, as each data mining tool has its own format. At a base level, if the data involves customers, you record all the data and bring it into one data record, but that’s not how databases represent data today.

Fayyad says digiMine begins from the other end, asking the client what data needs to be mined and how to apply the algorithms. From there, digiMine sets up the data warehouse and the technology to grab the data from a variety of formats. The customer installs their software, which they maintain and run from their data center.

Fees charged depend on the subscription, ranging from $7,000-$10,000/month for a customer who wants pure data mining and not much warehousing, to $30,000-$40,000/month if the customer wants data warehouse hosting, maintenance and enterprise solutions.

Then there is the issue of purchasing software. According to Fayyad, “There are many tools available from companies such as SAS or IBM, but in order to use them properly, you had better be an expert, preferably a Ph.D. in the area of data mining or statistics. If you’re not, you just bought a bunch of shelfware.

“For most users, data mining tools offer the wrong interface. You need data mining solutions. If you have a large staff of experts who know data mining very well, data mining tools will do the job,” he says. “However, this department of experts is now acting as the interface between the tools and the ultimate user.”

If you’re considering purchasing data mining software, Fayyad recommends you look at applications that package the data mining inside the software, or that you purchase a service option.

Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs

FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020

SEE ALL
ARTICLES

Drilling Down With A Data Mining Pioneer

Nathan Segal

Company

Categories

Drilling Down With A Data Mining Pioneer

RELATED NEWS AND ANALYSIS

Nathan Segal

Company

Categories