Datamation content and product recommendations are
editorially independent. We may make money when you click on links
to our partners.
Learn More
Dr. Usama Fayyad is a data mining pioneer who began working in the field in 1989. He got his start at NASA’s Jet Propulsion Laboratory, compiling data on astronomical phenomena such as volcanoes, star systems, etc. From there, he went on to work for Microsoft research and then, frustrated by problems he was seeing in the data mining industry, he left Microsoft and started digiMine to deal with the issues of data mining and data warehousing. In this article, he shares his thoughts about the industry and how to get the most out of your data.
“There are two sides to data mining, descriptive and predictive,” says Dr. Fayyad. “Descriptive data mining reorganizes the data, digging deeper into it and pulling out patterns, such as customer similarity, which allows you to create a short description about that group of customers.
“Predictive data mining looks for the best prediction, such as the best product to pitch to a customer. You won’t get much insight, but it increases the performance, the ROI. Using both techniques will give you the best results.
“An important issue today is SQL, the standard interface for databases, which has proven to be the wrong interface,” Fayyad says. “As an example, let’s say you worked for a telecommunications company, and you want to find records about cell phone fraud. Well, guess what? These naturally asked questions cannot be answered by today’s databases, because the interface was designed to address problems where you know the target and you want the database to quickly retrieve the result. If you don’t have an exact description of the target, you’re lost with a database today. This is why data mining is seeing a lot of demand.
“When I started in this field back in 1989, there were many people in large corporations struggling with large data sets. And even though there’s a lot of data out there, it’s not necessarily the right kind. Also, there’s big difference in the ability to store data and the ability to access it in a useful way.
“In response, companies began building data warehouses, which convert transactional database data into a format that allows for more analytically oriented queries to go against it. In theory, it contained all the details for data mining. But in reality, it was a huge challenge. Today, industry analysts are recording an 85% failure rate on data warehouses.
Data Warehousing Woes
“The big question today is: Where’s the data? If there was a data warehouse, most likely it failed or it’s not working. I saw this so consistently that I started digging into it and discovered three problems:
Invariably, data warehouses were built by a company such a IBM or NCR, but once that company left, the data warehouse started dying;
In some cases companies tried to build it themselves, but it became so complicated that they couldn’t maintain it anymore;
The data in the warehouse became stale or dead, after the employee who built it left the company.”
At this point, Fayyad left Microsoft to create digiMine. He says he realized that “you cannot mine if you can’t have access to the data. And you can’t have the right data in the right format unless you ensure that there’s a successful data warehouse. At digiMine, we build and host data warehouses for companies; then we embed data mining on top as a solution.”
Here are some guidelines for collecting data:
The data needs to be in the right format, which not only collects the data but also collects it with enough detail.
The data should not be aggregated in any way, because if the detail is lost, you cannot do data mining on top of it.
There is no specific format for collecting data, as each data mining tool has its own format. At a base level, if the data involves customers, you record all the data and bring it into one data record, but that’s not how databases represent data today.
Fayyad says digiMine begins from the other end, asking the client what data needs to be mined and how to apply the algorithms. From there, digiMine sets up the data warehouse and the technology to grab the data from a variety of formats. The customer installs their software, which they maintain and run from their data center.
Fees charged depend on the subscription, ranging from $7,000-$10,000/month for a customer who wants pure data mining and not much warehousing, to $30,000-$40,000/month if the customer wants data warehouse hosting, maintenance and enterprise solutions.
Then there is the issue of purchasing software. According to Fayyad, “There are many tools available from companies such as SAS or IBM, but in order to use them properly, you had better be an expert, preferably a Ph.D. in the area of data mining or statistics. If you’re not, you just bought a bunch of shelfware.
“For most users, data mining tools offer the wrong interface. You need data mining solutions. If you have a large staff of experts who know data mining very well, data mining tools will do the job,” he says. “However, this department of experts is now acting as the interface between the tools and the ultimate user.”
If you’re considering purchasing data mining software, Fayyad recommends you look at applications that package the data mining inside the software, or that you purchase a service option.
-
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
-
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
-
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
-
Top 10 AIOps Companies
FEATURE | By Samuel Greengard,
November 05, 2020
-
What is Text Analysis?
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
-
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
-
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
-
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
-
Top 10 Chatbot Platforms
FEATURE | By Cynthia Harvey,
October 07, 2020
-
Finding a Career Path in AI
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
-
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
-
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
-
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
-
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
-
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
-
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
-
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
-
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
-
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
-
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
SEE ALL
ARTICLES