"There are two sides to data mining, descriptive and predictive," says Dr. Fayyad. "Descriptive data mining reorganizes the data, digging deeper into it and pulling out patterns, such as customer similarity, which allows you to create a short description about that group of customers.
"Predictive data mining looks for the best prediction, such as the best product to pitch to a customer. You won't get much insight, but it increases the performance, the ROI. Using both techniques will give you the best results.
"An important issue today is SQL, the standard interface for databases, which has proven to be the wrong interface," Fayyad says. "As an example, let's say you worked for a telecommunications company, and you want to find records about cell phone fraud. Well, guess what? These naturally asked questions cannot be answered by today's databases, because the interface was designed to address problems where you know the target and you want the database to quickly retrieve the result. If you don't have an exact description of the target, you're lost with a database today. This is why data mining is seeing a lot of demand.
"When I started in this field back in 1989, there were many people in large corporations struggling with large data sets. And even though there's a lot of data out there, it's not necessarily the right kind. Also, there's big difference in the ability to store data and the ability to access it in a useful way.
"In response, companies began building data warehouses, which convert transactional database data into a format that allows for more analytically oriented queries to go against it. In theory, it contained all the details for data mining. But in reality, it was a huge challenge. Today, industry analysts are recording an 85% failure rate on data warehouses.
Data Warehousing Woes
"The big question today is: Where's the data? If there was a data warehouse, most likely it failed or it's not working. I saw this so consistently that I started digging into it and discovered three problems:
At this point, Fayyad left Microsoft to create digiMine. He says he realized that "you cannot mine if you can't have access to the data. And you can't have the right data in the right format unless you ensure that there's a successful data warehouse. At digiMine, we build and host data warehouses for companies; then we embed data mining on top as a solution."
Here are some guidelines for collecting data:
Fayyad says digiMine begins from the other end, asking the client what data needs to be mined and how to apply the algorithms. From there, digiMine sets up the data warehouse and the technology to grab the data from a variety of formats. The customer installs their software, which they maintain and run from their data center.
Fees charged depend on the subscription, ranging from $7,000-$10,000/month for a customer who wants pure data mining and not much warehousing, to $30,000-$40,000/month if the customer wants data warehouse hosting, maintenance and enterprise solutions.
Then there is the issue of purchasing software. According to Fayyad, "There are many tools available from companies such as SAS or IBM, but in order to use them properly, you had better be an expert, preferably a Ph.D. in the area of data mining or statistics. If you're not, you just bought a bunch of shelfware.
"For most users, data mining tools offer the wrong interface. You need data mining solutions. If you have a large staff of experts who know data mining very well, data mining tools will do the job," he says. "However, this department of experts is now acting as the interface between the tools and the ultimate user."
If you're considering purchasing data mining software, Fayyad recommends you look at applications that package the data mining inside the software, or that you purchase a service option.