Datamining poised to go mainstream: Page 3

(Page 3 of 3)

The Microsoft OLE DB for DM endeavor will likely spawn compliant datamining products sometime in 2000. But that doesn't mean you can't do datamining against SQL Server (or any other database) today. In fact, Microsoft's Site Server 3.0 already includes features such as an intelligent "cross-sell" based on historical sales baskets in stores, the contents of the current shopper basket, and the browsing behavior of the shopper. Site Server ranks products that are likely to be most interesting to the shopper.

Lessons learned about datamining
Don't try to do everything at once.
Focus delivery on immediate tactical as well as long-term strategic value.
Use consultants with track records in your industry.
Make it easy for end users.
Microsoft isn't the only firm with interdependent products. IBM Corp.'s SurfAid Analytics ( relies on the company's own Intelligent Miner for Data to deliver sophisticated Web site analytics for a fixed monthly fee that ranges from under $1,000 to about $30,000. SurfAid is a small, entrepreneurial e-business within IBM Global Services, which is based in Somers, N.Y. Clients upload daily Web log files to the SurfAid FTP site. RS/6000 AIX scripts handle preprocessing, which includes "stitching back together" navigation paths of individual Web visitors. Then, one of SurfAid's RS/6000s runs the IBM Intelligent Miner datamining tool kit against the customer file, which may contain over 150 million hits per day. The result is a daily report that customers can access at a private URL. Because IBM DB2 for OLAP is running behind the scenes, users can "slice and dice" the data starting with almost a dozen different reports.

IBM, by the way, shipped its first datamining tool kit in 1995. Today, the company's Intelligent Miner for Data and Intelligent Miner for Text are used by customers with large DB2 databases. IBM has also developed a graphical query language, query by image content (QBIC), which lets users make queries of large image databases based on visual image content--properties such as color percentages, color layout, and textures occurring in the images. It is used with Digital Library to do graphical datamining.

Shortly after Microsoft parted the curtains on its datamining spec, Oracle Corp. announced its purchase of leading datamining vendor Thinking Machines Corp. and its Darwin product family. The Redwood City, Calif.-based company hasn't made any announcements about how Darwin will be integrated into its product line. Although Oracle already has its own text mining product called Oracle ConText, it's likely that the company will weave Darwin into its marketing campaign and Oracle Applications product line. In another significant move toward consolidation, SPSS Inc. ( acquired Integral Solutions Ltd. (ISL) and its popular Clementine product.

Darwin and Clementine are two of six datamining tools suites that Stamford, Conn.-based Gartner Group, in an August 1999 report on datamining, identified as key players in the generic datamining market. The other four are Angoss' Knowledge Suite, IBM's Intelligent Miner for Data, SAS's EnterpriseMiner, and SGI's MineSet.

In the audio mining field, speech vendors such as Dragon Systems ( and Virage Inc. ( are working with all the major database vendors--including IBM--to support the technique, which is scheduled to be available later this year. Audio mining might be used to monitor call center traffic, customer service calls, or company voice mail (privacy issues aside) looking for anything from profanity to recurring customer service complaints to suspected industrial espionage.

E-commerce, CRM, and data warehousing will all help propel the datamining market forward. Standards such as extensible markup language (XML), the predictive modeling markup language (PMML), the cross-industry standard process for datamining (CRISP-DM), as well as Microsoft's OLE DB for DM, will help, too. The evolving technology combined with such success stories as Just for Feet and Fingerhut will certainly drive the market into the mainstream. //

Karen Watterson is an independent San Diego-based consultant who specializes in database and data warehouse design. She's an editor of industry newsletters ( and has just completed a book on SQL Server, "10 Projects you can do with Microsoft SQL Server." She can be reached at

Datamining: How it's done
Datamining overlaps with many fields, including statistics, artificial intelligence, data visualization, machine learning, expert systems, and neural networks. One way to demonstrate the breadth of the field is to categorize datamining into six families of techniques (see "Techniques used in datamining").

To get a feeling for what's involved in datamining, imagine that you're a bank and that you want to identify your most profitable customers. In most cases, that information is buried inside reams of transaction data that's probably spread out over multiple divisions (loans, savings, asset management, etc.). Let's assume your bank already has a data warehouse in place.

First, you want to determine whether the data warehouse contains all the data you need--you might want to add external demographic data, for example. Once you're satisfied with the contents of the data warehouse, you identify the data to be extracted and examine it for quality and completeness. You're likely to find at least some data that's incomplete or of poor quality. Then you must decide whether you have the time and money to clean up the offending data; if not, you simply eliminate it from the model.

Next you figure out the best algorithms and methods to use. You buy (or obtain an evaluation copy of) potential tools and use them to develop predictive models. After many "runs" you'll probably uncover some trends and patterns that can be used to forecast which customers' business would be most profitable.

Then you refine the predictive model, and run it to generate a list of profitable customers. Sales or marketing executes their campaign, and, if the system worked, you have a high return rate at reduced marketing costs. --K.W.

Page 3 of 3

Previous Page
1 2 3

Comment and Contribute


(Maximum characters: 1200). You have characters left.



IT Management Daily
Don't miss an article. Subscribe to our newsletter below.