|Who's who in datamining |
There are dozens of datamining vendors, although some industry consolidation has begun. For now, there's no clear market leader, and most of the products are expensive and complex to use. They were typically developed for the UNIX workstation market for mathematicians or statisticians, not especially for database folks.
Herb Edelstein's market analysis of datamining tools, "Data Mining '99: Technology Report" [available at www.twocrows.com], is 1999's single best source of information about the datamining market. Edelstein provides analyses of the following vendors and their tools:
AbTech Software (ModelQuest MarketMiner)
*Angoss Software (KnowledgeSEEKER, KnowledgeSTUDIO)
Attar Software (XpertRule Miner)
Business Objects (BusinessMiner)
Cognos Software (4Thought, Scenario)
Group 1 (Model 1)
HNC Software Inc. (DataBase Mining Marksman)
Integral Solutions (Clementine, acquired by SPSS in 1998)
IBM (Intelligent Miner)
NeoVista Software (Decision Series)
Salford Systems (CART, MARS)
*SAS Institute (Enterprise Miner)
*Silicon Graphics (MineSet)
*SPSS (Base, AnswerTree, Neural Connection)
Tandem Division of Compaq
Thinking Machines (Darwin, acquired by Oracle in 1999)
Torrent Systems (Orchestrate Analytics)
Unica Technologies (PRW)
Urban Science Applications (GainSmarts)
* These vendors collaborated with Microsoft to create the OLE DB for DM spec. Two more vendors, E.piphany and Datasage, also helped draft the initial spec.
And then there are companies like Fingerhut Companies Inc. (fingerhut.com), the $2 billion firm known for its catalog, direct marketing, and telemarketing ventures, that have spent years honing the process of datamining. The Minnetonka, Minn.-based company's marketing analytics group maintains several hundred generic models that are used to build targeted segmentation models that generate mailing lists for catalogs.
Typically, the datamining team combines four models: a response model (will the customer respond?), a purchase model (how much will the customer buy?), a return model (is the customer likely to return merchandise?), and a payment model (is the customer a credit risk?). The company maintains data (almost 1,400 variables per customer) on more than 30 million customer households in a data warehouse that tops 7 terabytes.
The players, new and old
Although datamining isn't new technology, it has only recently emerged from academia, research labs, and several dozen vendors. The availability of data warehouses and cheap storage have certainly contributed to the trend, but today's keen interest in datamining is largely driven by the explosive growth of e-commerce. Sales and marketing departments want to leverage the data gleaned from Web traffic patterns to do one-to-one marketing.
If the prospect of mining customer data to increase revenues, reduce risk, or detect fraud isn't enough to propel datamining into the mainstream, there's always the Microsoft factor. Microsoft Corp. ventured into datamining when the Redmond, Wash., software maker announced work on the OLE DB Extensions for Data Mining specification in May 1999. The project is a joint effort between the Microsoft SQL Server group and Microsoft Research's Data Mining & Exploration group led by Usama Fayyad in consultation with a select group of vendors (see "Who's who in datamining"). OLE DB is a specification for a set of data access interfaces designed to enable access to heterogeneous data sources. It's considered the successor of open database connectivity (ODBC) and has already been "extended" for online analytic processing (OLAP) and a variety of vertical markets.
|Techniques used in datamining|
Statistics: Identifies instances where one variable causes or influences others. It's good for trends and confirming hunches
Induction techniques: Generates a hypothesis
Neural networks: Sifts through large amounts of data to find unexpected patterns
Visualization techniques: Helps nontechnical people understand the meaning of the data through graphic displays
OLAP: Helps confirm hypotheses using flexible, slice-and-dice techniques
SQL and similar query languages: Answers specific questions (Purists usually don't consider this true datamining.)