The Democratization of Datamining

No longer an esoteric exercise practiced solely by statisticians and analysts, datamining is fast becoming an end-user tool. And that move promises to realize long sought-after-financial returns on expensive data-warehousing solutions.

In this article:
AT A GLANCE: American Century Investments
Lessons learned
Don't tell Stephen Cole the return on datamining is modest or yet to be proven. As assistant director of database marketing and analysis with $70 billion U.S. investment firm American Century Investments of Kansas City, Cole contends that datamining reaps big rewards.

For Stephen Cole of American Century Investments, datamining reaps big rewards.
Cole says that in the first year of using a datamining system to cross-sell financial products to existing customers, "Our models tell us to expect an immediate 12% increase in profits through effective targeting." He adds, "That can go to 30% and even higher."

American Century isn't the only company starting to see huge and immediate financial returns from datamining. Financial services institutions, as well as telecom, insurance, and retail companies, were early users of software tools that let them dig deep into customer data for nuggets of information to help them detect fraud, predict loan delinquencies, and keep customers from taking their business elsewhere. Now, as datamining vendors make their products more accessible to the average business analyst who has no analytical or statistical proficiency, more organizations are making money out of these expensive and complex implementations.

A growing field

AT A GLANCE: American Century Investments

The company: American Century Investments is a $70 billion U.S. investment firm based in Kansas City.

The problem: The company wants to keep existing customers and acquire new ones.

The solution: SPSS Base suite of datamining tools.

The IT infrastructure: Hewlett-Packard V2800 server, HP 300 Pentium Plus workstations, and Oracle 8.x RDBMS.

Until recently, datamining was an esoteric exercise practiced only by statisticians with strong mathematical backgrounds. The lack of specific business applications has also hindered the datamining market, according to analysts.

Today, however, as vendors attempt to simplify their tools and offer industry-specific vertical applications, datamining is a growing field. Meta Group, a consulting firm in Stamford, Conn., predicts revenue from new datamining tools and services will reach $8.4 billion in 2000, up from $3.3 billion in 1996.

Lessons learned
Senior executive support and total IT commitment are required for successful datamining projects.
Business issues drive project development.
Datamining often yields specific results rather than general rules.
User-friendly GUIs mask the complexity of systems.
Complex results are not always easily understood or interpreted.
Successful use requires statistical, database analyst, and business analyst skillsets.
Data quality is critical to results.
But, experts caution, don't be fooled by glitzy front ends and easy-to-decipher outputs. Dataminers still require knowledge about statistics and the mathematical algorithms, neural networks, decision trees, and data visualization techniques that enable the tools to detect relationships and patterns in the data they grab from databases. Most important, people have to understand why they're using them. "You need three skills: good statistical knowledge, relational database management skills, and business knowledge," says Jan Mrazek, chief datamining specialist with Bank of Montreal in Toronto, Ontario. "Whatever you deliver has to be actionable [to meet the goals and objectives of the business]."

Datamining software can also be costly. Datamining tools like Cognos's Scenario and Business Objects' BusinessMiner, which supplement those vendors' on-line analytical processing (OLAP) tools, range from $495 to $1,295 per user. But enterprise-wide software that runs on parallel servers and mainframes can cost from several thousand dollars into the millions for customized datamining systems that also require outside consultants. //

Emily Kay writes about technology as a principal with Choice Communications, an editorial consulting firm in Chelmsford, Mass.

Datamining product profiles

Intelligent Miner, DecisionEdge, Discovery Series
Armonk, N.Y.

Pricing: Intelligent Miner pricing is based on computing platform. For example, $25,000 for an AS/400 includes a complete set of functions. DecisionEdge starts at about $1.5 million, but varies according to customization level. Discovery Series ranges from $50,000 to $150,000, plus the cost of Intelligent Miner, which is required to run it.

Unique features: Intelligent Miner offers a complete set of datamining algorithms and data-preparation capabilities, as well as a high degree of scalability and parallelism. Customizable datamining apps are available for relationship marketing and fraud detection. Discovery Series is a prepackaged app for telco, banking, and insurance companies; DecisionEdge is a complete database marketing application for insurance, telco, and utilities companies.

Operating environments: IBM AIX, AS/400, O, and MVS/ESA servers; and Windows and NT clients.

Databases supported: DB/2 tables or flat files; through Intelligent Miner facilities, users can extract data from Oracle and Sybase.

Algorithms supported: Neural networks, decision trees, predictive modeling, association discovery, logistics and linear regression, sequential pattern discovery, and database segmentation.

Strengths: According to IBM, Intelligent Miner supports several different algorithms in one package and is scalable across large databases.

Weaknesses: Two-dimensional presentation functionality is improved over the last release, but Intelligent Miner still lacks 3-D visualization capabilities. It also requires users to devise their own methodologies for combining different variables. The software lacks good documentation, which is especially difficult since there are few skilled consultants trained on Intelligent Miner. When error messages pop up, for example, users have no way to know what the messages refer to, says Jan Mrazek, Bank of Montreal's chief datamining specialist. IBM notes servers are limited to IBM hardware, but that Windows NT and Solaris are coming in the future.

What users say: Bank of Montreal uses Intelligent Miner to analyze data on the bank's existing credit card, mortgage, and loan customers to depict behavior and to determine bankruptcy and delinquency risks. It also uses the product to uncover fraud and to predict which customers are most likely to leave the bank. Intelligent Miner is "a robust tool with great potential, but you have to learn all the tricks and there isn't much help, even in the consulting community," says Mrazek.

NeoVista Software
Decision Series
Cupertino, Calif.

Pricing: $50,000 to $250,000 for Decision Series.

Unique features: Decision Series is a high-end, enterprise-level set of tools built using object-oriented technology and designed to work in parallel, scalable environments. Retail Decision Suite (RDS) Profile and RDS Assort are retail industry applications.

Operating environments: HP/UX, Sun Solaris, Digital UNIX servers; and Windows NT and 95 clients.

Databases supported: ODBC-compliant databases and native connections to Oracle and Informix.

Algorithms supported: Neural networks, decision networks, decision trees, decision clusters, and association rules.

Strengths: Highly scalable for parallel environments. Strong services associated with the software.

Weaknesses: Runs only on UNIX servers. The company is taking steps to make its software easier to use, but successful use requires a statistician with quantitative analysis experience. "We unfortunately believe that datamining is a nontrivial subject," requiring specialized expertise, says John Harte, NeoVista chief executive.

What users say: NeoVista's services were important to Safeco, a $6 billion insurer in Seattle. "They're willing to do whatever it takes to make our project successful," says Doug Gillette, senior business systems analyst with Safeco.

SAS Institute
Enterprise Miner
Cary, N.C.

Pricing: From $80,000 to $160,000, based on server platform. The price includes five clients.

Unique features: SAS offers an end-to-end data warehouse, datamining, and report-writing solution for nonstatisticians.

Operating environments: Windows NT, Sun Solaris, IBM AIX, and HP/UX servers; and Windows NT and 95 clients.

Databases supported: DB/2, Excel, SAS Data Store, Informix, Oracle, Sybase, and others.

Algorithms supported: Decision trees, neural networks, classification and regression trees, and market basket analysis.

Strengths: Supports several major algorithms so users don't have to mix and match software from different vendors. Strong technical support.

Weaknesses: The beta version of the latest release of Enterprise Miner has some deficiencies, says Jianmin Liu, Bank of America's vice president of mortgage credit risk management. For example, it could not print a GUI-based graph comparing results from several different algorithms, including logistical regression, decision tree, and neural network.

What users say: As a beta test site for Enterprise Miner, Bank of America's mortgage division uses the software to determine and predict customers' mortgage payment patterns so it can forecast foreclosure, bankruptcy, and nondelinquency probabilities, and establish appropriate loan loss reserves. The variety of algorithms is helpful because the type required depends on the data set, data quality, and business need. "We use SAS to design the database and do the statistical model, and we can use all these different [algorithm] alternatives on the same platform so it reduces the hassle of data conversion," says Liu. "They put different alternatives into one package so I don't have to."

Silicon Graphics
Mountain View, Calif.

Pricing: Starts at $23,000 for one user.

Unique features: Decisions tables let users drill up and down into data for quick results.

Operating environments: SGI Irix 6.0 and higher.

Databases supported: Native support for Informix, Oracle, and Sybase RDBMSs.

Algorithms supported: Association rules, decision trees, evidence, option trees, regression, and clustering.

Strengths: Offers a wide variety of visualization techniques, including map and scatter plots, as well as the ability to create customized visuals. Takes advantage of SGI's scalable hardware and offers a user-friendly interface "designed for a non-statistician," says Dan Stevens, a scientist with Procter & Gamble, a 106,000-employee consumer products and healthcare manufacturer in Mason, Ohio.

Weaknesses: Runs only on SGI hardware, other packages have more robust statistical techniques, and it needs an industry-standard API, which the company is developing to support easier development of vertical applications. Built to present three-dimensional scatter plots, but for a simple two-dimensional scatter plot where the x-axis starts in the bottom left-hand corner, the numbers show up backwards or not at all.

What users say: The software helps his company "understand the data better at a faster rate than using traditional methods," says Afshin Goodarzi, managing director of Risk Monitors, a seven-person data modeling subsidiary of General Motors Acceptance Corp. in White Plains, N.Y. MineSet lets Goodarzi communicate visually with customers. "We use a two-pronged approach," says Goodarzi, whose unit models prepayment risk for the mortgage industry. "We're using it as our first line of data analysis and to communicate to customers why we came up with the results." Still, the software could use newer visualization techniques. "People out there are ready for more complicated plotting and visualization," Goodarzi notes. "I'd like to see more of that in MineSet."

SPSS Base, AnswerTree, Neural Connection
800-525-4980; 312-329-2400

Pricing: SBSS Base is $795 per user. AnswerTree is $995 per user. Neural Connection is $995 per user. Add-on modules such as advanced statistics and trends are $495 each per user.

Unique features: Companies can purchase datamining algorithms and modules as needed.

Operating environments: Windows95 and NT, UNIX, and Open VMS

Databases supported: ODBC-compliant databases

Algorithms supported: Decision trees, neural networks, answer trees, clustering, discriminate analysis, and logistic regression.

Strengths: Offers straightforward Windows-based interface to make it easy for marketing professionals to analyze data without having to build models or know about statistical analysis. The online help system assists users in determining which model to build and in interpreting the results.

Weaknesses: Desktop-only product, so scalability is limited. The company expects to provide a client/server product next year. SPSS could improve on some of its data management techniques, like the ability to manipulate and join files in complex merges and joins. Though easier to use than other software, SPSS still requires external consulting help.

What users say: SPSS's GUI-driven environment lets American Century Investments choose which data to analyze in the development phase, get it into the platform, and model it effectively "without writing a lot of code," says Stephen Cole, American Century's assistant director of database marketing and analysis. "It writes its own SQL query language in the background and users don't have to do that. It lets analysts be analysts and not programmers." Cole says companies are more apt to use SAS for complex merges and joins. "SPSS knows it doesn't have the capability that SAS has, but it's correcting that as we speak," he says.

Thinking Machines
Darwin, LoyaltyStream
Burlington, Mass.

Pricing: Starts at $50,000 for a server component, $995 for each Windows client, and $20,000 for each additional CPU. A typical four-CPU system is priced at $125,000. LoyaltyStream is a custom-built and custom-priced application.

Unique features: A high-end tool for large databases, Darwin uses a variety of datamining algorithms, operates on parallel-processing computers, and is targeted largely at telecom, financial, insurance, and database marketing companies. LoyaltyStream is an extension of Darwin, including the integration of business intelligence into real-time marketing and customer-interaction apps.

Operating environments: Sun Solaris, HP/UX, IBM AIX servers; and Windows95 and NT clients.

Databases supported: Ingres, Informix, Oracle, and other ODBC-compliant relational databases.

Algorithms supported: Neural networks, decision trees, and memory-based (k-nearest-neighbor) reasoning.

Strengths: A highly scalable system that can handle enormous databases.

Weaknesses: Thinking Machines does not yet offer a PC-based solution, but plans to by the end of this year. A business analyst with statistical background is required to use the system.

What users say: Ease of use should not be the ultimate goal of datamining tools, says Peter Milne, ACSys datamining program coordinator with CSIRO Sciences, an Australian research organization funded by the Australian government. Milne believes Darwin should retain its sophistication and complexity. "I don't believe it's a good idea to try and put ?sharp tools in the hands of children,?" he says.

0 Comments (click to add your comment)
Comment and Contribute


(Maximum characters: 1200). You have characters left.