The new face of data warehousing

As more and more companies begin data warehouse projects, IT departments are looking for product innovations that will make implementation better, faster, cheaper.
(Page 1 of 3)

In this article:
Data warehouse software use will grow
What the analysts say...
Evolving into knowledge management
The origins of Migration Architect
What would you rather move your family across country in, a beat up old rental trailer or a shiny new Peterbilt 18-wheeler? The obvious answer is the 18-wheeler, and that's exactly what the Automobile Club of Southern California opted for when it chose to implement a new datamart.

AAA decided in late 1997 to liberate the data buried in its two VSAM (virtual sequential access method) files--and over 200 transaction types--that contained its cashiering system records. And with records dating back to 1900, 40 million members throughout the United States and Canada today, and total annual revenues that amount to about $1.2 billion, an 18-wheeler was exactly what the automobile club needed for easier access to its data.

"We found social security numbers and alien ID numbers in the 'fishing' field," recalls Ralph Nordstrom, data warehouse architect for the Automobile Club of Southern California (ACSC is the largest U.S. affiliate of AAA). "The club used to sell fishing licenses. And, of course, it has always sold travelers checks." So when the federal government required notification of transactions worth $10,000 or more, the club decided to appropriate the then unused fishing license field to store the required social security number or alien ID information.


As data warehouses move from being storage facilities to monitoring the business, the market for data-warehouse software and tools will grow into the next century, according to GartnerGroup projections.
Trying to make sense out of fields that have taken on different meanings over the years is just one of the myriad data identification, reconciliation, and transformation problems that plague the early phase--or ETL (extract, transform, and load) tasks--of any data warehouse project. In response to these issues, today's data warehouse implementation teams are looking for tools that tend to be fast, industry-specific, function-specific, or a combination thereof.

ACSC's datamart works with sales data (mainframe point of sale [POS] combined with general ledger and cost systems) and is used to generate sales reports for local ACSC offices. Prior to implementation of the datamart, reports were put together manually and didn't arrive in the most timely fashion. ACSC's business goal is to get sales reports to branches fast.

Nordstrom, who is responsible for back-end data movement issues, remembers the painful 10 months of ACSC's first datamart project spent back and forth with Passport, an ETL (extract, transform, and load) tool from Carleton Corp. of Minnetonka, Minn., just trying to move the legacy data into a datamart. ACSC spent almost a year trying to set up extraction rules in Passport, but the source data was too "dirty" and convoluted to be implemented. At one point, Nordstrom's team had created a series of complex lookups that entailed traversing four or five large tables, all in an attempt to "fix" the data. Finally, the initial Passport project was killed--due to the combination of too many data sources (customers, cashiering/POS, employee, organization, and membership files) and the evolutionary use of fields (the fishing license story), in addition to other typical data quality problems--and ACSC began looking for a way to solve the data warehouse design and data quality beasts.

Don't call it artificial intelligence

What the analysts say...

The Standish Group, a Dennis, Mass.-based research firm, says that 15,000 data warehouse projects with budgets greater than $3 million will begin in 1999 at a cost of $95 billion. Of these projects, 88% will either overrun their budgets or fail.

Framingham, Mass.-based market research firm International Data Corp. expects firms will be spending $24 billion on the total data warehousing market by 2001.

Nordstrom credits colleague Francine Burick, the auto club's data warehouse coordinator in charge of the end-user information access side of things, with discovering Evoke Software Corp.'s Migration Architect at a data warehousing conference. Typical of today's data warehousing tools, Evoke does upfront profiling and mapping using data design rules that Carleton's Passport can't do. It adds value because of the parsing it can do to "guess" dependencies and similar fields.

Formerly DBStar, Evoke's Migration Architect is a process whereby the content and structure of legacy data sources are examined and understood in detail and mapping specs are produced (see textbox, "The origins of Migration Architect"). Migration Architect consists of six sequential steps, three for data profiling and three for data mapping. The resulting transformation maps can be used in conjunction with "traditional" ETL tools like Passport, which was ACSC's approach, since it already had Passport in house and was familiar with it. So instead of a group of people getting together and trying to figure out the content and structure of legacy data by looking at COBOL copy books, outdated documentation, comments in code, and sample data itself, Migration Architect profiles data in three dimensions (down columns, across rows, and across tables). Redundancies (duplicate copies of addresses, etc.) are usually found in the across tables. Migration Architect uses output of the first three steps to build normalized (3NF) data models. A widely accepted industry standard for "good enough" relational design, 3NF avoids update and deletion "anomalies" associated with poor design.



Page 1 of 3

 
1 2 3
Next Page





0 Comments (click to add your comment)
Comment and Contribute

 


(Maximum characters: 1200). You have characters left.