Download the authoritative guide: Cloud Computing 2019: Using the Cloud for Competitive Advantage
|In this article:|
|AT A GLANCE: Monsanto
|ReliaStar: Scores of data sources|
|Comcast: Accessing financial data|
|The bottom line?|
Make no bones about it, data warehouses and datamarts are going mainstream. They are being built to serve increasing numbers of sophisticated, demanding end users who typically know how to drill down into details and perform roll-up summaries at varying levels of granularity. This new generation of data warehouse users are comfortable dealing with multidimensional data cubes and even performing--imagine this--cross tabs! They know enough about push and pull to expect up-to-date data via the Web.
Improved data warehousing tools allow Bipin Junnarkar, a Monsanto senior director, to focus "more on the analytical and less on the data-gathering."
Photo by Greg Kiger
But it's not just end users who are more sophisticated. Organizations are working smarter, too. Heeding past lessons of doomed, multiyear attempts to create The Ultimate Enterprise Data Warehouse--a daunting task often intertwined with reengineering and information architecture business-process issues--most organizations are opting for a phased approach to building their data warehouses.
One such organization is Monsanto Co., headquartered in St. Louis. Last September, Monsanto spun off its $4 billion chemical business into a new company called Solutia Inc., leaving the $6 billion Monsanto to concentrate its efforts on life sciences. Robert B. Shapiro, chairman and chief executive officer of Monsanto, has a vision for the company based on the belief that the evolution of science and technology is going to drive the marketplace to a much closer integration of agriculture, food, nutrition, and health.
Consequently, it has been a busy year for Bipin Junnarkar, Monsanto's senior director of Knowledge Management Architecture (KMA). Junnarkar is responsible not only for making sure the data flows among Monsanto's five business sectors (Agricultural, Nutrition & Consumer Products, Pharmaceutical, Health & Wellness, and Sustainable Development), but also for setting up a knowledge management architecture that will help institutionalize the discovery and knowledge-sharing processes.
Junnarkar sees the dynamics of the new Monsanto life sciences enterprise as having a lot in common with both software and biotech firms. The upside potential is enormous, but the marketplace generally only rewards the nimblest players. Hence the need for innovative uses of information technology and knowledge management in general.
Not surprisingly, data warehouses play a fundamental role in Junnarkar's vision of Monsanto's knowledge management architecture. In fact, during the past two and a half years, Monsanto's KMA has built five different enterprisewide data warehouses and built a couple business unit-specific data warehouses. The first one, which has been operational since December 1995, was a financial warehouse. It's still relatively small (about 10 gigabtyes), and it's only updated once a month after the monthly closings. Based on the success of that first data warehouse, which provided executives, financial managers, and others with almost instant access to income statements, balance sheets, and budgets, Monsanto has built another data warehouse for procurement, one stocked with logistics and inventory information, and two business unit-specific data warehouses containing regional sales and marketing data.
Junnarkar recalls April 1995 when he received the mandate to create that first data warehouse. He and his team looked at three OLAP (on-line analytical processing) vendors: MicroStrategy, the Stanford Group (the company that created MetaCube, which was later acquired by Informix), and IRI (Oracle subsequently acquired IRI and renamed its IRI Express product Oracle Express). Each vendor was given a data model and about a month to come up with a solution.
The company chose MicroStrategy's solution. Today, Monsanto uses MicroStrategy's DSS Server, DSS Web, and DSS Agent to create, distribute, and provide access to the data warehouse. Hundreds of executives, financial managers and support staff access information ranging from income statements to the status of a customer's order. Some end users use DSS Web via their browser; others use the original "fat client" DSS Agent. All of them benefit, however, from being able to focus "more on the analytical and less on the data-gathering" aspects of their jobs, says Junnarkar.
Monsanto's Data Architecture team (which includes Monsanto CIO Dr. Patrick Fortune) aren't resting on their laurels, despite the fact that CFO Robert Hoffman now strongly values data warehousing. Monsanto, which is in the process of converting to SAP, plans to warehouse some of the SAP data and gradually implement more supply-chain management into its IT systems. Monsanto currently warehouses a variety of data streams from partners and external data sources, and in the future, it will share some of its warehoused data with partners. Junnarkar is also aware of the challenge of integrating unstructured data to Monsanto's warehouses, and that's one reason he's happy with MicroStrategy--its flexibility. The other plus: It's scalable.
To provide a broader view of how leading-edge IT organizations are scaling data warehouses, Datamation presents the following case studies.
Ochsner Health Plan: The phased approach
Award-winning Ochsner Health Plan, headquartered in New Orleans, faced challenges that were similar to Monsanto. With multiple data sources that too often resulted in reports that simply "didn't foot," according to Estal Fain, vice president and CIO of the 175,000 member HMO, Ochsner decided last June to take a phased approach to building its data warehouse. By November, at the end of Phase I, Ochsner offered 35 reports that had previously been only available directly from Amisys, a popular managed healthcare package running on Ochsner's HP 3000.
The data warehouse itself is an Oracle8 database running on a Sequent NUMA-Q 2000 configured with a single quad of four processors and a 200G storage system. Fain says Ochsner considered other hardware (including Sun Microsystems' SMP-based UE 10000) and other software (including database and warehousing technologies from Sybase and Informix), but is pleased with his choices. Part of Oracle's appeal was the availability of Oracle's Express OLAP server. Currently being deployed as part of Phase II, scheduled for completion this year, Oracle Express will give end users more flexibility to analyze metrics of interest to HMOs such as patterns of practice (given a medical episode, what are the standard sets of services associated with it?), episodes of care, and counts and amounts.
Ochsner, like virtually every other company that has built a data warehouse or datamart, found that some of its most fundamental concepts (member months, for example) were defined in different ways by different systems. "We had seven different ways to calculate member months," recalls Fain ruefully. Dirty data, along with staffing issues (especially cross-training current staff for the new technologies) have been Fain's biggest challenges. In fact, as a transitional strategy, Fain hired contract consultants to install and set up the basic Oracle warehouse.
Overall, though, Ochsner's data warehouse, already being used by dozens of employees in finance, medical management, and sales, has been hugely successful. Fain attributes part of Ochsner's success to its incremental approach to building the data warehouse. Phases III and IV will integrate more of Ochsner's financials and sales and marketing functions into the data warehouse. Beyond that, Fain envisions rolling provider relations and call-center data into the data warehouse. Ochsner's incremental plans for growth seem to be the formula for a winning strategy.
ReliaStar: Scores of data sources
ReliaStar's data warehouses represent another impressive success story, especially when you consider the fact that its application development and data utility teams have to deal with between 70 and 120 discrete data sources for each of the data warehouses they build.
ReliaStar Financial Corp., a Minneapolis-based financial services company with more than $20 billion in assets under management, is a family of companies that offer a variety of products: individual life insurance and annuities, employee benefits, retirement plans, life and health reinsurance, mutual funds, residential mortgages, and personal finance education. As a growing company that has acquired many smaller firms--five in the last two years alone--it's no surprise that it has scores of data sources.
That, along with substantial in-house familiarity with SAS--especially its ability to work with just about any data source under the sun--led Data Utility Strategist Fen Hiew and his team to choose SAS Warehouse Administrator as their primary tool for building their data warehouses. SAS Warehouse Administrator edged out a half dozen other contenders including Carleton Passport, Prism, ETI, and Informatica. Hiew liked the fact that the SAS tool employs a virtual warehouse model, but admits that he has had to create some "real" cubes and staged marts to deliver performance users demand.
ReliaStar started building its first operational data store (ODS) that houses their client and contract information in late 1994, and ultimately delivered Phase I of the data store in conjunction with its Customer Service Workbench application in mid-1996. Rather than do everything in house, ReliaStar sought external expertise for the data extraction and cleansing procedure from Cambridge Technology Partners. Today, Phase II of the ODS collects and consolidates data on individual clients and their respective contract information from more than 120 data sources and is used by more than 100 analysts and customer service representatives at ReliaStar. In the last year, Hiew and his group have also responded to requests from senior management and business areas such as marketing, sales management, internal audit, and field technology to build subject area datamarts by connecting them from the client and contract ODS and other subject area data stores. These datamarts are housed in a Sybase database, and several dozen analysts use Cognos business intelligence tools or Brio Query tools to slice and dice the data.
ReliaStar is a complex organization, so one of Hiew's main roles now is to serve as a resource for other divisions as they begin to build their own data warehouses and marts. Hiew is beginning to publish repository data models and templates, along with descriptions of successful data development processes. He plans to step up the pace of his team's own internal marketing and education and continue to publish ReliaStar data warehousing "best practices" for use across the enterprise.
On the datamining front, Joel Portice, director of the firm's Integrity Plus division, opted for IBM's fraud management package as the basis for a data warehouse for detecting false insurance claims. By going the pre-packaged route, Portice estimates he saved about six months of development time and $500,000 cash.
Comcast: Accessing financial data
It doesn't take a mental giant to realize how brutally competitive the cable and telecommunications industries are. But if you're sitting in Jim Scott's seat, you'd probably be more aware of the complexity of the information net than anything else. Scott is applications manager at Comcast, a diversified global leader in entertainment services and telecommunications.
Headquartered in Philadelphia, Comcast is involved not only in wired (including cable) and wireless telecommunciations, but also content through principal ownership of the QVC television shopping channel and C3 (Comcast Content & Communications) and a controlling interest in the E! Entertainment channel
No, Comcast didn't need a tool to give its 10 million customers access to its data warehouse, just several hundred staffers spread across 70 sites throughout North America. Of particular interest was access to financial data, including open purchase orders and capital information--data that was buried in a variety of data stores including Oracle Financials. A mainframe-based reporting system routinely delivered some 70 different reports to the remote sites using a bursting mechanism so that each office only received relevant data, and Comcast could have used Oracle's SQLPlus programming to tackle access to the Oracle Financials data, but a warehousing solution seemed smarter and easier.
With the help of outside consultants who reviewed the current and potential reporting systems, Comcast finally settled on Brio Software's suite of tools in lieu of competitors Business Objects and Cognos. Comcast wanted tools that offered IT low administration time and end users both query and analysis capabilities. According to Scott, users find Brio extremely easy to use. A few have requested range-based grouping, but he is confident that Brio will deliver that in a forthcoming version or point release. "Brio has been very responsive," he observes.
Scott likes the fact that Brio's Web-based distribution system is basically zero maintenance and distributes data cubes to end users, employing good compression algorithms and on-demand reconnection. Scott and his staff of four are already delivering capital expenditure, general ledger, purchasing, and invoice information to more than a hundred end users and are in the process of scaling up their data warehouse by adding work-order history, channel information, and operational statistics. Comcast's change-management team, which meets every other week, helps prioritize demands for additional data access.
The bottom line?
The old vision of building a single, monolithic, dump-and-run style data warehouse just doesn't cut it in today's environment. Users are savvier and more demanding than ever. Mergers, acquisitions, and downsizing both dictate the need for a flexible architecture that can both be scaled and quickly re-architected to meet changing (hint: growing) demands.
Smart organizations are consolidating not only their data, but also their expertise in the form of internally published best practices. They're using contract help or consultants to get them off to a running start. Almost all are deploying data over the Web and letting end users (which typically include everyone from executives and MBAs to staffers in sales and marketing and customer service) choose from a variety of tools--avoiding a one-tool-fits-all mentality. //
Karen Watterson is an independent San Diego-based writer and consultant specializing in database design and data warehousing issues. She has written several books, including Visual Basic Database Programming and Client/Server Technology for Managers.
Brio Enterprise Server 5.5
Palo Alto, Calif.
Category: OLAP server, clients
Pricing: Brio Enterprise Server $19,950
Options: BrioQuery Designer $2995/seat; BrioQuery Explorer $795/seat; BrioQuery Navigator $795/seat. Plug-ins: Brio.Insight $395/seat; Brio.Quickview $49/seat.
Server operating environments:Windows NT, Solaris, HP-UX, AIX.
BrioQuery operating environments: Mac, Windows 3.1, 95, NT
Good points:"Zero admin" download of plug-ins. French, Italian, and Kanji versions available. Supports dynamic on-demand and broadcast (off-line) reports.
Bad points:No thin client support. No support for OLAP calculation engine or hypercube on middle tier. Weak administration tools.
Cognos PowerPlay 5.x
Category:OLAP server, client
Server pricing:PowerPlay 5.0 $4,995 (NT), $14,995 (UNIX).
Server operating environments: NT, Solaris, HP-UX, AIX
Client operating environments: Windows 3.1, 95, NT
Good points: Large installed base of PowerPlay desktop OLAP (and Impromptu client query and reporting) tools. PowerPlay can be used in reporter or explorer modes.
Bad points:Slow to market with new features.
DSS Server, DSS Web 5.0, DSS Agent 5.1
Category:ROLAP server, clients
Server pricing: DSS Server 5.0 $37,500 for 20 named users; DSS Web 5.0 (requires DSS Server 5.0) $17,500 for five users; DSS Architect 5.0 $3,995 (one copy required for DSS Server or DSS Web); DSS Administrator (optional) $35,000.
Client pricing: DSS Agent 5.1 (original "fat" client) $995/named user; DSS Objects 5.0 (OLE API) $595/named user.
Server operating environments:DSS Server and DSS Web: Windows NT 4.0; DSS Agent: Windows 3.1, 95, NT, OS/2.
Good points: Local- and server-caching options. DSS Agent supports auto-export to Access. Good report wizards both for DSS Web clients and DSS Agent clients. CEO Michael Saylor, often cited as "visionary," is an excellent promoter.
Bad points:DSS Web requires ActiveX browser for best performance (HTML display of grids and graphs is less functional), and it is an expensive, complex set of tools.
Oracle Express Server 6.x
Redwood Shores, Calif.
Category: OLAP server, clients
Pricing:$3995/concurrent user (includes Express Web Agent)
Options: Personal Express $695; Express Analyzer $595; Express Objects $3995/developer; Express Financial Analyzer $3995/concurrent user; Express Sales Analyzer $4995/concurrent user.
Express server operating environments: Windows NT, AIX, Solaris, HP-UX
Good points: High-end OLAP server that can be used to develop customized decision-support applications. Convenient off-the-shelf financial and sales analyzer options for end users.
Bad points:Complex and proprietary (requires programming with Express language). Only slowly becoming better intregrated with Oracle family of tools following purchase from IRI Software in 1995.
Category: Data warehouse design, extraction, transformation, and maintenance
Operating environments: UNIX workstations and servers (Solaris, HP-UX, AIX, Digital UNIX), and PCs (Windows95, OS/2, Windows NT).
Good points: Can be used to access and integrate scores (50+) of different file types, which is probably more legacy file support than is offered by any other product. The product is tightly integrated with SAS end-user tools, including the new SAS Miner. The Scheduler API lets programmers write to operating system schedulers (e.g. CRON, NT Agent), to schedule jobs for automatic execution.
Bad points: Creates a "virtual" warehouse, not a physical data store that contains data. Job dependency related to scheduling; the data warehouse refreshes are slow.