Monday, May 20, 2024

Virtues of a virtual data warehouse

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Traditional data warehouses vs. virtual data warehouses


provides a central repository for information

requires the development of metadata definitions

provides data cleansing facilities


uses middleware as data hubs, allowing access to the corporate data stored in heterogeneous data sources

uses middleware to build direct connections among disparate applications

relies on the creation of an independent metadata definition of the corporate data

accesses only data in its raw form unless data access middleware is used to populate OLAP databases directly

One of the key factors governing a company’s business success moving into the 21st century is its ability to continue disseminating useful information to decision makers throughout the organization. Today, most large companies already have in place e-mail and collaborative computing systems, such as IBM Corp.’s Lotus Notes, which distribute all kinds of messages and documents throughout the enterprise. In fact, this technology is so widely used that many of us are overwhelmed with information, most of which does little to help us make decisions.
To achieve success in this new millennium, companies must begin to provide true business intelligence (BI). According to market-research firm the Gartner Group Inc. in Stamford, Conn., BI comprises systems that enable a business professional to access the information that describes the enterprise, to analyze it, to gain insight into its workings, and to take action based on its findings, via integration with other office functions. BI systems also present this information in an easy-to-digest form that facilitates effective decision making.

Recently, there has been movement toward virtual data warehouses, which has implications for both information dissemination and improved decision making. Virtual data warehouses allow users to distill the most important pieces of data from disparate legacy applications, without the time, expense, and risk to data required by traditional data warehousing.

Now, as more companies are using Web architectures as the backbone for their enterprise networks, they are moving back toward developing their own information-presentation applications. Many organizations are building or at least considering enterprise portals. An enterprise portal is a single information gateway, typically browser-based, that can be used to navigate and examine both internal and external data, via the Web, and that can have information pushed to it on a regular basis.

While the applications that fit into the portal model are still being explored, it is very likely that a BI application will work very well with this technology. For example, a single enterprise portal could be used to push key business performance information to a firm’s management, while at the same time pushing the latest share price of key competitors, along with any related published information from Web sites around the world.

Most of the major BI vendors have enabled their products to be deployed and used over a Web-based architecture. Indeed, they provide a quick solution for deploying Web-based BI applications that are easy to implement. The Web-based development environments and growing strength of eXtensible Markup Language (XML)–the standard Web-based data-manipulation language–make the development and deployment of custom-built applications feasible for most organizations. Many companies are finding that a continuous program of rapidly deploying specific business information monitoring applications can provide key data to allow the company to respond to competitive pressures or market needs.

But first you have to get the data

Whether you are focusing on custom-built, Web-based business information systems or packaged online analytic processing (OLAP) tools, the fundamental requirement is to have access to the corporate data to populate these applications. A data warehouse provides an effective method for extending legacy assets while leveraging today’s information distribution technology.

Most enterprises employ a wide array of heterogeneous systems, frequently using various databases and file systems. More often than not, in order to provide meaningful business intelligence, data must be mined from more than one of these application data sources. Also, the core business transaction information and the business history is often maintained and stored by legacy applications using legacy data structures.

Because data residing with multiple sources must often be combined for effective decision making, many companies have implemented data warehouses. This creates essentially a hub-and-spoke model whereby each enterprise application sends information to the warehouse, where it can then be accessed throughout the enterprise.

Data warehousing provides many benefits to the process of disseminating business intelligence into the enterprise. First, by acting as a single repository for information from many applications, a data warehouse removes the burden of disparate data access from the BI application. Data is presented to the applications in an easy-to-access, typically relational database format. Also, the act of populating the data warehouse provides the opportunity to cleanse the data. In other words, before the data is put in the warehouse, it can be checked and altered to better suit the intended use. This data cleansing function is not an inherent feature of a data warehouse, but a facility that is provided by most tools used to populate data warehouses.

Moreover, the installation of a data warehouse requires the development of metadata, which is used to describe the data. These metadata definitions are essential to making the data useful to the general user community.

But data warehousing is no easy task. Building a data warehouse can be extremely expensive. And once built, data warehouse systems can be complex to manage and maintain.

There is another way: the middleware approach

The great strides made in the area of enterprise middleware now provide an interesting alternative to traditional data warehousing. Depending on your company’s requirements, middleware can act as data hubs, allowing access to the corporate data stored in heterogeneous data sources. Whereas a traditional data warehouse provides a central repository for information, a virtual data warehouse uses middleware to build direct connections among disparate applications. This virtual approach requires less time and expense to develop, and entails less risk of data being lost.

Like data warehousing, this middleware approach to direct data access relies on the creation of an independent metadata definition of the corporate data and, therefore, provides the same ease-of-use advantages. Layering data access middleware over the corporate data allows you to create a virtual data warehouse, providing access to information without the complexity of building a traditional data warehouse system.

However, data access middleware generally will give you access only to data in its raw form. It doesn’t provide the data cleansing that is a major benefit of building an actual data warehouse. This can be a significant disadvantage to the virtual data warehouse, but the importance of it depends greatly on the BI application being deployed.

Using data access middleware to populate OLAP databases directly means that you can use the functionality in the OLAP model to do data cleansing. In addition, with the creation of intranet-based business information portals, it is likely that these applications will have a much more static, less ad hoc query requirement.

In other words, users of these Web-based functions will have a fixed view of the data–but the data must be current. The wide user population is going to be less likely to generate continuously new or changing queries on the corporate data. MIS departments will be asked to create the necessary query models, such as whether customer locations are defined by city, state, or both, by rapidly enhancing the functionality of the enterprise business intelligence portal. Given this model for disseminating data, the middleware data access approach can frequently provide the best solution. Importantly, it reduces the requirement for the data to be massaged before it can be used.

Is access to the data enough?

The concept of enterprisewide business intelligence took root with the creation of Executive Information Systems or Enterprise Information Systems in the early 1990s. Many early adopters of this concept attempted to utilize the new graphical presentation technologies and to build their own specific applications to meet this need.

Soon, this build-it-yourself approach was superseded by a wide variety of specialized “packaged” applications that quickly dominated the market. These applications all make use of the graphical user interface (GUI) desktop environment to deliver the information to the user. The applications range from the ad hoc user-defined reporting tools, such as Seagate Crystal Reports from the Seagate Software subsidiary of Seagate Technology Inc., Cognos Impromptu from Cognos Inc., and IQ Objects from Sterling Software Inc., to the more sophisticated OLAP tools, such as Cognos PowerPlay, Seagate Holos, or SRC Software Inc.’s Information Advisor.

If we look to the future of business intelligence applications as intranet-based solutions that can be rapidly developed and updated to provide corporate information to a large community of users, then the architecture of these applications is going to be different from the business intelligence tools of the past. Most BI tools on the market today were initially developed as PC-based, “thick” client applications. If the future framework for BI applications is the intranet, which is a thin-client architecture, then this old design model doesn’t fit. Today’s BI applications need to be built as browser-based, distributed applications.

Given this model, it is feasible that new BI applications may not just be portals onto pure data but will require access to the business logic and processes that create or manipulate the data. In other words, business intelligence is an extension of the existing application infrastructure in the organization rather than simply a discrete application layer processing pure data. If today’s BI applications are to fulfill this broader role, then they must become applications that comply with the application infrastructure being created within the organization.

This approach centers once again on the middleware layer. Application infrastructures built around a middleware application server provide access not only to data but also to application services. It is reasonable to expect BI applications to increasingly fit into this enterprise information systems architecture, once again removing the need for intermediate data warehousing.

One issue with this model is the old problem that much of the data–and now the business logic–exists in old legacy applications. However, this problem is being resolved by legacy extension and integration tools that allow access to legacy data and also enable legacy application services to be packaged and made available as part of a standard application infrastructure environment.

Using middleware layers to look at the total enterprise data set as a virtual data warehouse doesn’t remove the need for classic data warehousing. In many large organizations, data volumes are so enormous that isolating information in a dedicated warehouse is the only option. Indeed, the virtual data warehouse simply augments traditional data warehouses. It provides a rapid solution to disseminating information throughout the organization. Using the latest middleware technology, Web-based applications can be built to easily sit on top of the existing corporate infrastructure and have complete access to the information locked up inside these applications.

As such, virtual data warehouses provide a key to sharing the right information with the right people, all as directly as possible. By extending legacy assets and providing access to Web-based, distributed applications, virtual data warehousing allows users to leverage the information they have spent years developing with the strongest decision-making technologies of today. //

Paul Holland is chief executive officer of Transoft Inc., an Atlanta-based provider of data connectivity middleware and application integration products. He has worked in the legacy extension market for 13 years, with such clients as The Home Depot Inc., Genuine Parts Corp., and Georgia Pacific Corp. Holland began his career in the United Kingdom and has headed Transoft’s U.S. operations since 1996.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles