SHARE

Distributed databases, distributed headaches

Chuck Shellhouse knows really big databases. Shellhouse, the Colorado Springs-based director of service operations in the information technology division of MCI, is responsible for managing more than 40 terabytes of data located in datacenters around the country. Those databases, which primarily run DB2 and Adabas on IBM and Hitachi mainframes, contain all of the business […]

Written By

Karen D. Schwartz

Jun 1, 1998

13 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Chuck Shellhouse knows really big databases. Shellhouse, the Colorado Springs-based director of service operations in the information technology division of MCI, is responsible for managing more than 40 terabytes of data located in datacenters around the country. Those databases, which primarily run DB2 and Adabas on IBM and Hitachi mainframes, contain all of the business applications needed to run the telecommunications company’s entire $20+ billion revenue stream.

“In a typical datacenter in the old days, the technical-support people could see and touch the hardware. That isn’t the case anymore,” says Chuck Shellhouse, MCI’s director of service operations, IT division.
Photo: Steve Starr/SABA

Even though MCI’s database dwarfs the databases of most corporations, MCI’s computing model has become increasingly common. With companies generating and keeping more data to make better decisions on how to make money, many organizations now rely on the model of geographically dispersed, multiterabyte databases.

But today’s forward-thinking companies are changing the definition of distributed computing. No longer are they managing hundreds of distributed environments running small servers. By and large, they’ve found such set-ups, which involve installing software and backing up data at each location, to be time-consuming and expensive.

Instead, these companies have consolidated their data into just a few datacenters, housing much larger amounts of data at each center. At MCI, Shellhouse and his staff used to manage numerous datacenters in many locations around the country. But with managerial problems and costs spiraling–the datacenters required on-site support personnel, operational personnel, and systems programmers at each location–Shellhouse and his team devised a plan to replace those datacenters with “megacenters” on the backbone of MCI’s network. Today, the company has just four datacenters. They operate in an automated, lights-out environment.

Finding profits in data

MCI’s progression toward its current four large megacenters mirrors that of many Fortune 1,000 companies. Cost is a primary reason.

Mentis recently surveyed U.S. banks and thrift institutions that have at least one currently operational data warehouse (or plans, with funding, for implementing a data warehouse) and asked each one to specify the expected size of its primary data warehouse database (including all data, indexes, and metadata). Nearly a third of the databases were projected to be 500 gigabytes or more.
Source: Mentis

Among other companies consolidating datacenters are Chase Manhattan and Norcross, Ga.-based CheckFree. Banc One, based in Columbus, Ohio, has gone through a major transition from decentralization to centralization, Citibank is migrating from multiple data networks to a common network, and BankBoston has consolidated from two datacenters to one.

Consolidating a dozen datacenters into a few makes a lot of sense for most large companies, says Daniel Graham, a Somers, N.Y.-based strategy and operations executive in IBM’s Global Business Intelligence Solutions group.

“Having [distributed datacenters] is like having children. Two are wonderful, four are a lot of fun, six start to be a drain on your resources, and 20 are impossible,” Graham says. “Every time you have another child, you have bought yourself a certain amount of overhead cost.”

Driving this exponential growth in database size is marketing. Companies are consolidating internal databases, purchasing additional market-research data, and holding onto data longer in efforts to better focus their marketing efforts.

“The more behavioral characteristics companies can analyze about their customers, the better they can serve them,” says IBM’s Graham. “If I can start tracking more about Dan Graham’s purchasing behavior, the kinds of things he likes and doesn’t like, the stores he frequents, and the kind of clothes he buys, I can start using the company’s datamining tools and data warehouses to target-market to him. I’d send him a lot less junk mail, but send him the kind of deals he cares about.”

As a result of efforts such as these, MCI’s 40-terabyte database won’t be considered unusually large for very long, experts believe. Today, many Fortune 1,000 companies handle 300 to 500 gigabytes of raw data, which translates to about 1.5 terabytes of DASD, and the average database size is expected to more than triple in the next few years (see chart, “Coming soon to a database near you: Godzilla!“).

Inadequate tools

While building very large, geographically dispersed databases containing just the kind of information needed to focus a company’s marketing is a great concept, many organizations have had trouble finding adequate tools to manage such behemoths. Many IT departments have found that the tools necessary to manage these databases are inadequate, immature, and in some cases nonexistent.

“Having [distributed datacenters] is like having children. Two are wonderful, four are a lot of fun, six start to be a drain on your resources, and 20 are impossible.”
–Daniel Graham, IBM ‘s Global Business Intelligence Solutions group

It’s quite common to hear IT managers complain about the lack of adequate tools. Jim Johnson, New York City-based chief technologist and system architect in the Human Resources group at Xerox, says he prefers to use off-the-shelf tools whenever possible, but has resorted to writing his own tools for some tasks. Johnson is in charge of Xerox’s 120 gigabytes of human resources and personnel data, which is housed on eight Oracle servers distributed around the country.

Johnson uses BMC Patrol to monitor the system’s hardware, operating system, and Oracle database, but has written his own tools to perform tasks Patrol does not perform. For example, Johnson’s team wrote a distributed load process to load large data feeds from external sources into each of its eight servers, which guarantees that they are always in sync.

“In some cases, we preferred not to use commercial tools, because we had specific needs and because we felt we could keep a better handle on things that way,” Johnson says. “I always prefer to buy off-the-shelf tools rather than write things in-house, but in this case, it just wasn’t possible.”

Graham of IBM agrees that effective tools to manage large, geographically dispersed databases are scarce. There are tools available to do some of the necessary tasks, such as transferring data from one server to another, but there’s no tool, for example, that coordinates all of the destinations for the data and synchronizes schedules in a way that recognizes all other servers on the network. In addition, there is no single tool or collection of products that provides a complete solution for managing large and geographically dispersed databases.

Shellhouse and his MCI team gave up in frustration after trying to find the tools they needed. “The tools out there today either didn’t fit our needs, or they couldn’t handle our volumes,” he says. What he needed, he says, simply doesn’t exist today. Shellhouse and his team are working to build a database-management tool that can manage large, geographically dispersed databases effortlessly, dealing with alarms, recoveries, and backups.

New toys, new challenges

Lack of tools aside, there are other pressing issues and challenges in managing large, geographically dispersed databases. In order to keep the data on his servers in sync, Xerox’s Johnson has implemented a series of checks and balances to make sure the data is correctly loaded on all servers and that nobody accesses the data until it’s all in sync. This stressful nightly dance promises to become even more complicated as the system loads more data from external systems, and as the database grows to more than double its current size within a few years, he says.

Lessons Learned

	Move from dozens or even hundreds of distributed computing sites to a handful of automated, lights-out database facilities to generate significant savings.
	Give your vendors and suppliers the flexibility to customize your applications.
	Understand that moving from a distributed environment with hundreds of locations to one with three or four lights-out locations is a big change. Give your employees time to get used to the change.
	Recognize what you need skills-wise, and hire for the future, not the past. The skills needed to manage datacenters remotely are different from those required to manage older style datacenters. This may result in the need to hire systems administrators with different skills than in the past.
	If you can’t find tools to meet your needs, consider modifying an off-the-shelf tool or even writing your own. Tools to manage large, geographically dispersed databases are still immature.

Today’s dispersed, hands-off databases present challenges that did not exist even a few years ago. Managing large amounts of data remotely is a culture unto itself, and takes special skills, MCI’s Shellhouse says.

“In a typical datacenter in the old days, the technical-support people could see and touch the hardware. That isn’t the case anymore,” Shellhouse says, because of the lights-out nature of his datacenters. “The biggest challenge we have is keeping our technical people current with the changing technology when they don’t have a lab where they can see it and play with it. And they have to manage new databases without hands-on experience. It is difficult to refresh and train your personnel when they have never had the opportunity to see this stuff first-hand.”

Reducing head count

Consolidation of large databases often entails new investments in hardware, software, and network infrastructure, but it can improve the bottom line by reducing personnel costs.

“We’ve found that the cost actually goes down as the databases get larger because of centralization and consolidation, which gives us economies of scale,” says Shellhouse. “Before we moved to our megacenter concept, we had people at each location responsible for the day-to-day operations and technical support. There may have been only 10 terabytes of data in those days and 100 people. Today, we have 40 terabytes of DASD that is more centrally located and one-fourth the number of people. We’ve seen our headcount go down consistently year after year, yet our database growth rate has been in the 30% to 40% range.”

As IBM’s Graham puts it: “Every time you put a distributed database out there, you have just bought at least one systems programmer. You are replicating skills you have at your central hub.”

Johnson’s experience at Xerox is that the size of the database doesn’t have much impact on maintenance costs. The real cost for Xerox is the price of each server the system adds. To keep costs down, Johnson tries to keep each of its eight production servers as similar to the others as possible so that the staff doesn’t have to manage several different server configurations.

The Internet: a new paradigm

Database experts have seen the future of distributed computing, and it is the Internet. The Internet provides IT managers with an easier mechanism for distributing data to end users. By simplifying and consolidating on one universal client, they can contact their customers and work with their business partners much more easily.

The Internet changes the whole paradigm of distributed computing, says Carl Olofson, research director for database management systems at International Data Corp., the Framingham, Mass., consulting firm.

“Ultimately, instead of an organization having a fixed topology of networks that have to be connected together, they can employ a much more flexible scheme using the Internet instead of allowing regional offices to connect through their system,” Olofson says. In addition, the Internet enables companies to connect to each other and create virtual enterprises, he notes.

Olofson says security, Java standards, and other issues are temporarily preventing the Internet from becoming the principal backbone for most organizations. But once those issues are resolved, companies will experience dramatic changes in the way their databases are used.

The Internet will make it simpler for organizations to centralize the management of geographically distributed databases and organizations, says Ken Jacobs, Oracle’s vice president of data server marketing. “It will be extremely easy to consolidate your data into a central server and provide it to users anywhere. And you can do that in a way that preserves the integrity of the data, security, and transaction semantics of the data. The economics become compelling to move toward consolidation.”

Although the Internet has not yet made a big impact on the way MCI manages its large distributed databases, Shellhouse expects that to change soon. Eventually, MCI customers will be able to accomplish a variety of tasks by accessing the company’s databases via the Internet. They’ll be able to obtain billing information and change the format of invoices and the way their calls are routed. Some consumer customers already can access their MCI accounts via the Internet.

Cost may be the biggest reason for companies to make their large distributed databases available via the Internet.

Do you plan to consolidate your company’s distributed datacenters? E-mail us and tell us what motivated your decision.

“A lot of organizations are interested in recentralizing the IT management operations because database administrators are so expensive,” says IDC’s Olofson. “Rather than have a database administrator for each regional office, you can have one DBA team in the central office that can manage all the regional databases. The Internet will move that paradigm along because in the world of the Internet, physical location is more or less irrelevant.” //

Karen D. Schwartz is a freelance business and technology writer based in Washington, D.C.

Banks scramble to de-Balkanize their information

Unlike pharmaceutical companies or manufacturing organizations, financial firms don’t sell tangible products. In the financial industry, information is the name of the game, and the processing of information helps these institutions, perhaps more than any other, run their businesses efficiently and profitably.

Banking and investment houses came relatively late to the game of distributed computing. Because each segment of a financial institution traditionally has run independently, and because the need for security is so important, many financial databases were created as stand-alone applications, creating numerous islands of information. Eventually, with the advent of client/server computing, trading desks and other financial areas started building their own systems, but many financial institutions are still coping with the legacy of the Balkanization of information.

“Data warehousing in the financial services industry tends to be a somewhat more complex process than in other industries,” says Mary Knox, research manager at Durham, N.C.-based Mentis Corp. (http://www.mentis.com), a firm specializing in evaluating information and communications technology in the financial services industry. “Within banking, information systems were originally developed along product-centric lines, so there has been very little thought given to standardization and the ability to merge data from disparate systems.”

Even today, it isn’t unusual for a large bank to have one host mainframe system serving as its check-processing system, a second machine for CD business, a third for credit card information, and a fourth to handle mortgage loans.

But combining data from different groups within financial institutions has become very much a necessity in today’s competitive financial environment. Large databases are needed to better understand the company’s relationships with its customers. That is very different from the original use of such databases, which were developed for processing transactions.

A recent survey by Mentis asked financial institutions to specify the level of approved funding for external expenditures and capital investments for the first 12 months of their data warehouse projects. The average was $1.5 million, with nearly a third of organizations spending $2 million or more. The survey was of U.S. commercial banks and thrift institutions that had at least one currently operational data warehouse or firm plans, with funding, for implementation of a data warehouse solution.
Source: Mentis

Banks today are moving aggressively to develop complete customer profiles, which Knox says is key to many banks’ relationship- management strategies. But because of the complexity of the data and systems involved, this development is a long and arduous process.

Banks that are successful in merging their data and developing complete customer information will have a valuable competitive tool. Without this information, banks are limited in their ability to identify customer profitability, and thus to develop strategies and tactics for retaining and growing profitable relationships.

In a recent survey, Mentis found that funding for external expenditures and capital investments for the first 12 months of data warehouse projects ranged from $200,000 to $7.5 million, with the average being $1.5 million (see chart, “Big bucks up front”). Although these numbers seem high, banks have no choice but to take the plunge, Knox says.

Despite the cost, many major financial institutions are doing just that. First Union, for example, is in the process of developing an enterprisewide database to allow the company a full view of customer relationships across products. The system will replace previous departmental systems that had inconsistent and incomplete customer data. Capital One, a credit card issuer based in Falls Church, Va., has had great success in using large customer databases for rapid product development and marketing.

And cost, as in all large database projects, is relative. At Boston-based CS First Boston, revamping the trading database support systems saved the institution about $750,000 annually in the cost of database administrators alone, notes Sergey Fradkov, chief technology officer at UNIF/X, a consultancy that helped CS First Boston revamp its database systems.
–Karen D. Schwartz

Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs

FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020

SEE ALL
ARTICLES

Distributed databases, distributed headaches

Karen D. Schwartz

Company

Categories

Distributed databases, distributed headaches

RELATED NEWS AND ANALYSIS

Karen D. Schwartz

Company

Categories