Database modeling's fashionable new tools

Database design and modeling through the new millennium will combine object and distributed computing technologies.

In this article:
Historical perspective: upper CASE vs. lower CASE
Schemas and models
The future of data modeling tools
For more information
Although some developers currently use the proverbial "back of a napkin"--or even Excel's drawing tools--to sketch the logical design for a new database, most IT shops use tools like Platinum Technology's ERwin or Silverrun's multiplatform Relational Data Modeler (RDM) for databases with 20 or more tables.

But the combination of object technology and the increasing complexity of distributed computing (it's not uncommon for SAP implementations to reside in databases that contain over 8,000 tables) is beginning to change the way databases are built. As you might expect, this has also created the need for a new generation of database modeling tools.

Modeling is good for you


One simple component: Today's application modeling software sees data modeling as simply one component (data services) of a classic three-tier model.

Using tools like ERwin (formerly from Logic Works) and Silverrun-RDM, and similar products from Oracle (Oracle Designer), Popkin Software (System Architect), and Sybase (PowerDesigner), database designers can design and document new databases and even reverse-engineer existing ones. These tools are fine for generating the DDL (data definition language) SQL statements you can use to create a database schema (see sidebar, "Schemas and Models"), but they're widely viewed by the developer community as burdensome ("necessary evils" is a phrase you hear over and over) and boring nuisances. What was acceptable in yesterday's mainframe shops and waterfall method lifecycles (see sidebar, "Historical perspective: upper CASE vs. lower CASE") simply doesn't cut it in today's object-oriented shops that work in Internet time.

Everyone will conceptually agree that modeling is a good thing, but not everyone has the discipline to use it. Almost all programmers would like to sit down and immediately start coding rather than suffer through boring meetings where a system's basic design is developed and then reviewed. CASE tools, and even their modern-day equivalents such as Rational Software's Rational Rose '98 (see screenshot, "One simple component"), that support component-based development are perceived as imposing too much time-consuming overhead. There's also the problem of what comes first: designing an application's classes and components or the database/data warehouse that it will use.

As Paul Munkenbeck, the database administrator with marketing services company Maritz UK, says, "I think data architects have gone out of fashion over the last few years with 'quick and dirty' fixes, but I think they may start making a comeback." In a current data warehousing project that involves integrating two different (large) European systems, Munkenbeck and his colleagues are starting from the data analysis side looking for common items. Data warehousing projects aren't the only things driving IT back to the kinds of design tools they routinely used when developing mainframe-scale applications. There's also object technology and the increasing use of object-oriented application development tools.

Historical perspective: upper CASE vs. lower CASE

Products like Platinum's ERwin (See figure below, "Today's most popular data modeling tool") and Oracle Designer are sometimes called computer-aided software engineering--or CASE--tools and generate code that corresponds to a database schema. "Lower" CASE tools like ERwin or Oracle Designer use input (table and column names, for example) either from an existing database "system catalog" or as typed in by a database designer. "Upper" CASE tools such as Andersen's Foundation and Texas Instruments' Composer, which were popular in mainframe shops a decade ago, were more "soup to nuts" frameworks or methodologies for complete lifecycle application development. In an era of five-year projects developed under the so-called "waterfall" model of structured analysis and design, upper CASE packages worked fine and produced well-documented systems.

Today's most popular data-modeling tool: ERwin

Object technology-- and the distributed applications that tend to use it --drastically increase the need for modeling, and even developers are beginning to see the need to spend up-front time developing formal database and system architectures. Darwin Sanoy, a veteran consultant with Info- Evolution based in Limerick, Penn., who has worked extensively with databases running under OS/400, AIX, OS/2, and Windows NT agrees, adding that "the simultaneous movement to network-based computing and the componentization of development models is driving complexity into the systems management space." Sanoy worries that IT hasn't recognized this trend and feels that developers also need to think in terms of process and system design earlier in the game.

Vendors also recognize the need for a more holistic approach, and we're beginning to see suites of tools that are a lot like modern-day upper CASE tools. Today's lifecycle tools, including IBM's Visual Age product line, Platinum Technology's repository-based family of tools, Rational Software's products anchored by Rational Rose, and the COOL family from Sterling Software, aren't likely to use the much-maligned CASE term in their product literature. But that's basically what they are: code-generating tools that let designers and developers do more visual modeling and programming.

CASE renaissance?

Components aren't the only reason we need new ways to design databases. Not only are we designing and building more data warehouses and datamarts, but we're also facing the business realities of globalization, increasing merger and acquisition activity, Y2K-driven systems reengineering, and the influence of giant enterprise resource planning systems from Baan, PeopleSoft, SAP, and other vendors.

"The key to data warehouses is getting the data out. You've got to focus on the business logic--design first, code later," says Mark Simons, vice president at Convista, a consulting firm located in Silver Spring, Md., that specializes in system redesign. Convista has helped dozens of clients build large data warehouses by relying heavily on Sterling Software's COOL:Biz for business-process modeling. Simons sees some of the discipline of mainframe-style system development coming back into vogue for today's big projects. "When you built production systems, you followed certain procedures. You had design and code reviews. You had testing and QA. Developers don't like to do modeling, but you need some structure and discipline to build production-caliber systems," he says.

In the short term, Simons would like to see vendors' tools with tighter integration between the data and the process diagrammers in their toolsets. "Today we use a variety of tools to bring together people, processes, technology, and organizations, so they need to be fully integrated." In two years, he expects to see mature object-oriented design tools that will automatically generate not only the databases (something today's tools can do), but also the application from the models. Five years from now (2003), concludes Simons, "we just hope the design tools can live up to the big promises [of upper CASE and its methodologies] made in the '80s. This will only happen if the tool vendors stay focused on their current products and not chase the next paradigm shift."

Indeed, we're even seeing the "M" word (for methodologies) creep back into the vernacular, even though today's methodologies are more object-oriented than were the structured ones of yesteryear. Instead of drawing flowcharts and ER diagrams (see figure, "Microsoft's SQL Server 7.0"), you're more likely to represent systems using UML (Unified Modeling Language). Platinum Technology, for example, recently acquired Tom Vayda's consulting firm for its effective Complete Lifecycle Incremental Parallel Process (CLIPP). Based on the "Catalysis" methodology created by Icon Computing's Desmond d'Souza (who also joined Platinum) and Alan Wills, CLIPP relies on developing USE cases, identifying business rules, and designing UI specs and programming contracts before a single line of code is written.

Vayda, now chief technology officer for Platinum, describes how CLIPP routinely generates half the code required by a project, which brings programmer productivity up to a range of 120 to 180 LOC (lines of code) per day. When you compare that to the standard 12 to 18 lines of fully debugged code often cited for programmer productivity, you can probably understand why Platinum was interested in CLIPP. Under Vayda's leadership, Platinum customers should expect to see tighter integration among the company's business, process, and data-modeling tools, which in 1999 will share the Model Mart repository acquired with Platinum's purchase of Logic Works. Within the two-year timeframe, Vayda expects Platinum's ERwin, BPwin, and Paradigm Plus products to interoperate with industry-standard repositories and frameworks and do an even better job generating code from the design model.

One company committed to Platinum's CLIPP methodology is Caldwell-Spartin, a software specialist for the staffing industry based in Marietta, Ga. Platinum was extremely effective with the technology transfer to Caldwell-Spartin employees, says Jeff Morrison, director of Product Development at Caldwell-Spartin. When Platinum was called in, Caldwell-Spartin was looking for a consulting firm to take it into the next millennium with a new, scalable, object-oriented version of its product line. By following the CLIPP methodology, not only were Caldwell-Spartin's existing programmers trained in object-oriented development, but the firm also laid the foundations for benefiting from a repeatable development process.

A documentation specialist at Caldwell-Spartin commented that it was the first system she had ever worked on where tech writers didn't have to interrupt the programmers to find out details needed for online help systems. "It was all there in the specifications," she summarized.

Schemas and models

The three-layer ANSI/SPARC architecture, which dates back to 1975, still exerts influence. Its goal was to define an architecture for database-centric systems that supported physical data independence so that changes in a database's conceptual design could be handled separately from changes in physical data storage.

In ANSI/SPARC jargon, there are three models: the internal, conceptual, and external or user view. The internal model corresponds to the physical implementation (files, indexes, and so on). The conceptual model is the more abstract model of the information that's being modeled. And the external model is more like a series of views of the different applications of the user of the data. Later, during the client/server era, the ANSI/SPARC terminology was largely replaced with similar terms that divided the database world into logical and physical models and the user interface.

Today's changing needs for database storage that accommodates more than simply text and numbers have resulted in an evolving relational database model as classic relational RDBMSs morph into object/relational systems and additionally into a new "pure object" object database model.

But what about the productivity claims that Platinum and other suite vendors make? Did Caldwell-Spartin enjoy a ten-fold improvement in productivity? Yes, according to Morrison, who cited an incredible six-month project lifecycle. (The firm's initial attempt to port the original character-based VAX/VMS system to a more object-oriented one running on NT was abandoned when external consultants failed to deliver even a pilot after a year's work.)

Of course, not everyone's motivated to seek those kinds of productivity gains. Rao Adiraju, a consultant with Curtin-Act, Australia-based Distributed Systems Consultants P/L, asks rhetorically how many project managers/IT directors really want to deliver their IT projects early, under budget, or with fewer staff. They know full well that next year's budget will be linked to the current year's actuals and that their promotion depends on the number of people they have managed.

Push-button coding?

The future of data modeling tools

Year 1

In the short term, vendors' tools should have tighter integration between the data and the process diagrammers in their toolsets, according to Mark Simons, vice president of Silver Springs, Md., consultancy Convista. "Today we use a variety of tools to bring together people, processes, technology, and organizations so they need to be fully integrated."
Because today's projects are component- and model-driven, we're seeing database design subsumed into the much larger process of application development.

Year 2

Convista's Simons expects to see mature object-oriented design tools that will automatically generate not only the databases (something today's tools can do), but also the application from the models as well.
We'll see a convergence of the top-down repositories used in designing data warehouses and the bottom-up repositories assembled and used by developers, especially those building Microsoft's COM objects.
Within two years object-oriented suites will be used for the majority of new development, and the IBM, Microsoft, and Oracle repositories will be interoperable standards.

Year 5

In 2003 "we just hope the design tools can live up to the big promises [of upper CASE and its Methodologies] made in the '80s," says Simons from Convista. "And this will only happen if the tool vendors stay focused on their current products and not chase the next paradigm shift."
"It's going to be very interesting to see the uptake in Oracle's 8i with its change of focus to Web-based and nondatabase objects," says DBA Paul Munkenbeck. "This may well provide a major shift in design techniques--not only wlll DBAs have to learn how to use browser-type interfaces in their admin tools, they're also going to have more Web-style features in the database itself."
Further down the road, we'll see tools that help visualize and manage our complex, interrelated enterprise architectures.
Yet another "new" capability that is really old is the idea of a modeling tool that generates code. Most lower CASE tools generate error-free SQL code, but that doesn't mean the code will be optimized for a given environment. One can hardly expect a single code generator to come up with designs that are equally optimized for transaction processing and decision support, for example. As database administrator Paul Munkenbeck notes, "I think the idea of pressing a button and out comes a generated application is seductive, but I'm not convinced. I haven't seen a tool yet that can go through the decision-making processes that I do when designing the implementation of a database."

Info-Evolution's Sanoy echoes Munkenbeck's cynicism. According to Sanoy, "The biggest obstacle to complete database design by CASE is the need to truly optimize design beyond table definitions and into DBMS and operating system optimizations." To build today's systems, says Sanoy, you need to have better communication among database designers, developers, and the systems and network administrators.

Of course, today's suites do more than generate database schemas. They'll crank out forms and other UI features, and even build error-handling routines into their code. And suites aren't the only kind of package that offer "ready to run" code. Today, you can buy "frameworks" such as IBM's San Francisco Project or "best of breed" templates that claim to have over half the code you need to build a back office application or industry-specific datamart.

Don't be surprised, then, to see more conferences on Enterprise Architecture and Distributed Systems Design. It's a theme that's dear to John Zachman's heart. Zachman published his Zachman's Framework in the midst of his 26-year career with IBM. It helps organizations establish a common set of perspectives for describing today's complex enterprise systems. Today, Zachman's Framework, like UML and ER diagrams, has entered the mainstream vocabulary as a standard way of helping describe distributed enterprise systems, component-based or not.

Looking down the database road

Yesterday's projects tended to be data-driven or process-driven. Because today's projects are component- and model-driven, we're seeing database design subsumed into the much larger process of application development. Details about an organization's many databases and their designs (or details about how legacy databases have simply been "wrapped" for easy access) are being stored in central repositories such as those from IBM, Microsoft, Oracle, Platinum, or Unisys. The increasing popularity--and visibility--of repositories isn't only due to Microsoft weighing in with desktop-based repositories, but it's also because of its role in designing data warehouses and marts.



"I haven't seen a tool yet that can go through the decision-making processes that I do when designing the implementation of a database," says Paul Munkenbeck, Maritz UK's database administrator.
So what's missing today? In the database arena, we still don't have easy to use object/relational databases, and lower CASE tools to support their proprietary extensions are lagging. Nor will today's database schema generators warn us when to denormalize or make other schema changes required for decent performance. There's no tool that automatically identifies what needs to be done to create a data warehouse or datamart. We're not sure whether it's a safe long-term strategy to start replacing stored procedures written in proprietary PL/SQL or T-SQL, for example, with ones written in Java. There are too many standalone, noninteroperable metadata repositories. We're not even sure whether it's a smarter long-term strategy to build Enterprise Java Beans (EJBs) or ActiveX controls--or whether it matters.

One way or another, tomorrow's databases will handle complex, unstructured data as easily as today's OLTP systems handle customer ID fields. In the next two years, we'll see a convergence of the top-down repositories used in designing data warehouses and the bottom-up repositories assembled and used by developers, especially those building Microsoft's COM objects. The two-year timeframe will also see object-oriented suites used for the majority of new development, and the IBM, Microsoft, and Oracle repositories will be interoperable standards. Further down the road, we'll see tools that help visualize and manage our complex, interrelated enterprise architectures. DBA Munkenbeck wonders about the impact of Oracle's 8i with its change of focus to Web-based and nondatabase objects. "This may well provide a major shift in design techniques. Not only will DBAs have to learn how to use browser-type interfaces into their admin tools, they are also going to have more Web-style features in the database itself. This could be the most significant change over the next five years," he says.

What are your thoughts on the future of database modeling tools? E-mail us and tell us about your experience.
Databases are now becoming part of the infrastructure, so it only makes sense that today's tools should focus less on the relatively simplistic one-time design activity and more on understanding them within the context of an enterprise architecture. //

Karen Watterson, karen_watterson@msn.com, is an independent San Diego-based writer and consultant specializing in database design and data warehousing issues. She has written several books, including Visual Basic Database Programming and Client/Server Technology for Managers and is completing one on Microsoft's SQL Server 7.0.

For more information

BOOKS

Data Warehouse Design Solutions
By Christopher Adamson and Michael Venerable
John Wiley & Sons, 1998
Based on their consulting experience (their independent consulting firm has been acquired by Sagent Technology), the authors provide data warehouse/mart "recipes" for standard business environments.

Vendors, analysts, and consultants, oh my!

VENDORS

IBM
VisualAge

Intersystems
Cache

Microsoft
Microsoft Repository

NCR
Scalable Data Warehousing

Oracle
Oracle Designer

Platinum Technology
Platinum ERwin

Popkin Software
System Architect

Rational Software
Rational Rose

Silverrun Technologies
Silverrun RDM

Sterling Software
COOL: Biz

SELECT Software Tools

ANALYSTS

Bloor Research

Hurwitz Group

Doug Barry's ODBMS Central

Ovum Research

CONSULTANTS

Convista

Platinum Global Consulting

Zachman Institute for Framework Advancement

MCSD: SQL Server 6.5 Database Design Study Guide
By Kevin Hough
Sybex, 1998

The Data Model Resource Book: A Library of Logical Data and Data Warehouse Designs
By Len Silverston, Kent Graziano, and William H. Inmon
John Wiley & Sons, 1997
Another good source of recipes.

Designing Databases: An Object-Oriented Approach to Database Design
By Robert J. Muller
Morgan Kaufmann Publishers, 1998
Robert Muller is another author with years of "real world" experience (mainly Oracle) under his belt. This is the best book I've read for database designers and DBAs who need to work within the new object-oriented development process.

Designing Quality Databases With IDEF1X Information Models
By Thomas A. Bruce
Dorset House, 1991
The classic book for anyone using the IDEF1X methodology.

Objects, Components and Frameworks with UML: The Catalysis Approach
By Desmond d'Souza and Alan Wills
Addison Wesley, 1998
Recommended for anyone contemplating Platinum's suite.

WEB SITES

DAMA (Data Management Association)
Meta Data Conference.

TDAN (The Data Administrator's Newsletter).
Top-notch site with contributed articles covering data architecture, repositories, and data warehousing.

TEN (The Enterprise Newsletter).
Newsletter on information engineering published by well-known data modeling consultant Clive Finkelstein.

NEWSGROUPS AND DISCUSSION LISTS

Promising forum with useful message threads.

CASE-L (Computer Aided Software Engineering) newsgroup
You can join this relatively inactive group by sending the message "sub CASE-L your name" to listserv@vm.cc.purdue.edu

ITA-L (Issues in Information Technology Architecture) newsgroup
You can join this group by sending a message "sub ITA-L your name" to listserv@tc.umn.edu

JSD-L (Discussion of Jackson System Development) mailing list
You can join this group by sending the message "sub JSD-L your name" to listserv@itssrv1.ucsf.edu

CNI-Architecture and Standards newsgroup
You can join this group by sending the message "sub CNI-ARCH your name" to listserv@uccvma.ucop.edu

CISIRG-L (Common Information System Implementation Resource Group) mailing list
You can join this group by sending the message "sub CISIRG-L your name" to listserv@admin.humberc.on.ca