Knowing what systems to test is hard enough. But with many systems, you can't even get started until you have enough test data to make it meaningful. And if you need thousands or millions of data records, you've got a whole new problem.
For most large-scale commercial applications, assembling the test data is most of the effort behind designing a comprehensive test environment. While sampling data from production is often employed for regression testing of existing systems, it's obviously not available for those under development. Furthermore, it's difficult to know how large of a sample is necessary to achieve coverage of all critical states and combinations, and it's also complicated to maintain referential integrity between related tables or files.
Fortunately, test-data generators are becoming more powerful and available. These products offer features that can expedite the tiresome job of populating files and databases with enough data to support complex test scenarios. A few examples include Datatect from Banner Software of Sacramento, Calif. (www.datatect.com), Platinum TestBytes from Computer Associates International Inc. of Islandia, N.Y. (www.cai.com), The Generator from Data Generation LLC of Reston, Va., and TestBase from Tenerus Corp. in Reston, Va. (www.tenerus.com).
But what do these tools do? How well do they work? What do they test?
Who, what, when, where, and why
As you might imagine, test-data generators begin with the description of the file or database that is to be created. In most cases, the tools can read the database tables directly to determine the fields and their type, length, and format. The user can then add the rules, relationships, and constraints that govern the generation of valid data.
Standard "profiles" are also offered, which can automatically produce billions of names, addresses, cities, states, zip codes, social security numbers, test dates, and other common data values such as random values, ranges, and type mixes. User-customizable data types are also available in most products, which can be used to generate unique standard industrialization classification (SIC) business codes, e-mail addresses, and other data types.
A more critical feature, and a more difficult one to implement, is support for parent/child and other relationships in complex databases. For example, a parent record, such as a customer account master, must be linked with multiple child records, such as different accounts and transactions. This type of functionality is essential for relational database environments where referential integrity is key.
Steve Pearson, test automation lead at Network Associates Inc. (NAI), implemented Datatect to assist him in testing Magic Total Service Desk, an intranet help desk application developed by NAI. Headquartered in Santa Clara, Calif., NAI is an independent network security and management software company. "We use Datatect in conjunction with the Segue Silk test automation tool to simulate hundreds of virtual Web users engaged in creating and updating help desk tickets and other modules in the product," Pearson says.