Knowing what systems to test is hard enough. But with many systems, you can’t even get started until you have enough test data to make it meaningful. And if you need thousands or millions of data records, you’ve got a whole new problem.
For most large-scale commercial applications, assembling the test data is most of the effort behind designing a comprehensive test environment. While sampling data from production is often employed for regression testing of existing systems, it’s obviously not available for those under development. Furthermore, it’s difficult to know how large of a sample is necessary to achieve coverage of all critical states and combinations, and it’s also complicated to maintain referential integrity between related tables or files.
Fortunately, test-data generators are becoming more powerful and available. These products offer features that can expedite the tiresome job of populating files and databases with enough data to support complex test scenarios. A few examples include Datatect from Banner Software of Sacramento, Calif. (www.datatect.com), Platinum TestBytes from Computer Associates International Inc. of Islandia, N.Y. (www.cai.com), The Generator from Data Generation LLC of Reston, Va., and TestBase from Tenerus Corp. in Reston, Va. (www.tenerus.com).
But what do these tools do? How well do they work? What do they test?
Who, what, when, where, and why
As you might imagine, test-data generators begin with the description of the file or database that is to be created. In most cases, the tools can read the database tables directly to determine the fields and their type, length, and format. The user can then add the rules, relationships, and constraints that govern the generation of valid data.
Standard “profiles” are also offered, which can automatically produce billions of names, addresses, cities, states, zip codes, social security numbers, test dates, and other common data values such as random values, ranges, and type mixes. User-customizable data types are also available in most products, which can be used to generate unique standard industrialization classification (SIC) business codes, e-mail addresses, and other data types.
A more critical feature, and a more difficult one to implement, is support for parent/child and other relationships in complex databases. For example, a parent record, such as a customer account master, must be linked with multiple child records, such as different accounts and transactions. This type of functionality is essential for relational database environments where referential integrity is key.
Steve Pearson, test automation lead at Network Associates Inc. (NAI), implemented Datatect to assist him in testing Magic Total Service Desk, an intranet help desk application developed by NAI. Headquartered in Santa Clara, Calif., NAI is an independent network security and management software company. “We use Datatect in conjunction with the Segue Silk test automation tool to simulate hundreds of virtual Web users engaged in creating and updating help desk tickets and other modules in the product,” Pearson says.
“Testing complex business rules may require knowing the exact state of several variables spread over multiple databases, tables, and/or files, and finding that precise combination may be like looking for a needle in a haystack.
All data, all the time
The goal of using a test data generation product, of course, is data–tons of it. Thousands or even millions, some claim billions, of records containing variations of the described data can be generated and placed in the database or file format of choice. Most major databases–Informix, Microsoft Access, Oracle, SQL Server, and Sybase–are supported, as well as flat fixed and limited files.
These days, it’s not always quite that easy. Databases can contain more than just data, such as stored procedures or derived foreign keys that link other tables or databases. In these cases, it is not feasible to generate data directly into the tables.
“Ironically,” notes NAI’s Pearson, “it was Datatect’s stubborn refusal [warnings, error dialogs, etc.] to violate database integrity that actually gave us in QA [quality assurance] a much better understanding of the table dependencies and business logic of our application.”
But Pearson employed a creative workaround. “For tables like the help desk table that fire stored procedures and triggers to ancillary tables, we use Datatect to create flat files. These flat files are supplied as input to our automated test scripts, which in turn simulate user input to create the data in the database through the browser.” This approach takes advantage of the software being tested to create the actual database entries from the generated data.
The synergy between test-data generation and test-automation tools is natural, and in some cases, the test data generation capability is being embedded in test execution products.
Putting data to work
Generated test data can obviously be used to create databases with enough information to approximate real-world conditions for testing capacity and performance. If you need to ensure that your database design can support millions of customers or billions of transactions and still deliver acceptable response times, you need some practical means of creating these volumes.
Functional testing is a different animal, however. If you are testing business rules, such as whether an overdue account balance may permit additional credit purchases to be posted, then you must know precisely which account number contains the condition and which transactions may be entered against it. It may be easy to generate huge volumes of accounts with balances that are all over the map in terms of their amounts and due dates, but it is not as simple to know exactly which accounts satisfy which business rules.
The same issues exist for sampling production data. Even if you are comfortable that your data sample represents the types of conditions to be tested, it’s another matter altogether to know which accounts meet which requirements. Testing complex business rules may require knowing the exact state of several variables spread over multiple databases, tables, and/or files, and finding that precise combination may be like looking for a needle in a haystack.
Can you benefit from a test-data generation tool? Probably. If you can, how do you select the best one for your needs? “The most important step is to carefully research the data population needs required by your testing efforts,” says Pearson. Not only will this help you evaluate the various products, but it will also add to your education about how your software and its data are designed. And that information is useful whether you generate data or not! //
Linda Hayes is CEO of WorkSoft Inc. She was one of the founders of AutoTester. She can be reached at [email protected]