In Part 1 of this column, Frank Cohen described a data center framework architected around many small devices, enabling live scalability and performance testing of a data center. Here, in Part 2, he defines criteria for good Web performance and provides information on tools for data center capacity testing.
Defining Criteria For Good Web Performance
Having test results shows data center readiness against predetermined criteria. Before the Internet, client/server systems operated on private networks and were accessed at rather predictable times, with simple patterns, by a well-known and predictable group of people. The network usually ran in a single office data center or maybe a few remote locations too. Advances in routers, gateways, bridges and other network equipment mean any device on the data center network may receive requests from people anywhere in the world and at any time.
A data center network may not be exposed to the open Internet, but it is very likely that employees will access the network from home, remote offices will access the data center through a network bridge, and the network may also be handling traffic for your phone system using VoIP protocols. Web applications on the network are subjected to highly unpredictable load patterns by a widely heterogeneous and unpredictable group of users.
Defining criteria for good Web application performance has changed over the years. Unfortunately, using current Web application performance criteria is a good way of developing outdated criteria. The old techniques of ping tests, click-stream measurement tools and services, and HTML content checking often result in inconclusive data.
- Ping tests – used the ICMP protocol to send a “ping” request to a server. If the “ping” returns the server is assumed to be alive and well. The downside: Usually a Web server will continue to return ping requests even when the Web application crashed.
- Click-stream measurements – request Web pages and record statistics, including total page views per hour, total hits per week, total user sessions per week and derivatives of these numbers. The downside: If a Web application takes twice as many pages as it should to complete a user’s task, the click-stream test will show the Web site is popular, but to the user the Web site is frustrating.
- HTML content checking – request a Web page, determine the HTTP hyperlinks and request those pages to record a working or broken link. The downside: the hyperlinks in a Web application are dynamic and change depending on the user. Just checking the links validity is meaningless, if not misleading.
Choosing criteria for good data center performance should be based on many factors.
Are the basic features working?
Assemble a short list of basic features. If your company is marketing the software, then develop this list from the marketing materials. Put yourself into the shoes of a user who just read your marketing literature and is ready to use your Web application. The basic features list for Inclusion.net software read like this:
- Sign-in and sign-out
- Navigate to a discussion message
- Download a document
- Post a message
- Search for a message using key words
While the Inclusion.net software had 480 server objects, 110,000 lines of code and a versatile user interface, it came down to these 5 basic features to guarantee the application was working at all.
Is performance acceptable?
Check with 3 to 5 users to determine how long they will wait for a Web application to perform one of the basic features before they abandon the feature and move on to another. Take some time and watch the user directly, and time the seconds the user takes to perform a basic feature.
How often does it fail?
Web application logs if formatted with valuable data can show the mean-time-between-failures. At first the attraction to develop a percentage of failures to users may be appealing. Such a percentage is meaningless. The time between failures is an absolute number. Estimate the number first, and then look into the real logs for a real answer.
Understanding and developing the criteria for good Web application performance touches on every day experiences. Companies with Web applications that have relied solely on ping tests, click-stream measurements or HTML content develop useless test results, and sometimes misleading results. These tools were meant to test public Web sites, not the Web services running in your datacenter. The best measurements for a Web application include:
- Meantime between failures in seconds
- Time in seconds each user-session took
- Web application availability and peak usage periods
Web applications are very different to test than desktop software. At anytime a Web application services 1 to 5,000 users. Learning the performance and scalability characteristics of a Web service under the load of hundreds of users is important for managing software development projects, to build sufficient data centers and to guarantee a good user experience. The interoperating modules of a Web application often do not show their interdependent nature until loaded with user activity.
Scalability and performance are measures needed to determine how well a Web application will serve users in production environments. Taken individually the results may not show the true nature of the Web application. Or even worse, they may show misleading results. Taken together, scalability and performance testing shows the true nature of a Web application.
Moving a Web application into a data center production environment requires assurances of high availability and consistently good performance. Here is a checklist that developers, QA managers and IT managers should keep in mind when choosing test tools to test a Web application:
- Stateful testing. When you use a Web application to set a value, does the server respond correctly later on?
- Privilege testing. What happens when the everyday user tries to access a control that is authorized only for administrators?
- Speed testing. Is the Web application taking too long to respond?
- Boundary timing testing. What happens when your Web application requests times-out, or takes a really long time to respond?
- Regression testing. Did a new build break an existing function?
These are fairly common tests for any software application. Since this is a Web application, though, the testing arena expands into a matrix, as the following table of a Web application test suite describes:
|Web Application Test
|1 User||50 Users||500 Users|
|Boundary Timing Testing|
Tools For Data Center Capacity Testing
Many engineers and IT managers are skeptical when evaluating commercial test tools. The thinking goes: Why should the company pay all this money for a tool, with a new set of instructions or language to learn, when I could just write the test suite myself?
The problem with writing a test suite comes down to one word: maintenance. Just like every other software application, the test agent will need to be maintained.
A brief, unscientific survey of Silicon Valley developers who wrote their own test agents found that maintaining the test agents grew from a minor irritation to a huge problem. The typical developer’s first attempt at writing test agents resulted in a fairly robust set of Java classes or Visual Basic methods that issue HTTP requests to a Web server and do some simple analysis of the results. Writing a test agent consisted of writing Java code that sequentially called the correct test object.
For example, an agent that reads through an online discussion forum message-base looks like this:
|1. Read the first page of the site (which included URLs to the
|2. Randomly read a discussion message.|
|3. If the message has a reply, then read the reply message.|
|4. Repeat step 3 until there are no more reply messages.|
The resulting test agent is a small Java application. At one point in developing the format for the URL to read the first page of the site changed. The test agent needed maintenance.
As a matter of fact, every new change to the data center required some subtle change in the test agent. Each and every change brought the developer back to the test agent code. While the test objects – getting Web pages, signing-in, testing for the correct return values – stayed the same from test suite to test suite, the calling sequence was always different.
Looking at the test agent code, does it make sense to add a scripting language to assemble test agents from a library of test objects? The test objects perform the individual test routines, routines that rarely need to change. And a scripting language would be used to assemble and control the parameters of the objects, a scripting language that would be easier to alter for different levels and types of Web application testing.
The benefit to all developers, QA managers and IT managers: A programmer writes the more labor-intensive test objects only once and adapts the scripting language as needed. The scripting language enables engineers more comfortable with scripts than hard-core test object code to write their own test agents. The scripts are readable by mere human beings and can easily be shared and swapped with others.
The key to making data center test agents in this form lies with the ability of a language to standardize test agent scripting — to provide a common reference for the way notations, variables, and program flow are noted. The test object library grows as your data center needs change. And the script library is more easily maintained.
One choice for developing intelligent test agents is to use the open-source Load project. Load comes with a script language and library of test objects. Load is distributed under an Apache-style open source license. Load comes with the software, source code and example test scripts. Load enables concurrent intelligent test agents to test a data center for scalability and performance.
Load is a free open-source utility for developing intelligent test agents. Details are at http://www.pushtotest.com.
To read Part 1 of this article, click here.
Frank Cohen is the principal architect for three large-scale Internet systems: Sun Community Server, Inclusion.net and TuneUp.com. These are Internet messaging, collaboration and e-commerce systems, respectively. Each needed to be tested for performance and scalability. Cohen developed Load, an open-source tool featuring test objects and a scripting language. Sun, Inclusion and TuneUp put Load to work, as have thousands of developers, QA analysts and IT managers. Load is available for free download at http://www.pushtotest.com.