The measurement of availability sounds so easy. You merely take the amount of time the service responds relative to the amount it should have — right?
Well, unfortunately the answer is nowhere near that simple because “availability” must be defined in order to have consistent understanding and application.
It is worth getting right, though. Availability is a key element by which customers and users will judge IT’s ability to deliver quality services, so you had better take great care in understanding what availability really means to your organization.
First, what is your goal?
Why are you and your customers measuring availability? Your reason for tracking this information will shape what is done. For example, are you tracking to understand performance relative to service levels? If so, the measure needs to be consistent across Service Level Agreements, Service Quality Plans and so on.
Second, who is your audience?
If you are dealing with the business, IT must worry about things that the business worries about. You need to understand how their business processes are affected by IT provisioned services. The “touch points” where the business processes and IT services join should be of keen interest.
The customer is worried about whether his/her people can do their jobs and, at the same time, whether they are achieving their objectives. If you tell the customer that the service is available 99.99%, what does that really mean if the 0.01 percent of downtime happens four days before Christmas and millions of dollars in website sales are lost and end-users become alienated and go to a competitor?
In cases like that, it’s doubtful the customer is going to be happy that they were up “most” of the time based on an API call used to see if a service was working or not.
As an aside, through effective IT Service Management, there should be mechanisms in place via Service Level Management to ensure that IT understands about the business’ critical periods and is ready to minimize the business impacts of incidents.
Third, what will your metric be?
Armed with knowledge of your goal and the audience that will be reviewing availability data, then you must arrive on the metric(s) used to monitor it. The simplistic formula of ((Agreed Service Time less Down Time)/Agreed Service Time * 100) is insufficient and too often is relied upon.
If you look at availability from the perspective of your customer(s), what would you look at to assess availability? Have you ever asked them? Their answers may surprise you. The metric, or series of metrics selected, should be meaningful to the customers, not just IT.
Fourth, how will you collect the data?
The data needed to generate the metric(s) must be collected in such a way as to be both accurate and cost effective. It never makes sense to spend more money collecting and generating a metric than what the metric is worth.
Think of it this way: If you are monitoring a service that supports millions of dollars of business, then what would you spend to measure and report on availability? What would you spend if the service supports only a few thousand dollars worth of business and the customer isn’t worried about it?
Make sure the right method of monitoring is used. If a service must be monitored, then it needs to be done in a way that truly reflects the state of the service.
Using PING to check the state of servers can overlook the true state of other subsystems that are hung and erroneously report that the overall service is available. This should correspond with how the alerting systems work. You don’t want the alerting systems to monitor one way and the reporting tools to monitor another. Ideally, the alerting and reporting functions should be handled by the same tool or at least using an identical methodology.
Fifth, how will you report it?
Once you have collected the data, you need to do something with it. How will it be presented and how often? Will it be trended over time to show whether it is improving? Will it be printed in a report and delivered during a service review meeting? Will it be updated daily on an intranet site? What will the repercussions be if a bad figure is released and the Service Level Manager is surprised by the customer who has seen the data first? These are all things to think about.
Availability is a important but nebulously defined term. IT must work with management and customers to ensure there is a common understanding about availability, including how it will be measured and reported. It is important that the requirements are understood, standardized and implemented to ensure that alerts and reports are delivering the information needed to make decisions.