Measuring High Availability

Here’s how to calculate your SLA’s availability percentage.

2 Min Read
ITPro Today logo in a gray background | ITPro Today

The term high availability is meaningless until you define how to measure it. IT shops measure high availability as the percentage of time that systems are available, but the provider and consumer of high-availability services must agree on what constitutes availability and how the time is measured. Typically, the availability percentage is calculated as follows:

x = (n - y) * 100/n

where n is the total number of minutes in a given calendar month and y is the total number of minutes that service is unavailable in a given calendar month. To calculate the availability percentage, you must know the total number of minutes in the service period, as well as the minutes that you can exclude from the calculation. Typical exclusions are scheduled maintenance hours, planned downtime (e.g., to quarantine viruses, to react to a security threat), and acts of force majeure.

For example, a 31-day month contains 31 * 24 * 60 or 44,640 minutes. If a server is unavailable for 15 minutes because of an unexpected crash and automatic reboot, the availability percentage is 99.97 percent. If the server is also down for 3 hours for a scheduled hardware replacement, the availability is 99.56 percent—if you don't exclude scheduled maintenance. (The availability percentage remains at 99.97 percent if you exclude scheduled maintenance.) Suppose you offer a $100,000 monthly guarantee on meeting a 99.9 percent service level agreement (SLA): An ambiguity with respect to exclusions might cost you every penny of that $100,000.

In addition to the duration of an outage, you also need to consider the frequency of outages. Suppose you want to offer a 99.5 percent SLA for a server, exclusive of scheduled maintenance. That availability percentage lets you have the server offline for a maximum of 3 hours and 43 minutes in a 31-day month. You could have one 3 hour 43 minute instance, two 1 hour 51 minute instances, or three 1 hour 14 minute instances. Obviously, the more often a server crashes, the faster you must restore it to meet your monthly SLA.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like