"Total Reliability"
Total reliability is the holy grail of any technology deployment. Everyone building, implementing, and using technology solutions loves to have guaranteed service all the time. However, we also know the reality of the situation: there are real failure rates associated with every kind of technology regardless of whether it's a ten dollar calculator or a multi-billion dollar space shuttle.
So, given positive failure rates, what does "total reliability" mean in practical terms?
In order to measure reliability, we will need a reliability metric. A reliability metric is a numerical measure of reliability. Two common reliability metrics are percent uptime and percent serviced. Percent uptime measures the percentage of the time the service is available and percent serviced measures the percentage of total customer requests that are correctly serviced. The term reliability used here will always refer to percent uptime unless otherwise stated.
In the telecommunications and Internet service provider world, reliability jargon has recently centered around the number of nines after a decimal place in the reliability metric. "Three Nines" refers to 99.9% (.999) reliability, "Four Nines" 99.99% (.9999), and "Five Nines" refers to 99.999% (.99999) reliability. "Five Nines", 99.999% reliability, is the gold standard benchmark used by many telecommunications companies. The table below shows the various percent uptime benchmarks and the associated downtime per week.
"Nines" |
Percent Uptime |
Downtime per Week |
Three Nines |
99.9% |
10 minutes per week |
Four Nines |
99.99% |
1 minute per week |
Five Nines |
99.999% |
6 seconds per week |
Measure Reliability as a Practitioner, not a Salesman
Marketing departments love to throw around the term "Five Nines" when describing their services. However, in reality Five Nines is much more of a myth than the marketing folks would like you to believe. Consider for example the most common of all telecommunications devices - the telephone.
How many times in the last year or two has your phone line been out of service? For how long? Consider that if a phone is out of service for even 2 hours during any given year, it has to have 100% uptime for the next 23 years before it reaches 99.999% reliability!
The phone company may say, "The phone lines went down because there was a storm." However, to the customer it does not matter why the interruption of service happened - whether it is a component failure, a natural disaster, vandalism, whatever. The only thing that is important is that the customer sees that an interruption of service has taken place. As technology professionals we should understand that all interruptions of service lower the percent uptime reliability metric.
We find that typical Active Call Center customers would be happy with Two to Three Nines of reliability, and they would be absolutely thrilled to have something between Three and Four Nines of reliability. This chapter discusses techniques that can be used to push an Active Call Center system to reach to the Three to Four Nine reliability range.