Medium-sized companies in particular underestimate the impact that IT interruptions have on business operations. A reason for this is the high reliability of standard components that are now used in corporate IT. Their availability is generally estimated to be 99.9%. This number might seem high, but if the system operates 24 hours daily for a year, this could result in a maximum downtime of almost 9 hours. If this ends up happening exactly during the peak sales period, a relatively short server failure could prove costly to the company. Highly-available IT systems with an availability of 99.99% are the standard now when it comes to supplying business-critical data and applications. In this case, a maximum downtime of 52 minutes per year is guaranteed. Some IT experts even believe 99.999% availability is possible, which would mean no more than 5 minutes of downtime annually.
The problem with the information regarding availability is that it only refers to the reliability of server hardware. According to IEEE (Institute of Electrical and Electronics Engineers)’s definition, a system is highly-available if it can ensure the availability of its IT resources despite several server components failing.
'High Availability (HA for short) refers to the availability of resources in a computer system, in the wake of component failures in the system.'
This is achieved, for example, by servers that are completely redundant. All operating components – especially processors, memory chips, and I/O units – are available twice. This prevents a defective component from paralyzing the server, but high-availability doesn’t protect against fire in the data center, targeted attacks by malicious software and DDoS attacks, sabotage or being taken over by hackers. When it comes to real operations, entrepreneurs should therefore expect significantly longer downtimes and take appropriate measures to prevent and limit damage.
Other strategies to compensate for the failure of server resources in the data center are based on standby systems and high-availability clusters. Both approaches are based on networks of two or more servers, which together provide more hardware resources than are needed for normal operation.
A standby system is a second server that is used to safeguard the primary system as soon as it fails due to a hardware or software error. This service takeover is known as failover and initiated automatically by cluster manager software without any administrator intervention. A structure like this, consisting of an active and passive server node can be considered as an asymmetric high availability cluster. If all nodes in the cluster provide services in normal operations, this is known as a symmetrical structure.
Since a time delay occurs when a service is migrated from one server to another, short-term disruption to standby systems and high-availability clusters cannot be completely prevented.