Design Tactics - Availability

The aim of the design tactic is to increase availability: the amount of time the system is operational. We could increase availability simply by increasing redundancy. The redundancy is costly. Ideally we would never have any replicas. There are two types of redundancy:

  • data redundancy -multiple storage of the same data
  • communication redundancy - multiple communications routes
  • control redundancy - multiple-processors

Synchronization is difficult to do well. Failure = Deviation from specification. Faults have the potential to cause failures. Availability closely related to ability to repair or recovery after failure.

There are two main approaches: fail-safe and failover. Fail-safe is the automatic protection of programs and/or processing systems when a computer hardware or software failure is detected in a computer. The detection mechanism then puts the system into a safe state. A watch dog may be classic example. Failover is a backup operational mode in which the functions of a system component (such as processor, server, network, database etc.) are assumed by secondary system components when the primary component becomes unavailable through either failure or scheduled down time.

At the end of the day, when a fault enters the system, it has be either masked or repaired. We have many tactics to control availability of software:

  1. Fault Detection
    1. Ping / Echo
    2. Heartbeat
    3. Exception
  2. Recovery Preparation and Repair
    1. Voting
    2. Active Redundancy
    3. Passive Redundancy
    4. Spare
  3. Recovery Reintroduction
    1. Shadow
    2. State Resynchronisation
    3. Rollback
  4. Fault Prevention
    1. Removal from service
    2. Transactions
    3. Process monitor

Leave Comment

Your email address will not be published.

Please type the characters of this captcha image in the input box

Please type the characters of this captcha image in the input box