Tuesday, June 11, 2013

HADR (High Availability Disaster Recovery)

The HADR (High Availability Disaster Recovery) feature is a database replication feature that provides a high availability solution to protect against a failure of a database server. Database structure changes and data changes are automatically replicated from the active or primary database to a standby database. HADR is essentially an active/passive HA solution, though read-only access on the standby database can optionally be enabled to make it a fully-active/limited-active HA solution as of DB2 LUW v9.7.

Data replication from the primary database to the standby database can be in one of three modes: synchronous, near-synchronous and asynchronous. 
  • Synchronous replication allows for the greatest protection against transaction loss but at the cost of longer transaction response time, as update transactions are considered successful only after they have been applied to the primary database, and after confirmation that they have been received by and applied to the standby database.
  • Near-synchronous replication provides slightly less protection against transaction loss while providing better performance of transaction response time. In near-synchronous HADR mode, update transactions are considered successful after they have been applied to the primary database and after confirmation that the standby database has received the updates (though has not yet applied them).
  • Asynchronous replication allows for the highest risk of transaction loss in the event of failure, but also provides for the shortest transaction response time. In asynchronous mode, update transactions are considered successful after they have been applied to the primary database and after the updates have been sent to the standby database. There is no wait for confirmation that the standby database has actually received or applied the updates.
In the event of a failure of the primary database server, failover to the standby server can be accomplished by issuing a “TAKEOVER HADR” command from the standby database server for the database in question. When the takeover command has been issued, the standby database now becomes the primary database and can resume full and normal processing. Caution must be taken when issuing a takeover command that the former primary database is no longer accessible. If it is, then this can result in a situation in which both nodes in an HADR pairing are functioning as primary databases, and are both accepting and processing update transactions. This is called a “split-brain” scenario, and makes the resynchronization of data extremely difficult as both databases have different instances of updated data.
 
An additional feature that works in conjunction with HADR is ACR (automatic client reroute), in which client applications have knowledge of both the primary and the standby servers in an HADR pairing. In the event that the primary database is no longer responding, such as when the client receives a communication error, the client will automatically attempt to connect to the secondary server (i.e. the new primary server, after a TAKEOVER HADR command has been issued).
 
It should be noted that in its current implementation, HADR can only be practically used either as a high-availability solution or as a disaster recovery solution. By their natures, HA and DR standby databases have one significant mutually exclusive trait, and that is physical proximity to the primary database. Ideally, an HA standby database will be physically co-located in the same data center (and on the same network segment) as the primary database, so that replication between primary and standby will not be impacted by network latency, and also so that there will be little degradation in network communication times between the application servers and the standby server in the event of an HADR takeover. This is in contrast to a DR standby database, which will be physically located at a different site than the primary database, to protect itself from a site disaster that occurs at the primary data center. This limitation of “HA or DR” will be lifted in a future release of DB2 LUW, as multiple standby databases will be allowed. This will enable a true HA-DR solution as one standby database can be located locally as an HA standby, and a second standby database can be located remotely as a DR standby.