Shoal Automated Delegated Recovery Initiation
To better understand this document, read : Shoal Design Document
Shoal's GroupManagementService (GMS) provides a client API that allows client components within a process to receive callbacks when group events occur. One such group event is the Automatic Selection of a live member for Initiating Recovery operations on a failed member's resources when the failure is confirmed.
This selection event is only notified to the registered component on the selected member, while a record of this selection is shared with all members through an entry in the GMS implementation of Distributed State Cache. In order to have such a selection performed and notification delivered, one or more components in each member should register with the Group Management Service. For this, the client component has to implement two interfaces, namely, FailureRecoveryActionFactory and FailureRecoveryAction. FailureRecoveryActionFactory produces a FailureRecoveryAction which consumes a FailureRecoverySignal, which is the selection event object handed by GMS on the member that is selected.
The client invokes the registration API by doing the following :
GroupManagementService gms =
When Failure Occurs
GMS's failure handing core would determine if the recovery member selection algorithm needs to be run and if there are recipients for such a resultant notification. If so, the algorithm is run and in the member that was selected, the client component's FailureRecoveryActionFactory's produceAction() method is invoked, followed by invoking the produced FailureRecoveryAction's consumeSignal() method. GMS passes the FailureRecoverySignal implementation packaging the failed member's identity token, timestamps, etc.
The recovery selection algorithm currently depends on an identically ordered list of members on all surviving members. The list prior to failure is consulted, and the member that was next in the list to the failed member, is selected for recovery operations. This makes it a more deterministic, simple bit scalable solution for recovery operations.
Note that the implementation of FailureRecoveryAction's consumeSignal() method has to follow the precedence norms set out in the Shoal Design Document. In other words, the implementation in consumeSignal() method should first acquire the Signal by invoking FailureRecoverySignal.acquire().
Note also that the acquire() call here has special meaning in that Shoal's protective Failure Fencing kicks in when this call is made, i.e. internally, the Distributed State Cache implementation is called and an entry is placed in it recording the recovery member identity, the failed member identity, the recovery state (appointed or in-progress) and a timestamp of the entry. Once recovery is completed, the consumeSignal() implementation should call Signal.release() which results in the fence being lowered . i.e. the entry in the Distributed State Cache is removed.
Failure Fencing Explained
When a remote/delegate member is performing recovery operations on a failed member's resources, this operation has to be protected from contention or race conditions. The contention or race condition could occur when the failed member restarts before the recovery operations by the delegate member is completed and attempts to regain control of its resources. In order to enforce such a protection, GMS provides a protocol that allows processes to raise a protective fence until recovery operations are complete, at which point, the fence is lowered by the recovery member. The lowering of the fence is automatically done when consumeSignal()implementation calls FailureRecoverySignal.release().
When a failed member restarts, the protocol requires that the member check with the group as to whether any other member is performing recovery operations on its resources. This portion of the protocol is termed Netiquette. This is provided through GroupHandle.isFenced() API.
The client in the restarting failed member, may have predetermined policies for handling such situations. One policy example could be to continue its startup without reclaiming control over the resources, and reclaim ownership when notified about completion of the recovery. GMS currently does not provide notifications when the fence is raised or lowered. Typically, clients can call the GroupHandle.isFenced() API to make such a determination. Another policy example would be to block start up until the recovery operations are completed and continue startup after regaining control of resources. This is left to the client's implementation.
Clients Performing Recovery on Startup
There are situations where certain clients perform recovery operations on their resources during startup with the implicit assumption that the startup was preceded by a failure. In such cases, and particularly so when delegated recovery notification is registered for, such clients during startup must :
first check if a fence has been raised on their member identity (through GroupHandle.isFenced())
if not fenced, and if performing recovery, raise a fence (through GroupHandle.raiseFence()). This results in an entry in the Distributed State Cache recording the recovery member identity, the failed member identity, the recovery state (appointed or in-progress) and a timestamp of the entry
when recovery operations are done, lower the fence (through GroupHandle.lowerFence()). This results in the entry in the Distributed State Cache being removed