Shoal – A Generic Clustering Framework
About Shoal Framework
Shoal is a java based clustering framework that provides infrastructure to build fault tolerance, reliability and availability. The framework can be plugged into any product needing clustering and related distributed systems capabilities without tightly binding to a specific communications infrastructure. The framework can be plugged in as an in-process component. The framework will have two broad categories of public APIs, namely, a Client API, and a Group Communication Provider API. The Client API will allow in-process client components to be notified of cluster events and to send and receive messages to/from the group or individual members. The Group Communication Provider API provides for integration of any group communication technology such as JXTA, JGroups, etc. as the underlying group communications provider. The Framework thus provides a layer of abstraction for consuming applications/products without tightly binding them to a specific group communication system semantics and implementation. This allows employing applications to simply call the framework's APIs for building clustering and related enterprise distributed systems characteristics. Shoal's design considerations, approach and programming model are discussed in the Shoal Design Overview document. Please read this document to get an indepth understanding of Shoal framework.
Shoal is a part of the GlassFish Community.
The Shoal Framework, written in Java, can be used as an in-process component for cluster communications and cluster event management. As an in-process component, Shoal's core service, the Group Management Service (GMS), provides the ability for the process (JVM) to become a communicating member of a predefined cluster. As a result, within each process, clients implementing Shoal's client API can get notifications of cluster events. By becoming a communicating cluster member, the process gets the ability to
be notified of members' joining, failures, and planned shutdowns among cluster members ( see Shoal Group Event Notifications),
be notified of the process's selection as a recovery member for any recovery operations on another member on occurrence that other member's failure. (see Shoal Automated Delegated Recovery Initiation),
check, while joining, if any other process in the cluster is performing any recovery operations on its resources (see Shoal Automated Delegated Recovery Initiation),
receive messages sent by one or more members directly to it or to the group as a whole (see Shoal Messaging) and
send messages to the group as a whole or a particular member (see Shoal Messaging)
share application state data using GMS's Shared Cache implementation based on DistributedStateCache interface.
Membership in Multiple Groups
A single process can be a member of multiple groups to represent different groups. For example, the instance could participate in the pre-defined cluster that encompasses all cluster members while also participating in a different group, say, a replication group that represents a sub-group of members in that cluster to which this member would replicate its state to. For each such group, GroupManagementService instances will be created within the process and these will be independent of each other.
Core and Spectator Member Roles
A member can participate in a group either as a core member or as a spectator member. If configured to participate as a Core member, then that member's failure would be notified to all other members. If configured to participate as a Spectator, then the spectator member's failure will not be notified to all other members, however, the spectator member would receive all notifications pertaining to other Core members. This is useful for non-cluster processes such as an administration server, to be part of a defined group to get the advantages of the various notifications and messages described in the previous paragraph. Typically, this is useful for monitoring or administrative infrastructures in that the ability to communicate and receive notifications allows such non-cluster members to effect configuration changes based on certain group events and also send out meaningful messages to listening endpoints in the group.
Distributed State Cache
Shoal also provides the ability to add distributed cache implementations by exposing an interface DistributedStateCache. A default implementation is provided by GMS for lightweight replicated caching. Shoal framework users have the opportunity to develop their own implementations of the DistributedStateCache for use within their clustering environment along with GMS.
Shoal GMS Use Cases
Some real use cases are listed here. Note that these are not exhaustive and the extent to which GMS could be used is only limited by imagination.
In a Java EE (J2EE) environment, transaction service reliability can be significantly improved in a cluster setting, by using delegated recovery initiation notification that automatically picks another cluster member to perform recovery operations and notifies that member's transaction service to recover transactions from the failed member's transaction logs. Contrast this with manual transaction recovery which depends on availability of administrative personnel to perform the job.
Another example in the Java EE environment is IIOP Failover. With GMS, the ORB layer can now be dynamically notified of member failures, planned shutdowns and joins, thereby allowing the list of IOR addresses to be updated out-of-band to IIOP clients. This allows for better failover scenario compared to a static loadbalancer list wherein failover entails moving from one address to another until a live instance is found.
The same idea as the IIOP Failover applies to Http-based software Loadbalancers. By having GMS as an inprocess component and partipating as a Spectator, the loadbalancer can determine availability or lack thereof of any member in the cluster.
GMS provided Distributed State Cache can be used as an in-memory store for session data. The underlying group communication provider such as JXTA, JGroups, etc. can be tuned to run in a highly performant manner, enabling high availability of servers.
Read-only and read-mostly beans on various instances in a cluster can be notified when a cache update is discovered by one of the instances. The notification can be done through GMS's messaging API.
Monitoring Clients/Agents can now participate in the cluster as Spectator members and be notified of cluster instance lifecycle events. One can also build more monitoring information on top of GMS notifications in terms of load factors on an instance prior to failure so that appropriate service provisioning can be made.
Dynamic Provisioning services can take advantage of GMS thereby taking appropriate actions of providing more resources on occurence of failures depending on load conditions.
We would like to acknowledge and thank the design contributions of our colleague Masood Mortazavi, who played a major role in the initial design phases of this project. Masood's prior experiences influenced our thought and learning processes in building this framework to its current state.
We would like to thank the Sun Java System Application Server Architecture (asarch) Committee for their support, critique and guidance in clarifying requirements from the Sun Java System Application Server perspectives.
We would also like to thank the quality team at the application server group who built distributed testing environments, ran several test suites on our development efforts, and helped fix several significant issues that manifested only in specific distributed settings.